Set Up Stirling PDF for Self-Hosted Document Processing

What Stirling PDF Actually Does (and Why It Matters)

Stirling PDF is an open-source, self-hosted web application that handles practically every PDF operation you would otherwise pay a SaaS subscription to perform. Split, merge, compress, rotate, watermark, convert, OCR, redact – it bundles all of that into a single browser-based interface running entirely on your own hardware. No file is ever sent to a third-party server, which makes it an obvious choice for anyone handling contracts, medical records, legal documents, or anything else that has no business sitting on a stranger’s cloud storage.

The project runs as a Docker container, which means setup is fast and the environment stays consistent whether you are running it on a home server, a VPS, or a Raspberry Pi 4 with enough RAM to spare. The interface is clean and requires no login by default, though authentication can be added for multi-user households or small teams. It supports over 50 PDF operations as of recent builds, and the developer updates it frequently enough that the GitHub release page is worth bookmarking.

This guide walks through a complete Docker Compose installation, covers the most useful configuration options, and explains how to connect it to a reverse proxy so you can reach it from a proper domain instead of a raw IP address.

A home server rack setup used for self-hosted applications and document processing — Photo by Brett Sayles / Pexels

Prerequisites and System Requirements

Before pulling any images, confirm your host machine is running Docker Engine 20.10 or later, along with Docker Compose V2. On Ubuntu or Debian, the quickest path is installing Docker via the official convenience script, then confirming the install with docker –version and docker compose version. Stirling PDF does not demand extraordinary resources – 1 GB of RAM is workable for light use, but 2 GB is more comfortable if you plan to run OCR jobs on multi-page scans. OCR processing is CPU-bound and can spike briefly during conversion, so do not be alarmed if you see that behavior on a lower-powered machine.

You will also want a dedicated directory on your host to store configuration files and any custom fonts or scripts you add later. A path like /opt/stirling-pdf works well. Create subdirectories inside it for configs, logs, customFiles, and trainingData before writing the Compose file – Docker will create volumes on its own, but having explicit host paths makes backups and migrations far simpler. If you already run Paperless-NGX for searchable document archiving, Stirling PDF pairs naturally with it: use Stirling to preprocess and clean up raw scans before they land in Paperless for indexing.

Network access is worth planning before you start. If this installation is for personal use on a local network only, you can expose the port directly and skip the reverse proxy section. For anything accessible from outside your home network, running it behind Nginx Proxy Manager or Caddy with a valid TLS certificate is strongly recommended. Leaving a document-processing tool exposed on a raw port without HTTPS is an unnecessary risk, even if the documents you process are not sensitive.

Installing with Docker Compose

Create your Compose file at /opt/stirling-pdf/docker-compose.yml and paste in the following configuration:

services:
  stirling-pdf:
    image: frooodle/s-pdf:latest
    container_name: stirling-pdf
    ports:
      - "8080:8080"
    volumes:
      - ./configs:/configs
      - ./logs:/logs
      - ./customFiles:/customFiles
      - ./trainingData:/usr/share/tesseract-ocr/5/tessdata
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
      - LANGS=en_GB
    restart: unless-stopped

A stack of PDF documents on a desk representing digital document management — Photo by Kampus Production / Pexels

Run docker compose up -d from that directory and give it a minute to pull the image and start. Once the container is running, open a browser and navigate to http://your-server-ip:8080. You should see the Stirling PDF dashboard with all available tools organized into categories: organize, convert, security, and more. If the page does not load, check docker logs stirling-pdf for error output – the most common early issue is a port conflict, which you resolve by changing the left side of the ports mapping to any unused port on your host.

The environment variables control meaningful behavior. Setting DOCKER_ENABLE_SECURITY=true activates the built-in login system, which then expects you to configure a username and password either through the UI on first run or by editing the generated settings.yml file inside the configs volume. The LANGS variable tells Tesseract which language packs to load for OCR – add additional codes as a comma-separated list if your documents include languages other than English. The INSTALL_BOOK_AND_ADVANCED_HTML_OPS flag pulls in Calibre and additional conversion dependencies; leave it false unless you specifically need ebook-to-PDF conversion, since it adds significant image size and startup time.

Connecting a Reverse Proxy and Finishing Configuration

If you are using Nginx Proxy Manager, add a new proxy host pointing to stirling-pdf:8080 as the forward hostname and port. Enable the SSL certificate through the built-in Let’s Encrypt integration and force HTTPS. That is the entire proxy setup for most home server configurations. Caddy users can add a simple block to their Caddyfile: define the domain, then reverse_proxy localhost:8080. Caddy handles certificate renewal automatically with no additional configuration.

With the proxy in place, open the Settings panel inside the Stirling PDF interface. You can set a custom application name, upload a logo to replace the default branding, and adjust the default language displayed in the UI. The settings file lives at /opt/stirling-pdf/configs/settings.yml on the host, so you can also edit it directly and restart the container to apply changes. One setting worth changing immediately is the login page’s application name if you have enabled security – generic default text on a login prompt is unnecessary information to expose to anyone who stumbles across the URL.

A person scanning and processing documents on a laptop computer — Photo by Kampus Production / Pexels

OCR quality is controlled by the Tesseract training data inside the trainingData volume. The default English pack is adequate for clean printed text, but if you are processing handwritten notes or lower-quality scans, downloading the larger eng.traineddata file from the Tesseract GitHub releases and dropping it into that directory will improve accuracy noticeably. You do not need to rebuild or restart the container – Stirling PDF reads the tessdata directory at job time, so the updated file takes effect on the next OCR request.

Running Your First Jobs

The interface is self-explanatory for most operations, but a few tools deserve attention. The PDF/A conversion option under the Convert section is useful for long-term archiving – PDF/A is the ISO standard format designed for documents that need to remain readable without depending on external fonts or resources. The “Remove Blanks” tool under Organize is a quiet workhorse: it scans a document and strips out pages that fall below a configurable ink coverage threshold, which saves considerable cleanup time after scanning a physical document with occasional empty pages.

Batch processing works by uploading multiple files at once to most tools. Drag several PDFs onto the merge tool and reorder them by dragging thumbnails before confirming. The compress tool offers three quality levels and shows an estimated output size before you commit. For redaction, the interface lets you draw boxes over text regions across specific pages or all pages at once – the redaction is applied to the actual PDF content layer, not just painted over visually, which matters if the document will ever be opened in a tool that can reveal covered text.

Stirling PDF also exposes a full REST API, documented at /swagger-ui/index.html on your instance. Every operation available in the UI is accessible programmatically, which opens the door to scripted workflows – automatically compressing PDFs dropped into a watched folder, for instance, or building a simple integration that calls the merge endpoint from another application. Whether you use the web interface daily or wire it into an automation pipeline, the API documentation is detailed enough that you will not need to hunt for examples elsewhere.