Set Up Paperless-NGX for Fully Searchable Document Archiving

Why Paperless-NGX Is Worth the Setup Time

Paper documents pile up fast – tax returns, insurance policies, medical records, receipts – and most people’s solution is a drawer that becomes a box that becomes a problem. Paperless-NGX is an open-source document management system that scans, processes, and indexes every document you feed it, making everything full-text searchable in seconds. It runs entirely on your own hardware, meaning no subscription fees, no third-party cloud storage, and no one else reading your files.

The appeal goes beyond just going green. Paperless-NGX uses OCR (optical character recognition) to extract text from PDFs and scanned images, then stores that text alongside the original file. Search for “dentist invoice 2023” and it surfaces the document immediately, regardless of what you named the file or which folder you put it in. For anyone managing a household, small business, or home office archive, that capability alone changes how you interact with your records.

A document scanner on a desk next to a stack of paper files — Photo by iMin Technology / Pexels

What You Need Before You Start

Paperless-NGX runs best via Docker, and the official setup assumes you have Docker and Docker Compose installed. A Raspberry Pi 4 with 4GB of RAM can handle light workloads, but a dedicated x86 machine or a NAS device with Docker support will perform noticeably better, especially if you plan to process large batches of scanned documents. You will also need a document scanner capable of outputting PDF or image files – most modern all-in-one printers qualify.

Before touching any configuration files, create a dedicated folder structure on your host machine. You will need directories for incoming documents (the “consume” folder), processed storage, and database files. These paths get mapped into the Docker container, so keeping them organized from the start prevents confusion when things inevitably need troubleshooting later.

A small home server or NAS device set up in a home office — Photo by panumas nikhomkhai / Pexels

Deploying the Stack with Docker Compose

The official Paperless-NGX repository includes a Docker Compose template that spins up three containers: the main application, a Redis broker for background tasks, and a PostgreSQL database. Download the docker-compose.yml and docker-compose.env files from the GitHub releases page, then edit the environment file to set your time zone, secret key, and the host paths you mapped out earlier.

The PAPERLESS_CONSUMPTION_DIR variable is where files get picked up automatically. Point this at the consume folder on your host, and any document dropped there will be processed without manual intervention. Set PAPERLESS_DATA_DIR and PAPERLESS_MEDIA_ROOT to persistent locations outside the container so your archive survives container rebuilds.

Run docker compose up -d from the directory containing your compose file. On first launch, Paperless-NGX will initialize the database and run migrations – this takes a minute or two. After that, create your admin account using the createsuperuser command inside the running container: docker exec -it paperless_webserver_1 python manage.py createsuperuser. Substitute the actual container name if yours differs, which you can confirm with docker ps.

Access the web interface at http://your-server-ip:8000. The default port is 8000, but you can remap it in the compose file if that conflicts with anything else running on your network. If you are already running self-hosted monitoring tools like Uptime Kuma, add Paperless-NGX as a monitored service so you get alerts if the container goes offline.

Configuring OCR and Auto-Tagging

OCR quality depends on the language settings in your environment file. Set PAPERLESS_OCR_LANGUAGE to the appropriate ISO 639-2 language code – eng for English, deu for German, and so on. Paperless-NGX uses Tesseract under the hood, and you can specify multiple languages separated by plus signs if your documents mix languages. For most home users, a single language setting is enough.

The auto-tagging and correspondent matching system is where Paperless-NGX earns its setup cost. Under Settings, you can define rules that automatically assign tags, document types, and correspondents based on content patterns. A rule that matches the word “invoice” and assigns a tag called “Billing” will run every time a new document is processed. You can also set rules to match sender names, account numbers, or any other recurring text. Over time, your archive self-organizes without any manual filing.

Setting Up Automated Document Ingestion

The consume folder method works, but most scanners can also send documents directly via email or SMB network share. Paperless-NGX supports polling an IMAP mailbox – configure the PAPERLESS_EMAIL_HOST and related credentials in your environment file and it will pull attachments from a designated inbox on a set schedule. This is particularly useful for email-delivered invoices, bank statements, and receipts that you would otherwise download manually.

For physical documents, the cleanest workflow is a scanner with a sheet feeder set to save directly to the consume folder over your local network. Many all-in-one printers support SMB or FTP output natively. Set the scanner to output multipage PDFs, keep DPI at 300 for clean OCR results without creating enormous files, and name the files with a timestamp so duplicates are easy to spot.

Scheduled backups are non-negotiable for an archive you plan to rely on. Use docker exec to run document_exporter on a cron schedule, which exports all documents and metadata to a folder you can then sync to an external drive or offsite storage. The export includes the original files and a JSON manifest – enough to fully restore the archive to a new instance if your server fails. Storing your documents in one place is only as safe as your backup strategy, and a paperless archive with no backup is no more reliable than the drawer it replaced.