1. Posts/

Paperless on k8s

·3 mins

Paperless is a software package that scans and organizes documents digitally. The coolest features are

  • automatically tagging documents based on contents
  • scan and OCR documents
  • full text search all documents

So let’s install into my personal k8s cluster!

Configuring Redis

Paperless uses redis for communicating between API and worker processes. Redis also stores a queue of documents yet to be processed. Let’s run a simple instance.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: paperless
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:latest
          ports:
            - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: paperless
spec:
  selector:
    app: redis
  ports:
    - protocol: TCP
      port: 6379
      targetPort: 6379

Storage

Persistent storage in K8s is managed through PV (persistent volume) and PVC (persistent volume claim). Please refer to the official documentation: https://kubernetes.io/docs/concepts/storage/persistent-volumes/ .

The storage classes available in your cluster depend on the k8s distribution you are using. In my case I chose DigitalOcean. You can get storage classes like so:

$ kubectl get storageclass
NAME                          PROVISIONER                 RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
do-block-storage (default)    dobs.csi.digitalocean.com   Delete          Immediate           true                   40h
do-block-storage-retain       dobs.csi.digitalocean.com   Retain          Immediate           true                   40h
do-block-storage-xfs          dobs.csi.digitalocean.com   Delete          Immediate           true                   40h
do-block-storage-xfs-retain   dobs.csi.digitalocean.com   Retain          Immediate           true                   40h

It’s advisable to choose a storage class here that is not local so our paperless pod can survive a restart. Internally the do-block-storage volume is handled by an operator that attaches to the node that our pod is running on and passes the volume to our pod itself using CSI. We are creating three separate PVCs for paperless:

  • data (contains management information, database state)
  • media (contains documents scanned)
  • consume (will store files yet to be consumed)

Paperless

Installing paperless is a matter of referring to the documentation and converting the docker-compose sample to k8s language.

Homepage: https://docs.paperless-ngx.com/ .

apiVersion: apps/v1
kind: Deployment
metadata:
  name: paperless
  namespace: paperless
  labels:
    app: paperless
spec:
  replicas: 1
  selector:
    matchLabels:
      app: paperless
  template:
    metadata:
      labels:
        app: paperless
    spec:
      volumes:
        - name: paperless-data
          persistentVolumeClaim:
            claimName: paperless-data
        - name: paperless-media
          persistentVolumeClaim:
            claimName: paperless-media
        - name: paperless-consume
          persistentVolumeClaim:
            claimName: paperless-consume
      containers:
        - name: paperless
          image: ghcr.io/paperless-ngx/paperless-ngx:1.17.2
          ports:
            - containerPort: 8000
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: "/data/data"
              name: paperless-data
            - mountPath: "/data/media"
              name: paperless-media
            - mountPath: "/data/consume"
              name: paperless-consume
          env:
            - name: PAPERLESS_REDIS
              value: "redis://redis:6379"
            - name: PAPERLESS_DATA_DIR
              value: "/data/data"
            - name: PAPERLESS_MEDIA_ROOT
              value: "/data/media"
            - name: PAPERLESS_CONSUMPTION_DIR
              value: "/data/consume"
            - name: PAPERLESS_ADMIN_USER
              value: "root"
            - name: PAPERLESS_ADMIN_PASSWORD
              value: "***"
            - name: PAPERLESS_URL
              value: "https://paperless.yourdomain.com"
            - name: PAPERLESS_BIND_ADDR
              value: "0.0.0.0"
            - name: PAPERLESS_PORT
              value: "8000"
---
kind: Service
apiVersion: v1
metadata:
  name: paperless
  namespace: paperless
spec:
  selector:
    app: paperless
  ports:
    - port: 8000
Warning! Storing the ADMIN_PASSWORD in k8s is a bad idea. Load the value form a secret instead.
Avatar
Julius Hinze