Paperless on k8s
Paperless is a software package that scans and organizes documents digitally. The coolest features are
- automatically tagging documents based on contents
- scan and OCR documents
- full text search all documents
So let’s install into my personal k8s cluster!
Configuring Redis #
Paperless uses redis for communicating between API and worker processes. Redis also stores a queue of documents yet to be processed. Let’s run a simple instance.
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: paperless
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:latest
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: paperless
spec:
selector:
app: redis
ports:
- protocol: TCP
port: 6379
targetPort: 6379
Storage #
Persistent storage in K8s is managed through PV (persistent volume) and PVC (persistent volume claim). Please refer to the official documentation: https://kubernetes.io/docs/concepts/storage/persistent-volumes/ .
The storage classes available in your cluster depend on the k8s distribution you are using. In my case I chose DigitalOcean. You can get storage classes like so:
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
do-block-storage (default) dobs.csi.digitalocean.com Delete Immediate true 40h
do-block-storage-retain dobs.csi.digitalocean.com Retain Immediate true 40h
do-block-storage-xfs dobs.csi.digitalocean.com Delete Immediate true 40h
do-block-storage-xfs-retain dobs.csi.digitalocean.com Retain Immediate true 40h
It’s advisable to choose a storage class here that is not local
so our paperless pod can survive a restart.
Internally the do-block-storage
volume is handled by an operator that attaches to the node that our pod is
running on and passes the volume to our pod itself using CSI. We are creating three separate PVCs for paperless:
- data (contains management information, database state)
- media (contains documents scanned)
- consume (will store files yet to be consumed)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: paperless-data
namespace: paperless
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: do-block-storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: paperless-media
namespace: paperless
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: do-block-storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: paperless-consume
namespace: paperless
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: do-block-storage
Paperless #
Installing paperless is a matter of referring to the documentation and converting the docker-compose sample to k8s language.
Homepage: https://docs.paperless-ngx.com/ .
apiVersion: apps/v1
kind: Deployment
metadata:
name: paperless
namespace: paperless
labels:
app: paperless
spec:
replicas: 1
selector:
matchLabels:
app: paperless
template:
metadata:
labels:
app: paperless
spec:
volumes:
- name: paperless-data
persistentVolumeClaim:
claimName: paperless-data
- name: paperless-media
persistentVolumeClaim:
claimName: paperless-media
- name: paperless-consume
persistentVolumeClaim:
claimName: paperless-consume
containers:
- name: paperless
image: ghcr.io/paperless-ngx/paperless-ngx:1.17.2
ports:
- containerPort: 8000
imagePullPolicy: Always
volumeMounts:
- mountPath: "/data/data"
name: paperless-data
- mountPath: "/data/media"
name: paperless-media
- mountPath: "/data/consume"
name: paperless-consume
env:
- name: PAPERLESS_REDIS
value: "redis://redis:6379"
- name: PAPERLESS_DATA_DIR
value: "/data/data"
- name: PAPERLESS_MEDIA_ROOT
value: "/data/media"
- name: PAPERLESS_CONSUMPTION_DIR
value: "/data/consume"
- name: PAPERLESS_ADMIN_USER
value: "root"
- name: PAPERLESS_ADMIN_PASSWORD
value: "***"
- name: PAPERLESS_URL
value: "https://paperless.yourdomain.com"
- name: PAPERLESS_BIND_ADDR
value: "0.0.0.0"
- name: PAPERLESS_PORT
value: "8000"
---
kind: Service
apiVersion: v1
metadata:
name: paperless
namespace: paperless
spec:
selector:
app: paperless
ports:
- port: 8000
ADMIN_PASSWORD
in k8s is a bad idea. Load the value form a secret instead.