Storage

Customizing storage and providing permanent storage.

Scheduler

By default a PVC is created that stores the SQLite database and logs from the pipelines. This is mounted in /var/run/owl.

TBD.

Pipeline jobs

Storage provided to the pipeline will be mounted by the pipeline and all Dask workers. For this reason it needs to be able to be mounted in ReadWriteMany mode.

Option 1: hostPath

If we have somewhere a NFS server and all the nodes have access to the same mounted shares we can use hostPath to define user storage. In this case we would have the following in the Helm values.yaml:

pipeline:
  extraVolumeMounts:
    - name: user-storage
      mountPath: /storage/${USER}
  extraVolumes:
    - name: user-storage
      hostPath:
        path: /storage/${USER}
        type: Directory

If user joe submits a pipeline, it is expected that the all nodes have access to /storage/joe, mounted from the NFS server with write permissions for the user id 1000 (the default id of the user running the pods). The pipeline and Dask pods then mount this directory inside each container under the same mount point /storage/joe.

Option 2: NFS

Another option is using as well an external NFS server but instead of hostPath creating the PV and PVC manually as nfs type. In this case again is necesary ghat the /storage/joe directory is already available in the NFS share with the right permissions.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-storage-joe
  labels:
    username: joe
spec:
  storageClassName: nfs-direct
  capacity:
    storage: 10Mi
  accessModes:
    - ReadWriteMany
  nfs:
    path: /storage/joe
    server: 192.168.100.10
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-storage-joe
spec:
  storageClassName: nfs-direct
  accessModes:
    - ReadWriteMany
  selector:
    matchLabels:
      username: joe
  resources:
    requests:
      storage: 10Mi

Then in the Heml values.yaml we set:

pipeline:
  extraVolumeMounts:
    - name: user-storage
      mountPath: /storage/${USER}
  extraVolumes:
    - name: nfs-storage
      persistentVolumeClaim:
        claimName: nfs-storage-${USER}

Option 3: Ceph

Using Ceph. TBD.