Shell
The Shell pipeline allows to submit scripts as shell commands.
Install
curl -O https://raw.githubusercontent.com/eddienko/owl-shell-pipeline/main/shell_pipeline/signature.yaml
owl admin pdef add signature.yaml
Pipeline definition file
The following pipeline definition file example executes a shell script in the cluster.
version: 1.2
name: shell
command: |
#!/bin/bash
echo "hello" > hello
sleep 600
cat hello
# optional
# use_dask: false
# output directory (optional)
# - sets the directory where the script is run
# - stores pipeline logs
# output: /storage/user/output
resources:
workers: 1
memory: 2
cores: 2
The command
argument is anything that can be executed in a shell script, or indeed
a Python script if the shebang line #!/usr/bin/env python
is used. Other example commands:
command: |
#!/usr/bin/env python
import random
print(random.randint(1, 10))
The use_dask
argument needs a bit of explanation. In the default mode (false)
the script is run in only one worker with access to as much memory and cores
requested. Internally the (python) script can use Dask, multiprocessing,
multithreading or any other mechanism but the resources will be fixed to one
worker.
If use_dask is true then it makes sense to request more workers. It is assumed then that the command is a Python script that connects to the Dask scheduler as follows:
import os
from distributed import Client
DASK_SCHEDULER = os.getenv("DASK_SCHEDULER_ADDRESS")
client = Client(DASK_SCHEDULER)