Owl Job Scheduler

Owl is a job scheduler that allows to submit user jobs to a Kubernetes cluster from anywhere.

Get started

Open-source MIT Licensed. GitHub v0.8.3

cloud

Owl Server

Allow external users to run parameterized jobs in a Kubernetes cluster.

monitor

Owl Client

Submit parameterized pipelines and custom jobs to the cluster from anywhere.

code

Owl Develop

Develop reusable pipelines that can be run at scale with custom parameters.

Submit pipelines from anywhere

Authenticate to the Owl Server and submit jobs that run in a remote Kubernetes cluster.

# Authenticate
owl auth login

# Submit pipeline
owl job submit pipeline.yml

Run parameterized pipelines

Execute pipelines at scale with user supplied parameters allocating custom server resources. Parellel and distributed computation is powered by Dask.

The basic idea is that a user can run pipelines (or data analysis recipes) with different data or different parameters without the need of any code. Out of the box pipelines include an example pipeline for demonstration purposes, a shell pipeline that executes a command or a script and a papermill pipeline that runs a parameterized Jupyter notebook.

More complex pipelines for image analysis and data processing will be available in the showcase section.

version: 1

# Name of the pipeline
name: example

# Pipeline arguments
datalen: 100

# Resources requested
resources:
  threads: 10
  workers: 2
  memory: 10
 

Build reusable pipelines

Develop reusable pipelines that can be run by all users with custom parameters.

Owl pipelines are Python packages that can be installed using pip.

from dask import delayed
from owl_dev import pipeline
from owl_dev.logging import logger


@pipeline
def main(*, datalen: int, output: Path=None):
    logger.info("Computing...")

    output = []
    for x in range(datalen):
        a = delayed(inc)(x)
        b = delayed(double)(x)
        c = delayed(add)(a, b)
        output.append(c)

    total = delayed(sum)(output)
    return total.compute()