@livingdata/pipex
v0.0.9
Published
Execution engine for containerized pipeline steps
Readme
Pipex
Execution engine for containerized steps via Docker CLI.
Runs containers with explicit volume mounts and manages artifacts through a staging/commit lifecycle. Designed to be driven by different orchestrators (CLI included, AI agent planned).
Prerequisites
- Node.js 24+
- Docker CLI installed and accessible
Quick Start
Run directly without installing:
npx @livingdata/pipex run pipeline.yamlOr install globally:
npm install -g @livingdata/pipex
pipex run pipeline.yamlUsage
# Interactive mode (default)
pipex run pipeline.yaml
# JSON mode (for CI/CD)
pipex run pipeline.yaml --json
# Custom workdir
pipex run pipeline.yaml --workdir /tmp/buildsInspecting runs
Each step execution produces a run containing artifacts, logs (stdout/stderr), and metadata:
# Show all steps and their last run (status, duration, size, date)
pipex show my-pipeline
# Show logs from the last run of a step
pipex logs my-pipeline download
pipex logs my-pipeline download --stream stderr
# Show execution metadata (image, cmd, duration, exit code, fingerprint…)
pipex inspect my-pipeline download
pipex inspect my-pipeline download --json
# Export artifacts from a step to the host filesystem
pipex export my-pipeline download ./output-dirManaging workspaces
# List workspaces (with run/cache counts and disk size)
pipex list
pipex ls --json
# Remove old runs (keeps only current ones)
pipex prune my-pipeline
# Remove specific workspaces
pipex rm my-build other-build
# Remove all workspaces
pipex cleanCommands
| Command | Description |
|---------|-------------|
| run <pipeline> | Execute a pipeline |
| show <workspace> | Show steps and runs in a workspace (with artifact sizes) |
| logs <workspace> <step> | Show stdout/stderr from last run |
| inspect <workspace> <step> | Show run metadata (meta.json) |
| export <workspace> <step> <dest> | Extract artifacts from a step run to the host filesystem |
| prune <workspace> | Remove old runs not referenced by current state |
| list (alias ls) | List workspaces (with disk sizes) |
| rm <workspace...> | Remove one or more workspaces |
| clean | Remove all workspaces |
Global Options
| Option | Description |
|--------|-------------|
| --workdir <path> | Workspaces root directory (default: ./workdir) |
| --json | Structured JSON logs instead of interactive UI |
Run Options
| Option | Alias | Description |
|--------|-------|-------------|
| --workspace <name> | -w | Workspace name for caching |
| --force [steps] | -f | Skip cache for all steps, or a comma-separated list |
| --dry-run | | Validate pipeline, compute fingerprints, show what would run without executing |
| --verbose | | Stream container logs in real-time (interactive mode) |
Pipeline Format
Pipeline files can be written in YAML (.yaml / .yml) or JSON (.json). YAML is recommended for readability; JSON is still fully supported.
Steps can be defined in two ways: raw steps with explicit image/cmd, or kit steps using uses for common patterns. Both can coexist in the same pipeline.
Pipeline and Step Identity
Both pipelines and steps support an id/name duality:
id— Machine identifier (alphanum, dash, underscore). Used for caching, state, artifacts.name— Human-readable label (free-form text). Used for display.- At least one must be defined. If
idis missing it is derived fromnamevia slugification (e.g."Données préparées"→donnees-preparees). Ifnameis missing,idis used for display.
# Pipeline with both id and name
id: data-pipeline
name: Data Processing Pipeline
steps:
# Step with only id (current style, still works)
- id: download
image: alpine:3.19
cmd: [sh, -c, "echo hello > /output/hello.txt"]
# Step with only name (id auto-derived to "build-assets")
- name: Build Assets
image: node:22-alpine
cmd: [sh, -c, "echo done > /output/result.txt"]
# Step with both
- id: deploy
name: Deploy to Staging
image: alpine:3.19
cmd: [echo, deployed]Kit Steps
Kits are reusable templates that generate the image, command, caches, and mounts for common runtimes. Use uses to select a kit and with to pass parameters:
name: my-pipeline
steps:
- id: build
uses: node
with: { script: build.js, src: src/app }
- id: analyze
uses: python
with: { script: analyze.py, src: scripts }
- id: extract
uses: shell
with: { packages: [unzip], run: "unzip /input/build/archive.zip -d /output/" }
inputs: [{ step: build }]uses and image/cmd are mutually exclusive. All other step fields (env, inputs, mounts, sources, caches, timeoutSec, allowFailure, allowNetwork) remain available and merge with kit defaults (user values take priority). The src parameter in with copies the host directory into /app in the container's writable layer (see Sources).
Available Kits
node -- Run a Node.js script with automatic dependency installation.
| Parameter | Default | Description |
|-----------|---------|-------------|
| script | (required) | Script to run (relative to /app) |
| src | -- | Host directory to copy into /app |
| version | "24" | Node.js version |
| packageManager | "npm" | "npm", "pnpm", or "yarn" |
| install | true | Run package install before script |
| variant | "alpine" | Image variant |
python -- Run a Python script with automatic dependency installation from requirements.txt.
| Parameter | Default | Description |
|-----------|---------|-------------|
| script | (required) | Script to run (relative to /app) |
| src | -- | Host directory to copy into /app |
| version | "3.12" | Python version |
| packageManager | "pip" | "pip" or "uv" |
| install | true | Run dependency install before script |
| variant | "slim" | Image variant |
shell -- Run a shell command in a container, with optional apt package installation.
| Parameter | Default | Description |
|-----------|---------|-------------|
| run | (required) | Shell command to execute |
| packages | -- | Apt packages to install before running |
| src | -- | Host directory to mount read-only at /app |
| image | "alpine:3.20" | Docker image (defaults to "debian:bookworm-slim" when packages is set) |
When packages is provided, the kit automatically switches to a Debian image, enables network access, and provides an apt-cache cache. Without packages, it runs on a minimal Alpine image with no network.
# Simple command (alpine, no network)
- id: list-files
uses: shell
with:
run: ls -lhR /input/data/
# With system packages (debian, network + apt cache)
- id: extract
uses: shell
with:
packages: [unzip, jq]
run: unzip /input/download/data.zip -d /output/
inputs: [{ step: download }]Raw Steps
For full control, define image and cmd directly:
name: my-pipeline
steps:
- id: download
image: alpine:3.19
cmd: [sh, -c, "echo hello > /output/hello.txt"]
- id: process
image: alpine:3.19
cmd: [cat, /input/download/hello.txt]
inputs: [{ step: download }]Step Options
| Field | Type | Description |
|-------|------|-------------|
| id | string | Step identifier (at least one of id/name required) |
| name | string | Human-readable display name |
| image | string | Docker image (required for raw steps) |
| cmd | string[] | Command to execute (required for raw steps) |
| uses | string | Kit name (required for kit steps) |
| with | object | Kit parameters |
| inputs | InputSpec[] | Previous steps to mount as read-only |
| env | Record<string, string> | Environment variables |
| outputPath | string | Output mount point (default: /output) |
| mounts | MountSpec[] | Host directories to bind mount (read-only) |
| sources | MountSpec[] | Host directories copied into the container's writable layer |
| caches | CacheSpec[] | Persistent caches to mount |
| timeoutSec | number | Execution timeout |
| allowFailure | boolean | Continue pipeline if step fails |
| allowNetwork | boolean | Enable network access |
Inputs
Mount previous steps as read-only:
inputs:
- step: step1
- step: step2
copyToOutput: true- Mounted under
/input/{stepName}/ copyToOutput: truecopies content to output before execution
Host Mounts
Mount host directories into containers as read-only:
mounts:
- host: src/app
container: /app
- host: config
container: /confighostmust be a relative path (resolved from the pipeline file's directory)containermust be an absolute path- Neither path can contain
.. - Always mounted read-only -- containers cannot modify host files
This means a pipeline at /project/ci/pipeline.yaml can only mount subdirectories of /project/ci/. Use /tmp or /output inside the container for writes.
Sources
Copy host directories into the container's writable layer. Unlike bind mounts, copied files live inside the container so the step can create new files and subdirectories alongside them (e.g. node_modules after npm install).
sources:
- host: src/app
container: /app- Same path rules as
mounts(hostrelative,containerabsolute, no..) - Files are snapshotted at step start -- changes on the host during execution are not reflected
- The container can write next to source files without affecting the host
When to use sources vs mounts:
- Use
sourceswhen the step needs to write alongside the source files (install dependencies, generate build artifacts next to sources) - Use
mountswhen read-only access is sufficient (config files, static data)
Kits use sources internally: the node kit's src parameter copies into /app so that npm install can create node_modules.
Caches
Persistent read-write directories shared across steps and executions:
caches:
- name: pnpm-store
path: /root/.local/share/pnpm/store
- name: build-cache
path: /tmp/cache- Persistent: Caches survive across pipeline executions
- Shared: Multiple steps can use the same cache
- Mutable: Steps can read and write to caches
Common use cases:
- Package manager caches (pnpm, npm, cargo, maven)
- Build caches (gradle, ccache)
- Downloaded assets
Note: Caches are workspace-scoped (not global). Different workspaces have isolated caches.
Examples
Geodata Processing
The examples/geodata/ pipeline downloads a shapefile archive, extracts it, and produces a CSV inventory — using the debian and bash kits:
examples/geodata/
└── pipeline.yamlSteps: download → extract → list-files / build-csv
pipex run examples/geodata/pipeline.yamlMulti-Language
The examples/multi-language/ pipeline chains Node.js and Python steps using kits:
examples/multi-language/
├── pipeline.yaml
└── scripts/
├── nodejs/ # lodash-based data analysis
│ ├── package.json
│ ├── analyze.js
│ └── transform.js
└── python/ # pyyaml-based enrichment
├── pyproject.toml
├── requirements.txt
├── analyze.py
└── transform.pySteps: node-analyze → node-transform → python-analyze → python-transform
pipex run examples/multi-language/pipeline.yamlCaching & Workspaces
Workspaces enable caching across runs. The workspace ID is determined by:
- CLI flag
--workspace(highest priority) - Pipeline
id(explicit or derived fromname)
Cache behavior: Steps are skipped if image, cmd, env, inputs, and mounts haven't changed. See code documentation for details.
Troubleshooting
Docker not found
# Verify Docker is accessible
docker --version
docker psPermission denied (Linux)
sudo usermod -aG docker $USER
newgrp dockerWorkspace disk full
Clean old workspaces:
pipex list
pipex rm old-workspace-id
# Or remove all at once
pipex cleanCached step with missing run
Force re-execution:
rm $PIPEX_WORKDIR/{workspace-id}/state.jsonDevelopment
git clone https://github.com/livingdata-co/pipex.git
cd pipex
npm install
cp .env.example .envRun the CLI without building (via tsx):
npm run cli -- run pipeline.yaml
npm run cli -- listOther commands:
npm run build # Compile TypeScript (tsc → dist/)
npm run lint # Lint with XO
npm run lint:fix # Auto-fix lint issuesArchitecture
For implementation details, see code documentation in:
src/engine/- Low-level container execution (workspace, executor)src/cli/- Pipeline orchestration (runner, loader, state)src/kits/- Kit system (registry, built-in kit implementations)
