workflowskill
v0.9.0
Published
Declarative YAML workflow runtime — authoring skill and schema
Readme
WorkflowSkill
Agents improvise. Workflows deliver.
An open standard for turning agent skills into durable, deterministic workflows that run on any platform.
$ claude
> /workflow-author Write me a workflow that fetches my last 10 Gmail messages,
summarizes them, and posts the summary to #daily-digest in Slack.$ workflowskill run workflows/gmail-to-slack.md
Running gmail-to-slack
toolkit: weldable (mock mode)
⟳ gmail.search_messages
✓ gmail.search_messages (12ms)
⟳ anthropic.llm
✓ anthropic.llm (9ms)
⟳ slack.post_message
✓ slack.post_message (4ms)
╭──────── gmail-to-slack ─────────╮
│ { "message_ts": "172..." } │
╰─────────────────────────────────╯Why WorkflowSkill?
Agents are great at reasoning, but not every task needs reasoning. When an agent fetches your emails, summarizes them, and posts to Slack — that's a predictable sequence of actions. Running an agent through it every time means paying for inference, waiting on model calls, and hoping it doesn't hallucinate a step. The tasks where this hurts most are:
- Structured — the work is predictable and can be defined ahead of time
- Multi-step — useful automation chains together multiple actions
- Repetitive — they run on a schedule or in response to a trigger, not just once
- Action-oriented — the value comes from doing something (fetching a page, comparing prices, sending an email), not from open-ended reasoning
WorkflowSkill lets agents delegate these tasks to a runtime instead of improvising them. A workflow is authored once, then runs as deterministic code — no inference on every execution, no token burn, no drift. LLM calls only happen where you actually need intelligence.
Because the logic is code — not a prompt being re-interpreted — the runtime can offer capabilities that agents can't: durable execution that survives failures, automatic retries, timeouts, pausing and resuming, deterministic outcomes, and scheduling on a timer or triggering from external events.
Workflow Skills are portable across any platform that implements the WorkflowSkill standard, and they're built on the open Agent Skills spec. A workflow authored for one platform runs on any other that supports the same actions — no rewriting, no lock-in. The goal is to do for durable execution what Agent Skills did for skills — an open ecosystem of workflows where the whole community moves forward together.
To support WorkflowSkill, a platform implements two things: a toolkit (which handles action execution — routing execute_activity() calls to the platform's integrations) and a runtime (which handles orchestration — durability, checkpointing, retries, and pause/resume). These are independent extension points: any toolkit works with any runtime.
Quickstart
Prerequisites: Node.js 20+ · pnpm · Claude Code
1. Install
git clone https://github.com/matthew-h-cromer/workflowskill.git
cd workflowskill
pnpm install
pnpm buildThe CLI is not yet published to npm. Invoke it via node dist/cli/index.js (shown below) or add it to your shell with npm link to get a global workflowskill binary.
2. Run the hello-world example
node dist/cli/index.js run examples/hello-world.mdhello-world
───────────
✓ Workflow complete
Output:
{
"greeting": "Hello, World!"
}Override the default input:
node dist/cli/index.js run examples/hello-world.md -i name=Linus3. Author your own
Open Claude Code in this directory and use the /workflow-author skill:
> /workflow-author Write me a workflow that takes a name as input and returns a greeting.Claude generates a .md file with YAML frontmatter and saves it to workflows/. Run it:
node dist/cli/index.js run workflows/greeting.md -i name=Linus4. Call real services (mock mode)
The CLI ships with a mock Weldable toolkit that simulates action calls deterministically — no API keys, no network, no authentication. It's designed for authoring iteration: an agent can draft a workflow that uses Slack, Gmail, GitHub, Anthropic, or any of the other built-in integrations, and you can run it locally to verify structure and data flow before deploying to a hosted runtime.
$ claude
> /workflow-author Write me a workflow that fetches my last 10 Gmail messages,
summarizes them, and posts the summary to #daily-digest in Slack.node dist/cli/index.js run workflows/gmail-to-slack.mdRunning this same workflow against a live toolkit (with real credentials and durable execution) is the job of a hosted runtime like Weldable — the workflow YAML is unchanged; only the execution environment differs.
Toolkits
A toolkit handles action execution — it routes each action step to the right API, SDK, or service. Toolkit authors implement a single method: execute(action, args, idempotencyKey) -> unknown. The workflow YAML is unchanged regardless of which toolkit runs it.
| Toolkit | Platform | Actions |
|---------|----------|---------|
| weldable (built-in, mock-only in this CLI) | Weldable | 11 integrations shipped locally: Anthropic, Discord, GitHub, Gmail, Google Calendar/Docs/Drive/Sheets/Tasks, Slack, Web. Hosted Weldable exposes 264+. |
The CLI's run command uses the mock Weldable toolkit. Real action execution happens on hosted runtimes that implement the toolkit protocol against live credentials.
Runtimes
A runtime handles workflow orchestration — durability, checkpointing, retries, pause/resume, and signals. The same workflow code runs on any runtime; only the execution guarantees differ.
| Runtime | Description |
|---------|-------------|
| in-memory (built-in) | Single-process, non-durable. Signals via EventEmitter. Used by the CLI for local authoring and by the conformance suite. |
| dbos (planned) | Durable execution via DBOS. Each step checkpointed; crash recovery resumes from the last completed step. The Runtime protocol in src/runtime/protocol.ts maps directly to DBOS primitives. |
The CLI run command always uses the in-memory runtime. Hosted platforms supply their own durable runtime against the same protocol.
Specification
Workflows are declarative YAML documents. The interpreter traverses them deterministically against any conforming runtime and toolkit. This section is the authoritative format reference.
Document structure
Workflows live in .workflow.md files. The YAML workflow definition goes in the frontmatter (between --- delimiters); the markdown body below is a plain-language description of the workflow.
---
version: 1
name: <slug>
description: <string>
inputs:
<name>:
type: <schema>
default: <value> # optional
description: <string> # optional
outputs:
<name>: "{{ <expression> }}"
steps:
- <step>
- <step>
---Relationship to Agent Skills
WorkflowSkill extends the open Agent Skills standard. The name and description fields inherit the standard's constraints unchanged, so any conforming Agent Skills validator accepts a WorkflowSkill frontmatter block on those fields. The remaining fields (version, inputs, outputs, steps) are WorkflowSkill extensions.
| Field | Source | Rule |
|---|---|---|
| name | Agent Skills | Lowercase letters, numbers, and hyphens only. Max 64 chars. Cannot contain the reserved words anthropic or claude. |
| description | Agent Skills | Non-empty. Max 1024 chars. Should describe both what the workflow does and when to run it. |
| version | WorkflowSkill | Integer-major schema version. Always 1 in v1. |
| inputs | WorkflowSkill | Declared input parameters. |
| outputs | WorkflowSkill | Top-level output expressions (alternative to a return step). |
| steps | WorkflowSkill | Sequential list of step primitives — the workflow body. |
Agent Skills optional frontmatter fields that Claude Code recognizes (when_to_use, argument-hint, disable-model-invocation, user-invocable, allowed-tools, model, effort, context, agent, hooks, paths, shell) are not consumed by the WorkflowSkill interpreter, but are neither forbidden nor rewritten. A consumer that treats a workflow file as an Agent Skill will see them; the interpreter ignores them.
Expression language
JSONata is the only expression language. It handles both data reshaping and predicates.
No general-purpose code execution is supported in v1. If a transformation cannot be expressed in JSONata, the correct response is to expose the missing capability as an integration action, not to embed code in the workflow. This is a deliberate constraint: it keeps multi-tenant execution safe by construction, keeps workflows fully portable, and forces integration coverage to grow where real demand exists.
Expression delimiters:
{{ ... }}— JSONata expression embedded in a string-typed field, evaluated against the execution context.- Bare JSONata (no delimiters) in
transform.expr,if.when,while.when, andswitch.on. In predicate positions, any truthy JSONata result selects the branch.foreach.itemsuses{{ }}template syntax.
Execution context:
| Name | Contents |
| -------------------------- | ---------------------------------------------------------- |
| steps.<id>.output | Output value of a completed step |
| steps.<id>.error | Error object (only in scope inside catch) |
| input.<name> | Workflow inputs declared in top-level inputs: |
| workflow.owner | Publisher / runner identity |
| workflow.run_id | Current run identifier |
| workflow.name | Workflow name |
| workflow.started_at | ISO timestamp |
| env.<name> | Publisher-scoped environment variables (allowlisted) |
Step scoping:
steps is lexically scoped. Inside a foreach iteration or parallel branch, a new scope is pushed. Resolution walks inner-to-outer (read). Writes land in the innermost scope.
- Step
ids must be unique among sibling steps (same lexical scope). Shadowing an outer id is a validation error. - Inside a
foreachbody:steps.<id>resolves local-first.<as>and$indexare bound by the loop. - From outside a
foreach:steps.<loop_id>.output[i].<inner_id>.outputaddresses iterationi's inner step. - From outside a
parallel:steps.<par_id>.branches.<name>.<inner_id>.outputaddresses a branch's inner step.
Non-deterministic JSONata built-ins:
$now(), $millis(), $random() — because every step is wrapped in an idempotent checkpoint, these functions are evaluated once at first execution and their result is cached on replay. This is the correct durability semantic. Authors should be aware that $now() inside a transform returns the time of first execution, not replay time.
Data flow
Implicit inside the workflow body: steps reference each other via expressions. Normal steps do not declare inputs or outputs.
Explicit only at one typed boundary: inputs: and outputs: at the workflow
top level.
Rationale: implicit expression references keep normal wiring tight and readable; explicit typed signatures at the workflow boundary give the workflow-level contract what it needs.
Step primitives
Every step requires a unique id and a description (1–80 chars, single line). No exceptions — wait and return steps also require both.
action
Call an integration action.
- id: list_unread
type: action
uses: gmail.search
with:
q: "is:unread newer_than:1d"
maxResults: 50transform
Reshape data via JSONata. Checkpointed as a first-class step.
- id: urgent
type: transform
expr: |
steps.list_unread.output.messages[
"IMPORTANT" in labelIds
].{
id: id,
subject: headers.subject
}if
Conditional branch. JSONata predicate (bare, no {{ }}).
- id: has_urgent
type: if
when: "$count(steps.urgent.output) >= input.threshold"
then:
- <step>
else: # optional
- <step>switch
Multi-way branch on a value.
- id: route
type: switch
on: "steps.classify.output.category"
cases:
billing: [<step>]
support: [<step>]
default: [<step>]foreach
Iterate a collection with bounded concurrency and optional rate limiting. Output is an ordered array of per-iteration scope maps, indexed by source item order.
- id: enrich
type: foreach
items: "{{ steps.urgent.output }}"
as: msg
concurrency: 5 # optional, default 1 (sequential)
rate_limit: # optional
max: 10
per: "1s" # "1s" | "1m" | "1h"
body:
- id: lookup
type: action
uses: clearbit.person_lookup
with: { email: "{{ msg.from }}" }rate_limit caps iteration-body starts across the whole loop. Combines with concurrency: concurrency bounds in-flight count, rate_limit bounds start frequency.
Outer-visible output: steps.enrich.output[i].lookup.output reaches iteration i's lookup step.
while
Conditional loop. max_iterations is required. Supports optional rate limiting for polling patterns.
- id: poll
type: while
when: "steps.check.output.status != 'done'"
max_iterations: 60
rate_limit: # optional
max: 1
per: "5s"
body:
- <step>parallel
Explicit fan-out with named branches. Output: steps.<id>.branches.<name>.<inner_id>.output is a branch's inner step output.
- id: lookup
type: parallel
branches:
clearbit:
- type: action
uses: clearbit.person
with: { email: "{{ input.email }}" }
hubspot:
- type: action
uses: hubspot.contact_by_email
with: { email: "{{ input.email }}" }try / catch / finally
Structured error handling. Inside catch, the error is in scope as error.
- id: risky
type: try
body:
- type: action
uses: external.sync
catch:
- type: action
uses: slack.post_message
with:
channel: "#ops"
text: "{{ 'Sync failed: ' & error.message }}"
finally: # optional
- type: action
uses: metrics.increment
with: { name: "sync_attempts" }wait
Time-based suspension.
- type: wait
duration: "5m"
- type: wait
until: "{{ input.scheduled_at }}"wait_for_signal
External signal. Used for webhooks, inbound events, cross-workflow coordination, and any asynchronous resume.
- id: await_payment
type: wait_for_signal
signal: "stripe.payment_succeeded"
match:
"customer.email": "{{ input.email }}"
timeout: "7d"
on_timeout: abort # | continue (default: abort)Output when received: the signal payload. Output on on_timeout: continue: null.
return
Explicit workflow output. Alternative to top-level outputs:.
- type: return
value: "{{ steps.final.output }}"Step-level properties
retry applies to action steps only. It is meaningless on pure/deterministic steps and is rejected by the schema for all other step types.
retry:
max_attempts: <int>
backoff: exponential | linear | fixed
on: [<error_code>, ...] # optional allowlist
timeout: <duration>
continue_on_error: <bool> # default falseCheckpointing is not author-controllable. Every step checkpoints; this is a platform guarantee required by durable resume, the step inspector, and fork-based debug. Large-output concerns are handled by the interpreter (size caps, blob-backed references), not by per-step opt-outs.
retryapplies before error propagation. Exhausted attempts surface the last error.continue_on_error: truecaptures the error intosteps.<id>.error, setssteps.<id>.outputto null, and continues.
Idempotency is automatic, not declared. The interpreter passes a deterministic key derived from run_id + nested step path + iteration index (for foreach) + branch name (for parallel) to every action invocation. Integrations that honor idempotency headers (Stripe, Square, and similar) consume it transparently; others ignore it. This makes within-run retries and resumes safe without any author input.
For cross-run idempotency ("send this invoice exactly once, ever"), use the integration's native idempotency argument (e.g., Stripe's idempotency_key) as a normal with: value. Business-level dedup belongs to the integration contract, not the workflow primitive set.
Error semantics
- Uncaught errors propagate to the nearest
try.catch; if none exists, the workflow fails. - Error object shape:
{ message, code, step_id, retryable, details }. - Inside
catch,errorrefers to the most recent caught error.
Checkpointing and durability
- Each step produces a checkpoint row:
{ run_id, step_id, status, input, output, error, started_at, ended_at }. - On worker crash or restart, execution resumes from the last completed step.
- Transforms are deterministic and re-run safely; actions rely on
idempotency_keyfor safe replay. - Forked runs (debug mode) clone checkpoint history up to a chosen step and execute forward on a new run_id.
Interpreter determinism contract
The interpreter MUST traverse the YAML tree deterministically. Same workflow + same inputs → same sequence of runtime calls in the same order. This is the invariant that DBOS-backed (ordinal-based) and similar replay runtimes depend on.
All non-determinism (clocks, randomness, I/O) lives inside step thunks, never in the interpreter's control flow between steps.
YAML edits invalidate in-progress runs. If the tree structure changes, step ordinals shift and replay-based runtimes will detect a non-determinism error. Edits must produce new workflow instances; mid-run upgrades are not supported.
env.* source
env.* is never populated from process.env or any ambient shell environment. Doing so would leak secrets into step inputs and checkpoint rows.
- CLI mock mode:
envis empty by default. Populated only via explicit--env KEY=VALUEflags or--env-file <path>. Documented as "fake values for mock authoring only." - Hosted runtimes:
envis populated from the publisher's allowlisted, secrets-system-resolved values.
Secrets for real action calls are resolved by the integration layer at execution time, not surfaced to the workflow.
Portability and conformance
Every workflow declares version: 1. Any conforming interpreter must:
- Evaluate JSONata per the pinned spec version.
- Support every primitive in this document.
- Pass the published conformance test suite.
Interpreters MAY add non-standard primitives prefixed with x- (e.g., type: x-custom). Workflows using x- primitives are non-portable and must be flagged as such.
Versioning
versionis integer-major only. Breaking changes bump the major; interpreters reject unknown versions.- Non-breaking additions (new primitives, new step properties) are published as minor schema updates within the same major.
Resolved design decisions
- Sequential list, not DAG. Workflow body is a sequential
steps:list; branching emerges from nested primitives (if,switch,parallel,foreach). Revisit only if real composition hits limits. - Saga / compensation deferred.
compensate:is reserved as a step-level key name; not implemented in v1. - Rate limiting lives on loop primitives.
foreachandwhilesupportrate_limit. Keeps rate control a workflow concern, not an integration implementation burden. - No workflow-scoped mutable state. Transforms-as-steps subsume it. Reconsider only if durable demand emerges.
- String interpolation is single-mode.
{{ ... }}is always JSONata; outside the delimiters is literal. Multi-valued interpolation uses JSONata's&operator inside a single{{ }}. - Step reference scoping. All completed steps in the same scope are referenceable by id. Ids must be unique per lexical scope; shadowing is a validation error. Per-iteration references use the iteration body's local ids plus
$index. retryapplies toactionsteps only. Schema rejectsretryon all other step types. Retry on deterministic steps is meaningless.on_timeout: continueoutput is alwaysnullforwait_for_signal.- Parallel fan-out uses child workflows. On replay-based durable runtimes (e.g. DBOS),
foreachandparallelspawn child workflows per branch/iteration to keep each branch's step ordinal space independent.
