itsharness

v0.8.0

Published

23 days ago

**Build, run, and observe AI agent workflows.**

0High
0Medium
0Low

philiparxist

itsharness

Build, run, and observe AI agent workflows.

Draw a flow on the canvas → export a runtime-agnostic spec → compile to your framework → run, trace, debug, and deploy — all from one tool.

flow.json  →  [ langgraph adapter ]  →  Python / LangGraph
           →  [ crewai adapter ]     →  Python / CrewAI
           →  [ mastra adapter ]     →  TypeScript / Mastra
           →  [ maf adapter ]        →  Python / MS Agent Framework
           →  [ REST endpoint ]      →  POST /flows/{id}/invoke
           →  [ MCP tool ]           →  Claude Desktop + any MCP client
           →  [ A2A agent ]          →  any A2A-compatible runtime

Current version: v0.8.0 — all four phases complete. 240 items shipped.

Quick start

1. Run setup

./setup-env.sh

That single command does everything needed before docker compose up:

Generates all cryptographic secrets and writes them into .env (replacing any placeholders from .env.example in-place — no duplicate lines)
Prompts for the two values that can't be auto-generated: your Langfuse admin email and password
Optionally prompts for LLM API keys (OpenAI / Anthropic) — press Enter to skip and add them later
Writes .env.local for the Vite canvas dev server
Verifies every required secret with scripts/check-env.sh before proceeding
Then asks (separately, each with Y/n) whether to also:
- Create the Python virtual environment and install adapter dependencies
- Generate mastra-runner/package-lock.json (requires Node.js, one-time)
- Start the full Docker stack immediately

Re-run safe — if a secret is already set to a real value, setup-env.sh keeps it and skips it. Run it again any time to repair a partially-filled .env or add secrets that were introduced after your initial setup.

2. Start the stack

If you didn't start it inside setup-env.sh:

docker compose up

| Service | URL | |---|---| | Canvas | http://localhost:3000 | | Adapter API | http://localhost:8000/health | | Langfuse | http://localhost:3001 |

Nine services start: canvas, adapter, mastra-runner, postgres, redis, clickhouse, litellm, langfuse-web, langfuse-worker.

Startup errors? See docs/troubleshooting.md. The most common causes are a stale Postgres volume (./scripts/reset-volumes.sh fixes it) or a secret that's wrong length — bash scripts/check-env.sh will identify it exactly.

Real-time collaboration is opt-in — see docs/collab.md.

On-prem / Kubernetes — see docs/deployment.md.

Check secrets any time

bash scripts/check-env.sh

Checks every required secret is present, non-placeholder, and — for LANGFUSE_ENCRYPTION_KEY specifically — exactly 64 hex characters. Exits 0 if all good, 1 with a clear error for each failing key.

Without Docker

./setup-env.sh                  # handles secrets, venv, and deps
source adapter/.venv/bin/activate
npm install && npm run dev       # canvas → http://localhost:3000
cd adapter && python main.py     # adapter → http://localhost:8000

Tests

npm test                         # Vitest — validates all 5 reference flows
pytest adapter/tests/ -v         # adapter unit + integration suite
pytest adapter/tests/test_maf_adapter.py -v   # MAF adapter suite (742 tests)

LLM provider setup

itsharness routes all LLM calls through LiteLLM — a unified proxy that sits between the adapters and the actual model providers. You pick a model name in your flow spec; LiteLLM sends it to the right provider.

flow spec  →  adapter  →  LiteLLM proxy  →  OpenAI   (gpt-4o, gpt-4o-mini)
                       ↗                 →  Anthropic (claude-sonnet, claude-haiku, claude-opus)
                       ↗                 →  Ollama    (mistral, qwen3, qwen2.5-coder)

Quick reference — model names

| Model name in flow spec | Provider | Key required in .env | |---|---|---| | gpt-4o | OpenAI | OPENAI_API_KEY | | gpt-4o-mini | OpenAI | OPENAI_API_KEY | | claude-sonnet | Anthropic | ANTHROPIC_API_KEY | | claude-haiku | Anthropic | ANTHROPIC_API_KEY | | claude-opus | Anthropic | ANTHROPIC_API_KEY | | mistral | Ollama (local) | none | | qwen3 | Ollama (local) | none | | qwen2.5-coder | Ollama (local) | none |

All four adapters (LangGraph, CrewAI, Mastra, MS Agent Framework) use the same routing — the model name in your spec determines the provider automatically.

Option A — OpenAI

Add your key to .env:
```
OPENAI_API_KEY=sk-...
```
In your flow spec, set model_defaults.model or any llm_call node's model field:
```
{ "model_defaults": { "model": "gpt-4o-mini" } }
```
Start the stack: docker compose up

Option B — Anthropic (Claude)

Add your key to .env:
```
ANTHROPIC_API_KEY=sk-ant-...
```

In your flow spec, use a Claude model name:

{ "model_defaults": { "model": "claude-sonnet" } }

Start the stack: docker compose up

That's it. LiteLLM handles the Anthropic API — no other changes needed.

Option C — Local Ollama (no API keys required)

Run every adapter entirely offline against a local Ollama server. No OpenAI or Anthropic account needed.

Step 1 — Install Ollama

| Platform | Command | |---|---| | macOS | brew install ollama or download the desktop app | | Linux | curl -fsSL https://ollama.com/install.sh \| sh | | Windows | Download the installer |

Step 2 — Pull a model

ollama pull mistral:latest        # ~4 GB, fast — recommended for testing
# or
ollama pull qwen3:latest          # higher quality, larger
# or
ollama pull qwen2.5-coder:7b      # good for code-heavy flows

Check what you have: ollama list

Step 3 — Configure the Docker stack

Add two lines to your .env so the adapter and Mastra runner containers reach Ollama on your host:

# .env — add these lines (or uncomment if already present)
OPENAI_BASE_URL=http://host.docker.internal:11434/v1
OPENAI_API_KEY=ollama

host.docker.internal is the Docker-internal hostname that resolves to your Mac or Linux host. On Linux, run docker compose up with --add-host=host.docker.internal:host-gateway if this hostname isn't available in your Docker version.

Then restart the adapter (or the whole stack) to pick up the new env vars:

docker compose up -d              # full stack
# or, if already running:
docker compose restart adapter mastra-runner

Step 4 — Run the adapter test

setup-ollama.sh submits flows/06-ollama-simple-flow.json to all four adapters, polls for completion, and verifies each response mentions the test topic.

# Basic — mistral:latest, all 4 runtimes:
./setup-ollama.sh

# Different model:
./setup-ollama.sh qwen3:latest

# Different test topic:
./setup-ollama.sh mistral:latest "quantum computing"

# Single runtime only:
RUNTIME=langgraph ./setup-ollama.sh
RUNTIME=mastra    ./setup-ollama.sh

# Non-interactive / CI — skip the email prompt:
[email protected] TEST_PASSWORD=CiPass99! ./setup-ollama.sh

What you'll see:

━━  Preflight  ━━
  ✓  Ollama is running at http://localhost:11434
  ✓  Model 'mistral:latest' is available
  ✓  Adapter v0.7.0 is running at http://localhost:8000

━━  Authentication  ━━
  Email:    [email protected]
  Password: ········
  ✓  Logged in as [email protected]

━━  Submitting jobs  ━━
  ✓  langgraph                       → 4f1a…
  ✓  crewai                          → 8c2b…
  ✓  microsoft_agent_framework       → d91e…
  ✓  mastra                          → 3f7c…

━━  Waiting for results  ━━
  ✓  langgraph                      done in 5s — topic verified ✓
     Photosynthesis is a process used by plants…
  ✓  mastra                         done in 12s — topic verified ✓
     Photosynthesis is a process used by plants…
  ✓  microsoft_agent_framework      done in 18s — topic verified ✓
     Photosynthesis is a process by which plants…
  ✓  crewai                         done in 28s — topic verified ✓
     Photosynthesis is a process used by plants…

━━  Summary  ━━
  ✓  langgraph                      PASS
  ✓  crewai                         PASS
  ✓  microsoft_agent_framework      PASS
  ✓  mastra                         PASS

  ✓  All 4 runtime(s) passed

Troubleshooting Ollama

| Symptom | Fix | |---|---| | Ollama is not running | Run ollama serve (or open the macOS app) | | Model not found | Run ollama pull mistral:latest | | Adapter returns wrong topic / empty result | Check docker compose logs adapter --tail 30 — OPENAI_BASE_URL may not be set | | host.docker.internal not resolving (Linux) | Add --add-host=host.docker.internal:host-gateway to the adapter's docker-compose service, or set OPENAI_BASE_URL=http://172.17.0.1:11434/v1 | | Timeout on Mastra | Mastra compiles TypeScript and spins up a vm.Module — allow 30-60 s for first run |

Without Docker (local dev)

export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_API_KEY=ollama
cd adapter && uvicorn main:app --host 0.0.0.0 --port 8000 --reload &
./setup-ollama.sh mistral:latest

How it works under the hood

In Docker, the adapter and Mastra runner containers have:

OPENAI_BASE_URL = http://litellm:4000   (default — the LiteLLM proxy)
OPENAI_API_KEY  = <LITELLM_MASTER_KEY>  (authenticates to LiteLLM)

LiteLLM reads OPENAI_API_KEY and ANTHROPIC_API_KEY from the host .env to call the actual APIs. Every LLM call is also traced in Langfuse automatically.

When you set OPENAI_BASE_URL to an Ollama URL in .env, that value overrides the default, bypassing LiteLLM entirely and hitting Ollama directly.

Adding a custom model: edit adapter/litellm_config.yaml and restart the litellm container:

- model_name: my-model          # use this name in the flow spec
  litellm_params:
    model: openai/gpt-4.1       # or anthropic/..., ollama/..., etc.
    api_key: os.environ/OPENAI_API_KEY

What it does

Draw — 14 node types on a visual canvas. Every spec field is directly editable.
Own the spec — the canvas emits a versioned, runtime-agnostic JSON spec you control.
Compile — one API call transforms the spec into runnable code for your chosen framework.
Run and observe — live node overlays, per-node token counts, Langfuse trace links, HITL pause/resume.
Deploy — one click publishes the flow as a REST endpoint, MCP tool, and A2A agent simultaneously.
Collaborate — real-time multi-user editing with Yjs CRDT, live cursors, and offline persistence.
Embed — drop the canvas into your own portal with the @itsharness/canvas npm package.

The spec is the contract. The canvas is the editor. The adapters are the compilers.

Repository structure

itsharness/
│
├── spec/                        ← @itsharness/flow-spec (published npm)
│   ├── schema.ts                  Canonical Zod schema — source of truth
│   ├── schema.json                Derived JSON Schema
│   └── CHANGELOG.md
│
├── flows/                       ← 5 reference flows (JSON)
│
├── packages/
│   └── canvas/                  ← @itsharness/canvas (published npm)
│       ├── src/ItsHarnessCanvas.tsx
│       ├── src/store/create.ts    Per-instance Zustand store (no singleton)
│       ├── vite.config.lib.ts     Lib build → ESM + CJS + types
│       └── README.md
│
├── src/                         ← Canvas app (React + TypeScript + XYFlow)
│   ├── collab/                    Yjs CRDT real-time collaboration layer
│   ├── spec/                      Canvas schema + validation
│   ├── store/index.ts             Zustand store
│   ├── canvas/nodes/              14 node components
│   └── components/                Sidebar, ConfigPanel, deploy panels, HITL
│
├── adapter/                     ← FastAPI backend (v0.8.0)
│   ├── langgraph_adapter.py       LangGraph codegen — all 14 nodes
│   ├── crewai_adapter.py          CrewAI codegen — all 14 nodes
│   ├── mastra_adapter.py          Mastra TypeScript codegen
│   ├── maf_adapter.py             MS Agent Framework codegen — all 14 nodes
│   ├── sso_auth.py                OIDC + SCIM 2.0
│   ├── migrations/versions/       Alembic migrations 0001–0008
│   └── tests/                     Pytest suite
│
├── mastra-runner/               ← Node.js sidecar for Mastra execution
│
├── deploy/helm/itsharness/      ← On-prem Helm chart (v0.1.0)
│
├── .github/workflows/
│   ├── ci.yml                     PR checks: lint · typecheck · tests · canvas-package build
│   ├── eval.yml                   Spec-validation · debate quality metrics (nightly + push)
│   ├── deploy.yml                 5-stage: test → build → staging → prod → post-eval
│   ├── publish-spec.yml           Publishes @itsharness/flow-spec on spec-v* tags
│   └── publish-canvas.yml         Publishes @itsharness/canvas on canvas-v* tags
│
├── docs/
│   ├── architecture.md            System design, data flows, key decisions
│   ├── api.md                     Full API reference
│   ├── collab.md                  Real-time collaboration setup and internals
│   ├── deployment.md              Docker, Helm, SSO/OIDC, env var reference
│   └── adr/001-codegen-field-semantics.md
│
├── docker-compose.yml           ← 9 services
├── docker-compose.collab.yml    ← y-websocket overlay (opt-in collab)
└── CONTRIBUTING.md

spec/schema.ts vs src/spec/schema.ts — spec/schema.ts is the canonical published schema. The canvas copy omits .refine() calls required by Zod's discriminated union. When the spec changes, update both and run scripts/check-schema-sync.mjs.

The spec — `@itsharness/flow-spec`

Current version: 0.2.0 · RFC: closed — field semantics in docs/adr/001

Node types

| Node | What it does | Runtime support | |---|---|---| | input | Flow entry point | All | | output | Flow exit point | All | | llm_call | LLM invocation — structured output, validator, fail_branch, managed prompts | All | | tool_invoke | Named tool from the flow's tools[] registry | All | | condition | Branching — JSONPath or fn_ref | All | | parallel_fork | Fan-out to N concurrent branches | All | | parallel_join | Fan-in — merge / append / fn_ref reducer | All | | hitl_breakpoint | Suspend; wait for a typed human resume payload | All | | memory_read | Read from key-value or semantic store | All | | memory_write | Write to a named store | All | | subgraph | Embed another flow as a node | LG/MA: full · CR: partial | | transform | State transform — mapping or fn_ref | All | | agent_role | Execute an agent persona from agents[] | CR: native · others: synthesised | | agent_debate | Multi-agent loop with termination condition | MAF: native · others: synthesised |

Example flows

| Flow | Runtime | Exercises | |---|---|---| | 01 — RAG Agent | LangGraph | memory_read semantic, transform fn_ref | | 02 — Content Moderation + HITL | Mastra | llm_call structured output, hitl_breakpoint | | 03 — Parallel Risk Assessment | CrewAI | parallel_fork/join, agent_role ×3 | | 04 — Research Crew | CrewAI | context_from on edges, tool_approval: "human" | | 05 — Debate Agent + A2A | MS Agent Framework | agent_debate, a2a_config |

Adapter coverage

| Runtime | Status | Key notes | |---|---|---| | LangGraph · Python | ✅ Full | @observe trace + child spans · HITL via interrupt() | | CrewAI · Python | ✅ Full | context_from → Task.context · tier-aware Crew() memory | | Mastra · TypeScript | ✅ Full | Node.js sidecar · suspend()/resume() HITL | | MS Agent Framework · Python / semantic-kernel 1.x | ✅ Full | AgentGroupChat native · HITL via _HitlPause · OTel → Langfuse |

Core workflows

Compile a flow

TOKEN=$(curl -s -X POST http://localhost:8000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "[email protected]", "password": "YourPassword1"}' | jq -r .token)

curl -s -X POST "http://localhost:8000/compile?runtime=langgraph" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"spec\": $(cat flows/01-rag-agent-flow.json)}" | jq -r .code

Execute a flow

# The 'inputs' object seeds the flow's initial state.
# Keys must match the fields declared in the flow's state_schema.
JOB=$(curl -s -X POST "http://localhost:8000/run?runtime=langgraph" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"spec\":   $(cat flows/06-ollama-simple-flow.json),
    \"inputs\": {\"topic\": \"quantum computing\"}
  }" | jq -r .job_id)

# Poll for status + result
curl -s -H "Authorization: Bearer $TOKEN" "http://localhost:8000/run/$JOB" \
  | jq '{status, result, trace_url}'

Deploy a flow

# One-click deploy: REST + MCP + A2A simultaneously
curl -s -X POST "http://localhost:8000/deploy/my-flow" \
  -H "Authorization: Bearer $TOKEN"

# Invoke synchronously
curl -s -X POST "http://localhost:8000/flows/my-flow/invoke" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input": {"query": "What is the capital of France?"}}'

The `@itsharness/canvas` package

The canvas is also published as a standalone npm package for embedding in your own tools:

npm install @itsharness/canvas

import { ItsHarnessCanvas } from '@itsharness/canvas'
import '@itsharness/canvas/styles.css'

<ItsHarnessCanvas
  initialSpec={mySpec}
  onSpecChange={(updated) => save(updated)}
  onNodeSelect={(id) => setInspector(id)}
  execStats={runState.nodeStats}
  theme="dark"
/>

See packages/canvas/README.md for full props reference and usage patterns.

License

Apache 2.0 — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

itsharness

Quick start

1. Run setup

2. Start the stack

Check secrets any time

Without Docker

Tests

LLM provider setup

Quick reference — model names

Option A — OpenAI

Option B — Anthropic (Claude)

Option C — Local Ollama (no API keys required)

Step 1 — Install Ollama

Step 2 — Pull a model

Step 3 — Configure the Docker stack

Step 4 — Run the adapter test

Troubleshooting Ollama

Without Docker (local dev)

How it works under the hood

What it does

Repository structure

The spec — @itsharness/flow-spec

Node types

Example flows

Adapter coverage

Core workflows

Compile a flow

Execute a flow

Deploy a flow

The @itsharness/canvas package

Further reading

License

The spec — `@itsharness/flow-spec`

The `@itsharness/canvas` package