just-bash-harness

v0.3.0

Published

2 months ago

Single-agent harness on top of just-bash and the agent-skills ecosystem. Sandboxed tool execution, derived approval gates, persisted sessions, swappable LLM providers (Anthropic + Cloudflare Workers AI), cross-session memory, optional AES-256-GCM at rest.

0High
0Medium
0Low

rckflr

agent agentic agent-skills just-bash llm-tools sandboxed-execution anthropic cloudflare-workers-ai approval-gate memory

just-bash-harness

Single-agent loop on top of just-bash and the agent-skills ecosystem. Sandboxed tool execution, derived approval gates, persisted sessions, swappable LLM providers.

Version: 0.3.0 · Status: v0 contract complete + packaged + CI'd + polished + applicable_when filter + cross-session memory + search/stats/export + compaction (with optional rolling LLM summary) + AES-256-GCM at rest (with harness rekey rotation) + retrieval bench + interactive REPL + chains (with chain-aware approval, no bypass) + Hermes parser w/ diagnostics + AbortSignal propagation + per-tool-call rationale + secret redaction at persistence boundaries + approval-fatigue metrics + Cloudflare provider rate-limit + backoff. 195/195 unit tests pass. End-to-end validated against real Gemma 4 26B and Hermes 2 Pro on Cloudflare Workers AI subscribing the public [email protected]. Published as just-bash-harness on the npm registry.

Intended audience

This is maintainer-grade software for a specific ecosystem, not a generic agent harness aiming for mass adoption. It's designed for:

The maintainer of the agent-skills / just-bash stack (@rckflr/agent-skills-cli, just-bash-data, just-bash-wiki) and tightly-integrated downstreams.
Early-adopter engineers comfortable reading the source, willing to track a small stack of related packages, and able to make their own calls on the open trade-offs (no built-in secret redaction in tool stdout, optional-but-not-rotatable encryption key, single-tenant by design).

If you want a broader-purpose agent runtime with multi-tenancy, GUI, and strong integration with arbitrary tool ecosystems, this isn't it. If you want a small, auditable loop that composes the agent-skills spec primitives end-to-end, it is.

What it is

A thin orchestrator (~4100 LOC TypeScript in src/, plus ~2400 LOC of unit tests) that:

Runs a turn loop: prompt → tool calls → results → next turn → end.
Resolves tool calls to agent-skills subscribed in a local FileBank.
Executes each skill in runExec's per-skill sandboxed just-bash instance (FS scratch, network allowlist, env scoping — already provided by @rckflr/agent-skills-cli).
Categorizes each tool call as prohibited / explicit / regular derived from existing skill metadata (no new spec field) and applies the policy matrix.
Persists session state via db sessions/turns/approvals collections in a dedicated just-bash-data bank.
Speaks to Anthropic Messages API or Cloudflare Workers AI (default model: @cf/google/gemma-4-26b-a4b-it).

What it is not

A multi-agent orchestrator. Single agent only.
A multi-tenant deployment. Single user assumption is intact.
A sandbox for untrusted user code. The user is trusted; the LLM and skills are not (see DESIGN.md §2).
A web UI. CLI / TTY only.

Quickstart

git clone <this-repo> harness
cd harness
npm install
npm run build                                         # → dist/cli.js, dist/index.js

# Pick a provider via env (auto-detected). Either of these works:
export CF_ACCOUNT_ID=...   CF_API_TOKEN=...           # → Gemma 4 26B
export ANTHROPIC_API_KEY=...                          # → claude-opus-4-7

# Optional: real semantic retrieval (else stub embedder)
export OLLAMA_MODEL=nomic-embed-text                  # or OPENAI_*, CF_*

# Subscribe a skill pack (signed git tag enforced by default)
node dist/cli.js skills add github.com/foo/[email protected]

# Run a chat turn
SID=$(node dist/cli.js new)
echo "say hi using the available tools" | node dist/cli.js chat "$SID"

# Resume later
node dist/cli.js resume "$SID"

Working with unsigned skills (local development)

The default policy is signature.require_signed: true, so any pack whose git tag isn't gitsign / GitHub-OIDC verified resolves to category prohibited and gets hard-denied at the approval gate. The deny error in tool stdout now points at three remediations: sign the tag, add a per-skill override, or pass --allow-unsigned to the chat command for development:

# Drop the signature gate for one chat invocation (development only)
harness chat "$SID" --allow-unsigned --message "test the local skill"

--allow-unsigned flips signature.require_signed to false in memory for that invocation only. Unsigned skills then fall through to the capability heuristics (network/filesystem/idempotency) and most will resolve as explicit — meaning the user gets prompted at the TTY before each call, instead of being silently denied.

For a permanent override on a specific trusted skill, use policy.skills.overrides[skill.id] = "regular" | "explicit" in your policy YAML.

Install globally

npm link                       # makes `harness` available on PATH
harness --help

Or run directly through tsx during development without building:

npx tsx src/cli.ts --help

Architecture in one diagram

┌──────────────────────────────────────────┐
│  cli (TTY)                               │  user-facing
├──────────────────────────────────────────┤
│  loop                                    │  turn protocol
├──────────────────────────────────────────┤
│  provider   approval   session   policy  │  cross-cutting
├──────────────────────────────────────────┤
│  toolbox  ←  FileBank + runQuery/runExec │  skill resolution + execution
└──────────────────────────────────────────┘
                 │
                 ▼
   agent-skills-cli (handles per-skill sandbox)
                 │
                 ▼
            just-bash + just-bash-data

There is no Sandbox layer of our own. runExec already builds a per-skill sandboxed just-bash instance with the skill's declared network / filesystem / required_env constraints from the spec. Re-implementing that here would diverge from the canonical enforcement.

See DESIGN.md for full layer contracts and DESIGN.md §4 for the turn protocol.

Providers

Two LLM providers ship today; the factory auto-detects from env:

| Provider | Default model | Required env | |---|---|---| | Anthropic Messages API | claude-opus-4-7 | ANTHROPIC_API_KEY | | Cloudflare Workers AI | @cf/google/gemma-4-26b-a4b-it | CF_ACCOUNT_ID, CF_API_TOKEN |

Auto-detect prefers Cloudflare when both sets of credentials are present. Override via HARNESS_PROVIDER=anthropic|cloudflare. Override the model via --model <id> flag, HARNESS_DEFAULT_MODEL (Anthropic), or CF_LLM_MODEL (Cloudflare).

See PROVIDERS.md for adding a new provider.

Approval categories — derived, not declared

The harness derives a category for every tool call from existing spec fields:

| Signal | Effect | |---|---| | provenance.signature_status !== "valid" while policy.signature.require_signed: true | → prohibited (hard deny) | | network[] non-empty | → escalate to explicit | | filesystem[] non-empty | → escalate to explicit | | idempotent: false | → escalate to explicit | | Override map matches by full id or shortId | → forced category (escape hatch) | | Otherwise | → regular |

Default policy matrix:

prohibited → deny       (hard, never asks)
explicit   → ask        (TTY prompt unless host injects custom gate)
regular    → allow      (auto-approved, audit-only)

This means no spec changes were needed to ship the harness — the security category is a function of fields the spec already defines (network, filesystem, idempotent, provenance).

Sessions

Each session lives under <sessionsRoot>/<sessionId>/ and is backed by a dedicated just-bash-data bank with three collections:

db sessions    — one document with policy snapshot + metadata
db turns       — append-only history; each Turn includes user message,
                 LLM output, tool calls, approvals
db approvals   — flat audit of every approval decision (allow/deny,
                 source: policy or user, derivation reasons)

harness resume <id> re-opens the dir; db turns find '{}' --sort ts:1 rehydrates history. db <coll> export produces JSON snapshots; db <coll> import round-trips them.

The skills FileBank and the session bank live on separate dirs. They never share state.

Testing

| Layer | Tests | Where | |---|---|---| | Unit | 100 in 6 suites | src/*.test.ts | | Integration (no LLM) | 4 (slice) + 5 (e2e scripted) | scratch/{slice,e2e}.ts | | Live LLM (Gemma) | 1 PASS | scratch/e2e-cf-driven.ts | | Live LLM (CF, real fetch) | listed, opt-in | scratch/e2e-cloudflare.ts |

npm run test               # all unit tests, compact reporter
npm run test:list          # all unit tests, spec reporter
npm run smoke:slice        # FileBank + runExec round-trip
npm run smoke:e2e          # full loop, 5 approval scenarios
npm run smoke:cf-driven    # full loop, replayed Gemma decisions

CI runs the typecheck + tests + build + the three credential-free smokes on every push to main and every PR. See .github/workflows/ci.yml and TESTING.md for what's covered and what's intentionally not unit-tested.

Lessons

LESSONS.md captures operational doctrines distilled from real bugs: each entry is anchored to the release where the bug surfaced (e.g. v0.2.3 chains-bypass-approval) and stated as a one-sentence rule that should fire in code review for related future features. Read at design time, not incident time.

Layout

src/
  index.ts                library barrel — programmatic API
  types.ts                shared interface contracts (DESIGN §3)
  toolbox.ts              FileBank + runQuery + runExec
  provider.ts             provider barrel + env factory
  provider-anthropic.ts   Anthropic Messages API adapter
  provider-cloudflare.ts  Cloudflare Workers AI (OpenAI-compat endpoint)
  approval.ts             gate + deriveCategory + TTY prompt
  session.ts              createBankBash-backed db wrappers
  policy.ts               YAML loader + DEFAULT_POLICY
  loop.ts                 turn orchestrator
  cli.ts                  entry point — built into bin/harness
  cli-args.ts             argv parser (extracted for testability)
  *.test.ts               70 unit tests (cli-args, approval, policy,
                          provider factory, cloudflare provider)
scratch/                  smoke/integration scripts
examples/                 example policy YAML
dist/                     build output (gitignored, npm-published)
  cli.js                  the harness binary (shebang preserved)
  index.js                programmatic library entry
  *.d.ts                  TypeScript declarations
tsup.config.ts            build config (ESM, node22 target)
DESIGN.md                 contract — read first
PROVIDERS.md              provider abstraction + how to add one
TESTING.md                test layout + coverage notes
COEVOLUTION.md            upstream changes plan (mostly cancelled — see file)
CHANGELOG.md              project journey, v0 + v0.1

Stack version pins

| Package | Version pinned to | Notes | |---|---|---| | just-bash | ^2.14.3 | beta but stable surface | | @rckflr/agent-skills-cli | local file:../agent-skills-cli | uses STABLE-tier exports + one INTERNAL (createBankBash) | | just-bash-data | local file:../just-bash-data | bash-first; db/vec subcommands | | @anthropic-ai/sdk | ^0.40.0 | for Anthropic provider | | yaml | ^2.5.0 | policy parsing | | Node | >=22 | required by agent-skills-cli + native fetch/ReadableStream |

License

Same as the surrounding ecosystem (MIT). Copy attribution from contributing repos when forking.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

just-bash-harness

Intended audience

What it is

What it is not

Quickstart

Working with unsigned skills (local development)

Install globally

Architecture in one diagram

Providers

Approval categories — derived, not declared

Sessions

Testing

Lessons

Layout

Stack version pins

License