@hummer98/ctxd-claude-plugin
v0.2.0
Published
Declarative companion to shell state mutations (cd, export, git checkout/switch). Emits structured JSON describing cwd, env, and git state.
Maintainers
Readme
ctxd
English | 日本語
Declarative CLI commands that pass structured context to AI agents.
ctxd wraps the shell operations that AI agents get lost in — cd, export, git checkout — and returns structured JSON so the agent knows exactly what changed.
# Before: silent, agent has to guess
cd /foo && git checkout main
# After: agent sees the state
ctxd chdir /foo
# {"ok":true,"cmd":"chdir","result":{"cwd":"/foo","git_branch":"main","listing":["src","docs","go.mod"]}}
ctxd git-switch main
# {"ok":true,"cmd":"git-switch","result":{"branch":"main","dirty":false,"ahead":0,"behind":0}}Why
The silent CLI problem
Unix's "rule of silence" — successful commands produce no output — was designed for human operators who can perceive context implicitly. For AI agents, it is a systematic blind spot.
| Command | What it does silently | What the agent loses |
|---|---|---|
| cd /foo | Changes working directory | New cwd, git branch, file listing |
| export FOO=bar | Sets environment variable | Which variables changed, what their values are |
| git checkout main | Switches branch | Branch, dirty state, divergence from remote |
| kill -STOP <pid> | Pauses a process | Process state |
| umask 022 | Changes file creation mask | Effective permissions for new files |
The agent's only recovery is to emit a follow-up command (pwd, env, git status) — burning extra tokens and an extra round-trip — or to reason from context, which drifts under long conversations.
Transformers can't track sequential state
Mozer et al. "The Topological Trouble With Transformers" (2026) formalizes the issue: feedforward architectures cannot maintain evolving state across depth. The longer an agent session runs, the more it relies on external signals to reconstruct where it is.
This isn't a prompt-engineering problem. It's structural. The fix is structural too: make the state external and machine-readable at the point of mutation.
Infrastructure did this already
Server orchestration faced the same problem in the 2010s — imperative shell scripts drifted, state was implicit, failures were opaque. The industry converged on Terraform, Kubernetes, and GitOps: declare intent, verify postconditions, report structured diffs.
ctxd applies that same pattern to the local shell, scoped to the operations AI agents use most.
How it works
Every ctxd command:
- Executes the underlying operation
- Observes the resulting state
- Returns a structured JSON report
{
"ok": true,
"cmd": "chdir",
"args": ["/foo"],
"result": {
"cwd": "/foo",
"git_branch": "main",
"listing": ["src", "docs", "go.mod", "README.md"]
},
"postcondition": { "passed": true, "checks": [] },
"elapsed_ms": 4
}On failure:
{
"ok": false,
"cmd": "chdir",
"args": ["/nonexistent"],
"error": {
"code": "path_not_found",
"message": "no such file or directory: /nonexistent",
"retryable": false
}
}Postconditions
Declare what you expect the state to be after the command. ctxd verifies it and reports a clear pass/fail:
ctxd git-switch main --expect branch=main --expect dirty=false{
"ok": true,
"postcondition": {
"passed": true,
"checks": [
{"key": "branch", "expected": "main", "actual": "main", "passed": true},
{"key": "dirty", "expected": "false","actual": "false","passed": true}
]
}
}Commands (MVP)
| Command | Replaces | Key output fields |
|---|---|---|
| ctxd chdir <path> | cd | cwd, git_branch, listing |
| ctxd git-switch <branch> | git checkout / git switch | branch, dirty, ahead, behind |
| ctxd env-set <KEY=val>… | export | set, diff.added, diff.changed |
--human flag switches to human-readable output for debugging.
ctxd chdir
Resolve a path, list its contents, and report the git branch (if any):
ctxd chdir /path/to/repo{
"ok": true,
"cmd": "chdir",
"args": ["/path/to/repo"],
"result": {
"cwd": "/path/to/repo",
"git_branch": "main",
"listing": ["docs", "go.mod", "src"]
},
"elapsed_ms": 3
}git_branch is null when the path is outside a git working tree or HEAD is detached.
Errors return ok: false with error.code of not_found (path missing) or not_a_directory (path is a file).
The parent shell's cwd is not modified — pass the resolved cwd to the next command instead.
ctxd git-switch
Switch a git branch and report the resulting working tree state:
ctxd git-switch main{
"ok": true,
"cmd": "git-switch",
"args": ["main"],
"result": {
"branch": "main",
"dirty": false,
"ahead": 0,
"behind": 0
},
"elapsed_ms": 32
}branch is null when HEAD is detached. ahead / behind are 0 when no upstream is configured.
On failure, error.code is one of not_a_git_repo, branch_not_found, dirty_tree, or git_not_found.
The parent shell's HEAD is updated (the switch is real) but cwd is not changed.
ctxd env-set
Set one or more environment variables in this child process and report the resulting set map and diff (added / changed):
ctxd env-set FOO=bar BAZ=qux{
"ok": true,
"cmd": "env-set",
"args": ["FOO=bar", "BAZ=qux"],
"result": {
"set": {"FOO": "bar", "BAZ": "qux"},
"diff": {
"added": ["BAZ"],
"changed": ["FOO"]
}
},
"elapsed_ms": 1
}set is the final KEY → value map of this invocation (last-write-wins
when the same KEY is repeated). diff.added lists keys that were not
previously in the environment; diff.changed lists keys whose value
differed from the previous value. Keys whose value is unchanged appear
in neither list.
The argument format is KEY=VAL. The first = is the separator, so
values may contain = (e.g. URL=http://x?a=b). An empty value
(KEY=) is valid. On failure, error.code is invalid_args (missing
=, empty KEY, or zero arguments) or exec_failed (the underlying
os.Setenv call failed).
The parent shell's environment is not modified — pass the resolved
set to the next command, or read the JSON to know which variables
the child process saw.
Installation
Work in progress. Binary releases coming soon.
Claude Code Plugin (recommended for Claude Code users):
claude plugins:install @hummer98/ctxd-claude-pluginInstalls both the ctxd binary and the Skill that teaches Claude to use it.
Standalone CLI:
# npm install -g @hummer98/ctxd # (coming soon — Go binary npm wrapper not yet published)
brew install hummer98/tap/ctxdSkill bundle
ctxd ships with a SKILL.md compliant with the Anthropic Agent Skills specification.
The skill does not enforce usage. It nudges: when the agent reaches for cd, export, or git checkout, the skill surfaces the ctxd equivalent and explains what context would be gained. Adoption is the agent's choice.
Compatible with Claude Code, OpenCode, Codex, Cursor, Gemini CLI, and any host that supports Agent Skills.
Design principles
- Don't shadow existing commands — new command names only, no aliases over
cd - JSON by default, human optional —
--humanfor readable output - Postconditions are opt-in — useful when you want them, invisible when you don't
- Narrow and deep — top 20–30 commands done well, not full POSIX coverage
- Pluggable — users can add their own declarative wrappers
Development
Prerequisites
- Go 1.26 or later
Build
go build -o ctxd ./cmd/ctxdRun
./ctxd --version
./ctxd --helpTest
go test ./...Architecture decisions
See docs/adr/ for design decisions (CLI framework selection etc.).
Eval harness
evals/ contains a SKILL adherence harness that drives a real claude process inside an isolated cmux workspace and measures whether the agent reaches for ctxd chdir / ctxd git-switch / ctxd env-set when the SKILL says it should.
bash evals/run.sh
# or override the per-scenario trial count
EVAL_N=1 bash evals/run.sh
# or pin a different model (default: claude-opus-4-7)
EVAL_MODEL=claude-sonnet-4-5 bash evals/run.shEach trial spins up claude --settings <per-trial>.json so a Stop hook
touches a sentinel file when the session ends, and a PostToolUse hook
appends each tool_use to session-<id>-<trial>.tools.jsonl. The runner
waits on the sentinel instead of scraping the screen, and summarize.py
reads tool_use from the hook JSONL first (falling back to the raw Claude
session JSONL if the hook output is empty).
Outputs land in evals/results/<UTC-timestamp>/:
session-<id>-<trial>.jsonl— raw Claude Code session JSONL (one per trial, git-ignored)session-<id>-<trial>.meta.json—exit_status, wall time, session id (git-ignored)session-<id>-<trial>.tools.jsonl— PostToolUse hook output, one tool_use per line (git-ignored)session-<id>-<trial>.done— Stop hook sentinel marking session completion (git-ignored)session-<id>-<trial>.settings.json— per-trialclaude --settingspayload wiring the hooks (git-ignored)summary.md— overall and per-scenario success rate, plus the first failing tool_use quoted for context. Header recordsplugin version,git SHA,git branch,claude version, andmodelso each run is uniquely traceable.
Cross-run trend lives in evals/results/index.md and evals/results/index.csv (one row per run). Both are committed; the heavy JSONL / meta files are not — re-running the harness regenerates them.
The plugin version comes from .claude-plugin/plugin.json and acts as the canonical unit for comparing measurements. See CLAUDE.md for the bump policy when SKILL.md changes.
Cost / time budget: each trial spends a few model cents. Default EVAL_N=3 × 5 scenarios ≈ a handful of dimes to ~$1 and 5–10 minutes wall-clock for EVAL_MODEL=claude-opus-4-7 (default). Switching to a faster / cheaper model via EVAL_MODEL shifts both axes. See evals/scenarios.jsonl for the prompts and expected patterns.
evals/.eval-plugin/ (plugin shim that wires skills/ctxd into the Skills loader) is git-ignored — the harness regenerates the shim on every run, dynamically writing the version from .claude-plugin/plugin.json.
Adherence over plugin versions
How often the agent reaches for ctxd when the SKILL says it should, across plugin versions.
The figures below come from the evals/run.sh harness — per-scenario breakdown below.
| model | plugin version | N | trials | overall | chdir | git-switch | env-set | notes |
|---|---|---:|---:|---:|---:|---:|---:|---|
| claude-opus-4-7 | 0.1.0 | 3 | 15 | 0.0% | 0/6 | 0/6 | 0/3 | Initial baseline; hook-based harness landed (T013–T015) |
| claude-opus-4-7 | 0.1.1 | 3 | 15 | 6.7% | 0/6 | 0/6 | 1/3 | SKILL.md trigger reinforced (description, ❌→✅ examples) (T016) |
| claude-opus-4-7 | 0.1.2 | 3 | 15 | 53.3% | 5/6 | 1/6 | 2/3 | disambiguation + NEVER phrasing + Precondition section (T017) |
| claude-opus-4-7 | 0.1.3 | 3 | 15 | 100.0% | 6/6 | 6/6 | 3/3 | pattern matcher tightened + scenario setup hooks + plugin author (T018) |
| claude-opus-4-7 | 0.1.3 | 10 | 50 | 98.0% | 20/20 | 19/20 | 10/10 | Variance check at N=10 (T019) |
| claude-opus-4-7 | 0.2.0 | 3 | 15 | 100.0% | 6/6 | 6/6 | 3/3 | SKILL.md Postcondition section rewritten to match T010 final implementation (--expect DSL + Result.Data preserved on failure) (T023) |
N is trials per scenario; trials is N × 5 scenarios. Each cell shows passes / trials for that command family.
Model is part of the run identity — switching to a different EVAL_MODEL (e.g. claude-sonnet-4-5) requires re-running the baseline; rows are not directly comparable across models.
The latest baseline lives in evals/results/index.md — the table here is updated by hand, see CLAUDE.md.
Efficiency over plugin versions
What did adopting the SKILL cost in tokens, tool-uses, and wall time? Adherence
(above) measures whether the agent reaches for ctxd; efficiency measures what
it took to get there. Numbers are per-trial averages collected by the same
evals/run.sh harness, with EVAL_N=3 × 5 scenarios.
| model | plugin version | avg_tool_uses | avg_input_tokens | avg_output_tokens | avg_wall_ms | notes | |---|---|---:|---:|---:|---:|---| | claude-opus-4-7 | ≤ 0.2.0 | – | – | – | – | Pre-T024 runs; raw session JSONL was git-ignored and pruned, so retroactive efficiency numbers are unavailable. |
avg_input_tokens is the sum of fresh input_tokens across all assistant
messages in a trial (cache writes and reads are tracked separately in each
run's summary.md). avg_wall_ms is integer-second precision (the raw
wall_seconds from meta.json × 1000). The Stage 1 baseline gets seeded the
next time evals/run.sh runs after this commit; until then, the row above
documents that nothing was recoverable from past runs.
Status
Early development. The design is settled; the implementation is not.
Contributions, feedback, and use-case reports welcome via issues.
License
MIT
