@nimblehq/claude-sdk

v1.4.0

Published

2 months ago

Anthropic Claude Agent SDK automations (spec, tasks, implement, review, etc.)

0High
0Medium
0Low

dev-nimbl3

olivierobert

dev-nimblehq

Claude SDK automations

Anthropic-backed automations: spec, task breakdown, implement, create-pr, review-pr, test gap, changelog, release gate, and CLAUDE.md generation.

Consumer docs: GitHub Actions · npm / local

Stages

| Folder | Stage | Purpose | |---|---|---| | spec-generator/ | spec_generator | Request → cross-platform PRD | | task-breakdown/ | task_breakdown | Spec → sprint-ready tasks | | implement/ | implement | Branch, code, commit, push | | create-pr/ | create_pr_draft + create_pr | Draft body, open PR, ensure labels, request CODEOWNERS reviewers | | review-pr/ | review_pr | Inline review; structured verdict + deletions recommended; always COMMENT | | test-gap-detector/ | test_gap_detector | Missing-test coverage notes on PR diffs | | changelog-generator/ | changelog | Changelog from commits since last release tag | | release-gate/ | release_gate | Pre-release safety check | | claude-md-generator/ | claude_md_generator | Generate / refresh CLAUDE.md | | pipeline/ | — | Matrix builder + summary for full-pipeline workflow |

Models per stage

| Stage | Model | |---|---| | triage, spec_generator, task_breakdown, implement, fix_pr_comments, claude_md_generator | claude-sonnet-4-6 | | create_issue_draft, create_issue, create_pr_draft, create_pr, test_gap_detector, changelog | claude-haiku-4-5 | | review_pr, release_gate | claude-opus-4-8 |

Defined in agent-utils.ts (MODEL_PER_STAGE). Override by passing model to a stage input.

Knowledge layer

Nimble conventions injected into every prompt by loadKnowledge(projectRoot). SDK defaults in knowledge/shared/; projects override per-file in <repo>/.claude/knowledge/. Hard cap: 32 KB total.

| File | Purpose | |---|---| | git-conventions.md | Branch / commit / PR title format | | pr-checklist.md | Definition of done for drafter + reviewer | | issue-title-conventions.md | Issue title format | | coding-standards.md | Implementation conventions, tests, error handling | | security-guidelines.md | Secrets, auth, deps, logging |

Skills

Loaded from skills/ via combineSkills() in config/skills.ts.

| Skill | Stage | |---|---| | code-review-and-quality | review_pr | | security-and-hardening | review_pr, release_gate | | thermo-nuclear-code-quality-review | review_pr | | spec-driven-development | spec_generator | | api-and-interface-design | spec_generator | | planning-and-task-breakdown | task_breakdown | | incremental-implementation | implement (feature, chore) | | test-driven-development | implement (bug, feature) | | debugging-and-error-recovery | implement (bug) | | shipping-and-launch | release_gate | | documentation-and-adrs | changelog |

Shared modules

| Module | Purpose | |---|---| | agent-utils.ts | MODEL_PER_STAGE, modelForStage(), withModelFallback(), stage timeouts, lifecycle logging | | knowledge.ts | Two-layer loader (SDK shared + project overrides) | | state.ts | .claude-pipeline-state.json load/save; includes traceId; loadState() validates all required fields and throws on a corrupt file | | git-env.ts | resolveGithubRepo, resolveBaseBranch (auto-detect from git) | | config/client.ts | Anthropic SDK client; re-exports MODEL_PER_STAGE | | config/skills.ts | Loads skills from skills/; skillsForWorkType per-workType composition | | config/stacks.ts | Stack hints for implement + review_pr | | config/cost-tracker.ts | Per-run cost logging to .observability/costs.jsonl | | config/batch.ts | runBatchMessage() — Batch API wrapper with polling, fallback, and env-var tuning |

Environment variables

| Variable | Required | Description | |---|---|---| | ANTHROPIC_API_KEY | yes | Anthropic API key | | GH_TOKEN | recommended | PAT with repo + workflow scopes for git push and gh CLI | | LINEAR_API_KEY | no | Routes spec publishing to Linear | | CLAUDE_SDK_DEBUG | no | Set to 1 to dump raw Anthropic SDK events to stderr | | CLAUDE_MAX_USD_PER_STAGE | no | Global USD cap per stage. Default: see Cost & timeout gates below | | CLAUDE_MAX_USD_<STAGE> | no | Per-stage USD override (e.g. CLAUDE_MAX_USD_IMPLEMENT=15). Wins over the global cap | | DISABLE_BATCH_API | no | Set to true to fall back to synchronous messages.create() for changelog and test_gap_detector | | BATCH_POLL_INTERVAL_MS | no | Milliseconds between Batch API status checks (default 10000) | | BATCH_MAX_ATTEMPTS | no | Max Batch API poll iterations before timing out (default 12, ~120 s) |

Cost & timeout gates

Two hard gates keep a runaway stage from burning unbounded API spend:

Wall-clock timeout per stage — implement 30 min, review_pr / release_gate 15 min, drafts 10 min. Passed to runWithTimeout() at each call site.
USD budget — every result message from the agent stream is checked against maxBudgetForStage(stage); the stage throws as soon as total_cost_usd exceeds the cap.

| Stage | Default cap | |---|---| | implement | $10 | | review_pr, release_gate | $5 | | all other stages | $2 |

Override with CLAUDE_MAX_USD_<STAGE> (per stage) or CLAUDE_MAX_USD_PER_STAGE (global). The per-stage value wins.

When $GITHUB_STEP_SUMMARY is set, each stage appends a Stage | Duration | Cost | Runs | Status row so the spend is visible on the Actions run page.

Tool admission control (`makeToolPolicy`)

Every query() call site passes a stage-bound hook built by makeToolPolicy({ stage, traceId }) from hooks/safety.ts. Two layers run on each tool call:

| Layer | Rule | |---|---| | Per-stage allowlist | A stage may only call the tools in TOOL_ALLOWLIST[stage], enforced even if allowedTools is widened by mistake | | Bash denylist | Destructive shell commands are blocked before they reach the runner |

TOOL_ALLOWLIST: implement → Read, Write, Edit, Bash, Glob, Grep; review_pr → Read, Glob, Grep; create_pr_draft / create_pr → none (text only). Denylist patterns are sourced from automations/shared/safety-denylist.ts — the same list Cursor SDK's hooks consume. Blocks include rm -rf ~/.ssh, cat .env, curl … | sh, gh secret set, git filter-branch, --no-verify, force-push, and fork bombs.

Every decision (allow and deny) is appended to .observability/tool-calls.jsonl with the stage, traceId, tool, and reason — a forensic record of what each agent asked to do. Override the path with TOOL_AUDIT_LOG_PATH.

The local copy at safety-denylist.ts is regenerated from the canonical file via npm run prebuild. Do not edit the local copy.

Input sanitization

sanitize.ts (autogenerated from automations/shared/sanitize.ts via npm run prebuild) guards user-controlled text before it reaches any LLM prompt.

sanitizeUntrustedInput({ text, source }) — for free-form user content:

Strips <tool>, <function_calls>, <invoke>, <result>, <param> XML markup (tag content preserved).
Checks 16 denylist patterns — instruction overrides, role-change attacks, jailbreak keywords, system-tag injection. Returns [TRUNCATED: <source>] and logs [SECURITY] on match.
Truncates at 4 000 chars and wraps clean text in <UNTRUSTED_USER_INPUT source="…"> boundary tags.

sanitizePipelineHint(value, source) — for short workflow metadata:

Strips newlines, caps at 200 chars, applies the same denylist. Returns empty string on a match.

Currently wired into: implement (WORK_ITEM_TITLE, each ACCEPTANCE_CRITERIA), review_pr (PR_TITLE, PR_DESCRIPTION), spec_generator (SPEC_REQUEST, SPEC_CONTEXT).

Scripts

| Script | Purpose | |---|---| | scripts/pipeline.ts | Full pipeline CLI: spec → breakdown → implement → create-pr | | scripts/test-pipeline.ts | Dry run (stages 1–2 only) | | scripts/smoke-test.ts | Verifies ANTHROPIC_API_KEY and SDK wiring | | cost-report.ts | Summary from .observability/costs.jsonl (bin: claude-sdk-cost-report) | | scripts/test-implement.ts | Local implement test against TEST_REPO_PATH | | scripts/test-review-pr.ts | Local review_pr smoke test | | scripts/test-changelog-generator.ts | Local changelog test |

Observability

Cost log: .observability/costs.jsonl (gitignored). Every stage run appends one JSONL record via config/cost-tracker.ts (withCostTracking()). Fields: automation, runId, traceId, tokens, cost estimate, cache write/read tokens, duration, success/failure.

traceId is a UUID generated by pipeline/build-implementation-matrix.ts and emitted as trace_id to GITHUB_OUTPUT. The workflow passes it to each stage job as TRACE_ID; action files read it and thread it to withCostTracking() so all costs for one pipeline invocation share a common ID.

Summary: npx claude-sdk-cost-report

The report includes per-automation cache hit rates, a write-to-read ROI line, and a break-even annotation. If cacheCreationTokens is 0 across all runs, the report prints a setup reminder — prompt caching is not active.

Every agent-running CI job uploads .observability/costs.jsonl as an observability-<runId>-<stage> artifact (30-day retention, if: always()). Download from the Actions run page to debug cost spikes or timeouts post-mortem.

Prompt caching

System prompts in spec_generator, task_breakdown, changelog, release_gate, and test_gap_detector are marked with cache_control: { type: 'ephemeral' }. The stable skill content (2–8 KB) is written to cache on first call and read at ~0.10× base input cost on subsequent calls within the 5-minute TTL window.

Break-even at Sonnet 4.6 pricing ($3.00/MTok input, $3.75/MTok write, $0.30/MTok read): >12.5 reads per write. Stages triggered multiple times per day will exceed this threshold within hours of a cache cold start.

Publishing (maintainers)

Driven by sdk-publish.yml, triggered on a published GitHub Release with a claude-sdk/vX.Y.Z tag.

One-time setup

secrets.NPM_TOKEN — npm automation token with publish access to the @nimblehq scope.
The @nimblehq scope must exist on npm and the publishing user must have developer/owner access.

Procedure

Bump version in package.json and add a matching entry to CHANGELOG.md. Merge to main.
Run Create GitHub Release → select claude-sdk. Version is read from package.json — no input needed.

The workflow reads the version from automations/claude-sdk/package.json, creates the claude-sdk/vX.Y.Z tag, publishes the release, and triggers sdk-publish.yml to build and push to npm and GitHub Packages. sdk-publish.yml re-checks that the tag matches package.json before publishing.

Diagnostics

| File | Contents | |---|---| | .claude-pipeline-state.json | Stage handoff state | | .claude-pipeline-runs.jsonl | Stage and run lifecycle events | | .observability/costs.jsonl | Per-run cost log (also uploaded as observability artifact) | | .observability/tool-calls.jsonl | Per-tool-call audit log: stage, traceId, tool, allow/deny decision (also uploaded) | | .claude-pipeline-review-raw-<ts>.txt | Raw review_pr output when JSON parsing fails |