@nimblehq/claude-sdk
v1.3.0
Published
Anthropic Claude Agent SDK automations (spec, tasks, implement, review, etc.)
Readme
Claude SDK automations
Anthropic-backed automations: spec, task breakdown, implement, create-pr, review-pr, test gap, changelog, release gate, and CLAUDE.md generation.
Consumer docs: GitHub Actions · npm / local
Stages
| Folder | Stage | Purpose |
|---|---|---|
| spec-generator/ | spec_generator | Request → cross-platform PRD |
| task-breakdown/ | task_breakdown | Spec → sprint-ready tasks |
| implement/ | implement | Branch, code, commit, push |
| create-pr/ | create_pr_draft + create_pr | Draft body, open PR, ensure labels, request CODEOWNERS reviewers |
| review-pr/ | review_pr | Inline review; structured verdict + deletions recommended; always COMMENT |
| test-gap-detector/ | test_gap_detector | Missing-test coverage notes on PR diffs |
| changelog-generator/ | changelog | Changelog from commits since last release tag |
| release-gate/ | release_gate | Pre-release safety check |
| claude-md-generator/ | claude_md_generator | Generate / refresh CLAUDE.md |
| pipeline/ | — | Matrix builder + summary for full-pipeline workflow |
Models per stage
| Stage | Model |
|---|---|
| triage, spec_generator, task_breakdown, implement, fix_pr_comments, claude_md_generator | claude-sonnet-4-6 |
| create_issue_draft, create_issue, create_pr_draft, create_pr, test_gap_detector, changelog | claude-haiku-4-5 |
| review_pr, release_gate | claude-opus-4-7 |
Defined in agent-utils.ts (MODEL_PER_STAGE). Override by passing model to a stage input.
Knowledge layer
Nimble conventions injected into every prompt by loadKnowledge(projectRoot). SDK defaults in knowledge/shared/; projects override per-file in <repo>/.claude/knowledge/. Hard cap: 32 KB total.
| File | Purpose |
|---|---|
| git-conventions.md | Branch / commit / PR title format |
| pr-checklist.md | Definition of done for drafter + reviewer |
| issue-title-conventions.md | Issue title format |
| coding-standards.md | Implementation conventions, tests, error handling |
| security-guidelines.md | Secrets, auth, deps, logging |
Skills
Loaded from skills/ via combineSkills() in config/skills.ts.
| Skill | Stage |
|---|---|
| code-review-and-quality | review_pr |
| security-and-hardening | review_pr, release_gate |
| thermo-nuclear-code-quality-review | review_pr |
| spec-driven-development | spec_generator |
| api-and-interface-design | spec_generator |
| planning-and-task-breakdown | task_breakdown |
| incremental-implementation | implement (feature, chore) |
| test-driven-development | implement (bug, feature) |
| debugging-and-error-recovery | implement (bug) |
| shipping-and-launch | release_gate |
| documentation-and-adrs | changelog |
Shared modules
| Module | Purpose |
|---|---|
| agent-utils.ts | MODEL_PER_STAGE, modelForStage(), withModelFallback(), stage timeouts, lifecycle logging |
| knowledge.ts | Two-layer loader (SDK shared + project overrides) |
| state.ts | .claude-pipeline-state.json load/save; includes traceId; loadState() validates all required fields and throws on a corrupt file |
| git-env.ts | resolveGithubRepo, resolveBaseBranch (auto-detect from git) |
| config/client.ts | Anthropic SDK client; re-exports MODEL_PER_STAGE |
| config/skills.ts | Loads skills from skills/; skillsForWorkType per-workType composition |
| config/stacks.ts | Stack hints for implement + review_pr |
| config/cost-tracker.ts | Per-run cost logging to .observability/costs.jsonl |
| config/batch.ts | runBatchMessage() — Batch API wrapper with polling, fallback, and env-var tuning |
Environment variables
| Variable | Required | Description |
|---|---|---|
| ANTHROPIC_API_KEY | yes | Anthropic API key |
| GH_TOKEN | recommended | PAT with repo + workflow scopes for git push and gh CLI |
| LINEAR_API_KEY | no | Routes spec publishing to Linear |
| CLAUDE_SDK_DEBUG | no | Set to 1 to dump raw Anthropic SDK events to stderr |
| CLAUDE_MAX_USD_PER_STAGE | no | Global USD cap per stage. Default: see Cost & timeout gates below |
| CLAUDE_MAX_USD_<STAGE> | no | Per-stage USD override (e.g. CLAUDE_MAX_USD_IMPLEMENT=15). Wins over the global cap |
| DISABLE_BATCH_API | no | Set to true to fall back to synchronous messages.create() for changelog and test_gap_detector |
| BATCH_POLL_INTERVAL_MS | no | Milliseconds between Batch API status checks (default 10000) |
| BATCH_MAX_ATTEMPTS | no | Max Batch API poll iterations before timing out (default 12, ~120 s) |
Cost & timeout gates
Two hard gates keep a runaway stage from burning unbounded API spend:
- Wall-clock timeout per stage —
implement30 min,review_pr/release_gate15 min, drafts 10 min. Passed torunWithTimeout()at each call site. - USD budget — every
resultmessage from the agent stream is checked againstmaxBudgetForStage(stage); the stage throws as soon astotal_cost_usdexceeds the cap.
| Stage | Default cap |
|---|---|
| implement | $10 |
| review_pr, release_gate | $5 |
| all other stages | $2 |
Override with CLAUDE_MAX_USD_<STAGE> (per stage) or CLAUDE_MAX_USD_PER_STAGE (global). The per-stage value wins.
When $GITHUB_STEP_SUMMARY is set, each stage appends a Stage | Duration | Cost | Runs | Status row so the spend is visible on the Actions run page.
Safety denylist (canUseTool)
Every query() call site passes a canUseTool hook from hooks/safety.ts that denies destructive Bash invocations before they reach the runner. Patterns are sourced from automations/shared/safety-denylist.ts — the same list Cursor SDK's block-destructive.js consumes. Blocks include rm -rf ~/.ssh, cat .env, curl … | sh, gh secret set, git filter-branch, --no-verify, force-push, and fork bombs.
The local copy at safety-denylist.ts is regenerated from the canonical file via npm run prebuild. Do not edit the local copy.
Input sanitization
sanitize.ts (autogenerated from automations/shared/sanitize.ts via npm run prebuild) guards user-controlled text before it reaches any LLM prompt.
sanitizeUntrustedInput({ text, source }) — for free-form user content:
- Strips
<tool>,<function_calls>,<invoke>,<result>,<param>XML markup (tag content preserved). - Checks 16 denylist patterns — instruction overrides, role-change attacks, jailbreak keywords, system-tag injection. Returns
[TRUNCATED: <source>]and logs[SECURITY]on match. - Truncates at 4 000 chars and wraps clean text in
<UNTRUSTED_USER_INPUT source="…">boundary tags.
sanitizePipelineHint(value, source) — for short workflow metadata:
- Strips newlines, caps at 200 chars, applies the same denylist. Returns empty string on a match.
Currently wired into: implement (WORK_ITEM_TITLE, each ACCEPTANCE_CRITERIA), review_pr (PR_TITLE, PR_DESCRIPTION), spec_generator (SPEC_REQUEST, SPEC_CONTEXT).
Scripts
| Script | Purpose |
|---|---|
| scripts/pipeline.ts | Full pipeline CLI: spec → breakdown → implement → create-pr |
| scripts/test-pipeline.ts | Dry run (stages 1–2 only) |
| scripts/smoke-test.ts | Verifies ANTHROPIC_API_KEY and SDK wiring |
| cost-report.ts | Summary from .observability/costs.jsonl (bin: claude-sdk-cost-report) |
| scripts/test-implement.ts | Local implement test against TEST_REPO_PATH |
| scripts/test-review-pr.ts | Local review_pr smoke test |
| scripts/test-changelog-generator.ts | Local changelog test |
Observability
Cost log: .observability/costs.jsonl (gitignored). Every stage run appends one JSONL record via config/cost-tracker.ts (withCostTracking()). Fields: automation, runId, traceId, tokens, cost estimate, cache write/read tokens, duration, success/failure.
traceId is a UUID generated by pipeline/build-implementation-matrix.ts and emitted as trace_id to GITHUB_OUTPUT. The workflow passes it to each stage job as TRACE_ID; action files read it and thread it to withCostTracking() so all costs for one pipeline invocation share a common ID.
Summary: npx claude-sdk-cost-report
The report includes per-automation cache hit rates, a write-to-read ROI line, and a break-even annotation. If cacheCreationTokens is 0 across all runs, the report prints a setup reminder — prompt caching is not active.
Every agent-running CI job uploads .observability/costs.jsonl as an observability-<runId>-<stage> artifact (30-day retention, if: always()). Download from the Actions run page to debug cost spikes or timeouts post-mortem.
Prompt caching
System prompts in spec_generator, task_breakdown, changelog, release_gate, and test_gap_detector are marked with cache_control: { type: 'ephemeral' }. The stable skill content (2–8 KB) is written to cache on first call and read at ~0.10× base input cost on subsequent calls within the 5-minute TTL window.
Break-even at Sonnet 4.6 pricing ($3.00/MTok input, $3.75/MTok write, $0.30/MTok read): >12.5 reads per write. Stages triggered multiple times per day will exceed this threshold within hours of a cache cold start.
Publishing (maintainers)
Driven by sdk-publish.yml, triggered on a published GitHub Release with a claude-sdk/vX.Y.Z tag.
One-time setup
secrets.NPM_TOKEN— npm automation token with publish access to the@nimblehqscope.- The
@nimblehqscope must exist on npm and the publishing user must havedeveloper/owneraccess.
Procedure
- Bump
versioninpackage.jsonand add a matching entry toCHANGELOG.md. Merge tomain. - Run Create GitHub Release → select
claude-sdk. Version is read frompackage.json— no input needed.
The workflow reads the version from automations/claude-sdk/package.json, creates the claude-sdk/vX.Y.Z tag, publishes the release, and triggers sdk-publish.yml to build and push to npm and GitHub Packages. sdk-publish.yml re-checks that the tag matches package.json before publishing.
Diagnostics
| File | Contents |
|---|---|
| .claude-pipeline-state.json | Stage handoff state |
| .claude-pipeline-runs.jsonl | Stage and run lifecycle events |
| .observability/costs.jsonl | Per-run cost log (also uploaded as observability artifact) |
| .claude-pipeline-review-raw-<ts>.txt | Raw review_pr output when JSON parsing fails |
