autoctx
v0.5.1
Published
autocontext — always-on agent evaluation harness
Maintainers
Readme
autoctx — autocontext TypeScript Package
autoctx is the Node/TypeScript package for autocontext. It provides the operator-facing CLI, simulation, investigation, analysis, mission, and trace surfaces for Node environments:
The intended use is to point the harness at a real task, simulation, investigation, or mission, let it produce a rich execution history, and then use the returned traces, reports, datasets, packages, and artifacts to improve or operationalize that workflow.
Need the canonical product/runtime vocabulary first? Start with docs/concept-model.md.
- Scenario execution: run generation loops with tournament scoring and Elo progression
- Simulation surface: plain-language simulations with sweeps, replay, compare, and export
- Investigation surface: evidence-driven diagnosis with hypotheses and confidence scoring
- Analysis surface: interpret and compare runs, simulations, investigations, and missions
- Mission surface: adaptive execution, mission artifacts, and verifier-driven control plane
- Knowledge system: versioned playbooks, score trajectories, session reports, dead-end tracking
- Interactive server: HTTP API, WebSocket control plane, bundled Ink TUI
- MCP control plane: 40+ tools covering scenarios, runs, knowledge, evaluation, feedback, solve, sandbox, and export
- Provider routing: Anthropic, OpenAI-compatible, Gemini, Mistral, Groq, OpenRouter, Azure OpenAI, Ollama, vLLM, Hermes, Pi, Pi-RPC, deterministic
- Evaluation: one-shot judging, multi-round improvement loops, REPL-loop sessions
- Package management: strategy package export/import, training data export
- Training hook surface: dataset validation and executor-backed
trainentry point - Runtime workspace/session primitives: workspace-scoped filesystem/shell contracts, scoped command/tool grants with redacted lifecycle events, runtime session event logs, child-task lineage helpers, a runtime-session facade, and AgentRuntime session recording
- Experimental agent handler surface at
autoctx/agent-runtime: discover and invoke.autoctx/agents/*.tshandlers backed by runtime sessions - Production-traces emit SDK at
autoctx/production-traces— customer-facing emit APIs mirroring the Python SDK (A2-II-a)
Runtime command grants are host-created capability handles, not prompt text.
Trusted env values are injected only into the grant handler or local child
process. Local grant wrappers do not inherit process.env; callers must opt in
with an explicit inheritEnv allowlist. Runtime-session logs record
start/end/error events with command name, args summary, exit code, and
redaction metadata; they redact values from the exact env supplied to the
grant. Prompt-scoped grants last only for that runtime call. Child tasks receive
grants only when the caller passes grants to runChildTask() or when an
already-granted workspace contains grants whose policy allows child-task
inheritance.
TypeScript callers can also adapt remote streamable-HTTP MCP servers into
scoped runtime tool grants with connectMcpRuntimeTools(). The trusted host
code owns the MCP URL and headers, discovered tool names are normalized with
collision suffixes, input schemas are preserved, and tool results are converted
to model-safe text while keeping structured content available to callers:
import { createInMemoryWorkspaceEnv } from "autoctx";
import { connectMcpRuntimeTools } from "autoctx/runtimes/mcp";
const mcpTools = await connectMcpRuntimeTools({
url: "https://mcp.example.com/rpc",
headers: { Authorization: `Bearer ${process.env.MCP_TOKEN ?? ""}` },
namePrefix: "docs",
});
const workspace = await createInMemoryWorkspaceEnv({ cwd: "/repo" }).scope({
tools: mcpTools.tools,
});
const result = await workspace.tools?.[0]?.execute?.({ q: "runtime sessions" });
await mcpTools.close();The package also includes an experimental programmable-agent authoring surface
at autoctx/agent-runtime. It discovers handlers from .autoctx/agents only
and avoids colliding with .autoctx/skills, scenario directories, or hosted
deployment concerns. The invoker supplies payload, env, workspace, and an
AgentRuntime; env values are available to trusted handler code but are not
automatically inserted into prompts. The package uses its bundled tsx loader
for .ts, .tsx, and .mts agent files on Node 18+.
// .autoctx/agents/support.ts
import type { AutoctxAgentContext } from "autoctx/agent-runtime";
type Payload = { threadId?: string; message: string };
export const triggers = { webhook: true };
export default async function ({ id, init, payload }: AutoctxAgentContext<Payload>) {
const runtime = await init();
const session = await runtime.session(payload.threadId ?? id ?? "default");
return session.prompt(payload.message, { role: "support-triager" });
}import { discoverAutoctxAgents, invokeAutoctxAgent, loadAutoctxAgent } from "autoctx/agent-runtime";
import { createInMemoryWorkspaceEnv } from "autoctx";
const [entry] = await discoverAutoctxAgents({ cwd: process.cwd() });
const agent = await loadAutoctxAgent(entry);
const result = await invokeAutoctxAgent(agent, {
payload: { threadId: "ticket-123", message: "Please triage this ticket." },
env: { SUPPORT_TOKEN: process.env.SUPPORT_TOKEN },
runtime: myAgentRuntime,
workspace: createInMemoryWorkspaceEnv({ cwd: "/repo" }),
});For local iteration, the npm CLI can invoke the same handlers by name or expose
a tiny dev server. Env file loading is explicit: pass --env FILE; values
already set in the shell win over values in that file.
autoctx agent run support \
--id ticket-123 \
--payload '{"threadId":"ticket-123","message":"Please triage this ticket."}' \
--env .env.local \
--json
autoctx agent dev --port 3583 --env .env.local
curl http://127.0.0.1:3583/manifest
curl -X POST http://127.0.0.1:3583/agents/support/invoke \
-H 'content-type: application/json' \
-d '{"id":"ticket-123","payload":{"message":"Please triage this ticket."}}'The TypeScript package includes mirrored deterministic semantic prompt
compaction for long-lived playbooks, trajectories, and session reports.
Standalone npm runs compact prompt context before the coarse budget fallback,
then record Pi-shaped entries via the ArtifactStore ledger contract:
appendCompactionEntries(), readCompactionEntries(), and
latestCompactionEntryId() persist/read runs/<run_id>/compactions.jsonl plus
the compactions.latest sidecar for cheap latest-entry lookups.
The TypeScript runtime also mirrors the Python extension hook bus for
standalone npm runs. Set AUTOCONTEXT_EXTENSIONS to a comma-separated list of
JavaScript/ESM modules or module:callable targets, and set
AUTOCONTEXT_EXTENSION_FAIL_FAST=true when hook errors should stop the run.
Extensions receive ordered Pi-shaped events for run lifecycle, context
assembly, semantic compaction, provider calls, judge calls, and artifact
writes:
export function register(api) {
api.on("context", (event) => ({
roles: {
...event.payload.roles,
competitor: `${event.payload.roles.competitor}\nPrefer concise, testable strategies.`,
},
}));
}Install
npm install autoctxThe current npm release line is [email protected].
Important: use autoctx, not autocontext.
autocontext on npm is a different package and not this project.
From source:
cd ts
npm install
npm run buildEmit SDK: autoctx/production-traces
Customer applications can emit production traces directly from their
TypeScript code using the autoctx/production-traces subpath. This is the
TS mirror of the Python autocontext.production_traces emit module;
customers using both languages get one mental model, enforced at the byte
level by cross-runtime property tests.
import {
buildTrace,
writeJsonl,
TraceBatch,
hashUserId,
loadInstallSalt,
} from "autoctx/production-traces";
// 1) Hash personally-identifying identifiers with the install salt.
const salt = (await loadInstallSalt(process.cwd())) ?? "";
const userIdHash = hashUserId(session.user.id, salt);
// 2) Build and validate a ProductionTrace. Throws ValidationError on
// invalid input with per-field detail.
const trace = buildTrace({
provider: "openai",
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt, timestamp: new Date().toISOString() }],
timing: {
startedAt: "2026-04-17T12:00:00.000Z",
endedAt: "2026-04-17T12:00:01.250Z",
latencyMs: 1250,
},
usage: { tokensIn: 42, tokensOut: 88 },
env: { environmentTag: "production", appId: "my-app" },
session: { userIdHash },
});
// 3) Persist: one file per call, or batch across many calls.
writeJsonl(trace);
const batch = new TraceBatch();
for (const event of stream) batch.add(buildTrace(/* ... */));
batch.flush(); // writes accumulated traces as one fileBoth ESM and CommonJS consumers are supported via the "exports" map:
// ESM
import { buildTrace } from "autoctx/production-traces";
// CJS
const { buildTrace } = require("autoctx/production-traces");Zero telemetry
Traces go where you put them. The SDK itself emits zero telemetry
about its own usage. No analytics, no phone-home, no opt-out toggle
needed. CI script check:no-telemetry greps the SDK source plus every
transitive dep for suspicious network patterns on every PR.
Enterprise-discipline guarantees
- Bundle size: ~48 kB gzipped at ship, enforced in CI at a
100 kB ceiling. Tree-shakable via
"sideEffects"discipline. - License compatibility: every dep in the SDK's transitive closure
carries an MIT / Apache-2.0 / BSD / ISC / 0BSD license, enforced by
check:license-compatibility. - No install scripts:
autoctxdeclares nopreinstall,install, orpostinstalllifecycle hooks. Safe to deploy withnpm install --ignore-scripts. - Dual ESM + CJS: both
importandrequire()work via thepackage.json"exports"map. - Cross-runtime parity: TS
buildTraceand Pythonbuild_traceproduce byte-identical canonical JSON, enforced at 50 property runs plus 7 committed fixtures.
See src/production-traces/sdk/STABILITY.md
for the API-stability commitment and
src/production-traces/sdk/BUDGET.md
for the bundle-size budget details.
CLI Commands
The package ships a full autoctx CLI with commands including:
# Project setup and discovery
autoctx init
autoctx capabilities
autoctx login
autoctx whoami
autoctx logout
autoctx providers
autoctx models
# Scenario execution
autoctx solve "improve customer-support replies for billing disputes" --iterations 3 --json
autoctx run support_triage --iterations 3 --json
autoctx run --scenario support_triage --iterations 3 --json
autoctx list --json
autoctx runtime-sessions list --limit 10
autoctx runtime-sessions show --run-id <run-id> --json
autoctx runtime-sessions timeline --run-id <run-id> --json
autoctx context-selection --run-id <run-id> --json
autoctx agent run support --id ticket-123 --payload '{"message":"Please triage this."}' --env .env.local --json
autoctx agent dev --port 3583 --env .env.local
autoctx status <run-id>
autoctx show <run-id> --best
autoctx watch <run-id>
autoctx replay --run-id <id> --generation 1
autoctx benchmark --scenario support_triage --runs 5
# Package management
autoctx export <run-id> --output pkg.json
autoctx export-training-data --run-id <id> --output data.jsonl
autoctx import-package --file pkg.json
autoctx new-scenario --description "Test summarization quality"
autoctx new-scenario --template prompt-optimization --name support_triage
# Interactive, simulations, and missions
autoctx tui [--port 8000]
autoctx serve [--port 8000] [--json] # HTTP API
autoctx worker [--poll-interval 5] [--concurrency 2]
autoctx mcp-serve # MCP server on stdio
autoctx simulate -d "simulate deploying a web service with rollback"
autoctx simulate -d "simulate escalation thresholds" --sweep max_escalations=1:5:1
autoctx investigate -d "why did conversion drop after Tuesday's release"
autoctx investigate -d "checkout is failing" --browser-url https://status.example.com
autoctx analyze --id deploy_sim --type simulation
autoctx analyze --left sim_a --right sim_b --type simulation
autoctx trace-findings --trace ./trace.json # Markdown report
autoctx trace-findings --trace ./trace.json --json # TraceFindingReport JSON
autoctx mission create --name "Ship login" --goal "Implement OAuth"
autoctx mission create --type code --name "Fix login" --goal "Tests pass" --repo-path . --test-command "npm test"
autoctx mission run --id <mission-id> --max-iterations 3
autoctx mission status --id <mission-id>
autoctx mission artifacts --id <mission-id>
autoctx train --scenario support_triage --dataset data.jsonl --backend cuda
# Evaluation
autoctx judge -p <prompt> -o <output> -r <rubric>
autoctx judge --scenario my_saved_task -o <output>
autoctx improve -p <prompt> -r <rubric> [-n rounds]
autoctx improve -p <prompt> -o <output> -r <rubric> [-n rounds]
autoctx improve --scenario my_saved_task [-o <output>]
autoctx repl --scenario my_saved_task
# Task queue
autoctx queue -s <spec> [--priority N] [--browser-url https://status.example.com]
autoctx worker --poll-interval 5 --concurrency 2
autoctx statusStateful persistent providers, including persistent Pi RPC, run with effective worker concurrency 1 to keep long-lived runtime sessions isolated.
Provider Configuration
Configure the agent provider via environment variables:
# Anthropic (default)
ANTHROPIC_API_KEY=sk-ant-... autoctx run support_triage --json
# OpenAI-compatible
AUTOCONTEXT_AGENT_PROVIDER=openai-compatible \
AUTOCONTEXT_AGENT_API_KEY=sk-... \
AUTOCONTEXT_AGENT_BASE_URL=https://api.openai.com/v1 \
autoctx run support_triage --json
# Role-scoped override: competitor uses a separate gateway/key
AUTOCONTEXT_AGENT_PROVIDER=anthropic \
ANTHROPIC_API_KEY=sk-ant-primary \
AUTOCONTEXT_COMPETITOR_PROVIDER=openai-compatible \
AUTOCONTEXT_COMPETITOR_API_KEY=sk-role \
AUTOCONTEXT_COMPETITOR_BASE_URL=http://localhost:8000/v1 \
autoctx run support_triage --json
# Ollama (local)
AUTOCONTEXT_AGENT_PROVIDER=ollama autoctx run support_triage --json
# Hermes (via OpenAI-compatible gateway)
AUTOCONTEXT_AGENT_PROVIDER=openai-compatible \
AUTOCONTEXT_AGENT_BASE_URL=http://localhost:8080/v1 \
AUTOCONTEXT_AGENT_DEFAULT_MODEL=hermes-3-llama-3.1-8b \
autoctx run support_triage --json
# Hermes shortcut provider (same gateway path, Hermes defaults)
AUTOCONTEXT_AGENT_PROVIDER=hermes \
AUTOCONTEXT_AGENT_BASE_URL=http://localhost:8080/v1 \
autoctx run support_triage --json
# Claude CLI (local authenticated Claude Code runtime)
AUTOCONTEXT_AGENT_PROVIDER=claude-cli \
AUTOCONTEXT_CLAUDE_MODEL=sonnet \
autoctx run support_triage --json
# Codex CLI (local authenticated Codex runtime)
AUTOCONTEXT_AGENT_PROVIDER=codex \
AUTOCONTEXT_CODEX_MODEL=o4-mini \
autoctx run support_triage --json
# Pi CLI
AUTOCONTEXT_AGENT_PROVIDER=pi autoctx run support_triage --json
# Pi RPC with one long-lived subprocess
AUTOCONTEXT_AGENT_PROVIDER=pi-rpc \
AUTOCONTEXT_PI_RPC_PERSISTENT=true \
autoctx run support_triage --json
# Deterministic (CI/testing)
AUTOCONTEXT_AGENT_PROVIDER=deterministic autoctx run support_triage --jsonANTHROPIC_API_KEY is the preferred Anthropic credential env var. AUTOCONTEXT_ANTHROPIC_API_KEY remains supported as a compatibility alias.
Supported providers: anthropic, openai, openai-compatible, gemini, mistral, groq, openrouter, azure-openai, ollama, vllm, hermes, claude-cli, codex, pi, pi-rpc, deterministic.
Programmatic callers can pass runtimeSession, runtimeSessionRole,
runtimeSessionCwd, and runtimeSessionCommands to createProvider() for
CLI-backed providers (claude-cli, codex, pi, pi-rpc). Those options
wrap the underlying AgentRuntime at the provider bridge so provider
completions are recorded in the RuntimeSession event log.
autoctx run also creates a run-scoped runtime session automatically for
CLI-backed providers and persists it in the configured SQLite database. Use
autoctx runtime-sessions list and autoctx runtime-sessions show
--run-id <run-id> --json to inspect those recorded provider prompts,
messages, and child-task events. Use autoctx runtime-sessions timeline
--run-id <run-id> --json when operators need a grouped prompt/response and
child-task timeline instead of the raw event log. autoctx status <run-id> --json,
autoctx show <run-id> --json, and autoctx watch <run-id> --json include a
runtime_session summary when that persisted log exists. HTTP clients can read
the same data from GET /api/cockpit/runtime-sessions,
GET /api/cockpit/runtime-sessions/:session_id, and
GET /api/cockpit/runs/:run_id/runtime-session; timeline views are available
at GET /api/cockpit/runtime-sessions/:session_id/timeline and
GET /api/cockpit/runs/:run_id/runtime-session/timeline. Cockpit run list, status, and
resume responses also include runtime_session (a summary or null) plus
runtime_session_url for direct log discovery. The interactive server emits
live runtime updates on /ws/events as runtime_session_event envelopes on the
runtime_session channel; each payload includes the current session summary and
the appended event.
Inside autoctx tui, operators can run /timeline <run-id> to render the same
runtime-session timeline in the recent-activity pane. If a run is active,
/timeline uses that active run id. The TUI recent-activity feed also
summarizes live runtime-session prompt, assistant, shell, tool, and child-task
events as they arrive. Use
/activity [status|reset|<all|runtime|prompts|commands|children|errors> [quiet|normal|verbose]]
to focus that live feed and tune how much detail each runtime event includes.
Those activity settings are saved in the resolved autoctx config directory and
reloaded when the TUI starts again; /activity reset clears the saved
preference and returns the feed to all normal. On startup, Recent Activity
logs the loaded activity setting before the command help. Bare /activity and
/activity status report the current setting without rewriting the saved
preference.
autoctx simulate and autoctx investigate require a configured provider for spec generation. If you want synthetic placeholder behavior for CI/testing, select the deterministic provider explicitly instead of relying on implicit fallback.
Key environment variables:
| Variable | Purpose |
| -------------------------------------------------------------------- | ----------------------------------------------------------- |
| AUTOCONTEXT_AGENT_PROVIDER | Agent provider selection |
| AUTOCONTEXT_AGENT_API_KEY | Global API key override (or use provider-specific env vars) |
| AUTOCONTEXT_AGENT_BASE_URL | Global base URL override for compatible providers |
| AUTOCONTEXT_AGENT_DEFAULT_MODEL | Override default model |
| AUTOCONTEXT_COMPETITOR_API_KEY / AUTOCONTEXT_COMPETITOR_BASE_URL | Optional competitor-specific credential/endpoint override |
| AUTOCONTEXT_ANALYST_API_KEY / AUTOCONTEXT_ANALYST_BASE_URL | Optional analyst-specific credential/endpoint override |
| AUTOCONTEXT_COACH_API_KEY / AUTOCONTEXT_COACH_BASE_URL | Optional coach-specific credential/endpoint override |
| AUTOCONTEXT_ARCHITECT_API_KEY / AUTOCONTEXT_ARCHITECT_BASE_URL | Optional architect-specific credential/endpoint override |
| AUTOCONTEXT_CLAUDE_MODEL | Claude CLI model alias override |
| AUTOCONTEXT_CODEX_MODEL | Codex CLI model override |
| AUTOCONTEXT_PI_RPC_PERSISTENT | Reuse one Pi RPC subprocess across provider calls |
| AUTOCONTEXT_CONFIG_DIR | Override where login / whoami read saved credentials |
| AUTOCONTEXT_DB_PATH | SQLite database path |
Credential resolution order is:
- Environment variables
- CLI flags
- Project config (
.autoctx.json) - Credential store (
~/.config/autoctx/credentials.json)
Project Defaults
autoctx init scaffolds a .autoctx.json file in your project. When present, the CLI uses it for:
- Default provider selection
- Default model preference
- Default scenario for
run,benchmark, andexport - Project
runs/andknowledge/roots - The default SQLite database location under the configured
runs_dir
autoctx init also writes an AGENTS.md block with the recommended local autocontext workflow.
autoctx capabilities returns structured JSON describing commands, providers, scenarios, the canonical concept model, and project-specific state such as the current project config, active runs, and knowledge directory summary.
autoctx login can prompt interactively for provider credentials. autoctx login --provider ollama validates that a local Ollama server is reachable before persisting the connection details, and autoctx logout clears the stored credentials.
autoctx replay writes the selected generation and available generations to stderr before printing the replay JSON payload. autoctx export-training-data writes progress updates to stderr while keeping JSONL records on stdout.
Saved custom scenarios under knowledge/_custom_scenarios/ can be reused directly from the TS CLI. Saved parametric scenarios can now be targeted by name in run and benchmark, while saved agent-task scenarios remain directly usable in judge, improve, repl, and queue without retyping their prompt and rubric.
Control-Plane Strategy Identity
autoctx candidate register records deterministic strategy identity metadata for
each candidate artifact. The identity includes a canonical strategy fingerprint,
per-file component fingerprints, parent strategy fingerprints, and an exact or
near duplicate assessment when the candidate matches an existing strategy surface
for the same scenario and actuator. Environments do not split this identity
surface: a repeated strategy in staging is still a duplicate of the same
scenario/actuator strategy from production.
autoctx candidate register \
--scenario grid_ctf \
--actuator prompt-patch \
--payload ./payload \
--output jsonautoctx candidate show <artifact-id> --output json returns the full
strategyIdentity block. autoctx candidate list --output json includes compact
strategyFingerprint and duplicateKind fields for automation.
If a newly registered candidate is an exact or near duplicate of a disabled or
already-quarantined strategy, the artifact also records strategyQuarantine.
Promotion decisions reject quarantined strategies as non-promotion evidence, and
autoctx candidate list --output json includes quarantineReason for quick
triage. Operational memory can also skip findings tied to quarantined strategy
fingerprints before they are rendered back into agent context. Older artifacts
without strategyIdentity can still seed exact duplicate/quarantine checks via
their content-addressed payloadHash.
Control-Plane Eval Tracks
autoctx eval attach accepts --track verified|experimental for attached
EvalRuns. Verified runs are promotion-grade evidence; experimental runs remain
inspectable but are rejected by promotion decisions by default.
autoctx eval attach <artifact-id> \
--suite prod-eval \
--metrics metrics.json \
--dataset-provenance dataset.json \
--track experimental \
--output jsonautoctx eval list <artifact-id> --output json includes the effective track for
each EvalRun. Legacy clean EvalRuns without an explicit track are reported as
verified; non-clean integrity still blocks ingestion and promotion evidence.
For accepted strategy or harness changes, promotion can also require explicit ablation evidence. Attach an ablation verification object to the EvalRun:
{
"status": "passed",
"targets": ["strategy", "harness"],
"verifiedAt": "2026-05-13T12:00:00.000Z",
"evidenceRefs": ["runs/ablation/run_1.json"]
}autoctx eval attach <artifact-id> \
--suite prod-eval \
--metrics metrics.json \
--dataset-provenance dataset.json \
--ablation-verification ablation.jsonAblation checks are opt in at decision time:
autoctx promotion decide <artifact-id> \
--baseline auto \
--require-ablation \
--ablation-targets strategy,harness \
--output jsonWhen enabled, the PromotionDecision includes an ablationVerification
assessment and fails if the latest candidate EvalRun is missing evidence, has a
failed or incomplete status, or does not cover every required target.
Control-Plane Harness Proposals
autoctx harness proposal create records evidence-backed harness/context change
proposals before they can affect the loop. A proposal carries finding lineage,
the target surface, concrete patches, expected impact, rollback criteria, and
provenance.
autoctx harness proposal create \
--finding finding-1 \
--surface prompt \
--summary "tighten verifier-facing prompt" \
--patches patches.json \
--expected-impact impact.json \
--rollback "revert if heldout quality drops" \
--output jsonautoctx harness proposal decide gates a proposal against the candidate
artifact's EvalRun evidence for the requested suite. heldout and fresh
validation can accept or reject the proposal when compared with matching-suite
baseline evidence; dev or missing-baseline validation stays inconclusive.
Promotion-grade decisions must also include at least one --evidence-ref;
omitting evidence refs keeps the durable proposal decision inconclusive.
autoctx harness proposal decide <proposal-id> \
--candidate <artifact-id> \
--baseline <artifact-id>|auto|none \
--validation heldout \
--suite prod-heldout \
--evidence-ref runs/heldout/candidate.json \
--output jsonUse autoctx harness proposal list --output json for a compact review queue and
autoctx harness proposal show <proposal-id> --output json for the full durable
record under .autocontext/harness-proposals/.
MCP Tools (40+)
mcp-serve starts the MCP server on stdio with tools across these families:
| Family | Tools | | ------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | Scenarios | list_scenarios, get_scenario, validate_strategy, run_match, run_tournament, run_scenario | | Runs | list_runs, get_run_status, get_generation_detail, list_runtime_sessions, get_runtime_session, get_runtime_session_timeline, run_replay | | Knowledge | get_playbook, read_trajectory, read_hints, read_analysis, read_tools, read_skills | | Evaluation | evaluate_output, run_improvement_loop, run_repl_session, generate_output | | Task queue | queue_task, get_queue_status, get_task_result | | Export/Search | export_skill, export_package, import_package, list_solved, search_strategies | | Feedback | record_feedback, get_feedback | | Solve | solve_scenario, solve_status, solve_result | | Sandbox | sandbox_create, sandbox_run, sandbox_status, sandbox_playbook, sandbox_list, sandbox_destroy | | Agent tasks | create_agent_task, list_agent_tasks, get_agent_task | | Missions | create_mission, mission_status, mission_result, mission_artifacts, pause_mission, resume_mission, cancel_mission | | Discovery | capabilities |
create_mission and autoctx mission create both support a code-mission variant with type=code plus repo_path / test_command (and optional lint_command / build_command) so mission success is tied to external checks instead of model self-report.
Claude Code integration
{
"mcpServers": {
"autocontext": {
"command": "npx",
"args": ["autoctx", "mcp-serve"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}Library Usage
import { createProvider, LLMJudge, ImprovementLoop, SimpleAgentTask } from "autoctx";
// One-shot evaluation
const provider = createProvider({ providerType: "anthropic", apiKey: "sk-ant-..." });
const judge = new LLMJudge({ provider, rubric: "Score clarity and correctness." });
const result = await judge.evaluate({
taskPrompt: "Explain binary search.",
agentOutput: "Binary search halves the search space each step.",
});
// Multi-round improvement
const task = new SimpleAgentTask(
"Draft a support reply for a billing dispute.",
"Score accuracy, policy compliance, and tone.",
provider,
);
const loop = new ImprovementLoop({ task, maxRounds: 3, qualityThreshold: 0.9 });
const improved = await loop.run({
initialOutput: "We can help with that billing issue.",
state: {},
});TS / Python Scope
The TypeScript package includes the current 0.4.x operator-facing surfaces:
simulateinvestigateanalyzecontext-selectiontrace-findings— extract structured findings (TraceFindingReport) from aPublicTraceJSON file (AC-679)missiontrainas a validation plus executor-hook surface
campaign now ships as a first-class TypeScript CLI/API/MCP workflow for multi-mission coordination.
context-selection reads persisted per-run context-selection artifacts and
renders the same budget, semantic compaction cache, diagnostics, and selected
context telemetry cards exposed through Cockpit HTTP.
For end-to-end local MLX/CUDA training, the Python package is still the canonical out-of-the-box runtime.
Browser Exploration Contract
The TypeScript package exposes the shared browser exploration contract and policy helpers from the package root. Browser exploration is disabled by default and configured through AUTOCONTEXT_BROWSER_* settings such as AUTOCONTEXT_BROWSER_ENABLED, AUTOCONTEXT_BROWSER_ALLOWED_DOMAINS, and AUTOCONTEXT_BROWSER_PROFILE_MODE.
Use resolveBrowserSessionConfig(...), evaluateBrowserActionPolicy(...), and the validateBrowser* helpers when integrating a browser backend or agent harness.
When browser exploration is enabled, the TS CLI can capture a policy-gated Chrome DevTools Protocol snapshot and attach it as evidence for autoctx investigate --browser-url <url>. Queued agent tasks can also store --browser-url; the runner resolves it through an injected browser-context service so enterprise deployments can keep browser access disabled by default, domain-scoped, and audit-artifact backed.
Python-Only Commands
These workflows require infrastructure not available in the npm package:
ecosystem— Multi-provider cyclingab-test— Requires ecosystem runnerresume/wait— Run recovery infrastructurehermes inspect/hermes export-skill— Hermes v0.12 Curator inspection and Hermes skill exporttrigger-distillation— Training pipeline- Monitor conditions — Monitoring engine
train is exposed in the TS CLI as a validation plus executor-hook surface, but the npm package does not bundle a real MLX/CUDA trainer. For end-to-end local training, use the Python package (pip install autocontext) or inject a real TrainingRunner executor from code.
OpenAI integration
Autocontext ships a zero-configuration OpenAI instrumentation path that
automatically wraps your existing new OpenAI(...) calls and emits structured
traces to a sink of your choice.
1. Register detectors
Create .autoctx.instrument.config.mjs at the root of your repo:
// .autoctx.instrument.config.mjs
import { registerDetectorPlugin } from "autoctx/control-plane/instrument";
import { plugin as openaiTsPlugin } from "autoctx/detectors/openai-ts";
registerDetectorPlugin(openaiTsPlugin);2. Run instrument
Preview changes without touching any files:
autoctx instrument --dry-runApply changes on a new branch for review:
autoctx instrument --apply --branch autoctx/instrument3. Review the PR
The instrument command opens a branch. Open the PR and review the diff — you
will see your new OpenAI(...) calls wrapped with instrumentClient(...).
Edit the generated TODO comment to point at your FileSink:
// Before (generated):
const client = instrumentClient(new OpenAI(), { sink: /* TODO: pass your TraceSink here */ });
// After (your edit):
import { FileSink } from "autoctx/integrations/openai";
const sink = new FileSink("./traces/openai.jsonl");
const client = instrumentClient(new OpenAI(), { sink });Merge the PR.
4. Customer code emits traces
Your code is unchanged beyond the wrap. Every chat.completions.create call
now emits a JSONL trace line to your sink:
import OpenAI from "openai";
import { instrumentClient, FileSink, autocontextSession } from "autoctx/integrations/openai";
const sink = new FileSink("./traces/openai.jsonl");
const client = instrumentClient(new OpenAI(), { sink });
await autocontextSession({ userId: "u_123" }, async () => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);
});
await sink.close();Emitted trace line (pretty-printed for readability):
{
"schemaVersion": "1.0",
"traceId": "...",
"sessionContext": {
"userId": "u_123"
},
"request": {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
},
"response": {
"id": "...",
"choices": [
{
"message": {
"role": "assistant",
"content": "Hi! How can I help?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 7,
"total_tokens": 16
}
},
"durationMs": 342,
"errorReason": null
}For the Python equivalent, see
autocontext/src/autocontext/integrations/openai/STABILITY.md.
Anthropic integration
Autocontext ships a zero-configuration Anthropic instrumentation path that
automatically wraps your existing new Anthropic(...) calls and emits structured
traces to a sink of your choice.
1. Register detectors
Create .autoctx.instrument.config.mjs at the root of your repo:
// .autoctx.instrument.config.mjs
import { registerDetectorPlugin } from "autoctx/control-plane/instrument";
import { plugin as anthropicTsPlugin } from "autoctx/detectors/anthropic-ts";
registerDetectorPlugin(anthropicTsPlugin);2. Run instrument
Preview changes without touching any files:
autoctx instrument --dry-runApply changes on a new branch for review:
autoctx instrument --apply --branch autoctx/instrument3. Review the PR
The instrument command opens a branch. Open the PR and review the diff — you
will see your new Anthropic(...) calls wrapped with instrumentClient(...).
Edit the generated TODO comment to point at your FileSink:
// Before (generated):
const client = instrumentClient(new Anthropic(), { sink: /* TODO: pass your TraceSink here */ });
// After (your edit):
import { FileSink } from "autoctx/integrations/anthropic";
const sink = new FileSink("./traces/anthropic.jsonl");
const client = instrumentClient(new Anthropic(), { sink });Merge the PR.
4. Customer code emits traces
Your code is unchanged beyond the wrap. Every messages.create call now emits
a JSONL trace line to your sink:
import Anthropic from "@anthropic-ai/sdk";
import { instrumentClient, FileSink, autocontextSession } from "autoctx/integrations/anthropic";
const sink = new FileSink("./traces/anthropic.jsonl");
const client = instrumentClient(new Anthropic(), { sink });
await autocontextSession({ userId: "u_123" }, async () => {
const res = await client.messages.create({
model: "claude-opus-4-7-20251101",
max_tokens: 256,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.content[0].type === "text" ? res.content[0].text : "");
});
await sink.close();Emitted trace line (pretty-printed for readability):
{
"schemaVersion": "1.0",
"traceId": "...",
"sessionContext": {
"userId": "u_123"
},
"request": {
"model": "claude-opus-4-7-20251101",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
},
"response": {
"id": "...",
"content": [
{
"type": "text",
"text": "Hi! How can I help?"
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 9,
"output_tokens": 7
}
},
"durationMs": 342,
"errorReason": null
}For the Python equivalent, see
autocontext/src/autocontext/integrations/anthropic/STABILITY.md.
Development
cd ts
npm install
npm test # vitest
npm run lint # tsc --noEmit
npm run build # tsc (outputs to dist/)
npm run check:a2-ii-a-all # enterprise discipline checks for the SDK subpath