cto-ai-cli
v8.1.0
Published
AI context selection done right. Picks the right files, sanitizes secrets, learns from your feedback. --context, --audit, --accept/--reject.
Downloads
1,317
Maintainers
Readme
CTO — AI Context Selection Engine
The most complete AI context selection engine in open source. Picks the right code chunks (not just files), auto-redacts secrets, learns from feedback. 18 signals. Zero AI dependencies.
cto --context "fix the seller info cache invalidation on KVS delete" --stdout | pbcopy→ 166 relevant chunks from 59 files (26K tokens, 0 secrets)
→ Full chain: DeleteEndpoint → Router → UseCase → CacheService → KvsRepository202KB package · 1,133 tests · 96 source modules · Zero AI dependencies.
The Problem
When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:
- Send everything → expensive, slow, hits token limits
- Pick files manually → miss dependencies, forget test files, leak secrets
CTO solves both: it automatically selects the most relevant files for any task, sanitizes secrets before they reach any AI provider, and learns from feedback to get better over time.
Quick Demo
cto --demo # Run a live showcase on your projectThis runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.
Benchmark Results
Eval Harness v8.1 — 20-file Java enterprise project, 4 tasks with expert-labeled ground truth:
| Metric | v8.0 | v8.1 | |---|---|---| | Must-have recall | 100% | 100% | | Precision | 38% | 60% (+22pp) | | F1 | 55% | 74% (+19pp) | | Noise rate | 11.3% | 5.7% (-5.6pp) |
Real production repos (Java monoliths):
| Repo | Files | Without CTO | With CTO v8.0 | |---|---|---|---| | seller-info-service | 219 | 212 files (97%) | 166 chunks from 59 files | | sizechart-middleend | 1,719 | 230 files | 72 chunks from 37 files | | charts-backend | 1,261 | 685 files (54%) | 142 chunks from 16 files |
Internal benchmark (8 tasks, own codebase):
| Strategy | Precision | Recall | F1 | |---|---|---|---| | CTO + Reranker | 96.9% | 100% | 98.4% | | TF-IDF only | 54.6% | 87.5% | 62.0% | | Random | 7.7% | 6.3% | 2.8% |
ROI
On a typical 130-file TypeScript project:
| Metric | Without CTO | With CTO | |---|---|---| | Tokens per interaction | 370K (all files) | ~28K (selected) | | Cost per interaction (Sonnet) | $1.11 | $0.08 | | Monthly cost (10 devs, 40/day) | $8,880 | $640 | | Annual savings | — | ~$99,000 |
Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every --accept / --reject.
How it Works (v8.0 Pipeline)
Task → Query Intent Parser → structured action/entities/layers
│
▼
BM25 (weighted) ──────┐
TF-IDF Embedding ─────┤──→ RRF Fusion ─→ 8-signal Boosting ─→ Reranker
Multi-hop (auto) ─────┘ │
▼
Selection ─→ Chunk Extraction ─→ Output
(methods, not files)10-step pipeline:
| # | Step | What it does |
|---|---|---|
| 0 | Query Intent | Parses "fix cache invalidation on delete" → action:fix, entities:[cache,kvs], layers:[cache] |
| 1 | BM25 + Embedding | Lexical matching + TF-IDF cosine vectors, merged via Reciprocal Rank Fusion |
| 2 | Multi-hop | Complex queries auto-detected → iterative BM25 expansion via deps + call graph (2 hops) |
| 3 | Path IDF Boost | Query terms in file paths get boosted |
| 4 | Layer Boost | Architectural layer matching (controller, service, repository) |
| 5 | Import Boost | Dependencies of top-ranked files get pulled in |
| 6 | Call Graph Boost | Cross-file method calls traced (Java/TS/Python/Go) |
| 7 | Git Co-Change | Files frequently modified together (Jaccard similarity from commits) |
| 8 | Reranker | 5-signal quality gate: term coverage, specificity, bigram proximity, deps, path |
| 9 | Chunk Extraction | Extracts relevant functions/methods — not whole files. 10x token efficiency |
No AI is used for selection. Same input → same output. Deterministic.
Install
npm i -g cto-ai-cli # global
npx cto-ai-cli # or one-shotContext Selection
cto --context "refactor the auth middleware" # human-readable summary
cto --context "fix login bug" --stdout | pbcopy # pipe to clipboard
cto --context "add tests" --output context.md # save to file
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
cto --context "debug scoring" --json # JSON for tooling
cto --context "fix auth" --budget 30000 # custom token budgetOutput includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. Secrets are automatically redacted — API keys, tokens, passwords, PII are replaced with **** before output.
Feedback Loop
CTO learns from real feedback, not from itself:
cto --accept # last selection was good
cto --reject # last selection was bad
cto --reject --missing src/auth.ts # this file was missing
cto --stats # see what CTO has learnedOn --reject, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.
Secret Audit
cto --audit # scan all files
cto --audit --init-hook # install pre-commit hook
cto --audit --full-scan # ignore cache, scan everything
cto --audit --json # machine-readable output45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: audit protects context — every --stdout, --output, and --prompt auto-sanitizes secrets before output.
Before: OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
After: OPENAI_KEY = "sk-R********************De"AI Gateway (Enterprise)
A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.
cto --gateway # Start on port 8787
cto --gateway --port 9000 # Custom port
cto --gateway --block-secrets # Block requests with critical secrets
cto --gateway --budget-daily 50 # $50/day budget limit
cto --gateway --budget-monthly 500 # $500/month budget limitDeveloper → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
↓
Dashboard (http://localhost:8787/__cto)What the gateway does automatically:
- Injects CTO-selected context into every AI request (TF-IDF + composite scoring)
- Redacts secrets before they leave the network (45+ patterns)
- Tracks costs per model, per day, per month with budget alerts
- Streams responses with zero-copy SSE passthrough
- Serves a live dashboard at
/__ctowith real-time metrics
Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.
Cross-Repo Context
When working on a task, CTO can pull relevant files from sibling repositories — not just the current project.
cto --context "fix payment webhook" --auto-repos # Auto-discover sibling repos
cto --context "fix payment webhook" --repos shared-types,payment-serviceHow it works:
- Discovers sibling repos in parent directory (any dir with
package.json,tsconfig.json,Cargo.toml, etc.) - Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
- Queries each sibling with the task description
- Returns ranked matches with repo attribution and content
Real use case: You're fixing a webhook handler in api-gateway — CTO finds the Payment interface in shared-types and the consumer in notification-service automatically.
Cost-Aware Model Routing
CTO analyzes the actual selected context (not just the project) to recommend the cheapest model that can handle the task.
cto --context "update readme" --route # → Haiku ($0.08/call, 73% cheaper)
cto --context "fix auth bug" --route # → Opus ($1.33/call, critical complexity)
cto --context "refactor API" --route # → Sonnet ($0.30/call, balanced)Complexity is computed from real signals:
- Token density (% of budget used)
- Risk concentration (top-5 file avg risk vs project max)
- Directory diversity (cross-cutting = harder)
- Dependency density among selected files
The gateway also uses this: every proxied request gets a model recommendation in the injected context.
MCP Server
Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).
3 tools: cto_select_context, cto_audit_secrets, cto_explain
// Windsurf: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "cto": { "command": "cto-mcp" } } }
// Claude Desktop
{ "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }MCP output is also auto-sanitized when includeContents: true.
Programmatic API
import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';
const analysis = await analyzeProject('./my-project');
const index = buildIndex(files);
const semanticScores = query(index, 'fix auth', 50)
.map(m => ({ filePath: m.filePath, score: m.score }));
const selection = await selectContext({
task: 'fix auth',
analysis,
budget: 50_000,
semanticScores,
});v8.0 — What's New
Chunk-Level Retrieval (the big one)
Instead of including entire files, CTO now extracts only the relevant functions and methods. A 2000-line file with 1 relevant method → 50 lines included, not 2000.
### src/main/java/com/example/cache/CacheService.java
```java
// L15-22: method invalidate
public void invalidate(String id) {
redis.delete("cache:seller:" + id);
}
// ... lines 23-45 omitted ...
// L46-52: method retrieve
public SellerDTO retrieve(String id) {
return redis.opsForValue().get("cache:seller:" + id);
}Supports Java, TypeScript, Python, Go.
Query Intent Parsing
Before searching, CTO parses your task into structured intent:
"fix the seller cache invalidation on KVS delete"
→ action: fix
→ entities: [seller, kvs] (3× weight)
→ operations: [invalidate, delete] (2× weight)
→ layers: [cache]Entities get 3× BM25 weight, operations get 2×. Much better precision on enterprise queries.
Embedding Search + RRF Fusion
TF-IDF cosine embedding vectors complement BM25 lexical matching. Merged via Reciprocal Rank Fusion (60/40 BM25/embedding). Catches semantic similarity that BM25 misses.
Cross-File Call Graph
Traces method calls across files: cacheService.invalidate() in UseCase → finds CacheService.java. Regex-based, works for Java/TS/Python/Go.
Git Co-Change Signal
Files frequently modified together in git history get boosted. Jaccard similarity from commit co-occurrence.
Multi-Hop Reasoning
Complex enterprise queries auto-detected. Iterative BM25: top matches → expand via deps + call graph → re-query. Traces full execution chains (4/4 hops).
Evaluation Harness
Ground truth benchmark with must-have/relevant/noise labels. 100% must-have recall on 4-task Java enterprise benchmark.
Enterprise Features
- AI Gateway — transparent HTTP proxy with context injection, secret redaction, cost tracking
- Team Auth — per-team API keys, JWT (HS256/RS256), rate limiting, OIDC discovery
- Policy Engine — model overrides by task type, cost caps, block rules
- Metrics — Prometheus, Datadog JSON, StatsD UDP
- A/B Testing — context strategy experiments with z-test significance
- LSP Bridge — JSON-RPC 2.0 for VS Code, JetBrains, Neovim
- Persistent Index Cache — 50K-file repos: 5s → <100ms on warm cache
Competitor Comparison
| Feature | CTO v8 | Cursor | Sourcegraph Cody | |---|---|---|---| | BM25 retrieval | ✅ | ✅ | ✅ | | Embedding search | ✅ TF-IDF cosine+RRF | ✅ | ✅ | | Chunk-level retrieval | ✅ 4 langs | ✅ | ✅ | | Multi-signal RRF fusion | ✅ 8-signal | ❌ | ❌ | | Cross-file call graph | ✅ | ❌ | ❌ | | Git co-change signal | ✅ | ❌ | ❌ | | Multi-hop reasoning | ✅ | ❌ | ❌ | | Query intent parsing | ✅ | ❌ | ❌ | | Feedback learning | ✅ | ❌ | ❌ | | Secret redaction | ✅ | ❌ | ❌ | | Total signals | 18 | ~3 | ~5 |
Honest Limitations
- TypeScript/JavaScript gets AST analysis. Python/Go/Java/Rust get regex-based parsing (good for graphs + chunking, not AST-precise).
- Embeddings are TF-IDF cosine, not neural. ONNX infrastructure ready — neural model would add ~5-10% recall.
- Learning needs ~5 feedback cycles to start influencing selection. First runs are pure pipeline.
- Chunk extraction is regex-based — works for standard methods/functions, may miss DSLs or deeply nested code.
- Benchmarked against naive baselines. Not compared against Cursor/Copilot internal context engines.
Contributing
git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
npm install && npm run build && npm test # 1,133 tests