cto-ai-cli

v8.1.0

Published

4 months ago

AI context selection done right. Picks the right files, sanitizes secrets, learns from your feedback. --context, --audit, --accept/--reject.

Downloads

116

0High
0Medium
0Low

diegoth

ai context llm token-optimization secret-detection mcp claude cursor copilot windsurf

CTO — AI Context Selection Engine

The most complete AI context selection engine in open source. Picks the right code chunks (not just files), auto-redacts secrets, learns from feedback. 18 signals. Zero AI dependencies.

cto --context "fix the seller info cache invalidation on KVS delete" --stdout | pbcopy

→ 166 relevant chunks from 59 files (26K tokens, 0 secrets)
→ Full chain: DeleteEndpoint → Router → UseCase → CacheService → KvsRepository

202KB package · 1,133 tests · 96 source modules · Zero AI dependencies.

The Problem

When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:

Send everything → expensive, slow, hits token limits
Pick files manually → miss dependencies, forget test files, leak secrets

CTO solves both: it automatically selects the most relevant files for any task, sanitizes secrets before they reach any AI provider, and learns from feedback to get better over time.

Quick Demo

cto --demo   # Run a live showcase on your project

This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.

Benchmark Results

Eval Harness v8.1 — 20-file Java enterprise project, 4 tasks with expert-labeled ground truth:

| Metric | v8.0 | v8.1 | |---|---|---| | Must-have recall | 100% | 100% | | Precision | 38% | 60% (+22pp) | | F1 | 55% | 74% (+19pp) | | Noise rate | 11.3% | 5.7% (-5.6pp) |

Real production repos (Java monoliths):

| Repo | Files | Without CTO | With CTO v8.0 | |---|---|---|---| | seller-info-service | 219 | 212 files (97%) | 166 chunks from 59 files | | sizechart-middleend | 1,719 | 230 files | 72 chunks from 37 files | | charts-backend | 1,261 | 685 files (54%) | 142 chunks from 16 files |

Internal benchmark (8 tasks, own codebase):

| Strategy | Precision | Recall | F1 | |---|---|---|---| | CTO + Reranker | 96.9% | 100% | 98.4% | | TF-IDF only | 54.6% | 87.5% | 62.0% | | Random | 7.7% | 6.3% | 2.8% |

ROI

On a typical 130-file TypeScript project:

| Metric | Without CTO | With CTO | |---|---|---| | Tokens per interaction | 370K (all files) | ~28K (selected) | | Cost per interaction (Sonnet) | $1.11 | $0.08 | | Monthly cost (10 devs, 40/day) | $8,880 | $640 | | Annual savings | — | ~$99,000 |

Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every --accept / --reject.

How it Works (v8.0 Pipeline)

Task → Query Intent Parser → structured action/entities/layers
         │
         ▼
   BM25 (weighted) ──────┐
   TF-IDF Embedding ─────┤──→ RRF Fusion ─→ 8-signal Boosting ─→ Reranker
   Multi-hop (auto) ─────┘          │
                                    ▼
                              Selection ─→ Chunk Extraction ─→ Output
                                              (methods, not files)

10-step pipeline:

| # | Step | What it does | |---|---|---| | 0 | Query Intent | Parses "fix cache invalidation on delete" → action:fix, entities:[cache,kvs], layers:[cache] | | 1 | BM25 + Embedding | Lexical matching + TF-IDF cosine vectors, merged via Reciprocal Rank Fusion | | 2 | Multi-hop | Complex queries auto-detected → iterative BM25 expansion via deps + call graph (2 hops) | | 3 | Path IDF Boost | Query terms in file paths get boosted | | 4 | Layer Boost | Architectural layer matching (controller, service, repository) | | 5 | Import Boost | Dependencies of top-ranked files get pulled in | | 6 | Call Graph Boost | Cross-file method calls traced (Java/TS/Python/Go) | | 7 | Git Co-Change | Files frequently modified together (Jaccard similarity from commits) | | 8 | Reranker | 5-signal quality gate: term coverage, specificity, bigram proximity, deps, path | | 9 | Chunk Extraction | Extracts relevant functions/methods — not whole files. 10x token efficiency |

No AI is used for selection. Same input → same output. Deterministic.

Install

npm i -g cto-ai-cli    # global
npx cto-ai-cli         # or one-shot

Context Selection

cto --context "refactor the auth middleware"                 # human-readable summary
cto --context "fix login bug" --stdout | pbcopy              # pipe to clipboard
cto --context "add tests" --output context.md                # save to file
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
cto --context "debug scoring" --json                         # JSON for tooling
cto --context "fix auth" --budget 30000                      # custom token budget

Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. Secrets are automatically redacted — API keys, tokens, passwords, PII are replaced with **** before output.

Feedback Loop

CTO learns from real feedback, not from itself:

cto --accept                         # last selection was good
cto --reject                         # last selection was bad
cto --reject --missing src/auth.ts   # this file was missing
cto --stats                          # see what CTO has learned

On --reject, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.

Secret Audit

cto --audit                  # scan all files
cto --audit --init-hook      # install pre-commit hook
cto --audit --full-scan      # ignore cache, scan everything
cto --audit --json           # machine-readable output

45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: audit protects context — every --stdout, --output, and --prompt auto-sanitizes secrets before output.

Before:  OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
After:   OPENAI_KEY = "sk-R********************De"

AI Gateway (Enterprise)

A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.

cto --gateway                        # Start on port 8787
cto --gateway --port 9000            # Custom port
cto --gateway --block-secrets        # Block requests with critical secrets
cto --gateway --budget-daily 50      # $50/day budget limit
cto --gateway --budget-monthly 500   # $500/month budget limit

Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
                ↓
          Dashboard (http://localhost:8787/__cto)

What the gateway does automatically:

Injects CTO-selected context into every AI request (TF-IDF + composite scoring)
Redacts secrets before they leave the network (45+ patterns)
Tracks costs per model, per day, per month with budget alerts
Streams responses with zero-copy SSE passthrough
Serves a live dashboard at /__cto with real-time metrics

Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.

Cross-Repo Context

When working on a task, CTO can pull relevant files from sibling repositories — not just the current project.

cto --context "fix payment webhook" --auto-repos   # Auto-discover sibling repos
cto --context "fix payment webhook" --repos shared-types,payment-service

How it works:

Discovers sibling repos in parent directory (any dir with package.json, tsconfig.json, Cargo.toml, etc.)
Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
Queries each sibling with the task description
Returns ranked matches with repo attribution and content

Real use case: You're fixing a webhook handler in api-gateway — CTO finds the Payment interface in shared-types and the consumer in notification-service automatically.

Cost-Aware Model Routing

CTO analyzes the actual selected context (not just the project) to recommend the cheapest model that can handle the task.

cto --context "update readme" --route     # → Haiku ($0.08/call, 73% cheaper)
cto --context "fix auth bug" --route      # → Opus ($1.33/call, critical complexity)
cto --context "refactor API" --route      # → Sonnet ($0.30/call, balanced)

Complexity is computed from real signals:

Token density (% of budget used)
Risk concentration (top-5 file avg risk vs project max)
Directory diversity (cross-cutting = harder)
Dependency density among selected files

The gateway also uses this: every proxied request gets a model recommendation in the injected context.

MCP Server

Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).

3 tools: cto_select_context, cto_audit_secrets, cto_explain

// Windsurf: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "cto": { "command": "cto-mcp" } } }

// Claude Desktop
{ "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }

MCP output is also auto-sanitized when includeContents: true.

Programmatic API

import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';

const analysis = await analyzeProject('./my-project');
const index = buildIndex(files);
const semanticScores = query(index, 'fix auth', 50)
  .map(m => ({ filePath: m.filePath, score: m.score }));

const selection = await selectContext({
  task: 'fix auth',
  analysis,
  budget: 50_000,
  semanticScores,
});

v8.0 — What's New

Chunk-Level Retrieval (the big one)

Instead of including entire files, CTO now extracts only the relevant functions and methods. A 2000-line file with 1 relevant method → 50 lines included, not 2000.

### src/main/java/com/example/cache/CacheService.java
```java
// L15-22: method invalidate
public void invalidate(String id) {
    redis.delete("cache:seller:" + id);
}

// ... lines 23-45 omitted ...

// L46-52: method retrieve
public SellerDTO retrieve(String id) {
    return redis.opsForValue().get("cache:seller:" + id);
}

Supports Java, TypeScript, Python, Go.

Query Intent Parsing

Before searching, CTO parses your task into structured intent:

"fix the seller cache invalidation on KVS delete"
  → action: fix
  → entities: [seller, kvs] (3× weight)
  → operations: [invalidate, delete] (2× weight)
  → layers: [cache]

Entities get 3× BM25 weight, operations get 2×. Much better precision on enterprise queries.

Embedding Search + RRF Fusion

TF-IDF cosine embedding vectors complement BM25 lexical matching. Merged via Reciprocal Rank Fusion (60/40 BM25/embedding). Catches semantic similarity that BM25 misses.

Cross-File Call Graph

Traces method calls across files: cacheService.invalidate() in UseCase → finds CacheService.java. Regex-based, works for Java/TS/Python/Go.

Git Co-Change Signal

Files frequently modified together in git history get boosted. Jaccard similarity from commit co-occurrence.

Multi-Hop Reasoning

Complex enterprise queries auto-detected. Iterative BM25: top matches → expand via deps + call graph → re-query. Traces full execution chains (4/4 hops).

Evaluation Harness

Ground truth benchmark with must-have/relevant/noise labels. 100% must-have recall on 4-task Java enterprise benchmark.

Enterprise Features

AI Gateway — transparent HTTP proxy with context injection, secret redaction, cost tracking
Team Auth — per-team API keys, JWT (HS256/RS256), rate limiting, OIDC discovery
Policy Engine — model overrides by task type, cost caps, block rules
Metrics — Prometheus, Datadog JSON, StatsD UDP
A/B Testing — context strategy experiments with z-test significance
LSP Bridge — JSON-RPC 2.0 for VS Code, JetBrains, Neovim
Persistent Index Cache — 50K-file repos: 5s → <100ms on warm cache

Competitor Comparison

| Feature | CTO v8 | Cursor | Sourcegraph Cody | |---|---|---|---| | BM25 retrieval | ✅ | ✅ | ✅ | | Embedding search | ✅ TF-IDF cosine+RRF | ✅ | ✅ | | Chunk-level retrieval | ✅ 4 langs | ✅ | ✅ | | Multi-signal RRF fusion | ✅ 8-signal | ❌ | ❌ | | Cross-file call graph | ✅ | ❌ | ❌ | | Git co-change signal | ✅ | ❌ | ❌ | | Multi-hop reasoning | ✅ | ❌ | ❌ | | Query intent parsing | ✅ | ❌ | ❌ | | Feedback learning | ✅ | ❌ | ❌ | | Secret redaction | ✅ | ❌ | ❌ | | Total signals | 18 | ~3 | ~5 |

Honest Limitations

TypeScript/JavaScript gets AST analysis. Python/Go/Java/Rust get regex-based parsing (good for graphs + chunking, not AST-precise).
Embeddings are TF-IDF cosine, not neural. ONNX infrastructure ready — neural model would add ~5-10% recall.
Learning needs ~5 feedback cycles to start influencing selection. First runs are pure pipeline.
Chunk extraction is regex-based — works for standard methods/functions, may miss DSLs or deeply nested code.
Benchmarked against naive baselines. Not compared against Cursor/Copilot internal context engines.

Contributing

git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
npm install && npm run build && npm test  # 1,133 tests

License

MIT