winnow-mcp
v0.2.3
Published
MCP server for Winnow neural code pruning - reduces LLM context size 23-38%
Maintainers
Readme
winnow-mcp
Neural code pruning for AI agents. Cuts 23-38% of tokens from source code before your LLM sees it, keeping only the lines relevant to the task.
Before / After
Without Winnow, your agent reads the entire file:
agent reads mass_spring.py --> 847 tokens --> LLMWith Winnow, the agent sets a focus question and gets only what matters:
agent reads mass_spring.py
focus: "damping coefficient calculation"
--> Winnow prunes irrelevant lines
--> 412 tokens --> LLMPruned output preserves AST structure: (filtered N lines) placeholders replace removed blocks so the LLM understands code layout without paying for every line.
Install
Get an API key at winnow.o8s.ai, then configure your client:
Claude Code
claude mcp add winnow -e WINNOW_API_KEY=sk_winnow_... -- npx -y winnow-mcp@latestCodex CLI
codex mcp add winnow --env WINNOW_API_KEY=sk_winnow_... -- npx -y winnow-mcp@latestOr add manually to ~/.codex/config.toml:
[mcp_servers.winnow]
command = "npx"
args = ["-y", "winnow-mcp@latest"]
[mcp_servers.winnow.env]
WINNOW_API_KEY = "sk_winnow_..."Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"winnow": {
"command": "npx",
"args": ["-y", "winnow-mcp@latest"],
"env": {
"WINNOW_API_KEY": "sk_winnow_..."
}
}
}
}VS Code
Add to .vscode/mcp.json:
{
"inputs": [
{
"type": "promptString",
"id": "winnowKey",
"description": "Winnow API Key",
"password": true
}
],
"servers": {
"winnow": {
"command": "npx",
"args": ["-y", "winnow-mcp@latest"],
"env": {
"WINNOW_API_KEY": "${input:winnowKey}"
}
}
}
}Windsurf
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"winnow": {
"command": "npx",
"args": ["-y", "winnow-mcp@latest"],
"env": {
"WINNOW_API_KEY": "sk_winnow_..."
}
}
}
}Zed
Add to .zed/settings.json:
{
"context_servers": {
"winnow": {
"source": "custom",
"command": "npx",
"args": ["-y", "winnow-mcp@latest"],
"env": {
"WINNOW_API_KEY": "sk_winnow_..."
}
}
}
}Other MCP clients
For any MCP client not listed above, use the mcpServers format with npx -y winnow-mcp@latest as the command.
To use a self-hosted API, add "WINNOW_API_URL": "https://your-instance.example.com" to the env block.
Hosted data handling
When WINNOW_API_URL points to hosted Winnow, the MCP server sends submitted code and query
text to the API so the model can prune it. Hosted Winnow does not store submitted source code
or query text in its application database and retains operational metadata such as token counts,
latency, agent and model identifiers, threshold, and error state for billing, abuse prevention,
and usage reporting.
If you need code never to leave your environment, point WINNOW_API_URL at a local or
self-hosted Winnow instance instead.
Making agents prefer Winnow
Three mechanisms work together to make agents choose Winnow over built-in file tools:
MCP server instructions. During the MCP handshake, the server sends: "Use Winnow INSTEAD of Read/Grep for CODE files >200 lines." Claude Code and Codex surface this in the system prompt automatically.
Tool descriptions. Each tool says "PREFERRED over built-in Read/Grep", which influences the agent's tool selection when multiple options exist.
AGENTS.md reinforcement (recommended). For consistent behavior across all agents, add this to your project's
AGENTS.mdorCLAUDE.md:
## SWE-Pruner: MANDATORY for code files
**NEVER use Read/Grep on code files. Use SWE-Pruner MCP tools instead.**
| Instead of | Use |
|------------|-----|
| `Read` on `*.ts,js,py,go,rs,tsx,jsx,java,cpp,c,rb,php` | `mcp__winnow__read_file` with `context_focus_question` |
| `Grep` for code searches | `mcp__winnow__grep` with `context_focus_question` |
**Exceptions (use regular Read):**
- Markdown, JSON, YAML, config files
- When you need exact line numbers for editing (`force_full=true`)
- Files under 50 linesMCP config alone works in most cases. For consistent behavior across all agents, add the AGENTS.md directive.
Tools
read_file
Read file with neural pruning. PREFERRED over built-in Read for code files >200 lines.
| Parameter | Type | Default | Description |
|---|---|---|---|
| path | string | required | Absolute or relative path to the file |
| context_focus_question | string | "" | What you're looking for. Triggers pruning when set. |
| prune_threshold | number | 0.5 | 0.7 aggressive, 0.5 balanced, 0.3 conservative |
| force_full | boolean | false | Return full file (use when you need exact line numbers for editing) |
| model | string | "" | Your model identifier for analytics |
Non-code files (.md, .json, .yaml, configs) are returned in full, unpruned.
grep
Search files with optional pruning of results. Uses ripgrep when available, falls back to a built-in scanner.
| Parameter | Type | Default | Description |
|---|---|---|---|
| pattern | string | required | Search pattern (regex by default, literal with regex=false) |
| path | string | "." | File or directory to search (absolute or relative) |
| context_focus_question | string | "" | Focus question to prune grep output |
| prune_threshold | number | 0.5 | Pruning threshold |
| recursive | boolean | true | Search subdirectories |
| regex | boolean | true | Treat pattern as regex |
| force_full | boolean | false | Skip pruning and size enforcement |
| model | string | "" | Your model identifier for analytics |
prune_code
Prune code that's already in the conversation. Useful when code was pasted or returned by another tool and you want to trim it before reasoning over it.
| Parameter | Type | Default | Description |
|---|---|---|---|
| code | string | required | Source code to prune |
| query | string | required | What to focus on |
| threshold | number | 0.5 | Pruning threshold |
| model | string | "" | Your model identifier for analytics |
Configuration
| Variable | Default | Description |
|---|---|---|
| WINNOW_API_KEY | required | Bearer token for the Winnow API |
| WINNOW_API_URL | https://winnow.o8s.ai | API endpoint |
| WINNOW_DEFAULT_THRESHOLD | 0.5 | Default pruning aggressiveness |
| WINNOW_MAX_LINES | 200 | Files larger than this require a focus question (or force_full) |
| WINNOW_MAX_CHARS | 20000 | Char limit before requiring a focus question |
| WINNOW_ENFORCE_FOCUS | true | Set to false to disable the size guard |
| WINNOW_ALLOWED_ROOTS | unset | Comma-separated allowlist of roots. Defaults to detected git root (or current working directory). |
| WINNOW_BLOCK_SECRET_PATHS | true | Blocks .env*, key material, and common secret directories (.git, .ssh, .aws, .gnupg). |
Security defaults
- Path access is restricted to allowed roots (
WINNOW_ALLOWED_ROOTS) or the detected project root. grepand manual fallback skip common secret paths by default.- ripgrep is executed without a shell (
execFileSync) to avoid command-injection via patterns.
Runtime hardening (optional)
For local installs, you can run Node with permissions to reduce blast radius:
node --permission \
--allow-fs-read=/your/project/root \
--allow-child-process \
dist/index.jsThis is optional but recommended when you want stricter local process controls.
Client conformance matrix
The MCP package runs a real stdio compatibility matrix against common coding-agent client profiles:
claude-codecodex-clicursor
Each profile must pass:
list_toolsschema parity (same tool contract across clients)read_fileandgrepoperational behaviorprune_codeend-to-end request contract, including propagatedclient_info.agent
Run it with:
npm run test:conformance-matrixHow it works
- Agent calls
read_filewith a focus question like"error handling logic" - MCP server reads the file from disk
- If the file is source code and large enough, sends code + query to the Winnow API
- Winnow's 0.6B neural model scores every token by relevance
- Token scores are aggregated to line-level scores
- Lines below the threshold are replaced with
(filtered N lines)placeholders - Pruned code is returned to the agent
The model prunes along AST boundaries rather than cutting mid-expression.
Supported languages
C, C#, C++, Clojure, Dart, Elixir, Erlang, Go, Haskell, Java, JavaScript, Julia, Kotlin, Lua, Nim, Objective-C, OCaml, Perl, PHP, PowerShell, Python, R, Ruby, Rust, Scala, Shell (bash/zsh/fish), Svelte, Swift, TypeScript, Verilog, Vue, Zig.
Non-code files are always returned in full.
Threshold guide
| Threshold | Behavior | Use when |
|---|---|---|
| 0.3 | Conservative. Keeps most code. | You need broad context, small savings are fine |
| 0.5 | Balanced. Good default. | General-purpose reading and exploration |
| 0.7 | Aggressive. Keeps only high-relevance lines. | Large files, focused questions, maximum savings |
License
MIT
