winnow-mcp

v0.2.3

Published

2 months ago

MCP server for Winnow neural code pruning - reduces LLM context size 23-38%

0High
0Medium
0Low

vadimcomanescu

mcp code-pruning llm token-optimization claude-code

winnow-mcp

Neural code pruning for AI agents. Cuts 23-38% of tokens from source code before your LLM sees it, keeping only the lines relevant to the task.

Before / After

Without Winnow, your agent reads the entire file:

agent reads mass_spring.py  -->  847 tokens  -->  LLM

With Winnow, the agent sets a focus question and gets only what matters:

agent reads mass_spring.py
  focus: "damping coefficient calculation"
  -->  Winnow prunes irrelevant lines
  -->  412 tokens  -->  LLM

Pruned output preserves AST structure: (filtered N lines) placeholders replace removed blocks so the LLM understands code layout without paying for every line.

Install

Get an API key at winnow.o8s.ai, then configure your client:

Claude Code

claude mcp add winnow -e WINNOW_API_KEY=sk_winnow_... -- npx -y winnow-mcp@latest

Codex CLI

codex mcp add winnow --env WINNOW_API_KEY=sk_winnow_... -- npx -y winnow-mcp@latest

Or add manually to ~/.codex/config.toml:

[mcp_servers.winnow]
command = "npx"
args = ["-y", "winnow-mcp@latest"]

[mcp_servers.winnow.env]
WINNOW_API_KEY = "sk_winnow_..."

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "winnow": {
      "command": "npx",
      "args": ["-y", "winnow-mcp@latest"],
      "env": {
        "WINNOW_API_KEY": "sk_winnow_..."
      }
    }
  }
}

VS Code

Add to .vscode/mcp.json:

{
  "inputs": [
    {
      "type": "promptString",
      "id": "winnowKey",
      "description": "Winnow API Key",
      "password": true
    }
  ],
  "servers": {
    "winnow": {
      "command": "npx",
      "args": ["-y", "winnow-mcp@latest"],
      "env": {
        "WINNOW_API_KEY": "${input:winnowKey}"
      }
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "winnow": {
      "command": "npx",
      "args": ["-y", "winnow-mcp@latest"],
      "env": {
        "WINNOW_API_KEY": "sk_winnow_..."
      }
    }
  }
}

Zed

Add to .zed/settings.json:

{
  "context_servers": {
    "winnow": {
      "source": "custom",
      "command": "npx",
      "args": ["-y", "winnow-mcp@latest"],
      "env": {
        "WINNOW_API_KEY": "sk_winnow_..."
      }
    }
  }
}

Other MCP clients

For any MCP client not listed above, use the mcpServers format with npx -y winnow-mcp@latest as the command.

To use a self-hosted API, add "WINNOW_API_URL": "https://your-instance.example.com" to the env block.

Hosted data handling

When WINNOW_API_URL points to hosted Winnow, the MCP server sends submitted code and query text to the API so the model can prune it. Hosted Winnow does not store submitted source code or query text in its application database and retains operational metadata such as token counts, latency, agent and model identifiers, threshold, and error state for billing, abuse prevention, and usage reporting.

If you need code never to leave your environment, point WINNOW_API_URL at a local or self-hosted Winnow instance instead.

Making agents prefer Winnow

Three mechanisms work together to make agents choose Winnow over built-in file tools:

MCP server instructions. During the MCP handshake, the server sends: "Use Winnow INSTEAD of Read/Grep for CODE files >200 lines." Claude Code and Codex surface this in the system prompt automatically.
Tool descriptions. Each tool says "PREFERRED over built-in Read/Grep", which influences the agent's tool selection when multiple options exist.
AGENTS.md reinforcement (recommended). For consistent behavior across all agents, add this to your project's AGENTS.md or CLAUDE.md:

## SWE-Pruner: MANDATORY for code files

**NEVER use Read/Grep on code files. Use SWE-Pruner MCP tools instead.**

| Instead of | Use |
|------------|-----|
| `Read` on `*.ts,js,py,go,rs,tsx,jsx,java,cpp,c,rb,php` | `mcp__winnow__read_file` with `context_focus_question` |
| `Grep` for code searches | `mcp__winnow__grep` with `context_focus_question` |

**Exceptions (use regular Read):**
- Markdown, JSON, YAML, config files
- When you need exact line numbers for editing (`force_full=true`)
- Files under 50 lines

MCP config alone works in most cases. For consistent behavior across all agents, add the AGENTS.md directive.

Tools

`read_file`

Read file with neural pruning. PREFERRED over built-in Read for code files >200 lines.

| Parameter | Type | Default | Description | |---|---|---|---| | path | string | required | Absolute or relative path to the file | | context_focus_question | string | "" | What you're looking for. Triggers pruning when set. | | prune_threshold | number | 0.5 | 0.7 aggressive, 0.5 balanced, 0.3 conservative | | force_full | boolean | false | Return full file (use when you need exact line numbers for editing) | | model | string | "" | Your model identifier for analytics |

Non-code files (.md, .json, .yaml, configs) are returned in full, unpruned.

`grep`

Search files with optional pruning of results. Uses ripgrep when available, falls back to a built-in scanner.

| Parameter | Type | Default | Description | |---|---|---|---| | pattern | string | required | Search pattern (regex by default, literal with regex=false) | | path | string | "." | File or directory to search (absolute or relative) | | context_focus_question | string | "" | Focus question to prune grep output | | prune_threshold | number | 0.5 | Pruning threshold | | recursive | boolean | true | Search subdirectories | | regex | boolean | true | Treat pattern as regex | | force_full | boolean | false | Skip pruning and size enforcement | | model | string | "" | Your model identifier for analytics |

`prune_code`

Prune code that's already in the conversation. Useful when code was pasted or returned by another tool and you want to trim it before reasoning over it.

| Parameter | Type | Default | Description | |---|---|---|---| | code | string | required | Source code to prune | | query | string | required | What to focus on | | threshold | number | 0.5 | Pruning threshold | | model | string | "" | Your model identifier for analytics |

Configuration

| Variable | Default | Description | |---|---|---| | WINNOW_API_KEY | required | Bearer token for the Winnow API | | WINNOW_API_URL | https://winnow.o8s.ai | API endpoint | | WINNOW_DEFAULT_THRESHOLD | 0.5 | Default pruning aggressiveness | | WINNOW_MAX_LINES | 200 | Files larger than this require a focus question (or force_full) | | WINNOW_MAX_CHARS | 20000 | Char limit before requiring a focus question | | WINNOW_ENFORCE_FOCUS | true | Set to false to disable the size guard | | WINNOW_ALLOWED_ROOTS | unset | Comma-separated allowlist of roots. Defaults to detected git root (or current working directory). | | WINNOW_BLOCK_SECRET_PATHS | true | Blocks .env*, key material, and common secret directories (.git, .ssh, .aws, .gnupg). |

Security defaults

Path access is restricted to allowed roots (WINNOW_ALLOWED_ROOTS) or the detected project root.
grep and manual fallback skip common secret paths by default.
ripgrep is executed without a shell (execFileSync) to avoid command-injection via patterns.

Runtime hardening (optional)

For local installs, you can run Node with permissions to reduce blast radius:

node --permission \
  --allow-fs-read=/your/project/root \
  --allow-child-process \
  dist/index.js

This is optional but recommended when you want stricter local process controls.

Client conformance matrix

The MCP package runs a real stdio compatibility matrix against common coding-agent client profiles:

claude-code
codex-cli
cursor

Each profile must pass:

list_tools schema parity (same tool contract across clients)
read_file and grep operational behavior
prune_code end-to-end request contract, including propagated client_info.agent

Run it with:

npm run test:conformance-matrix

How it works

Agent calls read_file with a focus question like "error handling logic"
MCP server reads the file from disk
If the file is source code and large enough, sends code + query to the Winnow API
Winnow's 0.6B neural model scores every token by relevance
Token scores are aggregated to line-level scores
Lines below the threshold are replaced with (filtered N lines) placeholders
Pruned code is returned to the agent

The model prunes along AST boundaries rather than cutting mid-expression.

Supported languages

C, C#, C++, Clojure, Dart, Elixir, Erlang, Go, Haskell, Java, JavaScript, Julia, Kotlin, Lua, Nim, Objective-C, OCaml, Perl, PHP, PowerShell, Python, R, Ruby, Rust, Scala, Shell (bash/zsh/fish), Svelte, Swift, TypeScript, Verilog, Vue, Zig.

Non-code files are always returned in full.

Threshold guide

| Threshold | Behavior | Use when | |---|---|---| | 0.3 | Conservative. Keeps most code. | You need broad context, small savings are fine | | 0.5 | Balanced. Good default. | General-purpose reading and exploration | | 0.7 | Aggressive. Keeps only high-relevance lines. | Large files, focused questions, maximum savings |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

winnow-mcp

Before / After

Install

Claude Code

Codex CLI

Cursor

VS Code

Windsurf

Zed

Other MCP clients

Hosted data handling

Making agents prefer Winnow

Tools

read_file

grep

prune_code

Configuration

Security defaults

Runtime hardening (optional)

Client conformance matrix

How it works

Supported languages

Threshold guide

License

`read_file`

`grep`

`prune_code`