toolsift
v0.1.0
Published
Local, model-agnostic MCP proxy that fixes tool-definition bloat. Sits between your agent and N upstream MCP servers and exposes a tiny surface — search_tools + invoke_tool — so the agent retrieves only the few tools it needs instead of paying for 50+ JSO
Downloads
26
Maintainers
Readme
toolsift
A local, model-agnostic MCP proxy that fixes tool-definition bloat. It sits between your agent and N upstream MCP servers and exposes a tiny surface —
search_tools+invoke_tool— so the agent retrieves only the few tools it needs instead of paying for 50+ JSON definitions up front.
When an agent is wired to several MCP servers, their tool definitions can eat a
huge slice of the context window before the first message — schemas,
descriptions, and parameter lists for dozens of tools the agent won't use this
turn. toolsift connects to your upstream servers as a client, aggregates their
tools into a local index, and re-exposes just two meta-tools to the agent. The
agent calls search_tools("open a github issue") to pull back the handful of
relevant definitions, then invoke_tool(...) to run one. This is the
RAG-MCP / lazy-tool-loading pattern.
100% local. No API keys, no network beyond your own upstream servers, no telemetry. BM25 retrieval is the zero-dependency default.
Why this matters
RAG-MCP showed that retrieving tools on demand instead of loading them all raises tool-selection accuracy (from ~14% with the all-tools baseline to ~43% with retrieval) while cutting prompt tokens by ~50%. The intuition: a smaller, relevant tool list is easier for the model to choose from and cheaper to carry. toolsift brings that pattern to any MCP client, and ships an eval harness so you can measure the token side of it on your own toolset.
Install
npx toolsift mcp # zero-install, or:
npm i -g toolsift # then the `toolsift` command is on your PATHRequires Node ≥ 18.
Quick start
1. Tell toolsift about your upstream servers. Create toolsift.json in your
project (see toolsift.example.json):
{
"servers": [
{
"name": "github",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_…" }
},
{
"name": "fs",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
}
],
"pinned": ["read_file"],
"retriever": "bm25",
"topK": 5
}pinned tools are exposed directly (pass-through) so always-needed tools skip
the search. HTTP upstreams work too: { "name": "remote", "url": "https://…/mcp" }.
2. Wire toolsift into your agents — the one-liner:
npx toolsift install # adds toolsift to Claude Code + CursorThat writes the MCP server config (non-destructively). Restart the agent and
you're done. --agent claude|cursor targets one; --global writes to your home
dir instead of the project.
Prefer to wire it by hand? The server command is npx -y toolsift mcp.
Claude Code (.mcp.json or claude mcp add):
{
"mcpServers": {
"toolsift": { "command": "npx", "args": ["-y", "toolsift", "mcp"] }
}
}Cursor / Codex / any MCP client: use the same command —
npx -y toolsift mcp over stdio.
MCP tools the agent sees
| tool | what it does |
|------|--------------|
| search_tools(query, k?) | retrieve the top-k relevant upstream tool definitions (server, name, description, inputSchema) for a task |
| invoke_tool(server, name, arguments) | run one upstream tool and proxy its result back unchanged |
| list_servers() | summary of the upstream servers and their tool counts (orientation) |
| each pinned tool | exposed directly as a pass-through, no search needed |
Tool descriptions are written for the agent — they nudge it to search first and invoke second, instead of expecting every tool to be in context.
The eval harness — measure it on your own toolset
This is the part to actually run. toolsift eval is a deterministic,
no-API-key benchmark that scores retrieval against a labelled
query→tool dataset and reports, per retriever:
- recall@k — is the right tool in the top-k retrieved?
- MRR — how high does it rank, on average?
- token reduction — tokens of the all-tools surface (what your agent pays
today) vs the surface toolsift exposes, via
gpt-tokenizer.
Run it with no arguments for BM25 on the built-in toolset, or --embeddings to
also score the semantic and hybrid retrievers (28 tools across GitHub / Slack /
filesystem / Postgres / calendar):
$ toolsift eval --embeddings
tools: 28 · queries: 28 · k: 5 · baseline surface: 1342 tokens
retriever recall@5 mrr tokens saved
---------- -------- ---- ------ -----
all-tools 1.00 1.00 1342 0%
bm25 0.89 0.67 223 83%
embeddings 1.00 0.90 242 82%
hybrid 0.96 0.80 236 82%
→ embeddings: recall@5 1.00 · 82% fewer tokens than loading every toolThe all-tools baseline trivially has recall 1.0 and 0% reduction — that's what every agent pays right now. Retrieval keeps the right tool in the top-k while cutting the surface the model carries by ~80%. On a realistic 60-tool multi-server setup that becomes ~92% fewer tokens at 0.83 recall@5 (embeddings). Full methodology, per-domain generalization, and honest caveats are in BENCHMARKS.md.
Point it at your own toolset — the viral artifact:
toolsift eval --config toolsift.json --dataset my-queries.jsonwhere my-queries.json is an array (or { "examples": [...] }) of:
[
{ "query": "open a bug report", "expectedServer": "github", "expectedTool": "create_issue" },
{ "query": "post to the team channel", "expectedServer": "slack", "expectedTool": "send_message" }
]toolsift connects to your configured upstreams, pulls their real tool list, and
tells you exactly how many tokens you'd save and whether retrieval still finds
the tools you need. An end-to-end --llm accuracy mode (real model, needs an API
key) is planned; the default eval stays deterministic and key-free.
How it works
- Aggregate. toolsift connects to each configured upstream MCP server as a
client (stdio or HTTP), calls
tools/list, and builds one in-memory registry of{ server, name, description, inputSchema }. - Index. The registry is indexed by a pluggable retriever. The default is a self-contained BM25 over each tool's name (weighted), description, and parameter names — zero dependencies, instant, and tuned for the camelCase / snake_case identifiers tools use. The corpus is tiny (dozens to hundreds of tools), so it lives in memory; no vector DB.
- Expose. toolsift runs as its own MCP server over stdio and advertises
search_tools+invoke_tool(+ any pinned tools), not the upstream definitions. The agent searches, gets a few schemas, and invokes. - Proxy.
invoke_toolforwards the call to the owning upstream and returns its result unchanged.
Programmatic API
Everything is importable:
import { UpstreamManager, createServer, createRetriever, runEval } from "toolsift";
const manager = new UpstreamManager(config.servers);
await manager.connect();
const retriever = createRetriever("hybrid"); // or "bm25" | "embeddings"
await retriever.index(manager.tools());
await retriever.search("open a github issue", 5); // → scored tool refs
// inject your own embedding backend (model-agnostic):
const custom = createRetriever("embeddings", { embed: async (texts) => myApi(texts) });
// build the proxy MCP server directly:
const server = createServer({ manager, config });
// or benchmark a toolset yourself (compare retrievers):
const report = await runEval(manager.tools(), dataset, {
k: 5,
retrievers: ["all-tools", "bm25", "embeddings", "hybrid"],
});Retrievers
| retriever | needs | notes |
|-----------|-------|-------|
| bm25 | nothing (default) | self-contained lexical retrieval; zero-dependency, in-memory, instant. Excels at identifier/keyword queries. |
| embeddings | @huggingface/transformers (optional peer) | semantic retrieval via a small model run locally (no API key, model fetched once and cached). Catches paraphrases BM25 misses. |
| hybrid | @huggingface/transformers | fuses BM25 + embeddings with Reciprocal Rank Fusion — best recall of the three. |
npm i @huggingface/transformers # opt in to embeddings / hybridSet "retriever": "hybrid" in toolsift.json. The embedding backend is
pluggable — inject any EmbedFn (a hosted embedding API, a different local
model) via the programmatic API; BM25 stays the zero-install default so the core
package is lean.
Limitations
- The default retriever is lexical (BM25) and excels at identifier/keyword
queries; install
@huggingface/transformersand set"retriever": "hybrid"for paraphrase-heavy toolsets (it fuses lexical + semantic). - Tool definitions are aggregated at startup (
tools/list); a refresh hook exists (UpstreamManager.refresh()) but live upstream tool-list change subscriptions aren't wired into the proxy yet. pinnedmatches by tool name across servers; if two upstreams expose the same tool name, pin by reviewing your config.
Roadmap
--llmeval mode — end-to-end tool-selection accuracy with a real model.- Live tool-list refresh — react to upstream
notifications/tools/list_changed. - Persistent embedding cache — reuse tool embeddings across restarts.
Contributing
Contributions are very welcome — toolsift is small, fully tested, and has no network in its test suite. See CONTRIBUTING.md.
pnpm install && pnpm test && pnpm typecheck && pnpm buildLicense
MIT © Abdulmunim Jemal
