@portkey-ai/mcp-tool-filter
v1.0.1
Published
Ultra-fast semantic tool filtering for MCP servers using embedding similarity
Readme
@portkey-ai/mcp-tool-filter
Ultra-fast semantic tool filtering for MCP (Model Context Protocol) servers using embedding similarity. Reduce your tool context from 1000+ tools down to the most relevant 10-20 tools in under 10ms.
Features
- ⚡ Lightning Fast: <10ms filtering latency for 1000+ tools with built-in optimizations
- 🚀 Performance Optimized: 6-8x faster dot product, smart top-K selection, true LRU cache
- 🎯 Semantic Understanding: Uses embeddings for intelligent tool matching
- 📦 Zero Dependencies on Runtime: Only requires an embedding provider API
- 🔄 Flexible Input: Accepts chat completion messages or raw strings
- 💾 Smart Caching: Caches embeddings and context for optimal performance
- 🎛️ Configurable: Tune scoring thresholds, top-k, and always-include tools
- 📊 Performance Metrics: Built-in timing for optimization
Installation
npm install @portkey-ai/mcp-tool-filterQuick Start
import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter';
// 1. Initialize the filter (choose embedding provider)
// Option A: Local Embeddings (RECOMMENDED for low latency < 5ms)
const filter = new MCPToolFilter({
embedding: {
provider: 'local',
}
});
// Option B: API Embeddings (for highest accuracy)
const filter = new MCPToolFilter({
embedding: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
}
});
// 2. Load your MCP servers (one-time setup)
await filter.initialize(mcpServers);
// 3. Filter tools based on context
const result = await filter.filter(
"Search my emails for the Q4 budget discussion"
);
// 4. Use the filtered tools in your LLM request
console.log(result.tools); // Top 20 most relevant tools
console.log(result.metrics.totalTime); // e.g., "2ms" for local, "500ms" for APIEmbedding Provider Options
Local Embeddings (Recommended)
Pros:
- ⚡ Ultra-fast: 1-5ms latency
- 🔒 Private: No data sent to external APIs
- 💰 Free: No API costs
- 🌐 Offline: Works without internet
Cons:
- Slightly lower accuracy than API models
- First initialization downloads model (~25MB)
const filter = new MCPToolFilter({
embedding: {
provider: 'local',
model: 'Xenova/all-MiniLM-L6-v2', // Optional: default model
quantized: true, // Optional: use quantized model for speed (default: true)
}
});Available Models:
Xenova/all-MiniLM-L6-v2(default) - 384 dimensions, very fastXenova/all-MiniLM-L12-v2- 384 dimensions, more accurateXenova/bge-small-en-v1.5- 384 dimensions, good balanceXenova/bge-base-en-v1.5- 768 dimensions, higher quality
Performance:
- Initialization: 100ms-4s (one-time, downloads model)
- Filter request: 1-5ms
- Cached request: <1ms
API Embeddings
For highest accuracy, use OpenAI or other API providers:
const filter = new MCPToolFilter({
embedding: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small', // Optional
dimensions: 384, // Optional: match local model for fair comparison
}
});Pros:
- 🎯 Highest accuracy: 5-15% better than local
- 🔄 Easy to switch models
- 🌐 No local resources needed
Cons:
- 🐌 Slow: 400-800ms per request
- 💰 Costs money: ~$0.02 per 1M tokens
- 🔒 Data sent to external API
- 📶 Requires internet connection
Performance:
- Initialization: 200ms-60s (depends on tool count)
- Filter request: 400-800ms
- Cached request: 1-3ms
Quick Comparison
| Aspect | Local | API | Winner | |--------|-------|-----|--------| | Speed | 1-5ms | 400-800ms | 🏆 Local (200x faster) | | Accuracy | Good (85-90%) | Best (100%) | 🏆 API | | Cost | Free | ~$0.02/1M tokens | 🏆 Local | | Privacy | Fully local | Data sent to API | 🏆 Local | | Offline | ✅ Works offline | ❌ Needs internet | 🏆 Local | | Setup | Zero config | Needs API key | 🏆 Local |
📊 See TRADEOFFS.md for detailed analysis
MCP Server JSON Format
The library expects an array of MCP servers with the following structure:
[
{
"id": "gmail",
"name": "Gmail MCP Server",
"description": "Email management tools",
"categories": ["email", "communication"],
"tools": [
{
"name": "search_gmail_messages",
"description": "Search and find email messages in Gmail inbox. Use when user wants to find, search, look up emails...",
"keywords": ["email", "search", "inbox", "messages"],
"category": "email-search",
"inputSchema": {
"type": "object",
"properties": {
"q": { "type": "string" }
}
}
}
]
}
]Field Descriptions
Required Fields:
id: Unique identifier for the servername: Human-readable server nametools: Array of tool definitionsname: Unique tool namedescription: Rich description of what the tool does and when to use it
Optional but Recommended:
description: Server-level descriptioncategories: Array of category tags for hierarchical filteringkeywords: Array of synonym/related terms for better matchingcategory: Tool-level categoryinputSchema: JSON schema for parameters (parameter names are used for matching)
Tips for Best Results
Rich Descriptions: Write detailed descriptions with use cases
"description": "Search emails in Gmail. Use when user wants to find, lookup, or retrieve messages, correspondence, or mail."Add Keywords: Include synonyms and variations
"keywords": ["email", "mail", "inbox", "messages", "correspondence"]Mention Use Cases: Explicitly state when to use the tool
"description": "... Use when user wants to draft, compose, write, or prepare an email to send later."
API Reference
MCPToolFilter
Main class for tool filtering.
Constructor
new MCPToolFilter(config: MCPToolFilterConfig)Config Options:
{
embedding: {
// Local embeddings (recommended)
provider: 'local',
model?: string, // Default: 'Xenova/all-MiniLM-L6-v2'
quantized?: boolean, // Default: true
// OR API embeddings
provider: 'openai' | 'voyage' | 'cohere',
apiKey: string,
model?: string, // Default: 'text-embedding-3-small'
dimensions?: number, // Default: 1536 (or 384 for local)
baseURL?: string, // For custom endpoints
},
defaultOptions?: {
topK?: number, // Default: 20
minScore?: number, // Default: 0.3
contextMessages?: number, // Default: 3
alwaysInclude?: string[], // Always include these tools
exclude?: string[], // Never include these tools
maxContextTokens?: number, // Default: 500
},
includeServerDescription?: boolean, // Default: false (see below)
debug?: boolean // Enable debug logging
}About includeServerDescription:
When enabled, this option includes the MCP server description in the tool embeddings, providing additional context about the domain/category of tools.
// Enable server descriptions in embeddings
const filter = new MCPToolFilter({
embedding: { provider: 'local' },
includeServerDescription: true // Default: false
});Tradeoffs:
- ✅ Helps: General intent queries like "manage my local files" (+25% improvement)
- ❌ Hurts: Specific tool queries like "Execute this SQL query" (-50% degradation)
- ≈ Neutral: Overall impact is neutral (0% change)
Recommendation: Keep this disabled (default: false) unless your use case primarily involves high-level intent queries. See examples/benchmark-server-description.ts for detailed benchmarks.
Methods
initialize(servers: MCPServer[]): Promise<void>
Initialize the filter with MCP servers. This precomputes and caches all tool embeddings.
Note: Call this once during startup. It's an async operation that may take a few seconds depending on the number of tools.
await filter.initialize(servers);filter(input: FilterInput, options?: FilterOptions): Promise<FilterResult>
Filter tools based on the input context.
Input Types:
// String input
await filter.filter("Search my emails about the project");
// Chat messages
await filter.filter([
{ role: 'user', content: 'What meetings do I have today?' },
{ role: 'assistant', content: 'Let me check your calendar.' }
]);Options (all optional, override defaults):
{
topK?: number, // Max tools to return
minScore?: number, // Minimum similarity score (0-1)
contextMessages?: number, // How many recent messages to use
alwaysInclude?: string[], // Tool names to always include
exclude?: string[], // Tool names to exclude
maxContextTokens?: number, // Max context size
}Returns:
{
tools: ScoredTool[], // Filtered and ranked tools
metrics: {
totalTime: number, // Total time in ms
embeddingTime: number, // Time to embed context
similarityTime: number, // Time to compute similarities
toolsEvaluated: number, // Total tools evaluated
}
}getStats()
Get statistics about the filter state.
const stats = filter.getStats();
// {
// initialized: true,
// toolCount: 25,
// cacheSize: 5,
// embeddingDimensions: 1536
// }clearCache()
Clear the context embedding cache.
filter.clearCache();Performance Optimization
Built-in Optimizations
The library includes several performance optimizations out of the box:
- 🚀 Loop-Unrolled Dot Product - Vector similarity computation is 6-8x faster through CPU pipeline optimization
- 📊 Smart Top-K Selection - Hybrid algorithm uses fast built-in sort for typical workloads, switches to heap-based selection for 500+ tools
- 💾 True LRU Cache - Intelligent cache eviction based on access patterns, not just insertion order
- 🎯 In-Place Operations - Reduced memory allocations through in-place vector normalization
- ⚡ Set-Based Lookups - O(1) exclusion checking instead of O(n) array scanning
These optimizations are automatic and transparent - no configuration needed!
Latency Breakdown
Typical performance for 1000 tools:
Building context: <1ms
Embedding API call: 3-5ms (cached: 0ms)
Similarity computation: 1-2ms (6-8x faster with optimizations)
Sorting/filtering: <1ms (hybrid algorithm)
─────────────────────────────
Total: 5-9msUser Configuration Tips
Use Smaller Embeddings: 512 or 1024 dimensions for faster computation
embedding: { provider: 'openai', model: 'text-embedding-3-small', dimensions: 512 // Faster than 1536 }Reduce Context Size: Fewer messages = faster embedding
defaultOptions: { contextMessages: 2, // Instead of 3-5 maxContextTokens: 300 }Leverage Caching: Identical contexts reuse cached embeddings (0ms)
Tune topK: Request fewer tools if you don't need 20
await filter.filter(input, { topK: 10 });
Performance Benchmarks
Micro-benchmarks showing optimization improvements:
Dot Product (1536 dims): 0.001ms vs 0.006ms (6x faster)
Vector Normalization: 0.003ms vs 0.006ms (2x faster)
Top-K Selection (<500 tools): Uses optimized built-in sort
Top-K Selection (500+ tools): O(n log k) heap-based selection
LRU Cache Access: True access-order trackingSee the existing benchmark examples for end-to-end performance testing:
npx ts-node examples/benchmark.tsIntegration Examples
With Portkey AI Gateway
import Portkey from 'portkey-ai';
import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter';
const portkey = new Portkey({ apiKey: '...' });
const filter = new MCPToolFilter({ /* ... */ });
await filter.initialize(mcpServers);
// Filter tools based on conversation
const { tools } = await filter.filter(messages);
// Convert to OpenAI tool format
const openaiTools = tools.map(t => ({
type: 'function',
function: {
name: t.toolName,
description: t.tool.description,
parameters: t.tool.inputSchema,
}
}));
// Make LLM request with filtered tools
const completion = await portkey.chat.completions.create({
model: 'gpt-4',
messages: messages,
tools: openaiTools,
});With LangChain
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter';
const filter = new MCPToolFilter({ /* ... */ });
await filter.initialize(mcpServers);
// Create a custom tool selector
async function selectTools(messages) {
const { tools } = await filter.filter(messages);
return tools.map(t => convertToLangChainTool(t));
}
// Use in your agent
const model = new ChatOpenAI();
const tools = await selectTools(messages);
const response = await model.invoke(messages, { tools });Caching Strategy
// Recommended: Initialize once at startup
let filterInstance: MCPToolFilter;
async function getFilter() {
if (!filterInstance) {
filterInstance = new MCPToolFilter({ /* ... */ });
await filterInstance.initialize(mcpServers);
}
return filterInstance;
}
// Use in request handlers
app.post('/chat', async (req, res) => {
const filter = await getFilter();
const result = await filter.filter(req.body.messages);
// ... use filtered tools
});Benchmarks
Performance on various tool counts (M1 Max):
Local Embeddings (Xenova/all-MiniLM-L6-v2):
| Tools | Initialization | Filter (Cold) | Filter (Cached) | |-------|---------------|---------------|-----------------| | 10 | ~100ms | 2ms | <1ms | | 100 | ~500ms | 3ms | <1ms | | 500 | ~2s | 4ms | 1ms | | 1000 | ~4s | 5ms | 1ms | | 5000 | ~20s | 8ms | 2ms |
API Embeddings (OpenAI text-embedding-3-small):
| Tools | Initialization | Filter (Cold) | Filter (Cached) | |-------|---------------|---------------|-----------------| | 10 | ~200ms | 500ms | 1ms | | 100 | ~1.5s | 550ms | 2ms | | 500 | ~6s | 600ms | 2ms | | 1000 | ~12s | 650ms | 3ms | | 5000 | ~60s | 800ms | 4ms |
Key Takeaways:
- 🚀 Local embeddings are 200-300x faster for filter requests
- ✅ Local embeddings meet the <50ms target easily
- 💰 Local embeddings have no API costs
- 📊 API embeddings may have slightly higher accuracy
- ⚡ Both benefit significantly from caching
Note: Initialization is a one-time cost. Choose local embeddings for low latency, API embeddings for maximum accuracy.
When to Use Local vs API Embeddings
Use Local Embeddings when:
- ⚡ You need ultra-low latency (<10ms)
- 🔒 Privacy is important (no external API calls)
- 💰 You want zero API costs
- 🌐 You need offline operation
- 📊 "Good enough" accuracy is acceptable
Use API Embeddings when:
- 🎯 You need maximum accuracy
- 🌍 You have good internet connectivity
- 💵 API costs are not a concern
- 📈 You're dealing with complex/nuanced queries
Recommendation: Start with local embeddings. Only switch to API if accuracy is insufficient.
Testing Local vs API
Compare performance for your use case:
npx ts-node examples/test-local-embeddings.tsThis will benchmark both providers and show you:
- Initialization time
- Average filter time
- Cached filter time
- Speed comparison
Debugging & Performance Monitoring
Enable Debug Logging
To see detailed timing logs for each request, enable debug mode:
const filter = new MCPToolFilter({
embedding: { /* ... */ },
debug: true // Enable detailed timing logs
});This will output detailed logs for each filter request:
=== Starting filter request ===
[1/5] Options merged: 0.12ms
[2/5] Context built (156 chars): 0.34ms
[3/5] Cache MISS (lookup: 0.08ms)
→ Embedding generated: 1247.56ms
[4/5] Similarities computed: 1.23ms (25 tools, 0.049ms/tool)
[5/5] Tools selected & ranked: 0.15ms (5 tools returned)
=== Total filter time: 1249.48ms ===
Breakdown: merge=0.12ms, context=0.34ms, cache=0.08ms, embedding=1247.56ms, similarity=1.23ms, selection=0.15msTiming Breakdown
Each filter request logs 5 steps:
- Options Merging (
merge): Merge provided options with defaults - Context Building (
context): Build the context string from input messages - Cache Lookup & Embedding (
cache+embedding):- Cache HIT: 0ms embedding time (reuses cached embedding)
- Cache MISS: Calls embedding API (typically 200-2000ms depending on provider)
- Similarity Computation (
similarity): Compute cosine similarity for all tools- Also shows per-tool average time
- Tool Selection (
selection): Filter by score and select top-K tools
Example: Testing Timings
See examples/test-timings.ts for a complete example:
export OPENAI_API_KEY=your-key-here
npx ts-node examples/test-timings.tsThis will run multiple filter requests showing:
- Cache miss vs cache hit performance
- Different query types
- Chat message context handling
Performance Metrics
Every filter request returns detailed metrics:
const result = await filter.filter(input);
console.log(result.metrics);
// {
// totalTime: 1249.48, // Total request time in ms
// embeddingTime: 1247.56, // Time spent on embedding API
// similarityTime: 1.23, // Time computing similarities
// toolsEvaluated: 25 // Number of tools evaluated
// }Monitoring in Production
const result = await filter.filter(messages);
// Log metrics for monitoring
logger.info('Tool filter performance', {
totalTime: result.metrics.totalTime,
embeddingTime: result.metrics.embeddingTime,
cached: result.metrics.embeddingTime === 0,
toolsReturned: result.tools.length,
});
// Alert if too slow
if (result.metrics.totalTime > 5000) {
logger.warn('Slow filter request', result.metrics);
}Advanced Usage
Two-Stage Filtering
For very large tool sets, use hierarchical filtering:
// Stage 1: Filter by server categories
const relevantServers = mcpServers.filter(server =>
server.categories?.some(cat => userIntent.includes(cat))
);
// Stage 2: Filter tools within relevant servers
const result = await filter.filter(messages);Custom Scoring
Combine embedding similarity with keyword matching:
const { tools } = await filter.filter(input);
// Boost tools with exact keyword matches
const boostedTools = tools.map(tool => {
const hasKeywordMatch = tool.tool.keywords?.some(kw =>
input.toLowerCase().includes(kw.toLowerCase())
);
return {
...tool,
score: hasKeywordMatch ? tool.score * 1.2 : tool.score
};
}).sort((a, b) => b.score - a.score);Always-Include Power Tools
Always include certain essential tools:
const filter = new MCPToolFilter({
// ...
defaultOptions: {
alwaysInclude: [
'web_search', // Always useful
'conversation_search', // Access to context
],
}
});Troubleshooting
Slow First Request
Problem: First filter call is slow.
Solution: The embedding API call takes 3-5ms. Subsequent calls with similar context are cached and much faster.
// Warm up the cache
await filter.filter("hello"); // ~5ms
await filter.filter("hello"); // ~1ms (cached)Poor Tool Selection
Problem: Wrong tools are being selected.
Solutions:
- Improve tool descriptions with more keywords and use cases
- Lower the
minScorethreshold - Increase
topKto include more tools - Add important tools to
alwaysInclude
Memory Usage
Problem: High memory usage with many tools.
Solution: Use smaller embedding dimensions:
embedding: {
dimensions: 512 // Instead of 1536
}This reduces memory by ~66% with minimal accuracy loss.
License
MIT
Contributing
Contributions welcome! Please open an issue or PR.
Support
- GitHub Issues: github.com/portkey-ai/mcp-tool-filter
- Email: [email protected]
