@portkey-ai/mcp-tool-filter

v1.0.1

Published

a month ago

Ultra-fast semantic tool filtering for MCP servers using embedding similarity

@portkey-ai/mcp-tool-filter

Ultra-fast semantic tool filtering for MCP (Model Context Protocol) servers using embedding similarity. Reduce your tool context from 1000+ tools down to the most relevant 10-20 tools in under 10ms.

Features

⚡ Lightning Fast: <10ms filtering latency for 1000+ tools with built-in optimizations
🚀 Performance Optimized: 6-8x faster dot product, smart top-K selection, true LRU cache
🎯 Semantic Understanding: Uses embeddings for intelligent tool matching
📦 Zero Dependencies on Runtime: Only requires an embedding provider API
🔄 Flexible Input: Accepts chat completion messages or raw strings
💾 Smart Caching: Caches embeddings and context for optimal performance
🎛️ Configurable: Tune scoring thresholds, top-k, and always-include tools
📊 Performance Metrics: Built-in timing for optimization

Installation

npm install @portkey-ai/mcp-tool-filter

Quick Start

import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter';

// 1. Initialize the filter (choose embedding provider)

// Option A: Local Embeddings (RECOMMENDED for low latency < 5ms)
const filter = new MCPToolFilter({
  embedding: {
    provider: 'local',
  }
});

// Option B: API Embeddings (for highest accuracy)
const filter = new MCPToolFilter({
  embedding: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
  }
});

// 2. Load your MCP servers (one-time setup)
await filter.initialize(mcpServers);

// 3. Filter tools based on context
const result = await filter.filter(
  "Search my emails for the Q4 budget discussion"
);

// 4. Use the filtered tools in your LLM request
console.log(result.tools); // Top 20 most relevant tools
console.log(result.metrics.totalTime); // e.g., "2ms" for local, "500ms" for API

Embedding Provider Options

Local Embeddings (Recommended)

Pros:

⚡ Ultra-fast: 1-5ms latency
🔒 Private: No data sent to external APIs
💰 Free: No API costs
🌐 Offline: Works without internet

Cons:

Slightly lower accuracy than API models
First initialization downloads model (~25MB)

const filter = new MCPToolFilter({
  embedding: {
    provider: 'local',
    model: 'Xenova/all-MiniLM-L6-v2', // Optional: default model
    quantized: true, // Optional: use quantized model for speed (default: true)
  }
});

Available Models:

Xenova/all-MiniLM-L6-v2 (default) - 384 dimensions, very fast
Xenova/all-MiniLM-L12-v2 - 384 dimensions, more accurate
Xenova/bge-small-en-v1.5 - 384 dimensions, good balance
Xenova/bge-base-en-v1.5 - 768 dimensions, higher quality

Performance:

Initialization: 100ms-4s (one-time, downloads model)
Filter request: 1-5ms
Cached request: <1ms

API Embeddings

For highest accuracy, use OpenAI or other API providers:

const filter = new MCPToolFilter({
  embedding: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
    model: 'text-embedding-3-small', // Optional
    dimensions: 384, // Optional: match local model for fair comparison
  }
});

Pros:

🎯 Highest accuracy: 5-15% better than local
🔄 Easy to switch models
🌐 No local resources needed

Cons:

🐌 Slow: 400-800ms per request
💰 Costs money: ~$0.02 per 1M tokens
🔒 Data sent to external API
📶 Requires internet connection

Performance:

Initialization: 200ms-60s (depends on tool count)
Filter request: 400-800ms
Cached request: 1-3ms

Quick Comparison

| Aspect | Local | API | Winner | |--------|-------|-----|--------| | Speed | 1-5ms | 400-800ms | 🏆 Local (200x faster) | | Accuracy | Good (85-90%) | Best (100%) | 🏆 API | | Cost | Free | ~$0.02/1M tokens | 🏆 Local | | Privacy | Fully local | Data sent to API | 🏆 Local | | Offline | ✅ Works offline | ❌ Needs internet | 🏆 Local | | Setup | Zero config | Needs API key | 🏆 Local |

📊 See TRADEOFFS.md for detailed analysis

MCP Server JSON Format

The library expects an array of MCP servers with the following structure:

[
  {
    "id": "gmail",
    "name": "Gmail MCP Server",
    "description": "Email management tools",
    "categories": ["email", "communication"],
    "tools": [
      {
        "name": "search_gmail_messages",
        "description": "Search and find email messages in Gmail inbox. Use when user wants to find, search, look up emails...",
        "keywords": ["email", "search", "inbox", "messages"],
        "category": "email-search",
        "inputSchema": {
          "type": "object",
          "properties": {
            "q": { "type": "string" }
          }
        }
      }
    ]
  }
]

Field Descriptions

Required Fields:

id: Unique identifier for the server
name: Human-readable server name
tools: Array of tool definitions
- name: Unique tool name
- description: Rich description of what the tool does and when to use it

Optional but Recommended:

description: Server-level description
categories: Array of category tags for hierarchical filtering
keywords: Array of synonym/related terms for better matching
category: Tool-level category
inputSchema: JSON schema for parameters (parameter names are used for matching)

Tips for Best Results

Rich Descriptions: Write detailed descriptions with use cases

"description": "Search emails in Gmail. Use when user wants to find, lookup, or retrieve messages, correspondence, or mail."

Add Keywords: Include synonyms and variations

"keywords": ["email", "mail", "inbox", "messages", "correspondence"]

Mention Use Cases: Explicitly state when to use the tool

"description": "... Use when user wants to draft, compose, write, or prepare an email to send later."

API Reference

`MCPToolFilter`

Main class for tool filtering.

Constructor

new MCPToolFilter(config: MCPToolFilterConfig)

Config Options:

{
  embedding: {
    // Local embeddings (recommended)
    provider: 'local',
    model?: string,               // Default: 'Xenova/all-MiniLM-L6-v2'
    quantized?: boolean,          // Default: true
    
    // OR API embeddings
    provider: 'openai' | 'voyage' | 'cohere',
    apiKey: string,
    model?: string,               // Default: 'text-embedding-3-small'
    dimensions?: number,          // Default: 1536 (or 384 for local)
    baseURL?: string,            // For custom endpoints
  },
  defaultOptions?: {
    topK?: number,              // Default: 20
    minScore?: number,          // Default: 0.3
    contextMessages?: number,   // Default: 3
    alwaysInclude?: string[],   // Always include these tools
    exclude?: string[],         // Never include these tools
    maxContextTokens?: number,  // Default: 500
  },
  includeServerDescription?: boolean,  // Default: false (see below)
  debug?: boolean               // Enable debug logging
}

About includeServerDescription:

When enabled, this option includes the MCP server description in the tool embeddings, providing additional context about the domain/category of tools.

// Enable server descriptions in embeddings
const filter = new MCPToolFilter({
  embedding: { provider: 'local' },
  includeServerDescription: true  // Default: false
});

Tradeoffs:

✅ Helps: General intent queries like "manage my local files" (+25% improvement)
❌ Hurts: Specific tool queries like "Execute this SQL query" (-50% degradation)
≈ Neutral: Overall impact is neutral (0% change)

Recommendation: Keep this disabled (default: false) unless your use case primarily involves high-level intent queries. See examples/benchmark-server-description.ts for detailed benchmarks.

Methods

`initialize(servers: MCPServer[]): Promise<void>`

Initialize the filter with MCP servers. This precomputes and caches all tool embeddings.

Note: Call this once during startup. It's an async operation that may take a few seconds depending on the number of tools.

await filter.initialize(servers);

`filter(input: FilterInput, options?: FilterOptions): Promise<FilterResult>`

Filter tools based on the input context.

Input Types:

// String input
await filter.filter("Search my emails about the project");

// Chat messages
await filter.filter([
  { role: 'user', content: 'What meetings do I have today?' },
  { role: 'assistant', content: 'Let me check your calendar.' }
]);

Options (all optional, override defaults):

{
  topK?: number,              // Max tools to return
  minScore?: number,          // Minimum similarity score (0-1)
  contextMessages?: number,   // How many recent messages to use
  alwaysInclude?: string[],   // Tool names to always include
  exclude?: string[],         // Tool names to exclude
  maxContextTokens?: number,  // Max context size
}

Returns:

{
  tools: ScoredTool[],        // Filtered and ranked tools
  metrics: {
    totalTime: number,        // Total time in ms
    embeddingTime: number,    // Time to embed context
    similarityTime: number,   // Time to compute similarities
    toolsEvaluated: number,   // Total tools evaluated
  }
}

`getStats()`

Get statistics about the filter state.

const stats = filter.getStats();
// {
//   initialized: true,
//   toolCount: 25,
//   cacheSize: 5,
//   embeddingDimensions: 1536
// }

`clearCache()`

Clear the context embedding cache.

filter.clearCache();

Performance Optimization

Built-in Optimizations

The library includes several performance optimizations out of the box:

🚀 Loop-Unrolled Dot Product - Vector similarity computation is 6-8x faster through CPU pipeline optimization
📊 Smart Top-K Selection - Hybrid algorithm uses fast built-in sort for typical workloads, switches to heap-based selection for 500+ tools
💾 True LRU Cache - Intelligent cache eviction based on access patterns, not just insertion order
🎯 In-Place Operations - Reduced memory allocations through in-place vector normalization
⚡ Set-Based Lookups - O(1) exclusion checking instead of O(n) array scanning

These optimizations are automatic and transparent - no configuration needed!

Latency Breakdown

Typical performance for 1000 tools:

Building context:        <1ms
Embedding API call:      3-5ms  (cached: 0ms)
Similarity computation:  1-2ms  (6-8x faster with optimizations)
Sorting/filtering:       <1ms   (hybrid algorithm)
─────────────────────────────
Total:                   5-9ms

User Configuration Tips

Use Smaller Embeddings: 512 or 1024 dimensions for faster computation

embedding: {
  provider: 'openai',
  model: 'text-embedding-3-small',
  dimensions: 512  // Faster than 1536
}

Reduce Context Size: Fewer messages = faster embedding

defaultOptions: {
  contextMessages: 2,  // Instead of 3-5
  maxContextTokens: 300
}

Leverage Caching: Identical contexts reuse cached embeddings (0ms)
Tune topK: Request fewer tools if you don't need 20
```
await filter.filter(input, { topK: 10 });
```

Performance Benchmarks

Micro-benchmarks showing optimization improvements:

Dot Product (1536 dims):        0.001ms vs 0.006ms (6x faster)
Vector Normalization:           0.003ms vs 0.006ms (2x faster)  
Top-K Selection (<500 tools):   Uses optimized built-in sort
Top-K Selection (500+ tools):   O(n log k) heap-based selection
LRU Cache Access:               True access-order tracking

See the existing benchmark examples for end-to-end performance testing:

npx ts-node examples/benchmark.ts

Integration Examples

With Portkey AI Gateway

import Portkey from 'portkey-ai';
import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter';

const portkey = new Portkey({ apiKey: '...' });
const filter = new MCPToolFilter({ /* ... */ });

await filter.initialize(mcpServers);

// Filter tools based on conversation
const { tools } = await filter.filter(messages);

// Convert to OpenAI tool format
const openaiTools = tools.map(t => ({
  type: 'function',
  function: {
    name: t.toolName,
    description: t.tool.description,
    parameters: t.tool.inputSchema,
  }
}));

// Make LLM request with filtered tools
const completion = await portkey.chat.completions.create({
  model: 'gpt-4',
  messages: messages,
  tools: openaiTools,
});

With LangChain

import { ChatOpenAI } from 'langchain/chat_models/openai';
import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter';

const filter = new MCPToolFilter({ /* ... */ });
await filter.initialize(mcpServers);

// Create a custom tool selector
async function selectTools(messages) {
  const { tools } = await filter.filter(messages);
  return tools.map(t => convertToLangChainTool(t));
}

// Use in your agent
const model = new ChatOpenAI();
const tools = await selectTools(messages);
const response = await model.invoke(messages, { tools });

Caching Strategy

// Recommended: Initialize once at startup
let filterInstance: MCPToolFilter;

async function getFilter() {
  if (!filterInstance) {
    filterInstance = new MCPToolFilter({ /* ... */ });
    await filterInstance.initialize(mcpServers);
  }
  return filterInstance;
}

// Use in request handlers
app.post('/chat', async (req, res) => {
  const filter = await getFilter();
  const result = await filter.filter(req.body.messages);
  // ... use filtered tools
});

Benchmarks

Performance on various tool counts (M1 Max):

Local Embeddings (Xenova/all-MiniLM-L6-v2):

| Tools | Initialization | Filter (Cold) | Filter (Cached) | |-------|---------------|---------------|-----------------| | 10 | ~100ms | 2ms | <1ms | | 100 | ~500ms | 3ms | <1ms | | 500 | ~2s | 4ms | 1ms | | 1000 | ~4s | 5ms | 1ms | | 5000 | ~20s | 8ms | 2ms |

API Embeddings (OpenAI text-embedding-3-small):

| Tools | Initialization | Filter (Cold) | Filter (Cached) | |-------|---------------|---------------|-----------------| | 10 | ~200ms | 500ms | 1ms | | 100 | ~1.5s | 550ms | 2ms | | 500 | ~6s | 600ms | 2ms | | 1000 | ~12s | 650ms | 3ms | | 5000 | ~60s | 800ms | 4ms |

Key Takeaways:

🚀 Local embeddings are 200-300x faster for filter requests
✅ Local embeddings meet the <50ms target easily
💰 Local embeddings have no API costs
📊 API embeddings may have slightly higher accuracy
⚡ Both benefit significantly from caching

Note: Initialization is a one-time cost. Choose local embeddings for low latency, API embeddings for maximum accuracy.

When to Use Local vs API Embeddings

Use Local Embeddings when:

⚡ You need ultra-low latency (<10ms)
🔒 Privacy is important (no external API calls)
💰 You want zero API costs
🌐 You need offline operation
📊 "Good enough" accuracy is acceptable

Use API Embeddings when:

🎯 You need maximum accuracy
🌍 You have good internet connectivity
💵 API costs are not a concern
📈 You're dealing with complex/nuanced queries

Recommendation: Start with local embeddings. Only switch to API if accuracy is insufficient.

Testing Local vs API

Compare performance for your use case:

npx ts-node examples/test-local-embeddings.ts

This will benchmark both providers and show you:

Initialization time
Average filter time
Cached filter time
Speed comparison

Debugging & Performance Monitoring

Enable Debug Logging

To see detailed timing logs for each request, enable debug mode:

const filter = new MCPToolFilter({
  embedding: { /* ... */ },
  debug: true  // Enable detailed timing logs
});

This will output detailed logs for each filter request:

=== Starting filter request ===
[1/5] Options merged: 0.12ms
[2/5] Context built (156 chars): 0.34ms
[3/5] Cache MISS (lookup: 0.08ms)
     → Embedding generated: 1247.56ms
[4/5] Similarities computed: 1.23ms (25 tools, 0.049ms/tool)
[5/5] Tools selected & ranked: 0.15ms (5 tools returned)
=== Total filter time: 1249.48ms ===
Breakdown: merge=0.12ms, context=0.34ms, cache=0.08ms, embedding=1247.56ms, similarity=1.23ms, selection=0.15ms

Timing Breakdown

Each filter request logs 5 steps:

Options Merging (merge): Merge provided options with defaults
Context Building (context): Build the context string from input messages
Cache Lookup & Embedding (cache + embedding):
- Cache HIT: 0ms embedding time (reuses cached embedding)
- Cache MISS: Calls embedding API (typically 200-2000ms depending on provider)
Similarity Computation (similarity): Compute cosine similarity for all tools
- Also shows per-tool average time
Tool Selection (selection): Filter by score and select top-K tools

Example: Testing Timings

See examples/test-timings.ts for a complete example:

export OPENAI_API_KEY=your-key-here
npx ts-node examples/test-timings.ts

This will run multiple filter requests showing:

Cache miss vs cache hit performance
Different query types
Chat message context handling

Performance Metrics

Every filter request returns detailed metrics:

const result = await filter.filter(input);

console.log(result.metrics);
// {
//   totalTime: 1249.48,      // Total request time in ms
//   embeddingTime: 1247.56,  // Time spent on embedding API
//   similarityTime: 1.23,    // Time computing similarities
//   toolsEvaluated: 25       // Number of tools evaluated
// }

Monitoring in Production

const result = await filter.filter(messages);

// Log metrics for monitoring
logger.info('Tool filter performance', {
  totalTime: result.metrics.totalTime,
  embeddingTime: result.metrics.embeddingTime,
  cached: result.metrics.embeddingTime === 0,
  toolsReturned: result.tools.length,
});

// Alert if too slow
if (result.metrics.totalTime > 5000) {
  logger.warn('Slow filter request', result.metrics);
}

Advanced Usage

Two-Stage Filtering

For very large tool sets, use hierarchical filtering:

// Stage 1: Filter by server categories
const relevantServers = mcpServers.filter(server => 
  server.categories?.some(cat => userIntent.includes(cat))
);

// Stage 2: Filter tools within relevant servers
const result = await filter.filter(messages);

Custom Scoring

Combine embedding similarity with keyword matching:

const { tools } = await filter.filter(input);

// Boost tools with exact keyword matches
const boostedTools = tools.map(tool => {
  const hasKeywordMatch = tool.tool.keywords?.some(kw => 
    input.toLowerCase().includes(kw.toLowerCase())
  );
  return {
    ...tool,
    score: hasKeywordMatch ? tool.score * 1.2 : tool.score
  };
}).sort((a, b) => b.score - a.score);

Always-Include Power Tools

Always include certain essential tools:

const filter = new MCPToolFilter({
  // ...
  defaultOptions: {
    alwaysInclude: [
      'web_search',           // Always useful
      'conversation_search',  // Access to context
    ],
  }
});

Troubleshooting

Slow First Request

Problem: First filter call is slow.

Solution: The embedding API call takes 3-5ms. Subsequent calls with similar context are cached and much faster.

// Warm up the cache
await filter.filter("hello"); // ~5ms
await filter.filter("hello"); // ~1ms (cached)

Poor Tool Selection

Problem: Wrong tools are being selected.

Solutions:

Improve tool descriptions with more keywords and use cases
Lower the minScore threshold
Increase topK to include more tools
Add important tools to alwaysInclude

Memory Usage

Problem: High memory usage with many tools.

Solution: Use smaller embedding dimensions:

embedding: {
  dimensions: 512  // Instead of 1536
}

This reduces memory by ~66% with minimal accuracy loss.

License

MIT

Contributing

Contributions welcome! Please open an issue or PR.

Support

GitHub Issues: github.com/portkey-ai/mcp-tool-filter
Email: [email protected]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@portkey-ai/mcp-tool-filter

Features

Installation

Quick Start

Embedding Provider Options

Local Embeddings (Recommended)

API Embeddings

Quick Comparison

MCP Server JSON Format

Field Descriptions

Tips for Best Results

API Reference

MCPToolFilter

Constructor

Methods

initialize(servers: MCPServer[]): Promise<void>

filter(input: FilterInput, options?: FilterOptions): Promise<FilterResult>

getStats()

clearCache()

Performance Optimization

Built-in Optimizations

Latency Breakdown

User Configuration Tips

Performance Benchmarks

Integration Examples

With Portkey AI Gateway

With LangChain

Caching Strategy

Benchmarks

When to Use Local vs API Embeddings

Testing Local vs API

Debugging & Performance Monitoring

Enable Debug Logging

Timing Breakdown

Example: Testing Timings

Performance Metrics

Monitoring in Production

Advanced Usage

Two-Stage Filtering

Custom Scoring

Always-Include Power Tools

Troubleshooting

Slow First Request

Poor Tool Selection

Memory Usage

License

Contributing

Support

`MCPToolFilter`

`initialize(servers: MCPServer[]): Promise<void>`

`filter(input: FilterInput, options?: FilterOptions): Promise<FilterResult>`

`getStats()`

`clearCache()`