npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

mcp-query-assistant

v0.1.3

Published

A Model Context Protocol server that helps generate queries using semantic search over training examples

Downloads

12

Readme

Query Assistant MCP Server

A Model Context Protocol (MCP) server that helps generate queries using semantic search over training examples. This server provides few-shot learning capabilities by finding similar questions and their corresponding queries to guide query generation for various query languages (Cypher, SPARQL, SQL, etc.).

Features

  • Semantic Search: Uses OpenAI embeddings to find similar questions in your training dataset
  • Few-Shot Learning: Returns relevant examples to help generate accurate queries
  • Training Data Management: Add, list, and manage question-query pairs with duplicate detection
  • Vector Storage: Efficient similarity search using HNSW (Hierarchical Navigable Small World) algorithm
  • Metadata Support: Organize examples by domain, complexity, and tags
  • Multi-Language Support: Works with various query languages (Cypher, SPARQL, SQL, etc.)

Installation

  1. Prerequisites: Node.js 18+ and npm

  2. Build the server:

    cd /Users/alkhalili/Documents/Cline/MCP/mcp-query-assistant
    npm install
    npm run build
  3. Configure OpenAI API Key:

    • Get your API key from OpenAI Platform
    • Update the MCP settings file at: /Users/alkhalili/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
    • Replace your_openai_api_key_here with your actual API key
  4. Configure Data Directory (Optional):

    • By default, data is stored in the data/ folder relative to the project
    • To use a custom data directory, you can:
      • Set the DATA_DIR environment variable, or
      • Use the --data-dir command line argument
  5. Server Configuration: The server is already configured in your MCP settings as:

    "query-assistant": {
      "command": "npx",
      "args": ["mcp-query-assistant"],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "DATA_DIR": "/optional/custom/data/path"
      },
      "disabled": false,
      "autoApprove": []
    }

    Alternative with command line arguments:

    "query-assistant": {
      "command": "npx",
      "args": ["mcp-query-assistant", "--openai-key", "your_openai_api_key_here", "--data-dir", "/custom/data/path"],
      "disabled": false,
      "autoApprove": []
    }

Available Tools

1. find_similar_queries

Find similar query examples based on a natural language question.

Parameters:

  • question (required): The natural language question to find similar examples for
  • limit (optional): Maximum number of similar examples to return (default: 3, max: 10)
  • threshold (optional): Minimum similarity threshold 0-1 (default: 0.7)

Example Usage:

Use the find_similar_queries tool to find examples for: "Give me the list of CDEs in the lineage"

2. add_training_example

Add a new question-query pair to the training dataset.

Parameters:

  • question (required): The natural language question
  • query (required): The corresponding query (Cypher, SPARQL, SQL, etc.)
  • metadata (optional): Additional metadata (domain, complexity, tags)

Example Usage:

Add a training example:
Question: "Find users who bought expensive products"
Query: "MATCH (u:User)-[:PURCHASED]->(p:Product) WHERE p.price > 1000 RETURN u"
Metadata: {"domain": "user_analytics", "complexity": "medium"}

3. list_training_examples

List all training examples in the dataset.

Parameters:

  • limit (optional): Maximum number of examples to return (default: 10, max: 100)
  • domain (optional): Filter by domain

4. find_duplicates

Find duplicate training examples based on question and query.

Parameters: None

Example Usage:

Use find_duplicates to identify duplicate training examples in your dataset.

5. remove_duplicates

Remove duplicate training examples, keeping only the first occurrence of each unique question-query pair.

Parameters:

  • confirm (optional): Set to true to confirm removal of duplicates (default: false)

Example Usage:

Use remove_duplicates with confirm=true to clean up duplicate examples.

Default Training Examples

The server comes with 1 default example covering data lineage patterns:

  1. Data Lineage: "Give me the list of CDEs in the lineage"
    • Query: MATCH (cde:CDE) RETURN cde.name, cde.description, cde.layer, cde.fqn ORDER BY cde.name
    • Domain: Data Lineage
    • Complexity: simple

Usage Workflow

  1. Ask for similar queries: When you need to write a query, use find_similar_queries with your natural language question
  2. Get few-shot examples: The server returns similar questions and their queries with similarity scores
  3. Generate your query: Use the examples as guidance to write your specific query
  4. Add new examples: Use add_training_example to expand your training dataset with new patterns
  5. Manage duplicates: Use find_duplicates and remove_duplicates to keep your dataset clean

Data Storage

  • Training Data: Stored in {DATA_DIR}/training_data.json (default: data/training_data.json)
  • Vector Index: Stored in {DATA_DIR}/vector_index.bin (default: data/vector_index.bin)
  • Embeddings: Generated using OpenAI's text-embedding-3-small model (1536 dimensions)
  • Data Directory: Configurable via DATA_DIR environment variable or --data-dir command line argument

Example Interaction

User: "Show me all data elements with their descriptions"

Agent: Let me find similar examples for you.
[Uses find_similar_queries tool]

Server Response:
Found 1 similar examples for: "Show me all data elements with their descriptions"

Example 1 (similarity: 0.823):
Question: Give me the list of CDEs in the lineage
Query: MATCH (cde:CDE) RETURN cde.name, cde.description, cde.layer, cde.fqn ORDER BY cde.name
Domain: Data Lineage
Complexity: simple

Agent: Based on this example, here's a query for showing data elements with descriptions:
MATCH (cde:CDE) 
RETURN cde.name, cde.description 
ORDER BY cde.name

Troubleshooting

  1. "OpenAI API key not configured": Make sure you've set your API key in the MCP settings
  2. "No similar examples found": Try lowering the similarity threshold or adding more training data
  3. Server not connecting: Check that the build path is correct and the server compiled successfully

Contributing

To add more training examples or improve the server:

  1. Use the add_training_example tool to add new question-query pairs
  2. Organize examples with appropriate metadata (domain, complexity, tags)
  3. Test similarity search with various question phrasings
  4. Consider adding domain-specific examples for your use case

Technical Details

  • Vector Database: HNSW (Hierarchical Navigable Small World) for efficient similarity search
  • Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
  • Similarity Metric: Cosine similarity
  • Storage Format: JSON for training data, binary for vector index
  • Max Capacity: 10,000 training examples (configurable)