ncbi-mcp
v2.1.0
Published
Model Context Protocol (MCP) server that integrates NCBI Entrez capabilities.
Readme
NCBI Model Context Protocol (MCP)
A Python implementation of the Model Context Protocol for interacting with NCBI databases.
Setup
- Clone this repository
- Install dependencies:
pip install -r requirements.txt - Create a
.envfile with your NCBI API key:NCBI_API_KEY=your_api_key_here [email protected]
Running the MCP Server
python ncbi_mcp.pyUsing with Cursor/Claude
Once the MCP server is running, you can interact with it using natural language in Cursor/Claude.
Using Natural Language Queries
You can use natural language to perform searches and retrieve information:
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Find research articles about BRCA1"
}
}Or more simply, just use the query directly:
@ncbi-mcp Find research articles about BRCA1Example Natural Language Queries
Here are some example natural language queries you can try:
Gene function information:
@ncbi-mcp Please summarize the function of TNF-alphaGenome size and statistics:
@ncbi-mcp How big is the genome for Saccharomyces cerevisiae?Assembly statistics:
@ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome?Dataset counts:
@ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells?Search for scientific articles:
@ncbi-mcp Find the latest research on COVID-19 vaccinesGet gene information:
@ncbi-mcp Tell me about the BRCA1 geneFetch genome information:
@ncbi-mcp Get genome information for Homo sapiens
Testing
To test the MCP server with various queries, you can use the included test files:
# Test natural language query functionality (default)
.\run_test.bat
# Test all tools
.\run_test.bat all
# Test specific test file
.\run_test.bat test_all_tools.jsonl
# Test high-level tools
.\run_test.bat test_high_level_tools.jsonlThe test script will:
- Start the MCP server in background
- Send test requests from the specified file
- Wait for a few seconds to allow processing
- Terminate the server and display the output
This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use:
# Run manually with any test file
type test_nlp_query.jsonl | python ncbi_mcp.pyThe test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server.
Available Tools
The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction.
Tool Usage Guidelines for LLMs
Recommended Workflow Patterns
For most biological queries, start with nlp-query - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools.
Common Research Workflows:
Gene Analysis Workflow:
- Start with
nlp-queryfor general gene questions - Use
summarize-genefor comprehensive gene information - Use
get_gene_infofor detailed structured data - Use
ncbi-search+ncbi-fetchfor specific database queries
- Start with
Genome Analysis Workflow:
- Use
genome-statsfor organism genome statistics - Use
get_genome_infofor detailed genome metadata - Use
count-datasetsto explore available genome assemblies
- Use
Literature Research Workflow:
- Use
nlp-queryfor natural language literature searches - Use
ncbi-searchwith database="pubmed" for precise searches - Use
ncbi-fetchto get full publication details
- Use
Dataset Discovery Workflow:
- Use
count-datasetsto assess data availability - Use
nlp-queryto explore datasets with natural language - Use
ncbi-searchfor systematic database exploration
- Use
E-utilities Workflow (Advanced):
- Use
ncbi-infoto discover available databases - Use
ncbi-global-queryto see which databases contain your search term - Use
ncbi-searchto find specific UIDs in target databases - Use
ncbi-summaryto get overview information about records - Use
ncbi-fetchto retrieve complete records - Use
ncbi-linkto find related records across databases
- Use
Cross-Database Analysis Workflow:
- Use
ncbi-searchto find genes of interest - Use
ncbi-linkto find related proteins, structures, or literature - Use
ncbi-summaryto get metadata about related records - Use
ncbi-fetchto retrieve detailed information
- Use
Tool Selection Guide
High-Level Tools (Recommended for most users):
nlp-query: Use for general biological questions, complex queries, and when you're unsure which tool to usesummarize-gene: Use for comprehensive gene analysis and understanding gene functiongenome-stats: Use for genome size, assembly quality, and organism comparisoncount-datasets: Use for research planning and data availability assessmentget_gene_info: Use for detailed, structured gene informationget_genome_info: Use for detailed, structured genome information
Low-Level E-utilities Tools (For advanced users):
ncbi-search(ESearch): Use for precise database searches with specific filters, Boolean operators, and field qualifiersncbi-fetch(EFetch): Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML)ncbi-summary(ESummary): Use to get document summaries without fetching complete recordsncbi-link(ELink): Use to find related records across databases (e.g., gene to protein, protein to structure)ncbi-info(EInfo): Use to discover available databases and their capabilitiesncbi-global-query(EGQuery): Use to search across all databases simultaneouslyncbi-spell(ESpell): Use to get spelling suggestions for search termsncbi-citation-match(ECitMatch): Use to find PMIDs from citation information
Biological Context and Terminology
Understanding NCBI Databases:
- Gene: Contains gene records with symbols, names, functions, and genomic locations
- Protein: Contains protein sequences and annotations
- Nucleotide: Contains DNA/RNA sequences (genes, transcripts, genomic regions)
- PubMed: Contains scientific literature and publications
- BioSample: Contains biological sample metadata (tissues, cell lines, etc.)
- BioProject: Contains research project information
- SRA: Contains raw sequencing data
- Assembly: Contains genome assembly information
Common Biological Terms:
- Gene Symbol: Short abbreviation (e.g., BRCA1, TP53, TNF)
- Gene ID: Unique NCBI identifier (e.g., 672 for BRCA1)
- Accession: Unique sequence identifier (e.g., NM_001126114.3)
- N50/L50: Assembly quality metrics (larger N50 = better assembly)
- Reference Genome: High-quality representative genome for a species
- Organism: Use scientific names (Homo sapiens) or common names (human)
Search Strategies:
- Use specific gene symbols for precise results
- Include organism names to avoid ambiguity
- Use Boolean operators (AND, OR, NOT) for complex searches
- Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches
High-Level Tools
Natural Language Query Processor
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Please summarize the function of TNF-alpha"
}
}Gene Summarizer
tools/call
{
"name": "summarize-gene",
"arguments": {
"gene_name": "BRCA1"
}
}Genome Statistics
tools/call
{
"name": "genome-stats",
"arguments": {
"organism": "Escherichia coli"
}
}Dataset Counter
tools/call
{
"name": "count-datasets",
"arguments": {
"database": "biosample",
"query": "mouse melanoma b16f10"
}
}Low-Level Tools
Search NCBI Databases
tools/call
{
"name": "ncbi-search",
"arguments": {
"database": "pubmed",
"term": "BRCA1",
"filters": {
"organism": "Homo sapiens",
"date_range": {
"start": "2020"
}
}
}
}Fetch NCBI Records
tools/call
{
"name": "ncbi-fetch",
"arguments": {
"database": "gene",
"ids": ["70"],
"rettype": "gb"
}
}Get Gene Information
tools/call
{
"name": "get_gene_info",
"arguments": {
"gene_id": "672"
}
}Get Genome Information
tools/call
{
"name": "get_genome_info",
"arguments": {
"organism": "Homo sapiens",
"reference": true
}
}License
Apache-2.0
