gemini-multimodal-mcp
v1.1.4
Published
MCP server with multimodal capabilities - process documents, images, videos, audio using Gemini Pro with 1M context window
Downloads
16
Maintainers
Readme
Gemini Document MCP
A Model Context Protocol (MCP) server that uses Google's Gemini Pro to index and search document contents with 1M context capability. This MCP provides intelligent document processing, semantic search, and Q&A capabilities for various document formats.
Features
- Document Indexing: Index PDF, DOCX, Excel, HTML, TXT, MD, JSON, and CSV files
- Semantic Search: Use Gemini Pro's 1M context window for intelligent document search
- Document Q&A: Ask questions about specific documents using natural language
- Summarization: Generate brief, detailed, or key-point summaries of documents
- Multiple Formats: Support for various document types with automatic content extraction
- Metadata Support: Store and search documents with custom metadata
Prerequisites
- Node.js 18.0.0 or higher
- Google Gemini API key
Installation
Clone or download this MCP project
Install dependencies:
npm installCreate a
.envfile based on.env.example:cp .env.example .envAdd your Gemini API key to the
.envfile:GEMINI_API_KEY=your_actual_api_key_hereBuild the project:
npm run build
Getting a Gemini API Key
- Go to Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy the generated API key
- Add it to your
.envfile
Usage
Running the MCP Server
npm startFor development with auto-reload:
npm run devAvailable Tools
1. index_document
Index a document for later search and analysis.
Parameters:
file_path(string, required): Path to the documentdocument_id(string, required): Unique identifier for the documentmetadata(object, optional): Custom metadata for the document
Example:
{
"file_path": "./documents/report.pdf",
"document_id": "quarterly-report-2024",
"metadata": {
"title": "Q1 2024 Financial Report",
"department": "Finance",
"author": "John Doe"
}
}2. search_documents
Search indexed documents using natural language queries.
Parameters:
query(string, required): Natural language search querymax_results(number, optional): Maximum number of results (default: 5)include_content(boolean, optional): Include full content in results (default: false)
Example:
{
"query": "financial performance and revenue growth",
"max_results": 3,
"include_content": false
}3. get_document_content
Retrieve the full content of a specific document.
Parameters:
document_id(string, required): Unique identifier of the document
4. ask_document
Ask questions about a specific document.
Parameters:
document_id(string, required): Unique identifier of the documentquestion(string, required): Question to ask about the document
Example:
{
"document_id": "quarterly-report-2024",
"question": "What was the revenue growth percentage in Q1?"
}5. summarize_document
Generate a summary of a document.
Parameters:
document_id(string, required): Unique identifier of the documentsummary_type(string, optional): Type of summary - "brief", "detailed", or "key_points" (default: "brief")
6. list_indexed_documents
List all indexed documents with their metadata.
Supported File Formats
- PDF (.pdf): Text extraction from PDF documents
- Word Documents (.docx): Microsoft Word documents
- Excel Files (.xlsx, .xls): Spreadsheets with all sheets
- HTML (.html, .htm): Web pages with text content extraction
- Text Files (.txt, .md): Plain text and Markdown files
- JSON (.json): Structured JSON data
- CSV (.csv): Comma-separated value files
Configuration
Environment Variables
GEMINI_API_KEY: Your Google Gemini API key (required)GEMINI_MODEL: Gemini model to use (default: "gemini-pro")GEMINI_MAX_TOKENS: Maximum tokens for generation (default: 8192)PORT: Server port (default: 3000)LOG_LEVEL: Logging level (default: "info")
Data Storage
Documents are stored locally in the ./data/ directory as JSON files. The indexed content includes both the original text and Gemini's semantic analysis for enhanced search capabilities.
How It Works
- Document Processing: When you index a document, the MCP extracts text content based on the file type
- Semantic Analysis: Gemini Pro analyzes the document to understand its structure, key topics, entities, and concepts
- Enhanced Storage: Both original content and semantic analysis are stored for comprehensive search
- Intelligent Search: When searching, Gemini Pro analyzes your query against all indexed documents and ranks results by relevance
- Context-Aware Q&A: Questions are answered using Gemini's 1M context window to understand and analyze the full document content
Examples
Index a PDF document
{
"tool": "index_document",
"arguments": {
"file_path": "./1737toi-tai-gioi-ban-cung-the.pdf",
"document_id": "sample-document",
"metadata": {
"title": "Sample Document",
"type": "report"
}
}
}Search for content
{
"tool": "search_documents",
"arguments": {
"query": "main topics and key insights from the document",
"max_results": 3
}
}Ask a question about the document
{
"tool": "ask_document",
"arguments": {
"document_id": "sample-document",
"question": "What are the main conclusions or recommendations in this document?"
}
}Troubleshooting
Common Issues
"GEMINI_API_KEY environment variable is required"
- Make sure you've created a
.envfile with your API key
- Make sure you've created a
"Unsupported file type"
- Check that your file format is in the supported list
- Try converting to a supported format (e.g., DOC to DOCX)
"Document not found"
- Ensure the document ID matches exactly what you used when indexing
- Use
list_indexed_documentsto see all available documents
Memory issues with large documents
- Gemini Pro has a 1M token context limit
- Very large documents may need to be split into smaller sections
Development
Project Structure
src/
├── index.ts # Main MCP server entry point
├── services/
│ └── gemini-document.ts # Gemini integration service
└── utils/
├── document-processor.ts # Document text extraction
└── document-store.ts # Local document storageBuilding and Testing
# Build the project
npm run build
# Run in development mode
npm run dev
# Clean build artifacts
npm run cleanLicense
MIT License - feel free to use this MCP in your projects!
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
