gemini-multimodal-mcp

v1.1.4

Published

5 months ago

MCP server with multimodal capabilities - process documents, images, videos, audio using Gemini Pro with 1M context window

Downloads

0High
0Medium
0Low

anbaba

mcp gemini multimodal document video audio image ai search

Gemini Document MCP

A Model Context Protocol (MCP) server that uses Google's Gemini Pro to index and search document contents with 1M context capability. This MCP provides intelligent document processing, semantic search, and Q&A capabilities for various document formats.

Features

Document Indexing: Index PDF, DOCX, Excel, HTML, TXT, MD, JSON, and CSV files
Semantic Search: Use Gemini Pro's 1M context window for intelligent document search
Document Q&A: Ask questions about specific documents using natural language
Summarization: Generate brief, detailed, or key-point summaries of documents
Multiple Formats: Support for various document types with automatic content extraction
Metadata Support: Store and search documents with custom metadata

Prerequisites

Node.js 18.0.0 or higher
Google Gemini API key

Installation

Clone or download this MCP project
Install dependencies:
```
npm install
```
Create a .env file based on .env.example:
```
cp .env.example .env
```
Add your Gemini API key to the .env file:
```
GEMINI_API_KEY=your_actual_api_key_here
```
Build the project:
```
npm run build
```

Getting a Gemini API Key

Go to Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the generated API key
Add it to your .env file

Usage

Running the MCP Server

npm start

For development with auto-reload:

npm run dev

Available Tools

1. `index_document`

Index a document for later search and analysis.

Parameters:

file_path (string, required): Path to the document
document_id (string, required): Unique identifier for the document
metadata (object, optional): Custom metadata for the document

Example:

{
  "file_path": "./documents/report.pdf",
  "document_id": "quarterly-report-2024",
  "metadata": {
    "title": "Q1 2024 Financial Report",
    "department": "Finance",
    "author": "John Doe"
  }
}

2. `search_documents`

Search indexed documents using natural language queries.

Parameters:

query (string, required): Natural language search query
max_results (number, optional): Maximum number of results (default: 5)
include_content (boolean, optional): Include full content in results (default: false)

Example:

{
  "query": "financial performance and revenue growth",
  "max_results": 3,
  "include_content": false
}

3. `get_document_content`

Retrieve the full content of a specific document.

Parameters:

document_id (string, required): Unique identifier of the document

4. `ask_document`

Ask questions about a specific document.

Parameters:

document_id (string, required): Unique identifier of the document
question (string, required): Question to ask about the document

Example:

{
  "document_id": "quarterly-report-2024",
  "question": "What was the revenue growth percentage in Q1?"
}

5. `summarize_document`

Generate a summary of a document.

Parameters:

document_id (string, required): Unique identifier of the document
summary_type (string, optional): Type of summary - "brief", "detailed", or "key_points" (default: "brief")

6. `list_indexed_documents`

List all indexed documents with their metadata.

Supported File Formats

PDF (.pdf): Text extraction from PDF documents
Word Documents (.docx): Microsoft Word documents
Excel Files (.xlsx, .xls): Spreadsheets with all sheets
HTML (.html, .htm): Web pages with text content extraction
Text Files (.txt, .md): Plain text and Markdown files
JSON (.json): Structured JSON data
CSV (.csv): Comma-separated value files

Configuration

Environment Variables

GEMINI_API_KEY: Your Google Gemini API key (required)
GEMINI_MODEL: Gemini model to use (default: "gemini-pro")
GEMINI_MAX_TOKENS: Maximum tokens for generation (default: 8192)
PORT: Server port (default: 3000)
LOG_LEVEL: Logging level (default: "info")

Data Storage

Documents are stored locally in the ./data/ directory as JSON files. The indexed content includes both the original text and Gemini's semantic analysis for enhanced search capabilities.

How It Works

Document Processing: When you index a document, the MCP extracts text content based on the file type
Semantic Analysis: Gemini Pro analyzes the document to understand its structure, key topics, entities, and concepts
Enhanced Storage: Both original content and semantic analysis are stored for comprehensive search
Intelligent Search: When searching, Gemini Pro analyzes your query against all indexed documents and ranks results by relevance
Context-Aware Q&A: Questions are answered using Gemini's 1M context window to understand and analyze the full document content

Examples

Index a PDF document

{
  "tool": "index_document",
  "arguments": {
    "file_path": "./1737toi-tai-gioi-ban-cung-the.pdf",
    "document_id": "sample-document",
    "metadata": {
      "title": "Sample Document",
      "type": "report"
    }
  }
}

Search for content

{
  "tool": "search_documents",
  "arguments": {
    "query": "main topics and key insights from the document",
    "max_results": 3
  }
}

Ask a question about the document

{
  "tool": "ask_document",
  "arguments": {
    "document_id": "sample-document",
    "question": "What are the main conclusions or recommendations in this document?"
  }
}

Troubleshooting

Common Issues

"GEMINI_API_KEY environment variable is required"
- Make sure you've created a .env file with your API key
"Unsupported file type"
- Check that your file format is in the supported list
- Try converting to a supported format (e.g., DOC to DOCX)
"Document not found"
- Ensure the document ID matches exactly what you used when indexing
- Use list_indexed_documents to see all available documents
Memory issues with large documents
- Gemini Pro has a 1M token context limit
- Very large documents may need to be split into smaller sections

Development

Project Structure

src/
├── index.ts                 # Main MCP server entry point
├── services/
│   └── gemini-document.ts   # Gemini integration service
└── utils/
    ├── document-processor.ts # Document text extraction
    └── document-store.ts     # Local document storage

Building and Testing

# Build the project
npm run build

# Run in development mode
npm run dev

# Clean build artifacts
npm run clean

License

MIT License - feel free to use this MCP in your projects!

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Gemini Document MCP

Features

Prerequisites

Installation

Getting a Gemini API Key

Usage

Running the MCP Server

Available Tools

1. index_document

2. search_documents

3. get_document_content

4. ask_document

5. summarize_document

6. list_indexed_documents

Supported File Formats

Configuration

Environment Variables

Data Storage

How It Works

Examples

Index a PDF document

Search for content

Ask a question about the document

Troubleshooting

Common Issues

Development

Project Structure

Building and Testing

License

Contributing

1. `index_document`

2. `search_documents`

3. `get_document_content`

4. `ask_document`

5. `summarize_document`

6. `list_indexed_documents`