npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

gemini-multimodal-mcp

v1.1.4

Published

MCP server with multimodal capabilities - process documents, images, videos, audio using Gemini Pro with 1M context window

Downloads

16

Readme

Gemini Document MCP

A Model Context Protocol (MCP) server that uses Google's Gemini Pro to index and search document contents with 1M context capability. This MCP provides intelligent document processing, semantic search, and Q&A capabilities for various document formats.

Features

  • Document Indexing: Index PDF, DOCX, Excel, HTML, TXT, MD, JSON, and CSV files
  • Semantic Search: Use Gemini Pro's 1M context window for intelligent document search
  • Document Q&A: Ask questions about specific documents using natural language
  • Summarization: Generate brief, detailed, or key-point summaries of documents
  • Multiple Formats: Support for various document types with automatic content extraction
  • Metadata Support: Store and search documents with custom metadata

Prerequisites

  • Node.js 18.0.0 or higher
  • Google Gemini API key

Installation

  1. Clone or download this MCP project

  2. Install dependencies:

    npm install
  3. Create a .env file based on .env.example:

    cp .env.example .env
  4. Add your Gemini API key to the .env file:

    GEMINI_API_KEY=your_actual_api_key_here
  5. Build the project:

    npm run build

Getting a Gemini API Key

  1. Go to Google AI Studio
  2. Sign in with your Google account
  3. Click "Create API Key"
  4. Copy the generated API key
  5. Add it to your .env file

Usage

Running the MCP Server

npm start

For development with auto-reload:

npm run dev

Available Tools

1. index_document

Index a document for later search and analysis.

Parameters:

  • file_path (string, required): Path to the document
  • document_id (string, required): Unique identifier for the document
  • metadata (object, optional): Custom metadata for the document

Example:

{
  "file_path": "./documents/report.pdf",
  "document_id": "quarterly-report-2024",
  "metadata": {
    "title": "Q1 2024 Financial Report",
    "department": "Finance",
    "author": "John Doe"
  }
}

2. search_documents

Search indexed documents using natural language queries.

Parameters:

  • query (string, required): Natural language search query
  • max_results (number, optional): Maximum number of results (default: 5)
  • include_content (boolean, optional): Include full content in results (default: false)

Example:

{
  "query": "financial performance and revenue growth",
  "max_results": 3,
  "include_content": false
}

3. get_document_content

Retrieve the full content of a specific document.

Parameters:

  • document_id (string, required): Unique identifier of the document

4. ask_document

Ask questions about a specific document.

Parameters:

  • document_id (string, required): Unique identifier of the document
  • question (string, required): Question to ask about the document

Example:

{
  "document_id": "quarterly-report-2024",
  "question": "What was the revenue growth percentage in Q1?"
}

5. summarize_document

Generate a summary of a document.

Parameters:

  • document_id (string, required): Unique identifier of the document
  • summary_type (string, optional): Type of summary - "brief", "detailed", or "key_points" (default: "brief")

6. list_indexed_documents

List all indexed documents with their metadata.

Supported File Formats

  • PDF (.pdf): Text extraction from PDF documents
  • Word Documents (.docx): Microsoft Word documents
  • Excel Files (.xlsx, .xls): Spreadsheets with all sheets
  • HTML (.html, .htm): Web pages with text content extraction
  • Text Files (.txt, .md): Plain text and Markdown files
  • JSON (.json): Structured JSON data
  • CSV (.csv): Comma-separated value files

Configuration

Environment Variables

  • GEMINI_API_KEY: Your Google Gemini API key (required)
  • GEMINI_MODEL: Gemini model to use (default: "gemini-pro")
  • GEMINI_MAX_TOKENS: Maximum tokens for generation (default: 8192)
  • PORT: Server port (default: 3000)
  • LOG_LEVEL: Logging level (default: "info")

Data Storage

Documents are stored locally in the ./data/ directory as JSON files. The indexed content includes both the original text and Gemini's semantic analysis for enhanced search capabilities.

How It Works

  1. Document Processing: When you index a document, the MCP extracts text content based on the file type
  2. Semantic Analysis: Gemini Pro analyzes the document to understand its structure, key topics, entities, and concepts
  3. Enhanced Storage: Both original content and semantic analysis are stored for comprehensive search
  4. Intelligent Search: When searching, Gemini Pro analyzes your query against all indexed documents and ranks results by relevance
  5. Context-Aware Q&A: Questions are answered using Gemini's 1M context window to understand and analyze the full document content

Examples

Index a PDF document

{
  "tool": "index_document",
  "arguments": {
    "file_path": "./1737toi-tai-gioi-ban-cung-the.pdf",
    "document_id": "sample-document",
    "metadata": {
      "title": "Sample Document",
      "type": "report"
    }
  }
}

Search for content

{
  "tool": "search_documents",
  "arguments": {
    "query": "main topics and key insights from the document",
    "max_results": 3
  }
}

Ask a question about the document

{
  "tool": "ask_document",
  "arguments": {
    "document_id": "sample-document",
    "question": "What are the main conclusions or recommendations in this document?"
  }
}

Troubleshooting

Common Issues

  1. "GEMINI_API_KEY environment variable is required"

    • Make sure you've created a .env file with your API key
  2. "Unsupported file type"

    • Check that your file format is in the supported list
    • Try converting to a supported format (e.g., DOC to DOCX)
  3. "Document not found"

    • Ensure the document ID matches exactly what you used when indexing
    • Use list_indexed_documents to see all available documents
  4. Memory issues with large documents

    • Gemini Pro has a 1M token context limit
    • Very large documents may need to be split into smaller sections

Development

Project Structure

src/
├── index.ts                 # Main MCP server entry point
├── services/
│   └── gemini-document.ts   # Gemini integration service
└── utils/
    ├── document-processor.ts # Document text extraction
    └── document-store.ts     # Local document storage

Building and Testing

# Build the project
npm run build

# Run in development mode
npm run dev

# Clean build artifacts
npm run clean

License

MIT License - feel free to use this MCP in your projects!

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.