@vinhnguyen/glm-ocr-mcp
v3.0.0
Published
MCP server for GLM-OCR - local OCR capabilities via Ollama
Maintainers
Readme
GLM-OCR MCP Server
A Model Context Protocol (MCP) server that provides local OCR capabilities using the GLM-OCR model via Ollama. This server enables AI assistants (like Claude Desktop or VS Code with Continue) to extract text, tables, and mathematical formulas from images locally.
🌟 Features
- 📝 Text Extraction: Convert images into clean Markdown text.
- 📊 Table Recovery: Preserve complex table structures in Markdown format.
- 🧮 Math Support: Automatically convert mathematical formulas to LaTeX.
- 🖼️ Broad Support: Works with PNG, JPG, and other standard image formats.
- 🔧 Privacy-Focused: Local processing via Ollama (no cloud dependencies).
📋 Prerequisites
ollama pull glm-ocr🚀 Quick Start (via npx)
You can run this server directly without cloning the repository by using npx.
⚙️ Configuration
Claude Desktop
Add the following to your Claude Desktop configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"glm-ocr": {
"command": "npx",
"args": ["-y", "@vinhnguyen/glm-ocr-mcp"],
"env": {}
}
}
}Continue.dev (VS Code / JetBrains)
Add this to your config.json (usually found at ~/.continue/config.json or ~/.continue/config.yaml):
{
"mcpServers": {
"glm-ocr": {
"command": "npx",
"args": ["-y", "@vinhnguyen/glm-ocr-mcp"],
"env": {},
"description": "GLM-OCR document processing via Ollama",
"timeout": 120
}
}
}🛠️ Usage
Once configured and Ollama is running, your AI assistant will have a new tool called ocr_document. You can use natural language to trigger it:
- "Extract the text from this screenshot:
/path/to/image.png" - "Read the table in
/Users/me/Downloads/invoice.jpgand format it as markdown." - "What is the math formula in this image?
/path/to/notes.png"
How It Works
- The assistant sends a local image path to the server.
- The server validates the file and converts it to base64.
- It communicates with your local Ollama instance using the
glm-ocrmodel. - The extracted content is returned directly into your chat.
📂 Manual Installation (For Development)
If you want to modify the server or run it from source:
- Clone the repository:
git clone https://github.com/vinhnguyen/glm-ocr-mcp.git cd glm-ocr-mcp - Install dependencies:
npm install - Run locally for testing:
npm start
❓ Troubleshooting
Error: "Ollama connection refused"
Ensure Ollama is running in your system tray or run ollama serve in a terminal.
Error: "model 'glm-ocr' not found"
Run ollama pull glm-ocr to download the specific OCR model.
Timeout Issues
OCR processing is GPU/CPU intensive. If the process times out, ensure your configuration has a timeout value of at least 120 seconds.
📄 License
ISC
✨ Acknowledgments
⚠️ Note for Publishing
If this is your first time publishing this scoped package to NPM, remember to use the public access flag:
npm publish --access public