feishu-ocr-assistant
v1.0.0
Published
MCP tool for OCR image recognition and automatic data entry into Feishu Bitable
Readme
Feishu OCR Assistant
An MCP (Model Context Protocol) tool that recognizes content in images and automatically populates Feishu Bitable (multidimensional tables).
Features
- Uses Qwen-VL-Plus model to recognize content in images
- Dynamically fetches field definitions from Feishu Bitable
- Constructs dynamic prompts based on table structure
- Automatically inserts structured data into Feishu Bitable
Prerequisites
Before you begin, ensure you have obtained the following credentials:
- DashScope API Key (for Qwen-VL-Plus)
- Feishu App ID and App Secret
- Feishu Bitable App Token and Table ID
Installation
- Clone the repository
- Install dependencies:
npm install - Copy
.env.exampleto.envand fill in your credentials:cp .env.example .env
Usage
Build the project:
npm run buildStart the service:
npm startUse with an MCP-compatible client (e.g., Claude Desktop): Send an image URL with a command like "Please help me enter the content of this invoice/picture into the system."
How It Works
- Initialize Configuration: Load API keys and Feishu app credentials
- Get Feishu Token: Call Feishu's
auth/v3/app_access_token/internalendpoint to get access token - Get Table Definition: Call Feishu's
bitable/v1/.../fieldsendpoint to analyze table columns - Build Dynamic Prompt: Convert field list into a prompt, specifying which fields to extract
- Call Vision Model: Send image and prompt to Qwen-VL-Plus model
- Write Back Data: Parse the JSON response and call Feishu's
records/batch_createto insert data
Development
For development, you can run the service in dev mode:
npm run dev