docsmith-mcp
v0.0.3
Published
Python-powered document processing MCP for Excel, Word, PDF
Readme
docsmith-mcp
Python-powered document processing MCP with MCP Apps — Process Excel, Word, PDF, PowerPoint documents with ease using Python, and view them beautifully through an interactive MCP App.
Features
- Excel: Read/write
.xlsxfiles with sheet support and pagination - Word: Read/write
.docxfiles with paragraph and table support - PDF: Read
.pdffiles with text extraction and pagination - PowerPoint: Read
.pptxfiles with slide content extraction - Text Files: Read/write
.txt,.csv,.md,.json,.yaml,.ymlwith pagination support - Run Python: Execute Python code for flexible file operations and data processing
- MCP App: Beautiful React + Tailwind CSS app for viewing all document types
- Flexible Reading Modes: Raw full read or paginated for large files
- Powered by Pyodide: Runs in secure WebAssembly sandbox via code-runner-mcp
Quick Start
MCP Configuration
Add to your MCP client configuration (e.g., Claude Desktop, Cline, etc.):
Via npx (recommended):
{
"mcpServers": {
"docsmith": {
"command": "npx",
"args": ["-y", "docsmith-mcp"],
"env": {
"DOC_PAGE_SIZE": "100"
}
}
}
}Via global installation:
npm install -g docsmith-mcp{
"mcpServers": {
"docsmith": {
"command": "docsmith-mcp",
"env": {
"DOC_PAGE_SIZE": "100"
}
}
}
}Via local path:
{
"mcpServers": {
"docsmith": {
"command": "node",
"args": ["/path/to/docsmith-mcp/dist/index.js"]
}
}
}Then use the read_document tool:
{
"file_path": "/path/to/document.xlsx",
"mode": "paginated",
"page": 1,
"page_size": 50
}The MCP App will automatically open to display the document content beautifully.
Supported Formats
| Format | Extensions | Read | Write | Notes |
|--------|-----------|------|-------|-------|
| Excel | .xlsx | ✅ | ✅ | Multi-sheet support, pagination |
| Word | .docx | ✅ | ✅ | Paragraphs and tables |
| PDF | .pdf | ✅ | ❌ | Text extraction with pagination |
| PowerPoint | .pptx | ✅ | ❌ | Slide content extraction |
| CSV | .csv | ✅ | ✅ | - |
| Text | .txt, .md | ✅ | ✅ | Pagination support |
| JSON | .json | ✅ | ✅ | - |
| YAML | .yaml, .yml | ✅ | ✅ | - |
Tools
read_document
Read document content with automatic format detection.
Parameters:
file_path(string, required): Path to the documentmode(string, optional):"paginated"or"raw"(default:"paginated")page(number, optional): Page number for paginated mode (default: 1)page_size(number, optional): Items per page (default: 100)sheet_name(string, optional): Sheet name for Excel files
Example:
{
"file_path": "/path/to/document.xlsx",
"mode": "paginated",
"page": 1,
"page_size": 50,
"sheet_name": "Sheet1"
}write_document
Write document content.
Parameters:
file_path(string, required): Output pathformat(string, required):"excel","word","csv","txt","json","yaml"data(array/object, required): Document content
Example:
{
"file_path": "/path/to/output.xlsx",
"format": "excel",
"data": [
["Product", "Q1", "Q2"],
["Laptop", 100, 150],
["Mouse", 500, 600]
]
}get_document_info
Get document metadata without reading full content.
Parameters:
file_path(string, required): Path to the document
Example:
{
"file_path": "/path/to/document.pdf"
}run_python
Execute Python code for flexible file operations, data processing, and custom tasks. Supports any file format and Python libraries.
Parameters:
code(string, required): Python code to executepackages(object, optional): Package mappings (import_name -> pypi_name) for required dependenciesfile_paths(array, optional): File paths that the code needs to access
Examples:
Read and process any file:
{
"code": "import json\nwith open('/path/to/file.json') as f:\n data = json.load(f)\n result = len(data)\n print(json.dumps({'count': result}))",
"file_paths": ["/path/to/file.json"]
}Batch rename files with regex:
{
"code": "import os, re\nfolder = '/path/to/files'\nfor name in os.listdir(folder):\n new_name = re.sub(r'old_', 'new_', name)\n os.rename(os.path.join(folder, name), os.path.join(folder, new_name))\nprint(json.dumps({'success': True}))",
"file_paths": ["/path/to/files"]
}Process data with pandas:
{
"code": "import pandas as pd\ndf = pd.read_csv('/path/to/data.csv')\nsummary = df.describe().to_dict()\nprint(json.dumps(summary))",
"packages": {"pandas": "pandas"},
"file_paths": ["/path/to/data.csv"]
}Extract archive files:
{
"code": "import zipfile, os\nwith zipfile.ZipFile('/path/to/archive.zip', 'r') as z:\n z.extractall('/path/to/output')\nfiles = os.listdir('/path/to/output')\nprint(json.dumps({'extracted_files': files}))",
"file_paths": ["/path/to/archive.zip", "/path/to/output"]
}MCP App
The built-in MCP App provides a beautiful, interactive interface for viewing documents:
- Excel: Interactive tables with sticky headers
- PDF: Page-by-page text viewing
- Word: Paragraph and table rendering
- PowerPoint: Slide navigation
Built with React 19, Tailwind CSS v4, and Lucide icons.
Configuration
Environment variables for customizing behavior:
| Variable | Description | Default |
|----------|-------------|---------|
| DOC_RAW_FULL_READ | Enable full raw read mode | false |
| DOC_PAGE_SIZE | Default items per page | 100 |
| DOC_MAX_FILE_SIZE | Max file size in MB | 50 |
Contributing
See CONTRIBUTING.md for development setup and contribution guidelines.
License
MIT
