n8n-nodes-mineru

v0.1.10

Published

2 months ago

Free and comprehensive document parsing capabilities

0High
0Medium
0Low

clenlu

n8n-community-node-package

n8n-nodes-mineru

📖 Introduction

n8n-nodes-mineru is a powerful n8n community node package that integrates the MinerU document parsing API, providing you with free and comprehensive document parsing capabilities. It supports intelligent parsing of various formats including PDF, Word, PowerPoint, images, and can automatically recognize text, tables, formulas, and image content.

✨ Key Features

🚀 Multi-format Support: Supports PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
🧠 Intelligent Recognition: Automatically recognizes text, tables, formulas, and images in documents
🌐 Dual Service Modes: Supports both online API service and local self-deployed service
📊 Multiple Output Formats: Supports Markdown, JSON, DOCX, HTML, LaTeX and other output formats
🔧 Flexible Configuration: Provides rich parameter configuration options to meet different scenario requirements
🌍 Multi-language Support: Supports Chinese, English, and automatic language detection

📦 Included Nodes

1. MinerU Node

Function: Uses MinerU online API service to parse documents
Features: Automatically creates tasks and waits for results, returns parsed ZIP files
Use Case: Users who need to use the official API service

2. MinerU Custom Service Node

Function: Connects to self-deployed MinerU API server
Features: Supports local file upload with more custom configuration options
Use Case: Users with self-deployment needs or requiring more control

🛠️ Installation

Method 1: Install via n8n Community Nodes

Open n8n interface
Go to Settings > Community Nodes
Click Install Community Node
Enter package name: n8n-nodes-mineru
Click Install

Method 2: Install via npm

# Execute in n8n root directory
npm install n8n-nodes-mineru

Method 3: Manual Installation

# Clone repository
git clone https://github.com/opendatalab/awsome-mineru.git
cd awsome-mineru/n8n-nodes-mineru

# Install dependencies
npm install

# Build project
npm run build

# Link to n8n (for development environment)
npm link

🔑 Credential Configuration

MinerU API Credentials

Create new credentials in n8n
Select MinerU API type
Enter your API Token
Save credentials

Get API Token:

Visit MinerU Official Website
Register an account and obtain API Token

📋 Usage Guide

MinerU Node Usage

Add Node: Add "MinerU" node to your workflow
Configure Credentials: Select the created MinerU API credentials
Set Parameters:
- Document URL: Link to the document to be parsed (required)
- Enable OCR: Whether to enable image text recognition
- Enable Formula Recognition: Whether to recognize mathematical formulas
- Enable Table Recognition: Whether to recognize table structures
- Document Language: Select the main language of the document
- Extra Export Format: Select additional output formats needed
- Model Version: Select the MinerU model version to use
Execute Node: The node will automatically create parsing task and wait for completion
Get Results: Returns ZIP file containing all results after parsing completion

MinerU Custom Service Node Usage

Deploy Service: First need to deploy MinerU API server
Add Node: Add "MinerU Custom Service" node to your workflow
Configure Parameters:
- API Version: Select V1 or V2
- File URL: Link to the document to be parsed
- API Server Address: Your MinerU server address
- Output Directory: Output directory for parsing results
- Configure corresponding parameters based on selected API version
Execute Node: Directly call your server for parsing

🔧 Parameter Description

Common Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | Document URL | String | - | URL address of the document to be parsed | | Enable OCR | Boolean | false | Whether to enable optical character recognition | | Enable Formula Recognition | Boolean | true | Whether to recognize mathematical formulas | | Enable Table Recognition | Boolean | true | Whether to recognize table structures | | Document Language | Option | Chinese | Main language of the document |

MinerU Node Specific Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | Data ID | String | - | Optional data identifier | | Page Range | String | - | Specify the page range to parse | | Extra Export Format | Multi-select | [] | Additional output formats besides default | | Polling Interval | Number | 5 | Interval time to check task status (seconds) | | Maximum Wait Time | Number | 10 | Maximum time to wait for task completion (minutes) |

Custom Service Node Specific Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | API Server Address | String | http://localhost:8000 | MinerU server address | | Output Directory | String | ./output | Output directory for parsing results | | Backend Engine | Option | pipeline | Processing engine type | | Return Markdown | Boolean | true | Whether to return Markdown format results |

🌟 Usage Examples

Example 1: Parse PDF Document and Extract Text

{
  "nodes": [
    {
      "name": "MinerU",
      "type": "n8n-nodes-mineru.mineru",
      "parameters": {
        "url": "https://example.com/document.pdf",
        "isOcr": true,
        "enableFormula": true,
        "enableTable": true,
        "language": "ch",
        "extraFormats": ["docx", "html"]
      }
    }
  ]
}

Example 2: Parse Multiple Format Documents Using Custom Service

{
  "nodes": [
    {
      "name": "MinerU Custom Service",
      "type": "n8n-nodes-mineru.mineruCustom",
      "parameters": {
        "apiVersion": "v2",
        "fileUrl": "https://example.com/presentation.pptx",
        "serverUrl": "http://your-mineru-server:8000",
        "langList": "auto",
        "formulaEnable": true,
        "tableEnable": true,
        "returnMd": true
      }
    }
  ]
}

🚀 Advanced Usage

Batch Document Processing

You can combine with other n8n nodes to implement batch document processing:

Use HTTP Request node to get document list
Use Split In Batches node to process in batches
Use MinerU node to parse each document
Use Merge node to combine results

Result Post-processing

After parsing completion, you can:

Use Move Binary Data node to process returned files
Use HTTP Request node to upload results to cloud storage
Use Email node to send parsing results
Use Webhook node to trigger subsequent processes

🔍 Troubleshooting

Common Issues

Q: Node execution fails with "API Token verification failed" A: Please check if your API Token is correct and ensure you have obtained a valid Token from the MinerU official website.

Q: Document parsing timeout A: You can appropriately increase the "Maximum Wait Time" parameter or check if the document size is too large.

Q: Custom service connection failed A: Please ensure your MinerU server is running normally and the network connection is stable.

Q: Some document formats cannot be parsed A: Please confirm the document format is in the supported list and check if the document is corrupted.

Debugging Tips

Enable Node Debug: Enable "Continue On Fail" option in node settings
Check Error Logs: Check n8n error logs for detailed information
Test Connection: Use simple documents to test if connection is normal first
Check Parameters: Ensure all required parameters are set correctly

🤝 Contributing

We welcome community contributions! If you want to contribute to the project:

Fork this repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE.md file for details.

🔗 Related Links

👥 Contact Us

Author: opendatalab
Email: [email protected]
GitHub: @opendatalab

If this project helps you, please give us a ⭐️!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

n8n-nodes-mineru

📖 Introduction

✨ Key Features

📦 Included Nodes

1. MinerU Node

2. MinerU Custom Service Node

🛠️ Installation

Method 1: Install via n8n Community Nodes

Method 2: Install via npm

Method 3: Manual Installation

🔑 Credential Configuration

MinerU API Credentials

📋 Usage Guide

MinerU Node Usage

MinerU Custom Service Node Usage

🔧 Parameter Description

Common Parameters

MinerU Node Specific Parameters

Custom Service Node Specific Parameters

🌟 Usage Examples

Example 1: Parse PDF Document and Extract Text

Example 2: Parse Multiple Format Documents Using Custom Service

🚀 Advanced Usage

Batch Document Processing

Result Post-processing

🔍 Troubleshooting

Common Issues

Debugging Tips

🤝 Contributing

📄 License

🔗 Related Links

👥 Contact Us