paddleocr-skills

v1.2.1

Published

4 months ago

PaddleOCR Skills - Install PP-OCRv5 and PaddleOCR-VL 1.5 for Claude Code

0High
0Medium
0Low

aidenwu0209

paddle-ocr ocr claude-code skill ai-studio ppocrv5 paddleocr-vl-1.5

PaddleOCR Skills

One-command installer to add PaddleOCR Skills to your Claude Code project.

Quick Start

Install PaddleOCR Skills in your Claude Code project with a single command:

npx paddleocr-skills

The installer will:

Prompt you to select skills (PP-OCRv5 and/or PaddleOCR-VL)
Copy skill files to your project
Install Python dependencies
Guide you through API configuration
Verify the installation

What's Included

PP-OCRv5 (Text OCR)

Fast text recognition for images and documents
Adaptive quality modes (auto/fast/quality)
Supports URLs and local files
Confidence scoring and quality metrics

PaddleOCR-VL (Document Parsing)

Advanced document structure analysis
Table, formula, and chart recognition
Layout detection (headers, footers, page numbers)
Complete document parsing with reading order

Prerequisites

Node.js: 14.0.0 or higher
Python: 3.7 or higher
Claude Code: Installed and configured
API Access: Get your API credentials at Baidu AI Studio

Installation

Interactive Mode (Recommended)

npx paddleocr-skills

The installer will guide you through:

Skill selection
Python dependency installation
API configuration

Skip Configuration

If you want to configure later:

npx paddleocr-skills
# Choose "No" when asked about configuration

Then configure manually:

# For PP-OCRv5
python scripts/ppocrv5/configure.py

# For PaddleOCR-VL
python scripts/paddleocr-vl-1.5/configure.py

Usage

After installation, use the skills in your Claude Code session:

PP-OCRv5 Example

# Extract text from an image
python scripts/ppocrv5/ocr_caller.py --file-url "https://example.com/image.jpg" --pretty

# Save result to file
python scripts/ppocrv5/ocr_caller.py --file-path "document.pdf" --output result.json --pretty

PaddleOCR-VL Example

# Parse a complex document
python scripts/paddleocr-vl-1.5/vl_caller.py --file-url "https://example.com/paper.pdf" --pretty

# Save result to file
python scripts/paddleocr-vl-1.5/vl_caller.py --file-path "invoice.pdf" --output result.json --pretty

Project Structure

After installation, your project will have:

your-project/
├── skills/
│   ├── ppocrv5/
│   │   └── SKILL.md
│   └── paddleocr-vl-1.5/
│       └── SKILL.md
├── scripts/
│   ├── ppocrv5/
│   │   ├── ocr_caller.py
│   │   ├── configure.py
│   │   ├── smoke_test.py
│   │   └── requirements.txt
│   └── paddleocr-vl-1.5/
│       ├── vl_caller.py
│       ├── configure.py
│       ├── smoke_test.py
│       └── requirements.txt
├── references/
│   ├── ppocrv5/
│   └── paddleocr-vl-1.5/
└── .env.example

Configuration

Manual Configuration

If you skipped auto-configuration, create a .env file:

# Copy the example file
cp .env.example .env

# Edit with your credentials
nano .env

Add your API credentials:

# PP-OCRv5
API_URL=https://your-api-url.aistudio-app.com/ocr
TOKEN=your-token-here

# PaddleOCR-VL
VL_API_URL=https://your-vl-api-url.com/v1
VL_TOKEN=your-vl-token-here

Using Configuration Scripts

# Configure PP-OCRv5
python scripts/ppocrv5/configure.py --api-url "YOUR_URL" --token "YOUR_TOKEN"

# Configure PaddleOCR-VL
python scripts/paddleocr-vl-1.5/configure.py --api-url "YOUR_URL" --token "YOUR_TOKEN"

Verification

Test your installation:

# Test PP-OCRv5
python scripts/ppocrv5/smoke_test.py

# Test PaddleOCR-VL
python scripts/paddleocr-vl-1.5/smoke_test.py

Troubleshooting

Python Not Found

Ensure Python is in your PATH:

python --version

If not found, install Python 3.7+ from python.org.

API Configuration Error

Get your API credentials:

Visit Baidu AI Studio
Create a new task or use an existing one
Copy the API URL and TOKEN
Run the configuration script

Permission Denied

On Windows, run your terminal as Administrator if you encounter permission errors.

Documentation

Each skill includes comprehensive documentation:

skills/ppocrv5/SKILL.md: PP-OCRv5 usage guide
skills/paddleocr-vl-1.5/SKILL.md: PaddleOCR-VL usage guide
references/: Technical reference documentation

License

MIT

Support

For issues and questions:

GitHub Issues: Report a bug
Documentation: See skills/*/SKILL.md files

Credits

Built with:

PaddleOCR by PaddlePaddle
Claude Code by Anthropic