@kafkaliu/nezha-ocr-cli
v0.1.2
Published
CLI tool to convert PDF to JSON-AST using PaddleOCR API
Readme
nezha-ocr-cli
A CLI tool to convert PDF documents to JSON-AST format using PaddleOCR API.
Features
- 📄 PDF Processing: Support for multi-page PDF document OCR recognition
- 🔄 Split and Merge: Automatic splitting for large files, with automatic result merging
- 📊 Structured Output: Returns structured data including text, layout, tables, and more
- 🎯 High Accuracy: Support for document orientation classification, table recognition, and other advanced features
Installation
npm install -g @kafkaliu/nezha-ocr-cliAfter installation, use the nezha-ocr command to run.
Or use npx directly (no installation required):
npx @kafkaliu/nezha-ocr-cli input.pdfUsage
Basic Usage
nezha-ocr input.pdfOutput to File
nezha-ocr input.pdf -o output.jsonUsing Environment Variables
OCR_API_URL="https://your-api-url.com" \
OCR_API_TOKEN="your-token" \
nezha-ocr input.pdf -o output.jsonCommand Line Options
| Option | Short | Description | Default |
|--------|-------|-------------|----------|
| --api-url <url> | -u | PaddleOCR API URL | OCR_API_URL env var |
| --token <token> | -t | PaddleOCR API token | OCR_API_TOKEN env var |
| --output <file> | -o | Output file name | stdout |
| --format <format> | -f | Output format | json |
| --max-pages <number> | -m | Maximum pages per API call | 100 |
| --file-type <type> | | File type (0=PDF, 1=images) | 0 |
| --use-doc-orientation-classify | | Use document orientation classification | false |
| --use-doc-unwarping | | Use document unwarping | false |
| --use-chart-recognition | | Use chart recognition | false |
Output Format
The OCR API returns data in the following structure:
{
logId: string;
errorCode: number;
errorMsg: string;
result: {
layoutParsingResults: [
{
prunedResult: {
page_count: number;
width: number;
height: number;
model_settings: ModelSettings;
parsing_res_list: [
{
block_label: string; // Block type (text, title, table, etc.)
block_content: string; // OCR recognized text
block_bbox: number[]; // Bounding box [x1, y1, x2, y2]
block_id: number;
block_order: number;
group_id: number;
block_polygon_points: number[][]; // Polygon coordinates
}
];
};
markdown: {
text: string;
images: Record<string, string>;
};
}
];
dataInfo: {
type: 'pdf' | 'image';
numPages: number;
pages: Array<{ width: number; height: number }>;
};
};
}Development
Install Dependencies
npm installBuild
npm run buildRun Tests
# Run all tests (unit tests only, no API required)
npm test
# Run only unit tests
npm run test:unit
# Run integration tests (requires OCR environment variables)
npm run test:integration
# Watch mode
npm run test:watch
# Generate coverage report
npm run test:coverageTest Environment Variables
Integration tests require the following environment variables:
export OCR_API_URL="https://your-api-url.com/layout-parsing"
export OCR_API_TOKEN="your-api-token"If these are not set, integration tests will be skipped.
Generate Test PDF
npm run test:fixturesThis generates a test PDF file in the test-data/ directory.
Publishing
# Update version and create tag (automatically updates package.json)
npm version patch # 0.1.0 → 0.1.1 (bug fixes)
npm version minor # 0.1.0 → 0.2.0 (new features)
npm version major # 0.1.0 → 1.0.0 (breaking changes)
# Push code and tag
git push
git push origin v0.1.1After pushing the tag, GitHub Actions will automatically run tests, build, and publish to npm.
Project Structure
nezha-ocr-cli/
├── src/ # Source code
│ ├── cli.ts # CLI entry point
│ ├── ocr-client.ts # OCR API client
│ ├── pdf-splitter.ts # PDF splitting
│ ├── result-merger.ts # Result merging
│ └── types.ts # Type definitions
├── tests/ # Test code
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── fixtures/ # Test utilities
│ └── setup.ts # Test configuration
├── test-data/ # Test data
├── dist/ # Build output
└── package.jsonLicense
MIT
