exine
v0.1.2
Published
Universal Markdown extraction engine (CLI)
Maintainers
Readme
Exine
Universal Markdown extraction engine for Rust.
37+ formats. Zero external dependencies. 10–96× faster than Pandoc.
Performance
| vs Competitor | HTML Speed | Text Speed | |-----------------|----------------|-----------------| | vs Pandoc | 10–55× faster | 3.6–4.7× faster | | vs Markitdown | 73–96× faster | 77–114× faster | | vs html2text | 8–17× faster | — |
HTML (133 KB): 6.3ms. DOCX (37 KB): 25ms. Text: near-instant.
Supported Formats
PDF, DOCX, PPTX, XLSX, ODT/ODS/ODP, EPUB, RTF, SVG, HTML, EML, MSG, plain text — plus URL fetching and Vision AI escalation for images (Gemini, Claude, OpenAI, Mistral).
Installation
CLI
cargo install exineLibrary
[dependencies]
exine = "0.1"Usage
Library
use exine::extract::extract_by_extension;
let bytes = std::fs::read("report.pdf").unwrap();
let markdown = extract_by_extension("pdf", &bytes).unwrap();CLI
exine report.pdf # Extract to stdout
exine report.pdf -o output.md # Extract to file
exine https://example.com # Fetch URL and extract
exine image.png --vision gemini # Vision AI for imagesFindability Shield
Protects content from AI scrapers while allowing search engines.
# Generate robots.txt
exine shield --robots > robots.txt
# Deploy to CDN (S3-compatible) with content
exine shield \
--robots \
--s3-bucket my-bucket \
--s3-region us-east-1 \
--content-dir ./outputWeb Scraping
Stealth scraping with CAPTCHA solving and pagination.
# Crawl with stealth and pagination
exine crawl "https://example.com" --stealth --depth 2
# Crawl with CAPTCHA escalation
exine crawl "https://site.com" --captcha --renderVision AI (Optional)
For scanned PDFs and images, Exine escalates to Vision AI:
export GEMINI_API_KEY=...
exine scanned.pdf --vision auto # Auto-selects best available providerSupported providers: gemini, claude, openai, mistral
Feature Flags
| Flag | Default | Description |
|------|---------|-------------|
| dashboard | ✅ | Axum web dashboard |
| ocr | ❌ | Tesseract OCR (requires libtesseract) |
| stt | ❌ | Whisper.cpp STT (requires model file) |
| vision | ❌ | Vision AI extraction |
| headless | ❌ | Headless Chrome via chromiumoxide |
Contributing
See CONTRIBUTING.md for guidelines.
Built by NMA
Exine powers the FIELD ecosystem (GRID + SCALAR + STRIA) — AI-native tools for European and Israeli startup fundraising. https://nma.vc
