@codesota/ocr
v0.2.0
Published
Turn any PDF or image into clean Markdown or LaTeX. Free keyless API.
Downloads
231
Maintainers
Readme
hardparse
Turn any PDF or image into clean Markdown — tables as real Markdown tables, formulas as LaTeX, handwriting as readable text.
Free, keyless API. 100 requests/day per IP. Zero dependencies. Works in Node 18+ and modern browsers.
Install
npm i @codesota/ocrUse
import { parse } from "@codesota/ocr";
const md = await parse("invoice.pdf"); // Node: filesystem path
console.log(md);Browser — pass a File from an <input type=file>:
import { parse } from "@codesota/ocr";
input.addEventListener("change", async () => {
const md = await parse(input.files![0]);
output.textContent = md;
});Want LaTeX instead?
import { parseLatex } from "@codesota/ocr";
import { writeFile } from "node:fs/promises";
const tex = await parseLatex("paper.pdf");
await writeFile("out.tex", tex);
// $ xelatex out.tex → out.pdf, ready to include in your paperThe returned string is a compilable standalone .tex document — fontspec, amsmath, longtable preamble included, xelatex-ready.
More
import { parseFull } from "@codesota/ocr";
const result = await parseFull("scan.png");
console.log(result.markdown);
console.log(`${result.pageCount} pages, ${result.processingTimeMs} ms`);
console.log("remaining today:", result.rateLimit.remaining);Rate limits
Anonymous, per-IP: 100 requests / 24h rolling window.
import { parse, RateLimitError } from "@codesota/ocr";
try {
const md = await parse(file);
} catch (err) {
if (err instanceof RateLimitError) {
console.log(`hit limit, retry in ${err.retryAfter}s`);
}
}Need more? Email [email protected].
Self-host / custom endpoint
import { Client } from "@codesota/ocr";
const client = new Client({ baseUrl: "https://your-host" });
const md = await client.parse("doc.pdf");Or set HARDPARSE_BASE_URL.
Supported inputs
PDF, PNG, JPG, TIFF, HEIC, WEBP, BMP. 200 MB max.
