npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

resume-extract

v0.3.0

Published

Extract structured data from resume text, PDF, and DOCX using ONNX NER model

Readme

resume-extract

Fast, local resume extraction using a fine-tuned DistilBERT NER model. Extracts structured data from resume text, PDF, or DOCX via local document parsing + ONNX inference.

Installation

Binary (recommended):

curl -fsSL https://raw.githubusercontent.com/somus/resume-extract/main/scripts/install-release.sh | bash
resume-extract --help

The installer downloads the latest GitHub Release asset into ~/.local/bin. Override INSTALL_DIR, REPO, or VERSION if needed:

INSTALL_DIR=/usr/local/bin VERSION=v0.1.0 curl -fsSL https://raw.githubusercontent.com/somus/resume-extract/main/scripts/install-release.sh | bash

As library:

bun install

Build from source:

bun run build:bin
./dist/resume-extract --input ./resume.pdf --ats

Notes:

  • parseResume() is text-only fast path.
  • parseResumePdf() and parseResumeDocx() use @kreuzberg/node for local document text extraction.
  • parseResumePdf(..., { ocr: true }) enables OCR for scanned PDFs (defaults to Tesseract). Supports tesseract, easyocr, and paddleocr backends via { ocr: { backend: "easyocr" } }. OCR is much slower than text parsing.
  • On first run, the CLI automatically downloads the required oksomu/resume-ner model files into a local cache if they are missing and shows download progress. Pass --model to use a custom directory or --no-download to require a pre-populated model directory.
  • Library consumers should manage model directories explicitly.

Features

  • Structured extraction: name, email, phone, location, companies, titles, education, skills
  • Document input support: parse raw text, PDF, or DOCX
  • ATS scoring: completeness score with actionable issues list
  • Seniority inference: from job titles + years of experience
  • Country detection: from location + phone prefix
  • Experience years: computed from employment dates
  • Section-aware chunking: splits long resumes at paragraph boundaries for >512 token texts
  • Section detection: rule-based gap-filling for skills, certifications, and languages the model misses
  • 100% local: runs offline via ONNX, no API calls
  • Fast text parsing: ~15ms per resume after model load
  • Optional document parsing: PDF via Kreuzberg, including OCR when enabled; DOCX via Kreuzberg

Model

Uses oksomu/resume-ner — a DistilBERT model fine-tuned for resume NER and exported to ONNX for local structured extraction.

Latest model metrics (from model card, noise-augmented, 25 epochs, entity-level exact-match via seqeval):

  • entity F1: 97.77%
  • structured micro F1: 97.88%
  • clean resume F1: 99.18%
  • noisy resume F1: 69.24% (OCR/scraped text)
  • quantized ONNX size: 63MB

Entity types:

  • NAME, EMAIL, PHONE, LOCATION, COMPANY, TITLE, DATE, DEGREE, INSTITUTION, FIELD, SKILL, CERT, LANGUAGE

Model directory should include:

  • resume_config.json — pre-processing, post-processing, and inference rules
  • companies.json — company gazetteer for post-processing
  • city_country_map.json — 317 cities for country inference
  • tokenizer/config files
  • onnx/model_quantized.onnx or onnx/model.onnx

Usage

import {
  computeATSScore,
  parseResume,
  parseResumeDocx,
  parseResumePdf,
} from "resume-extract";

const result = await parseResume(resumeText, "/path/to/model");
const fromPdf = await parseResumePdf("/path/to/resume.pdf", "/path/to/model");
const fromScannedPdf = await parseResumePdf(pdfBytes, "/path/to/model", { ocr: true });
const fromDocx = await parseResumeDocx("/path/to/resume.docx", "/path/to/model");

// result.personal: { name, email, phone, location }
// result.experience: [{ title, company, start_date, end_date }]
// result.education: [{ degree, field, institution }]
// result.skills: ["Python", "AWS", ...]
// result.seniority: "Senior"
// result.country: "India"
// result.experience_years: 10

const ats = computeATSScore(result);
// ats.score: 87
// ats.issues: [{ severity: "medium", message: "..." }]

CLI

Run directly with Bun:

 bun run cli ./resume.pdf --ats
 bun run cli --text "Jane Doe..."
 bun run cli ./resume.pdf --view json --output result.json
cat ./resume.txt | bun run cli

# Batch mode
bun run cli batch ./resumes/*.pdf --ats
bun run cli batch --input-dir ./resumes --glob '**/*' --output batch.jsonl
bun run cli batch --input-dir ./resumes --output batch.csv --output-format csv
bun run cli batch --input-dir ./resumes --fail-fast

# Explicit model setup and diagnostics
bun run cli setup-model
bun run cli doctor --ocr
bun run cli doctor --fix
bun run cli doctor --json

Common flags:

  • --model <path>: model directory
  • --model-repo <repo>: alternate Hugging Face repo for first-run download
  • --model-revision <rev>: alternate model revision for first-run download
  • --no-download: disable automatic model download
  • --input <path>: input file path
  • --text <text>: inline text input
  • --format <auto|text|pdf|docx>: override format detection
  • --ocr: enable PDF OCR (defaults to Tesseract)
  • --ocr-backend <backend>: OCR backend: tesseract, easyocr, or paddleocr
  • --ats: include ATS scoring in output
  • --view <json|pretty>: render machine JSON or human-friendly terminal output
  • --output <path>: write structured output to a file
  • --compact: emit minified JSON

Batch-only flags:

  • batch [inputs...]: process many resumes at once
  • --input-dir <path>: scan a directory for resumes
  • --glob <pattern>: file selection pattern for directory scanning
  • --concurrency <n>: parallel batch workers, defaults to 4
  • --fail-fast: stop batch processing on the first extraction error
  • --output-format <json|jsonl|csv>: structured batch output format

Extra commands:

  • setup-model: download the configured model into the local cache or custom --model path
  • update-model: pull the latest model from Hugging Face, re-downloading all files
  • doctor: inspect model readiness, file integrity, writable cache paths, runtime platform, and optional OCR availability
  • doctor --fix: download/repair the configured model, then report status
  • doctor --json: emit machine-readable diagnostics

The CLI checks for model updates once per day. If a newer model is available on Hugging Face, a warning is shown on stderr. Run update-model to pull the latest.

Output behavior:

  • Single resume commands default to pretty view on a TTY and json otherwise.
  • Batch commands default to pretty summaries on a TTY and structured JSON otherwise.
  • Use --view json when piping to other tools.
  • Use --output with batch plus --output-format jsonl for machine-friendly bulk processing.
  • Use --output-format csv when you want spreadsheet-friendly flat output with summary fields plus numbered experience and education columns.

Limitations

  • English resumes only
  • Max 512 tokens per chunk (section-aware chunking splits at paragraph boundaries for longer resumes)
  • Image-based/scanned PDFs require OCR before text extraction
  • Two-column PDF layouts may flatten during text extraction

Development

bun run test        # Run tests
bun run check       # Biome lint + format check
bun run typecheck   # TypeScript type check
bun run format      # Auto-format

License

MIT