npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cesarandreslopez/occ

v0.2.0

Published

Office Cloc and Count — scc-style summary tables for office documents

Readme

What is this?

OCC scans directories for office documents (DOCX, XLSX, PPTX, PDF, ODT, ODS, ODP), extracts metrics like word counts, page counts, slide counts, and cell counts, and displays them in scc-style summary tables. When code files are also present, it auto-detects them and shells out to scc for code metrics, printing both sections together.

Features

  • Office document metrics — words, pages, paragraphs, slides, sheets, rows, cells
  • Seven formats supported — DOCX, XLSX, PPTX, PDF, ODT, ODS, ODP
  • Document structure extraction--structure parses heading hierarchy into a navigable tree with dotted section codes (1, 1.1, 1.2, ...)
  • Code metrics via scc — auto-detects code files and integrates scc output
  • Multiple output modes — grouped by type, per-file breakdown, or JSON
  • CI-friendly — ASCII-only, no-color mode for pipelines
  • Flexible filtering — include/exclude extensions, exclude directories, .gitignore-aware
  • Progress bar — with ETA for large scans
  • Zero config — auto-downloads scc binary on install, works out of the box

Quick Start

Global install:

npm i -g @cesarandreslopez/occ
occ

No-install usage:

npx @cesarandreslopez/occ docs/ reports/

From source:

git clone https://github.com/cesarandreslopez/occ.git && cd occ
npm install
npm run build
npm start

Usage

# Scan current directory
occ

# Scan specific directories
occ docs/ reports/

# Per-file breakdown
occ --by-file docs/

# JSON output
occ --format json docs/

# Extract document structure (heading hierarchy)
occ --structure docs/

# Structure as JSON
occ --structure --format json docs/

# Only specific formats
occ --include-ext pdf,docx docs/

# Skip code analysis
occ --no-code docs/

# CI-friendly (ASCII, no color)
occ --ci docs/

Example Output

-- Documents ---------------------------------------------------------------
  Format    Files    Words    Pages                  Details      Size
----------------------------------------------------------------------------
  Word         12   34,210      137              1,203 paras    1.2 MB
  PDF           8   22,540       64                             4.5 MB
  Excel         3                                12 sheets      890 KB
----------------------------------------------------------------------------
  Total        23   56,750      201              1,203 paras    6.5 MB

-- Code (via scc) ----------------------------------------------------------
  Language    Files    Lines   Blanks  Comments     Code
----------------------------------------------------------------------------
  JavaScript     15     2340      180       320     1840
  Python          8     1200       90       150      960
----------------------------------------------------------------------------
  Total          23     3540      270       470     2800

Scanned 23 documents (56,750 words, 201 pages) in 120ms

Structure Output (--structure)

-- Structure: report.docx --------------------------------------------------
1   Executive Summary
  1.1   Background ......................................... p.1
  1.2   Key Findings ....................................... p.1-2
2   Methodology
  2.1   Data Collection .................................... p.3
  2.2   Analysis Framework ................................. p.4
    2.2.1   Quantitative Methods ........................... p.4
    2.2.2   Qualitative Methods ............................ p.5
3   Results ................................................ p.6-8
4   Conclusions ............................................ p.9

4 sections, 10 nodes, max depth 3

Supported Formats

| Format | Extension | Metrics | Structure | |--------|-----------|---------|-----------| | Word | .docx | words, pages*, paragraphs | Yes | | PDF | .pdf | words, pages | Yes (with page mapping) | | Excel | .xlsx | sheets, rows, cells | — | | PowerPoint | .pptx | words, slides | Yes (slide headers) | | ODT | .odt | words, pages*, paragraphs | Yes (best-effort) | | ODS | .ods | sheets, rows, cells | — | | ODP | .odp | words, slides | Yes (slide headers) |

* Pages for Word/ODT are estimated at 250 words/page.

CLI Flags

| Flag | Description | Default | |------|-------------|---------| | --by-file / -f | Row per file | grouped by type | | --format <type> | tabular or json | tabular | | --structure | Extract and display document heading hierarchy | off | | --include-ext <exts> | Comma-separated extensions | all supported | | --exclude-ext <exts> | Comma-separated to skip | none | | --exclude-dir <dirs> | Directories to skip | node_modules,.git | | --no-gitignore | Disable .gitignore respect | enabled | | --sort <col> | Sort by: files, name, words, size | files | | --output <file> / -o | Write to file | stdout | | --ci | ASCII-only, no color | off | | --large-file-limit <mb> | Skip files over this size | 50 | | --no-code | Skip scc code analysis | off |

Documentation

Full documentation is available at cesarandreslopez.github.io/occ, including:

Why OCC?

Tools like scc, cloc, and tokei give you instant visibility into codebases — lines, languages, complexity. But most projects also contain Word documents, PDFs, spreadsheets, and presentations that are invisible to these tools. OCC fills that gap.

For Humans

  • Project audits — instantly see how much documentation lives alongside your code: total word counts, page counts, spreadsheet sizes, and presentation lengths
  • Tracking documentation growth — run OCC in CI to monitor how documentation scales over time, catch bloat early, or enforce minimums
  • Onboarding — new team members get a quick sense of a project's documentation footprint before diving in
  • Migration planning — when moving to a new platform, know exactly what you're dealing with across hundreds of files and formats

For AI Agents

  • Context budgeting — LLMs have finite context windows. OCC's word and page counts let agents estimate how much of a document set they can ingest before hitting token limits
  • Prioritization — an agent deciding which documents to read can use OCC's JSON output to rank files by size, word count, or type, focusing on the most relevant content first
  • RAG chunk mapping--structure --format json outputs heading trees with character offsets, enabling chunk-to-section mapping, scoped retrieval, and citation paths in RAG pipelines
  • Repository mapping — agents exploring an unfamiliar codebase can run occ --format json to build a structured inventory of all non-code content alongside scc code metrics
  • Pipeline integration — JSON output pipes directly into agent toolchains for automated document analysis, summarization, or compliance checking

How It Works

OCC is written in TypeScript and uses fast-glob for file discovery, dispatches to format-specific parsers (mammoth for DOCX, pdf-parse for PDF, SheetJS for XLSX, JSZip + officeparser for PPTX/ODF), aggregates metrics, and renders output via cli-table3. For code metrics, it shells out to a vendored scc binary (auto-downloaded during npm install, with PATH fallback).

For structure extraction (--structure), documents are first converted to markdown (mammoth + turndown for DOCX, pdf-parse with page markers for PDF), then headers are extracted and assembled into a tree with dotted section codes.

Contributing

Contributions are welcome! See CONTRIBUTING.md for setup instructions and guidelines.

License

MIT