npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dev-pi2pie/word-counter

v0.0.9

Published

Locale-aware word counting powered by the Web API [`Intl.Segmenter`](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter). The script automatically detects the primary writing system for each portion of the input, seg

Readme

Word Counter

Locale-aware word counting powered by the Web API Intl.Segmenter. The script automatically detects the primary writing system for each portion of the input, segments the text with the matching locale, and reports word totals per language.

How It Works

  • The runtime inspects each character's Unicode script to infer its likely locale (e.g., und-Latn, zh-Hans, ja).
  • Adjacent characters that share the same locale are grouped into a chunk.
  • Each chunk is counted with Intl.Segmenter at granularity: "word", caching segmenters to avoid re-instantiation.
  • Per-locale counts are summed into a overall total and printed to stdout.

Installation

For Development

Clone the repository and set up locally:

git clone https://github.com/dev-pi2pie/word-counter.git
cd word-counter
bun install
bun run build
npm link

After linking, you can use the word-counter command globally:

word-counter "Hello 世界 안녕"

To use the linked package inside another project:

npm link @dev-pi2pie/word-counter

To uninstall the global link:

npm unlink --global @dev-pi2pie/word-counter

From npm Registry (npmjs.com)

npm install -g @dev-pi2pie/word-counter@latest

Usage

Once installed (via npm link or the npm registry), you can use the CLI directly:

word-counter "Hello 世界 안녕"

Alternatively, run the built CLI with Node:

node dist/esm/bin.mjs "Hello 世界 안녕"

You can also pipe text:

echo "こんにちは world مرحبا" | word-counter

Hint a locale for ambiguous Latin text (ASCII-heavy content):

word-counter --latin-locale en "Hello world"

Collect non-word segments (emoji, symbols, punctuation):

word-counter --non-words "Hi 👋, world!"

When enabled, total includes words + non-words (emoji, symbols, punctuation).

Or read from a file:

word-counter --path ./fixtures/sample.txt

Library Usage

The package exports can be used after installing from the npm registry or linking locally with npm link.

ESM

import wordCounter, {
  countCharsForLocale,
  countWordsForLocale,
  countSections,
  parseMarkdown,
  segmentTextByLocale,
  showSingularOrPluralWord,
} from "@dev-pi2pie/word-counter";

wordCounter("Hello world", { latinLocaleHint: "en" });
wordCounter("Hi 👋, world!", { nonWords: true });
wordCounter("Hi 👋, world!", { mode: "char", nonWords: true });
wordCounter("Hi\tthere\n", { nonWords: true, includeWhitespace: true });
countCharsForLocale("👋", "en");

Note: includeWhitespace only affects results when nonWords: true is enabled.

Sample output (with nonWords: true and includeWhitespace: true):

{
  "total": 4,
  "counts": { "words": 2, "nonWords": 2, "total": 4 },
  "breakdown": {
    "mode": "chunk",
    "items": [
      {
        // ...
        "words": 2,
        "nonWords": {
          "emoji": [],
          "symbols": [],
          "punctuation": [],
          "counts": { "emoji": 0, "symbols": 0, "punctuation": 0, "whitespace": 2 },
          "whitespace": { "spaces": 0, "tabs": 1, "newlines": 1, "other": 0 }
        }
      }
    ]
  }
}

CJS

const wordCounter = require("@dev-pi2pie/word-counter");
const {
  countCharsForLocale,
  countWordsForLocale,
  countSections,
  parseMarkdown,
  segmentTextByLocale,
  showSingularOrPluralWord,
} = wordCounter;

wordCounter("Hello world", { latinLocaleHint: "en" });
wordCounter("Hi 👋, world!", { nonWords: true });
wordCounter("Hi 👋, world!", { mode: "char", nonWords: true });
wordCounter("Hi\tthere\n", { nonWords: true, includeWhitespace: true });
countCharsForLocale("👋", "en");

Note: includeWhitespace only affects results when nonWords: true is enabled.

Sample output (with nonWords: true and includeWhitespace: true):

{
  "total": 4,
  "counts": { "words": 2, "nonWords": 2, "total": 4 },
  "breakdown": {
    "mode": "chunk",
    "items": [
      {
        // ...
        "words": 2,
        "nonWords": {
          "emoji": [],
          "symbols": [],
          "punctuation": [],
          "counts": { "emoji": 0, "symbols": 0, "punctuation": 0, "whitespace": 2 },
          "whitespace": { "spaces": 0, "tabs": 1, "newlines": 1, "other": 0 }
        }
      }
    ]
  }
}

Export Summary

Core API

| Export | Kind | Notes | | --------------------- | -------- | -------------------------------------------------- | | default | function | wordCounter(text, options?) -> WordCounterResult | | wordCounter | function | Alias of the default export. | | countCharsForLocale | function | Low-level helper for per-locale char counts. | | countWordsForLocale | function | Low-level helper for per-locale counts. | | segmentTextByLocale | function | Low-level helper for locale-aware segmentation. |

Markdown Helpers

| Export | Kind | Notes | | --------------- | -------- | --------------------------------------------- | | parseMarkdown | function | Parses Markdown and detects frontmatter. | | countSections | function | Counts words by frontmatter/content sections. |

Utility Helpers

| Export | Kind | Notes | | -------------------------- | -------- | ------------------------------ | | showSingularOrPluralWord | function | Formats singular/plural words. |

Types

| Export | Kind | Notes | | ---------------------- | ---- | ------------------------------------------------- | | WordCounterOptions | type | Options for the wordCounter function. | | WordCounterResult | type | Returned by wordCounter. | | WordCounterBreakdown | type | Breakdown payload in WordCounterResult. | | WordCounterMode | type | "chunk" \| "segments" \| "collector" \| "char". | | NonWordCollection | type | Non-word segments + counts payload. |

Display Modes

Choose a breakdown style with --mode (or -m):

  • chunk (default) – list each contiguous locale block in order of appearance.
  • segments – show the actual wordlike segments used for counting.
  • collector – aggregate counts per locale regardless of text position.
  • char – count grapheme clusters (user-perceived characters) per locale.

Aliases are normalized for CLI + API:

  • chunk, chunks
  • segments, segment, seg
  • collector, collect, colle
  • char, chars, character, characters

Examples:

# chunk mode (default)
word-counter "飛鳥 bird 貓 cat; how do you do?"

# show captured segments
word-counter --mode segments "飛鳥 bird 貓 cat; how do you do?"

# aggregate per locale
word-counter -m collector "飛鳥 bird 貓 cat; how do you do?"

# grapheme-aware character count
word-counter -m char "Hi 👋, world!"

Section Modes (Frontmatter)

Use --section to control which parts of a markdown document are counted:

  • all (default) – count the whole file (fast path, no section split).
  • split – count frontmatter and content separately.
  • frontmatter – count frontmatter only.
  • content – count content only.
  • per-key – count frontmatter per key (frontmatter only).
  • split-per-key – per-key frontmatter counts plus a content total.

Supported frontmatter formats:

  • YAML fenced with ---
  • TOML fenced with +++
  • JSON fenced with ;;; or a top-of-file JSON object ({ ... })

Examples:

word-counter --section split -p examples/yaml-basic.md
word-counter --section per-key -p examples/yaml-basic.md
word-counter --section split-per-key -p examples/yaml-basic.md

JSON output includes a source field (frontmatter or content) to avoid key collisions:

word-counter --section split-per-key --format json -p examples/yaml-content-key.md

Example (trimmed):

{
  "section": "split-per-key",
  "frontmatterType": "yaml",
  "total": 7,
  "items": [
    { "name": "content", "source": "frontmatter", "result": { "total": 3 } },
    { "name": "content", "source": "content", "result": { "total": 4 } }
  ]
}

Output Formats

Select how results are printed with --format:

  • standard (default) – total plus per-locale breakdown.
  • raw – only the total count (single number).
  • json – machine-readable output; add --pretty for indentation.

Examples:

word-counter --format raw "Hello world"
word-counter --format json --pretty "Hello world"

Non-Word Collection

Use --non-words (or nonWords: true in the API) to collect emoji, symbols, and punctuation as separate categories. When enabled, the total includes both words and non-words.

word-counter --non-words "Hi 👋, world!"

Example: total = words + emoji + symbols + punctuation when enabled. Standard output labels this as Total count to reflect the combined total; --format raw still prints a single number.

Include whitespace-like characters in the non-words bucket (API: includeWhitespace: true):

word-counter --include-whitespace "Hi\tthere\n"
word-counter --misc "Hi\tthere\n"

In the CLI, --include-whitespace implies with --non-words (same behavior as --misc). --non-words alone does not include whitespace. When enabled, whitespace counts appear under nonWords.whitespace, and total = words + nonWords (emoji + symbols + punctuation + whitespace). JSON output also includes top-level counts when nonWords is enabled. See docs/schemas/whitespace-categories.md for how whitespace is categorized.

Example JSON (trimmed):

{
  "total": 5,
  "counts": { "words": 2, "nonWords": 3, "total": 5 },
  "breakdown": {
    "mode": "chunk",
    "items": [
      {
        "locale": "und-Latn",
        "words": 2,
        "nonWords": {
          "counts": { "emoji": 0, "symbols": 0, "punctuation": 0, "whitespace": 3 },
          "whitespace": { "spaces": 1, "tabs": 1, "newlines": 1, "other": 0 }
        }
      }
    ]
  }
}

[!Note] Text-default symbols (e.g. ©) count as symbols unless explicitly emoji-presented (e.g. ©️ with VS16).

Locale Detection Notes (Migration)

  • Ambiguous Latin text now uses und-Latn instead of defaulting to en.
  • Use --mode chunk/--mode segments or --format json to see the exact locale assigned to each chunk.
  • Regex/script-only detection cannot reliably identify English vs. other Latin-script languages; 100% certainty requires explicit metadata (document language tags, user-provided locale, headers) or a language-ID model.
  • Provide a hint with --latin-locale <locale> or latinLocaleHint when you know the intended Latin language.

Testing

Run the build before tests so the CJS interop test can load the emitted dist/cjs/index.cjs bundle:

bun run build
bun test

Sample Inputs

Try the following mixed-locale phrases to see how detection behaves:

  • "Hello world 你好世界"
  • "Bonjour le monde こんにちは 세계"
  • "¡Hola! مرحبا Hello"

Each run prints the total word count plus a per-locale breakdown, helping you understand how multilingual text is segmented.

License

This project is licensed under the MIT License — see the LICENSE file for details.