@twocaretcat/tally-ts
v2.0.0
Published
A TypeScript word counting library. Count the number of characters, words, sentences, paragraphs, and lines in your text instantly with tally-ts.
Maintainers
Readme
👋 About
[!NOTE] We use the terms graphemes and characters interchangeably in this README, although technically we are counting Unicode grapheme clusters rather than Unicode characters.
tally-ts is a TypeScript library that uses modern APIs like Intl.Segmenter to count the number of characters,
words, paragraphs, and lines in the input. It can also show breakdowns for different types of characters like letters,
digits, spaces, punctuation, and symbols/special characters.
Features
- 🧮 View text metrics: Count the number of characters, words, sentences, paragraphs, and lines in your text.
- 📊 View character composition: View the number of spaces, digits, letters, punctuation, and symbols/special characters in the input.
- 🌍 Multilingual support: Uses
Intl.Segmenterfor accurate word and character segmentation across many languages and scripts. - 👨🏻💻 Open-source: Know how to code? Help make tally-ts better by contributing to the project on GitHub, or copy it and make your own version!
Use Cases
- 📚 Students & Educators: Check essay lengths and assignment limits quickly and accurately.
- ✍️ Writers & Bloggers: Track writing progress and optimize structure for readability.
- 📄 Legal & Business Professionals: Ensure documents meet required character or word counts.
- 📱 Social Media Managers: Stay within platform limits for tweets, posts, and bios.
- 🧪 Developers & Testers: Analyze input strings and view line counts for code and data.
- 🌐 SEO Specialists: Optimize content length for meta descriptions, headings, and body text.
📦 Installation
[!TIP] JSR has some advantages if you're using TypeScript or Deno:
- It ships typed, modern ESM code by default
- No need for separate type declarations
- Faster, leaner installs without extraneous files
You can use JSR with your favorite package manager.
This package is available on both JSR and npm. Install it using your preferred package manager:
deno add jsr:@twocaretcat/tally-ts # JSR (recommended)deno add npm:@twocaretcat/tally-ts # npmbunx jsr add @twocaretcat/tally-ts # JSRbun add @twocaretcat/tally-ts # npmnpx jsr add @twocaretcat/tally-ts # JSRnpm install @twocaretcat/tally-ts # npmpnpm i jsr:@twocaretcat/tally-ts # JSRpnpm add @twocaretcat/tally-ts # npmyarn add jsr:@twocaretcat/tally-ts # JSRyarn add @twocaretcat/tally-ts # npmvlt install jsr:@twocaretcat/tally-ts # JSRvlt install @twocaretcat/tally-ts # npm🕹️ Usage
[!WARNING] Some Caveats:
- This library relies on the
Intl.SegmenterAPI (or a compatible replacement) to split the input into graphemes, words, and sentences. Thus, the exact behavior and reproducibility of output counts depend on the JavaScript runtime used. Results may vary between browsers, Node versions, or polyfills.- There may be slight variations between the counts generated by tally-ts and other libraries due to differences in how they are implemented.
- Languages like Chinese that do not have clearly defined words may have inaccurate word counts due to the segmentation algorithm used. If you need consistent or linguistically precise segmentation for these languages, use a dedicated tool instead. For Chinese, see Jieba, Stanford Segmenter, or pkuseg.
Getting Started
To get started, import the Tally class and create a new instance of it. I recommend setting the locale like so:
import { Tally } from 'tally-ts';
const tally = new Tally({ locales: 'en' });Counting Sentences & Words
Use individual methods to get counts for sentences and words:
tally.countWords('How are you?');
// → { total: 3 }
tally.countSentences('¿Como estas?');
// → { total: 1 }Counting Graphemes
You can get the number of graphemes (characters) the same way:
tally.countGraphemes('Hello world!');
// → {
// total: 12,
// by: {
// spaces: { total: 1 },
// letters: { total: 10 },
// digits: { total: 0 },
// punctuation: { total: 1 },
// symbols: { total: 0 },
// },
// related: {
// paragraphs: { total: 1 },
// lines: { total: 1 },
// }
// }This method has some extra features. You can access breakdown counts of the graphemes by type:
const result = tally.countGraphemes('Hi there!');
console.debug(result.by);
// → {
// spaces: { total: 1 },
// letters: { total: 7 },
// digits: { total: 0 },
// punctuation: { total: 1 },
// symbols: { total: 0 }
// }As well as related features that were computed at the same time:
console.debug(result.related);
// → {
// paragraphs: { total: 1 },
// lines: { total: 1 }
// }Kitchen Sink
To get all counts at once, use the countAll() method:
const all = tally.countAll(`Hello world!\n\nThis is a test.`);
console.debug(all);
/* →
{
graphemes: {
total: 27,
by: {
spaces: { total: 4 },
letters: { total: 20 },
digits: { total: 0 },
punctuation: { total: 1 },
symbols: { total: 0 },
},
related: {
paragraphs: { total: 2 },
lines: { total: 3 },
}
},
words: { total: 5 },
sentences: { total: 2 },
paragraphs: { total: 2 },
lines: { total: 3 }
}
*/🤖 Advanced Usage
Setting a Locale
You can pass a locale (or an array of locales) via the locales option. This value is forwarded directly to
Intl.Segmenter and determines how the input string is split into graphemes, words, and sentences:
// Single locale
new Tally({ locales: 'en' });
// Multiple locales (preference order)
new Tally({ locales: ['fr-CA', 'fr'] });If locales is not provided, Intl.Segmenter will resolve the runtime's best locale automatically.
Getting the Resolved Locale
[!NOTE] Even if you provide a locale, the resolved locale may be different if
Intl.Segmenterdoesn't support the one you've provided. In this case, another locale may be picked automatically.
If you didn't provide a locale, you might want to know which locale was actually used by Intl.Segmenter. You can get
it by like so:
const tally = new Tally();
console.debug(tally.getResolvedLocale());
// → "en-US"Using a Custom Segmenter Implementation
If your environment doesn't support Intl.Segmenter (or the exact locale you want to use), you can provide a custom
implementation or polyfill instead:
new Tally({ Segmenter: SomeSegmenter });This is also useful if you want to get consistent results across different runtimes. If you don't provide a segmenter,
we will try to use the native Intl.Segmenter implementation.
Internally, we will call the constructor of Segmenter to create segmenters of different granularities.
⚠️ Usage (legacy)
[!WARNING] Deprecated: The legacy implementation is no longer maintained and it has limited support for languages other than English. Use the class-based
TallyAPI instead if possible.
The legacy implementation exposes a single function, getCounts(), that can be used to get the number of characters,
words, sentences, paragraphs, lines, spaces, letters, digits, and symbols at once:
import { getCounts } from 'tally-ts/legacy';
const counts = await getCounts(`Hello world!\n\nThis is a test.`);
console.debug(counts);
/* →
{
characters: 27,
words: 5,
sentences: 2,
paragraphs: 2,
lines: 3,
spaces: 4,
letters: 20,
digits: 0,
symbols: 1
}
*/You can provide an optional locale to improve segmentation accuracy for non-English text:
const counts = await getCounts(`Hello world!\n\nThis is a test.`, 'de-DE');Note that the this only affects the segmentation of characters. If your language doesn't use spaces to separate words or
uses letters outside of the ASCII range, for example, you will still not get accurate results. For multilingual
counting, use the class-based Tally API instead.
🧠 Implementation Details
[!NOTE] In this section, we refer to words, graphemes, spaces, lines, etc. as tokens for simplicity.
Here's some more details about how tally-ts does its magic.
Algorithm
The class-based implementation uses Intl.Segmenter for locale-aware text segmentation at three granularities:
- grapheme with
countGraphemes() - word with
countWords() - sentence with
countSentences()
Each segmenter operates independently, and the results are combined when using countAll().
The counting functions are implemented as single-pass parsers for performance reasons. Each grapheme in the input string
is classified using Unicode General Categories (e.g., \p{L}, \p{Nd}, \p{Zs}), providing accurate results for all
languages and scripts supported by the platform’s ICU data.
Here’s how counts are determined for each token type:
| Count Type | Description |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| grapheme | A user-perceived character as defined by Intl.Segmenter with granularity: "grapheme". Multi-codepoint characters (e.g., emojis, accented letters, combined scripts) are counted as one. Examples: a, é, 😊, 👩🚀, 貓. |
| word | Counted using Intl.Segmenter with granularity: "word". Each segment where isWordLike is true increments the word count. This is locale-aware and works for non-Latin scripts (e.g., Chinese, Arabic). Examples: "Hello world" → 2, "你好世界" → 1. |
| sentence | Counted using Intl.Segmenter with granularity: "sentence". Each non-empty segment increments the sentence count. Works for punctuation and locale rules (e.g., handling ¿ and !). |
| space | A grapheme that matches the Unicode Space Separator category (\p{Zs}). Includes ordinary spaces and non-breaking spaces. Examples: ' ', \u00A0. |
| letter | A grapheme in the Unicode Letter category (\p{L}). Includes characters from all alphabets. Examples: A, ß, д, あ, م. |
| digit | A grapheme in the Unicode Decimal Digit category (\p{Nd}). Works across scripts (e.g., Arabic-Indic, Devanagari). Examples: 0, ९, ٢. |
| punctuation | A grapheme in the Unicode Punctuation category (\p{P}). Examples: ., ,, !, ¿, “”. |
| symbol | A grapheme in the Unicode Symbol category (\p{S}). Includes math, currency, emoji, and miscellaneous symbols. Examples: +, $, ©, 🔥, ™. |
| line | Determined by newline graphemes ('\n'). Each newline increments the line count. A final line is counted even if the text doesn’t end with a newline, unless the input is empty, in which case the line count is 0. |
| paragraph | A non-empty, non-newline string, separated from other paragraphs by one or more newline characters. A trailing paragraph is counted even if the text doesn’t end with a newline, unless the input is empty, in which case the paragraph count is 0. Example: "Hello\n\nWorld" → 2 paragraphs. |
Legacy
The legacy implementation exposes a single function, getCounts(), that can be used to get the number of characters,
words, sentences, paragraphs, lines, spaces, letters, digits, and symbols at once.
Algorithm
The counting function is implemented as a single-pass parser for performance reasons. State transitions (sentence terminator → letter, letter → space, etc.) are used to determine when to increment the counts for each token type.
The following characters are used to separate tokens:
- Space:
' ' - Newline:
\n - End Mark:
.,!,?
End of Input can also be considered a separator because words, sentences, paragraphs, and lines at the end of the
input are counted even if not specifically terminated. For example, Something is counted as a word, sentence,
paragraph, and line.
Here is an overview of how we determine the counts for each token type:
| Count Type | Description |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| character | A Unicode grapheme cluster (user-perceived character), as determined by Intl.Segmenter. Using this method, Emojis and other multi-codepoint characters are counted as a single character. Examples: a, 2, !, 🔥, 貓 |
| word | A contiguous sequence of one or more letters or digits followed by a space, end mark, or newline. Symbols by themselves are not considered words. Examples: space, Whoa!, newline\n, 42. |
| sentence | A contiguous sequence of one or more words followed by an end mark. Example: Hello, world!, 20 93.. |
| paragraph | A contiguous sequence of one or more sentences followed by a newline. Examples: The quick brown cat jumps over the lazy dog\n, Hello world! Bye world!\n, 42\n. |
| space | A literal space character (' '). Other whitespace (ex. tabs, newlines) are not included. |
| letter | A character in the ASCII ranges A–Z or a–z. Examples: A, j, z. |
| digit | A character in the ASCII range 0-9. Examples: 0, 5, 9. |
| symbol | A non-letter, non-digit, non-space, non-newline character. This includes emojis, symbols, punctuation, and most whitespace. Examples: ,, %, #, 😊, 貓, \t. |
| line | A literal newline character (\n). |
🤝 Contributing
Pull requests, bug reports, feature requests, and other kinds of contributions are welcome. See the contribution guide for more details.
🧾 License
Copyright © 2025 John Goodliff (@twocaretcat).
This project is licensed under the MIT license. See the license for more details.
🖇️ Related
Recommended
Other projects you might like:
- 👤 Tally Chrome Extension: A Chrome extension to easily count the number of words, characters, and paragraphs on any site
Used By
Notable projects that depend on this one:
- 👤 Tally: A free online tool to count the number of characters, words, paragraphs, and lines in your text. Tally uses this library to compute counts
Alternatives
Similar projects you might want to use instead:
- 🌐 Alfaaz: An alternative multilingual word counting library with less features, but faster execution
💕 Funding
Find this project useful? Sponsoring me will help me cover costs and commit more time to open-source.
If you can't donate but still want to contribute, don't worry. There are many other ways to help out, like:
- 📢 reporting (submitting feature requests & bug reports)
- 👨💻 coding (implementing features & fixing bugs)
- 📝 writing (documenting & translating)
- 💬 spreading the word
- ⭐ starring the project
I appreciate the support!
