segmenter
v2.0.1
Published
Work with grapheme, words, and sentences with small, simple, and fast API using Intl.Segmenter
Maintainers
Readme
segmenter
Work with grapheme, words, and sentences with small, simple, and fast API using
Intl.Segmenter
Install
npm install segmenterWhy
Intl.Segmenteris supported in all major browsers and94%of users have it available — it's time for adoption.- If you have a use case other than iterating over all graphemes/words/sentences in a text, then
Intl.Segmentermight be a little hard to work with. - In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji
👨🔧️is a single grapheme but consists of 6 characters.forloop will make 6 iterations,for oflooping👨🔧️will make 4 iterations — it's confusing, just use graphemes. - Before
Intl.Segmenter, working with graphemes required libraries likegraphemerthat is94KBin size.
Usage
import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";
graphemeAt("👨🔧️ the fixer", 3); // 👨🔧️
graphemeRangeAt("👨🔧️ the fixer", 3); // { start: 0, end: 6 }
wordAt("hello-world"); // "hello"
wordRangeAt("hello-world"); // { start: 0, end: 5 }API
Graphemes
graphemeAt(string: string, position: number): string | undefined
Get the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.
graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start and end positions of the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.
graphemes(string: string): string[]
Get all graphemes in the string as Array.
Words
wordAt(string: string, position: number): string | undefined
Get the word at position in string. Returns undefined if position is out of bounds or string is empty.
wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start and end positions of the word at position in string. Returns undefined if position is out of bounds or string is empty.
words(string: string): string[]
Get all words in the string as Array.
Sentences
Note: Intl.Segmenter doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office will be split into two sentences.
sentenceAt(string: string, position: number): string | undefined
Get the sentence at position in string. Returns undefined if position is out of bounds or string is empty.
sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start and end positions of the sentence at position in string. Returns undefined if position is out of bounds or string is empty.
sentences(string: string): string[]
Get all sentences in the string as Array.
