sentencex
v1.0.16
Published
sentence segmentation library
Maintainers
Readme
sentencex
Node.js bindings for SentenceX, a high-performance multilingual sentence segmentation library written in Rust.
Installation
npm install sentencexUsage
import { segment } from 'sentencex';
const sentences = segment("en", "This is first sentence. This is another one.");
console.log(sentences);
// [ 'This is first sentence. ', 'This is another one.' ]Get Sentence Boundaries
For detailed boundary information:
import { get_sentence_boundaries } from 'sentencex';
const boundaries = get_sentence_boundaries("en", "This is first sentence. This is another one.");
console.log(boundaries);
// [ { start_index: 0, end_index: 24, text: 'This is first sentence. ' }, ... ]CommonJS
const { segment } = require('sentencex');API
segment(languageCode: string, text: string): string[]— Segment text into sentencesget_sentence_boundaries(languageCode: string, text: string): SentenceBoundary[]— Get detailed boundary information
Each SentenceBoundary object contains:
start_index: Character position where the sentence startsend_index: Character position where the sentence endstext: The sentence textboundary_symbol: Punctuation mark that ended the sentence (if any)is_paragraph_break: Whether this boundary represents a paragraph break
Language Support
Supports ~244 languages with automatic fallback chains. See the upstream documentation for details.
Performance
See upstream benchmarks for comparison with other libraries.
