paraseg
v2.1.0
Published
Like `Intl.Segmenter`, but for paragraphs instead of graphemes/words/sentences
Downloads
7
Readme
paraseg
Like Intl.Segmenter, but for paragraphs instead of graphemes/words/sentences.
- How do I install it?
- How do I use it?
- Does it handle both Unix and Windows line endings?
- How do I specify custom paragraph separators?
- How do I trim leading and trailing spaces from segmented paragraphs?
- Can I save memory by only returning offset and length data for each segment?
- Is there a change log?
- How do I set up the dev environment?
- What versions of Node.js does it support?
- What license is it released under?
How do I install it?
If you're using npm:
npm i paraseg --saveOr if you just want the git repo:
git clone [email protected]:philbooth/paraseg.gitHow do I use it?
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = `The quick brown fox
jumps over the lazy dog.
How now brown cow?`;
const segmenter = new ParagraphSegmenter();
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(paragraphs[0].segment, 'The quick brown fox jumps\nover the lazy dog.\n\n');
assert.equal(paragraphs[1].segment, 'How now brown cow?');Does it work with both Unix and Windows line endings?
Yes.
How do I specify custom paragraph separators?
Pass the separators option to the constructor.
separators is an array of substring candidates
to be matched as segmentation points.
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = `The quick brown fox jumps over the lazy dog.
How now brown cow?`;
const segmenter = new ParagraphSegmenter({
separators: ['\n'],
});
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(paragraphs[0].segment, 'The quick brown fox jumps over the lazy dog.\n');
assert.equal(paragraphs[1].segment, 'How now brown cow?');How do I trim leading and trailing spaces from segmented paragraphs?
Pass the trim option to the constructor.
trim is a boolean,
set it to true if you want to remove spaces
from the start and end of each paragraph.
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = ` The quick brown fox
jumps over the lazy dog.
How now brown cow? `;
const segmenter = new ParagraphSegmenter({ trim: true });
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(paragraphs[0].segment, 'The quick brown fox\n jumps over the lazy dog.');
assert.equal(paragraphs[1].segment, 'How now brown cow?');Can I save memory by only returning offset and length data for each segment?
Yes,
pass the slim: true option to the constructor.
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = `The quick brown fox
jumps over the lazy dog.
How now brown cow?`;
const segmenter = new ParagraphSegmenter({ slim: true });
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(Object.keys(paragraphs[0]).includes('segment'), false);
assert.equal(Object.keys(paragraphs[1]).includes('segment'), false);
assert.equal(paragraphs[0].offset, 0);
assert.equal(paragraphs[1].offset, text.indexOf('How now brown cow?'));
assert.equal(paragraphs[0].length, paragraphs[1].offset);
assert.equal(paragraphs[1].length, text.length - paragraphs[0].length);Note that slim: true is mutually exclusive
with the trim option.
Is there a change log?
Yes.
How do I set up the dev environment?
To compile TypeScript:
make buildTo lint the code:
make lintTo run the tests:
make testWhat versions of Node does it support?
Node versions 20 or greater are supported.
What license is it released under?
MIT.
