@knaw-huc/text-annotation-segmenter
v0.1.0
Published
Utility functions to help render overlapping annotations in a text.
Downloads
109
Keywords
Readme
@knaw-huc/text-annotation-segmenter
Utility functions to help render overlapping annotations in a text.
Annotations on a text have a non-hierarchical nature, i.e., they can overlap:
Aa<bc>bb<cd>cc</bc>dd<cd>ee.However, HTML is hierarchical. How to display these kinds of annotations that do not live inside or next to each other, but cut across each other?
The segment function creates an array of segments: a flat, non-overlapping list where each segment links to both the text and all the annotations that apply. Each segment translates into a single DOM element. Elements linked to multiple overlapping annotations can now be decorated with their own styling, classes and callbacks.
A special case is the marker: an annotation of zero width marking a position in the text. Markers result in zero-width segments, also linking all annotations that start at, end at, or span across that position.
API
segment<T>(text, annotations): TextSegment<T>[]Split a text into TextSegments with char offsets to the text and a list of applying annotations.AnnotationSegment<T>Input list of objects linking annotations to the text using character offsets.TextSegment<T>Output list of segments with character offsets and the annotations that apply.groupSegments<T>(segments, predicate): Group<T>[]Group segments into higher-level units (e.g., words, sentences, entities) by collecting all segments that share a matching annotation.Group<T>Output group of segments matching the same predicate result.
Example
Given the text "abc" with two overlapping annotations:
text: abc
annotation ab: __
annotation bc: __import { segment } from "text-annotation-segmenter";
const text = 'abc';
const ab = {id: 'ab'};
const bc = {id: 'bc'};
const segments = segment(text, [
{begin: 0, end: 2, body: ab},
{begin: 1, end: 3, body: bc},
]);
expect(segments).toEqual([
{id: '0', begin: 0, end: 1, annotations: [ab]},
{id: '1', begin: 1, end: 2, annotations: [ab, bc]},
{id: '2', begin: 2, end: 3, annotations: [bc]},
]);More examples:
- For edge cases, see: segment.spec.ts.
- For benchmarks, see segment.bench.ts.
