npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

booknlp-ts

v1.1.0

Published

TypeScript port of entity, event, and supersense extraction pipelines of BookNLP (spaCy extraction excluded)

Downloads

975

Readme

BookNLP TypeScript Library

Browser-compatible TypeScript implementation of BookNLP for client-side NLP inference on long documents. This library provides complete entity recognition, supersense tagging, and event detection using pre-converted ONNX models running entirely in the browser via WebAssembly.

Quick Start

Installation

npm install booknlp-ts

Usage

import { BookNLP, SpaCyContext } from 'booknlp-ts';

const config = {
  // Optional: uses Hugging Face model by default
  // modelPath: 'Terraa/entities_google_bert_uncased_L-4_H-256_A-4-v1.0-ONNX',
  pipeline: ['entity', 'supersense', 'event'],
  // Optional: specify execution providers (default: ['wasm'])
  executionProviders: ['wasm'], // or ['webgl'], ['webgpu']
};

const booknlp = new BookNLP();
await booknlp.initialize(config);

// SpaCyContext must be provided (from spaCy preprocessing)
const spaCyContext: SpaCyContext = {
  tokens: [
    {
      text: 'Harry',
      startByte: 0,
      endByte: 5,
      pos: 'PROPN',
      finePos: 'NNP',
      lemma: 'Harry',
      deprel: 'nsubj',
      dephead: 1,
      morph: {},
      likeNum: false,
      isStop: false,
      sentenceId: 0,
      withinSentenceId: 0,
    },
    // ... more tokens
  ],
  sentences: [{ start: 0, end: 10 }],
};

const result = await booknlp.process(spaCyContext);
console.log('Entities:', result.entities);
console.log('Supersense:', result.supersense);
console.log('Events:', result.tokens.filter(t => t.event));

Browser Deployment

Bundled Resources

The library automatically bundles resource files (entity tagset, supersense tagset, WordNet) from the source repository. No external network requests needed for resources.

WASM Configuration

For custom WASM paths (advanced usage):

const config = {
  pipeline: ['entity'],
  wasmPaths: {
    'ort-wasm.wasm': '/custom/path/to/ort-wasm.wasm',
    'ort-wasm-simd.wasm': '/custom/path/to/ort-wasm-simd.wasm',
  },
};

External Resource URLs

If you prefer to host resources externally:

const config = {
  pipeline: ['entity', 'supersense'],
};

Required Input: SpaCyContext

The TypeScript implementation requires pre-processed input from spaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Your text here")

spacy_context = {
    "tokens": [
        {
            "text": token.text,
            "startByte": token.idx,
            "endByte": token.idx + len(token.text),
            "pos": token.pos_,
            "finePos": token.tag_,
            "lemma": token.lemma_,
            "deprel": token.dep_,
            "dephead": token.head.i,
            "morph": {k: v for k, v in token.morph.to_dict().items()},
            "likeNum": token.like_num,
            "isStop": token.is_stop,
            "sentenceId": token.sent.start,
            "withinSentenceId": token.i - token.sent.start
        }
        for token in doc
    ],
    "sentences": [
        {"start": sent.start, "end": sent.end}
        for sent in doc.sents
    ]
}

Validation Results

The validation suite compares Python and TypeScript outputs on identical input. Expected results:

| Metric | Status | Notes | | ----------------- | ---------------- | --------------------------- | | Token count | ✅ Exact match | All tokens preserved | | Token text | ✅ Exact match | Character-level accuracy | | Entity spans | ✅ Match expected | Boundaries align | | Entity categories | ✅ Match expected | PER, LOC, FAC, etc. | | Supersense spans | ✅ Match expected | Annotation boundaries align | | Event markers | ✅ Match expected | Token-level event flags |

Running Validation

cd validation
./run_validation.sh

This automatically:

  1. Checks dependencies
  2. Builds TypeScript
  3. Runs Python BookNLP (generates baseline)
  4. Runs TypeScript BookNLP (with same input)
  5. Compares outputs (reports mismatches)

Known Limitations

  1. No automatic model downloading: ONNX model must be provided
  2. Requires SpaCy preprocessing: Cannot process raw text directly
  3. CPU-focused: CUDA support exists but not extensively tested
  4. No quote/coreference handling: Future enhancement

Future Enhancements

  • [ ] Direct text input (integrate spaCy in TypeScript)
  • [ ] WebAssembly optimization for browser use
  • [ ] Quote and coreference chain extraction
  • [ ] Big model variant support
  • [ ] Streaming/incremental processing for very long documents
  • [ ] GPU optimization and multi-GPU support

Comparison: Python vs TypeScript

| Feature | Python | TypeScript (Browser) | Notes | | ------------------- | ------------ | -------------------- | --------------------------- | | Entity recognition | ✅ | ✅ | Equivalent | | Supersense tagging | ✅ | ✅ | Equivalent | | Event detection | ✅ | ✅ | Equivalent | | ONNX inference | ✅ | ✅ | Equivalent | | SpaCy preprocessing | ✅ Integrated | ⚠️ External | Requires external spaCy | | Raw text input | ✅ | ❌ | Requires SpaCy context | | Model conversion | ✅ Native | ⚠️ Via Python | ONNX export | | Deployment | Server-side | Browser/Client-side | No server required | | GPU acceleration | CUDA | WebGL/WebGPU | Different acceleration tech |

Key Features

Type-Safe Interfaces

All data structures have complete TypeScript type definitions with validation:

interface SpaCyToken {
  text: string;
  startByte: number;
  endByte: number;
  pos: string;          // Coarse POS (NOUN, VERB, etc.)
  finePos: string;      // Fine POS (NN, VBD, etc.)
  lemma: string;
  deprel: string;       // Dependency relation
  dephead: number;      // Head token index
  morph: Record<string, string>;
  likeNum: boolean;
  isStop: boolean;
  sentenceId: number;
  withinSentenceId: number;
}

interface EntityAnnotation {
  startToken: number;
  endToken: number;
  cat: string;          // PER, LOC, FAC, GPE, ORG, VEH
  text: string;
  prop: string;         // PROP, NOM, PRON
}

type SupersenseAnnotation = [number, number, string, string];
// [startToken, endToken, category, text]

ONNX Inference

Complete integration with ONNX Runtime:

// Automatic tensor creation with correct shapes and types
const predictions = await controller.predict(
  inputIds,           // int64[batch, seq_len]
  attentionMask,      // int64[batch, seq_len]
  transforms,         // float32[batch, seq_len, seq_len]
  matrix1,            // float32[batch, seq_len, seq_len]
  matrix2,            // float32[batch, seq_len, seq_len]
  wn,                 // int64[batch, seq_len]
  seqLengths,         // int64[batch]
  doEntity,
  doSupersense,
  doEvent
);

The ONNX model outputs final predictions (already CRF-decoded), not logits or emissions. No Viterbi decoding needed in TypeScript.

Entity Recognition

Hierarchical 3-layer entity detection with proper BIO tag fixing:

// Automatically handles:
// - Invalid BIO sequences (I-PER without B-PER)
// - Entity type classification (PROP_PER → PER)
// - Hierarchical merging (3 LSTM layers)
// - Overlapping entity resolution

const entities = await tagger.tag(tokens, spaCyTokens, true, false, false);
// Returns: EntityAnnotation[] with startToken, endToken, cat, text, prop

Supersense Tagging

WordNet-based semantic annotation:

// Uses WordNet first sense mappings
// Categories: noun.person, noun.location, verb.communication, etc.
const supersense = await tagger.tag(tokens, spaCyTokens, false, true, false);
// Returns: [startToken, endToken, category, text][]

Event Detection

Token-level event markers:

const events = await tagger.tag(tokens, spaCyTokens, false, false, true);
// Returns: Set<tokenId> of tokens that are events

Architecture

ONNX Model Architecture

Data Flow

SpaCy Context (input)
    ↓
Token Conversion
    ↓
Entity Tagger
    ├─→ Tokenization (with [CAP] tokens)
    ├─→ Transform Matrix Creation
    ├─→ WordNet Sense Lookup
    ├─→ ONNX Inference (predictions)
    ├─→ Postprocessing (BIO fixing)
    └─→ Entity Extraction
    ↓
BookNLP Result (output)

Configuration

BookNLPConfig

interface BookNLPConfig {
  modelPath?: string;            // Optional: Hugging Face repo ID or URL
                                 // Default: 'Terraa/entities_google_bert_uncased_L-4_H-256_A-4-v1.0-ONNX'
  pipeline: string[];            // ['entity', 'supersense', 'event']
  verbose?: boolean;             // Logging verbosity
  executionProviders?: ExecutionProvider[];  // ['wasm', 'webgl', 'webgpu']
  wasmPaths?: string | Record<string, string>;  // Custom WASM paths
}

Execution Providers

Choose the best backend for your deployment:

  • wasm (default): Universal compatibility, works in all browsers
  • webgl: GPU acceleration via WebGL (faster inference)
  • webgpu: Next-gen GPU acceleration (Chrome/Edge 113+, best performance)
const config = {
  pipeline: ['entity'],
  executionProviders: ['webgpu', 'wasm'], // Try WebGPU, fallback to WASM
};

Model Loading Options

Option 1: Use Hugging Face (Automatic Download, Default)

const config = {
  pipeline: ['entity', 'supersense'],
};
// Automatically downloads from Hugging Face and caches in browser

Option 2: Specify Hugging Face Repository

const config = {
  modelPath: 'Terraa/entities_google_bert_uncased_L-4_H-256_A-4-v1.0-ONNX',
  pipeline: ['entity', 'supersense'],
};

Option 3: Use Custom URL

const config = {
  modelPath: 'https://your-cdn.com/model.onnx',
  pipeline: ['entity', 'supersense'],
};

Input Requirements

SpaCy Preprocessing

The TypeScript implementation requires complete linguistic annotations from spaCy. Here's how to generate the required input in Python:

import spacy
import json

nlp = spacy.load("en_core_web_sm")
text = "Harry Potter walked through the castle."
doc = nlp(text)

spacy_context = {
    "tokens": [
        {
            "text": token.text,
            "startByte": token.idx,
            "endByte": token.idx + len(token.text),
            "pos": token.pos_,
            "finePos": token.tag_,
            "lemma": token.lemma_,
            "deprel": token.dep_,
            "dephead": token.head.i,
            "morph": {str(k): str(v) for k, v in token.morph.to_dict().items()},
            "likeNum": token.like_num,
            "isStop": token.is_stop,
            "sentenceId": token.sent.start,
            "withinSentenceId": token.i - token.sent.start
        }
        for token in doc
    ],
    "sentences": [
        {"start": sent.start, "end": sent.end}
        for sent in doc.sents
    ]
}

# Save to JSON for TypeScript
with open('spacy_context.json', 'w') as f:
    json.dump(spacy_context, f)

Then in TypeScript:

import * as fs from 'fs';

const spaCyContext = JSON.parse(
  fs.readFileSync('spacy_context.json', 'utf-8')
);

const result = await booknlp.process(spaCyContext);

Required Fields

All SpaCyToken fields are required:

  • text: Token text
  • startByte, endByte: Character offsets
  • pos, finePos: POS tags
  • lemma: Lemmatized form
  • deprel, dephead: Dependency parse
  • morph: Morphological features (can be empty object)
  • likeNum, isStop: Token properties
  • sentenceId, withinSentenceId: Sentence information

Output Format

BookNLPResult

interface BookNLPResult {
  tokens: Token[];                    // Annotated tokens
  sents: any[];                       // Sentence info (future)
  nounChunks: any[];                  // Noun chunks (future)
  entities: EntityAnnotation[];       // Detected entities
  supersense: SupersenseAnnotation[]; // Supersense annotations
  timing: Record<string, number>;     // Performance metrics
}

Entity Format

{
  startToken: 0,
  endToken: 2,
  cat: "PER",          // Entity category
  text: "Harry Potter",
  prop: "PROP"         // PROP, NOM, or PRON
}

Categories: PER, LOC, FAC, GPE, ORG, VEH

Supersense Format

[0, 2, "noun.person", "Harry Potter"]
// [startToken, endToken, category, text]

Categories include:

  • Nouns: noun.person, noun.location, noun.artifact, noun.cognition, etc.
  • Verbs: verb.communication, verb.motion, verb.cognition, etc.

Batch Processing

The TypeScript implementation uses sentence-level batch processing (matching Python behavior):

// Automatically batches sentences for efficient processing
// - Groups tokens into sentence batches (max 500 tokens per batch)
// - Processes up to 32 batches in parallel
// - Reconstructs results with correct token offsets

const result = await booknlp.process(spaCyContext);
// All batching is handled internally

Why Batch Processing?

  • Memory efficiency: Processes long documents in manageable chunks
  • Performance: Parallel processing of multiple sentences
  • ONNX compatibility: Matches Python implementation's batching strategy

Dependencies

Runtime Dependencies

{
  "@huggingface/transformers": "^2.6.0",   // BERT tokenization
  "onnxruntime-web": "^1.16.0"        // ONNX inference (browser)
}

Development Dependencies

{
  "@types/node": "^20.0.0",
  "typescript": "^5.0.0",
  "vite": "^5.0.0",                   // Build tool
  "vite-plugin-dts": "^3.0.0",        // TypeScript declarations
  "eslint": "^8.0.0"
}

Build and Development

Development Mode

npm run dev
# Watches for changes and rebuilds automatically

Production Build

npm run build
# Compiles TypeScript and bundles with Vite
# Output: dist/booknlp.js (ES module), dist/booknlp.umd.cjs (UMD)

Using in Your Project

ES Modules (Recommended)

import { BookNLP } from 'booknlp-ts';

UMD (Script tag)

<script src="node_modules/booknlp-ts/dist/booknlp.umd.cjs"></script>
<script>
  const booknlp = new window.BookNLP.BookNLP();
</script>

License

Same as BookNLP (Python version).

References

  • BookNLP Python: https://github.com/dbamman/book-nlp
  • ONNX Runtime: https://onnxruntime.ai/
  • Transformers.js: https://xenova.github.io/transformers.js/
  • BERT: https://arxiv.org/abs/1810.04805
  • CRF: https://en.wikipedia.org/wiki/Conditional_random_field