npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

tokenloom

v1.0.9

Published

A streaming text parser with custom tag and code fence detection for Node.js

Readme

TokenLoom - The ultimate token streams parser

TokenLoom is a TypeScript library for progressively parsing streamed text (LLM/SSE-like) into structured events. It detects:

  • Custom tags like <think>...</think> (non-nested in v1)
  • Fenced code blocks (``` or ~~~), including language info strings
  • Plain text emitted as tokens/words/graphemes

Demo

Why TokenLoom?

The Problem: When working with streaming text from LLMs, SSE endpoints, or real-time data sources, you often need to parse structured content that arrives in arbitrary chunks. Traditional parsers fail because they expect complete, well-formed input. You might receive fragments like:

  • "<thi" + "nk>reasoning</think>" (tag split across chunks)
  • "```java" + "script\nconsole.log('hello');\n```" (code fence fragmented)
  • Incomplete sequences that need buffering without blocking the stream

Existing Solutions Fall Short:

  • DOM parsers require complete markup and fail on fragments
  • Markdown parsers expect full documents and don't handle streaming
  • Regex-based approaches struggle with boundary conditions and backtracking
  • Custom state machines are complex to implement correctly for edge cases

TokenLoom's Solution:

  • Stream-native design that handles arbitrary chunk boundaries gracefully
  • Progressive emission - start processing immediately, don't wait for completion
  • Intelligent buffering with configurable limits to prevent memory issues
  • Robust boundary detection that works even when tags/fences split mid-sequence
  • Plugin architecture for flexible post-processing and output formatting

Perfect for AI applications, real-time chat systems, streaming markdown processors, and any scenario where structured text arrives incrementally.

Design intent:

  • Tolerate arbitrary chunk fragmentation (e.g., <thi + nk> or ````+javascript\n)
  • Emit start → progressive chunks → end; do not stall waiting for closers
  • Bound buffers with a high-water mark; flush when needed

Key features

  • Streaming-safe detection of custom tags and code fences
  • Incremental emission: does not block waiting for closers; emits start, progressive chunks, then end
  • Configurable segmentation: token, word, or grapheme units with named constants (EmitUnit.Token, EmitUnit.Word, EmitUnit.Grapheme)
  • Controlled emission timing: configurable delays between outputs for smooth streaming
  • Async completion tracking: flush() returns Promise, end event signals complete processing
  • Buffer monitoring: buffer-released events track when output buffer becomes empty
  • Non-interfering display: once() method for status updates that wait for buffer to be empty
  • Plugin system: pluggable post-processing via simple event hooks
  • Backpressure-friendly: exposes high-water marks and flushing

Status

  • v1 supports custom tags and fenced code blocks; Markdown headings and nested structures are intentionally out-of-scope for now.

Installation

npm install tokenloom

Browser Usage

TokenLoom includes a browser-compatible build that can be used directly in web browsers:

<script src="node_modules/tokenloom/dist/index.browser.js"></script>
<script>
  // Simple syntax - TokenLoom is available directly
  const parser = new TokenLoom();

  // All exports are also available as properties
  const { EmitUnit, LoggerPlugin } = TokenLoom;

  // Use parser as normal...
</script>

Or with a CDN:

<script src="https://unpkg.com/tokenloom/dist/index.browser.js"></script>

The browser build includes all necessary polyfills and works in modern browsers without additional dependencies.

Development

For development:

npm ci
npm run build

Requirements: Node 18+

Quick start

import { TokenLoom, EmitUnit } from "tokenloom";

const parser = new TokenLoom({
  tags: ["think"], // tags to recognize
  emitUnit: EmitUnit.Word, // emit words instead of tokens
  emitDelay: 50, // 50ms delay between emissions for smooth output
});

// Listen to events directly
parser.on("text", (event) => process.stdout.write(event.text));
parser.on("tag-open", (event) => console.log(`\n[${event.name}]`));
parser.on("end", () => console.log("\n✅ Processing complete!"));

// Non-interfering information display
parser.once("status", () => console.log("📊 Status: Ready"));

const input = `Hello <think>reasoning</think> world!`;

// Simulate streaming chunks
for (const chunk of ["Hello <thi", "nk>reason", "ing</think> world!"]) {
  parser.feed({ text: chunk });
}

// Wait for all processing to complete
await parser.flush();

See examples/ directory for advanced usage including syntax highlighting, async processing, and custom plugins.

API overview

Construction

new TokenLoom(opts?: ParserOptions)
// Named constants for emit units
namespace EmitUnit {
  export const Token = "token";
  export const Word = "word";
  export const Grapheme = "grapheme";
  export const Char = "grapheme"; // Alias for Grapheme
}

type EmitUnit =
  | typeof EmitUnit.Token
  | typeof EmitUnit.Word
  | typeof EmitUnit.Grapheme;

interface ParserOptions {
  emitUnit?: EmitUnit; // default "token"
  bufferLength?: number; // maximum buffered characters before attempting flush (default 2048)
  tags?: string[]; // tags to recognize e.g., ["think", "plan"]
  /**
   * Maximum number of characters to wait (from the start of a special sequence)
   * for it to complete (e.g., '>' for a tag open or a newline after a fence
   * opener). If exceeded, the partial special is treated as plain text and
   * emitted. Defaults to bufferLength when not provided.
   */
  specBufferLength?: number;
  /**
   * Minimum buffered characters to accumulate before attempting to parse a
   * special sequence (tags or fences). This helps avoid boundary issues when
   * very small chunks arrive (e.g., 1–3 chars). Defaults to 10.
   */
  specMinParseLength?: number;
  /**
   * Whether to suppress plugin error logging to console. Defaults to false.
   * Useful for testing or when you want to handle plugin errors silently.
   */
  suppressPluginErrors?: boolean;
  /**
   * Output release delay in milliseconds. Controls the emission rate by adding
   * a delay between outputs when tokens are still available in the output buffer.
   * This helps make emission smoother and more controlled. Defaults to 0 (no delay).
   */
  emitDelay?: number;
}

Core methods

  • use(plugin: IPlugin): this – registers a plugin
  • remove(plugin: IPlugin): this – removes a plugin
  • feed(chunk: SourceChunk): void – push-mode; feed streamed text
  • flush(): Promise<void> – force flush remaining buffered content and emit flush, resolves when all output is released
  • once(eventType: string, listener: Function): this – add one-time listener that waits for buffer to be empty before executing
  • dispose(): void – cleanup resources and dispose all plugins
  • getSharedContext(): Record<string, any> – access the shared context object used across events
  • [Symbol.asyncIterator](): AsyncIterator<Event> – pull-mode consumption

Event Emitter methods

TokenLoom extends Node.js EventEmitter, so you can listen to events directly:

  • on(event: string, listener: Function): this – listen to specific event types or '*' for all events
  • emit(event: string, ...args: any[]): boolean – emit events (used internally)
  • All other EventEmitter methods are available (once, off, removeAllListeners, etc.)

Events

TokenLoom emits the following event types:

  • text - Plain text content
  • tag-open - Custom tag start (e.g., <think>)
  • tag-close - Custom tag end (e.g., </think>)
  • code-fence-start - Code block start (e.g., ```javascript)
  • code-fence-chunk - Code block content
  • code-fence-end - Code block end
  • flush - Parsing complete, buffers flushed
  • end - Emitted after flush when all output processing is complete
  • buffer-released - Emitted whenever the output buffer is completely emptied

Each event includes:

  • context: Shared object for plugin state coordination
  • metadata: Optional plugin-attached data
  • in: Current parsing context (inside tag/fence)

Plugins

Plugins use a transformation pipeline with three optional stages:

  • preTransform - Early processing, metadata injection
  • transform - Main content transformation
  • postTransform - Final processing, analytics
parser.use({
  name: "my-plugin",
  transform(event, api) {
    if (event.type === "text") {
      return { ...event, text: event.text.toUpperCase() };
    }
    return event;
  },
});

Built-in plugins:

  • LoggerPlugin() - Console logging
  • TextCollectorPlugin() - Text accumulation

See examples/syntax-highlighting-demo.js for advanced plugin usage.

Usage patterns

Streaming text processing

const parser = new TokenLoom({
  tags: ["think"],
  emitUnit: EmitUnit.Word,
  emitDelay: 100, // Smooth output with 100ms delays
});

parser.on("text", (event) => process.stdout.write(event.text));
parser.on("tag-open", (event) => console.log(`[${event.name}]`));
parser.on("buffer-released", () => console.log("📤 Buffer empty"));

// Non-interfering status updates
parser.once("debug-info", () => console.log("🔍 Debug: Processing stream"));

// Simulate streaming chunks
for (const chunk of ["Hello <thi", "nk>thought</th", "ink> world"]) {
  parser.feed({ text: chunk });
}

await parser.flush(); // Wait for completion

AsyncIterator support

for await (const event of parser) {
  console.log(`${event.type}: ${event.text || event.name || ""}`);
  if (event.type === "end") break; // Wait for complete processing
}

Advanced features

Controlled emission timing

const parser = new TokenLoom({
  emitDelay: 200, // 200ms between emissions
  emitUnit: EmitUnit.Grapheme,
});

// Events will be emitted with smooth 200ms delays
parser.feed({ text: "Streaming text..." });
await parser.flush(); // Waits for all delayed emissions

Non-interfering information display

// Display info without interrupting the stream
parser.once("status-update", () => {
  console.log("📊 Processing 50% complete");
});

parser.once("debug-info", () => {
  console.log("🔍 Memory usage: 45MB");
});

// These will execute when buffer is empty, not interfering with output

Buffer monitoring

parser.on("buffer-released", (event) => {
  console.log(`📤 Buffer emptied at ${event.metadata.timestamp}`);
  // Triggered every time output buffer becomes completely empty
});

parser.on("end", () => {
  console.log("🏁 All processing complete");
  // Triggered after flush() when everything is done
});

Examples

You can run the examples after building the project:

# Build first
npm run build

# Basic parsing with plugins and direct event listening
node examples/basic-parsing.js

# Streaming simulation with random chunking and event tracing
node examples/streaming-simulation.js

# Syntax highlighting demo with transformation pipeline
node examples/streaming-syntax-coloring/index.js


# Pipeline phases demonstration
node examples/pipeline-phases-demo.js

# Async processing demo
node examples/async-processing.js

# Custom plugin example
node examples/custom-plugin.js

Development

Scripts

npm ci                  # install
npm run build           # build with rollup
npm run dev             # watch build
npm test                # run tests (vitest)
npm run test:run        # run tests once
npm run test:coverage   # coverage report

Architecture & Design

TokenLoom uses a handler-based architecture that switches between specialized parsers:

  • TextHandler - Plain text and special sequence detection
  • TagHandler - Custom tag content processing
  • FenceHandler - Code fence content processing

Key Features

  • Stream-safe: Handles arbitrary chunk fragmentation (<thi + nk>)
  • Progressive: Emits events immediately, doesn't wait for completion
  • Bounded buffers: Configurable limits prevent memory issues
  • Enhanced segmentation: Comment operators (//, /*, */) as single units
  • No nesting: Tags and fences are non-nested in v1

Roadmap

  • Optional nested tag/block support
  • Markdown structures (headings, lists, etc.)
  • More robust Unicode segmentation and locale controls
  • Additional built-in plugins (terminal colorizer, markdown renderer)
  • Performance optimizations for very large streams

License

MIT