llm-text-parser

v1.0.1

Published

8 months ago

A streaming text parser designed for processing LLM (Large Language Model) output with HTML-like tags and structured content.

0High
0Medium
0Low

zfh521

LLM Text Parser

A streaming text parser designed for processing LLM (Large Language Model) output with HTML-like tags and structured content.

Features

Streaming Processing: Parse text content incrementally as it arrives
Tag Recognition: Automatically detects and parses HTML-like tags (<tag>content</tag>)
Syntax Tree Generation: Builds a structured syntax tree from parsed tokens
State Management: Maintains parsing state across multiple text chunks
TypeScript Support: Fully typed for better development experience

Installation

pnpm install

Usage

Basic Usage

import { StreamingParser } from 'llm-text-parser';

const parser = new StreamingParser();

// Append text chunks
parser.appendText("Hello <call>");
parser.appendText("{\"name\": \"function\"}</call> world");

// Get the syntax tree
const syntaxTree = parser.getSyntaxTree();
console.log(JSON.stringify(syntaxTree, null, 2));

Example Output

For input:

Hello <call>{"name": "function"}</call> world

The parser generates:

[
  {
    "type": "text",
    "data": "Hello ",
    "start": { "row": 0, "column": 0 },
    "end": { "row": 0, "column": 6 }
  },
  {
    "type": "tag",
    "open": {
      "type": "tag_open",
      "data": "<call>",
      "start": { "row": 0, "column": 6 },
      "end": { "row": 0, "column": 12 }
    },
    "content": [
      {
        "type": "tag_content",
        "data": "{\"name\": \"function\"}",
        "start": { "row": 0, "column": 12 },
        "end": { "row": 0, "column": 31 }
      }
    ],
    "close": {
      "type": "tag_close",
      "data": "</call>",
      "start": { "row": 0, "column": 31 },
      "end": { "row": 0, "column": 38 }
    }
  },
  {
    "type": "text",
    "data": " world",
    "start": { "row": 0, "column": 38 },
    "end": { "row": 0, "column": 44 }
  }
]

API Reference

StreamingParser

Methods

appendText(text: string): Append text content to the parser
end(): Signal the end of input and finalize parsing
getSyntaxTree(): Get the current syntax tree
getStack(): Get the current token stack
getTextByRange(start: Position, end: Position): Extract text by position range

Properties

syntaxTree: The current syntax tree structure
stack: The token stack being processed
textBuf: Current text buffer content
state: Current parsing state

Token Types

text: Plain text content
tag_open: Opening tag (e.g., <call>)
tag_content: Content between tags
tag_close: Closing tag (e.g., </call>)
incomplete_tag: Partially parsed tags

Development

Build

pnpm run build

Watch Mode

pnpm run watch

Test

# Run the example parser
npx ts-node src/parser.ts

Architecture

The parser uses a state machine approach with the following states:

text: Processing plain text content
tag_open: Processing an opening tag
read_tag_content: Reading content between tags
tag_close: Processing a closing tag

License

ISC License