llm-text-parser
v1.0.1
Published
A streaming text parser designed for processing LLM (Large Language Model) output with HTML-like tags and structured content.
Readme
LLM Text Parser
A streaming text parser designed for processing LLM (Large Language Model) output with HTML-like tags and structured content.
Features
- Streaming Processing: Parse text content incrementally as it arrives
- Tag Recognition: Automatically detects and parses HTML-like tags (
<tag>content</tag>) - Syntax Tree Generation: Builds a structured syntax tree from parsed tokens
- State Management: Maintains parsing state across multiple text chunks
- TypeScript Support: Fully typed for better development experience
Installation
pnpm installUsage
Basic Usage
import { StreamingParser } from 'llm-text-parser';
const parser = new StreamingParser();
// Append text chunks
parser.appendText("Hello <call>");
parser.appendText("{\"name\": \"function\"}</call> world");
// Get the syntax tree
const syntaxTree = parser.getSyntaxTree();
console.log(JSON.stringify(syntaxTree, null, 2));Example Output
For input:
Hello <call>{"name": "function"}</call> worldThe parser generates:
[
{
"type": "text",
"data": "Hello ",
"start": { "row": 0, "column": 0 },
"end": { "row": 0, "column": 6 }
},
{
"type": "tag",
"open": {
"type": "tag_open",
"data": "<call>",
"start": { "row": 0, "column": 6 },
"end": { "row": 0, "column": 12 }
},
"content": [
{
"type": "tag_content",
"data": "{\"name\": \"function\"}",
"start": { "row": 0, "column": 12 },
"end": { "row": 0, "column": 31 }
}
],
"close": {
"type": "tag_close",
"data": "</call>",
"start": { "row": 0, "column": 31 },
"end": { "row": 0, "column": 38 }
}
},
{
"type": "text",
"data": " world",
"start": { "row": 0, "column": 38 },
"end": { "row": 0, "column": 44 }
}
]API Reference
StreamingParser
Methods
appendText(text: string): Append text content to the parserend(): Signal the end of input and finalize parsinggetSyntaxTree(): Get the current syntax treegetStack(): Get the current token stackgetTextByRange(start: Position, end: Position): Extract text by position range
Properties
syntaxTree: The current syntax tree structurestack: The token stack being processedtextBuf: Current text buffer contentstate: Current parsing state
Token Types
text: Plain text contenttag_open: Opening tag (e.g.,<call>)tag_content: Content between tagstag_close: Closing tag (e.g.,</call>)incomplete_tag: Partially parsed tags
Development
Build
pnpm run buildWatch Mode
pnpm run watchTest
# Run the example parser
npx ts-node src/parser.tsArchitecture
The parser uses a state machine approach with the following states:
- text: Processing plain text content
- tag_open: Processing an opening tag
- read_tag_content: Reading content between tags
- tag_close: Processing a closing tag
License
ISC License
