@mdocui/core

v0.7.2

Published

a month ago

Streaming Markdoc parser, component registry, and system prompt generator for LLM generative UI

Downloads

245

0High
0Medium
0Low

pnut

markdoc llm generative-ui streaming parser

@mdocui/core

Framework-agnostic streaming parser, component registry, and system prompt generator for LLM generative UI. Parses Markdoc {% %} tag syntax from streamed LLM output into a typed AST that any renderer can consume.

Part of the mdocui monorepo.

Install

npm install @mdocui/core

Overview

@mdocui/core provides three things:

Streaming parser -- incrementally tokenizes and parses Markdoc tags from chunked text (e.g. an LLM response stream).
Component registry -- defines available components with Zod schemas so the parser can validate props and the prompt generator can describe them to the model.
Prompt generator -- turns a registry into a system prompt section that teaches an LLM how to emit mdocUI markup.

How the parser separates prose from components

The LLM writes a single stream containing both standard markdown and {% %} component tags. The parser's job is to split that stream into two node types:

Prose nodes -- everything outside {% %} delimiters. This is standard markdown (headings, bold, lists, code blocks, etc.) and is passed through as-is for a markdown renderer to handle.
Component nodes -- everything inside {% %} delimiters. The parser extracts the tag name, parses attributes, and validates props against the registry's Zod schemas.

The {% sequence never appears in normal prose or fenced code blocks, so the tokenizer can reliably detect tag boundaries character-by-character during streaming -- no lookahead or backtracking required.

API Reference

`Tokenizer`

Low-level character-by-character tokenizer that splits raw text into Token objects. Handles {% tag %} boundaries, string quoting, and escape sequences. Most users should use StreamingParser instead.

import { Tokenizer, TokenType } from '@mdocui/core'

const tokenizer = new Tokenizer()

const tokens = tokenizer.write('Hello {% button action="go" label="Click" /%}')
// tokens[0] => { type: 'PROSE', raw: 'Hello ' }
// tokens[1] => { type: 'TAG_SELF_CLOSE', raw: '{% button ... /%}', name: 'button', attrs: 'action="go" label="Click"' }

// Flush any remaining buffer when the stream ends
const remaining = tokenizer.flush()

// Reset for reuse
tokenizer.reset()

Token types: PROSE, TAG_OPEN, TAG_SELF_CLOSE, TAG_CLOSE

Tokenizer states: IN_PROSE, IN_TAG, IN_STRING

`StreamingParser`

Incremental parser that converts a stream of text chunks into an ASTNode[] tree. Tags are matched by name, nested correctly, and force-closed on flush if unclosed.

import { StreamingParser } from '@mdocui/core'

const parser = new StreamingParser({
  knownTags: new Set(['card', 'button']),
  dropUnknown: true, // default -- silently drops unknown tags
})

// Feed chunks as they arrive from the LLM
let newNodes = parser.write('Here is a card:\n{% card title="Hello" %}')
newNodes = parser.write('Card body content')
newNodes = parser.write('{% /card %}')

// Finalize -- force-closes any unclosed tags
const finalNodes = parser.flush()

// Access the full AST
const allNodes = parser.getNodes() // ASTNode[]

// Inspect errors and status
const meta = parser.getMeta() // ParseMeta

`ParserOptions`

| Option | Type | Default | Description | |--------|------|---------|-------------| | knownTags | Set<string> | new Set() (allow all) | Tags the parser accepts. Empty set allows everything. | | dropUnknown | boolean | true | When true, unknown tags are silently dropped. When false, they are emitted as prose. |

`ComponentRegistry`

Typed store of component definitions. Used to generate the knownTags set for the parser and the system prompt for the LLM.

import { ComponentRegistry, defineComponent } from '@mdocui/core'
import { z } from 'zod'

const registry = new ComponentRegistry({ coerce: true }) // LLM-friendly mode

registry.register(
  defineComponent({
    name: 'alert',
    description: 'Displays a colored alert box',
    props: z.object({
      severity: z.enum(['info', 'warning', 'error']).describe('Alert severity'),
      title: z.string().optional().describe('Optional heading'),
    }),
    children: 'any',   // 'none' | 'any' | string[]
  })
)

// Batch register
registry.registerAll([alertDef, cardDef])

// Query
registry.has('alert')       // true
registry.get('alert')       // ComponentDefinition | undefined
registry.names()            // ['alert', ...]
registry.all()              // ComponentDefinition[]
registry.knownTags()        // Set<string>

// Validate props against the Zod schema
const result = registry.validate('alert', { severity: 'info' })
// { valid: true, errors: [], props: { severity: 'info' } }

`RegistryOptions`

| Option | Type | Default | Description | |--------|------|---------|-------------| | coerce | boolean | false | LLM-friendly mode: when validation fails, fall back to raw props instead of rejecting. Built-in definitions use z.coerce.* and case-insensitive enums, so most LLM output validates. The fallback is a safety net for edge cases. |

`defineComponent`

Identity helper that returns a ComponentDefinition unchanged. Provides type inference when defining components outside a registry.

import { defineComponent } from '@mdocui/core'
import { z } from 'zod'

export const myComponent = defineComponent({
  name: 'my-component',
  description: 'Does something useful',
  props: z.object({
    value: z.number().describe('A numeric value'),
  }),
  children: 'none',
  streaming: { value: true }, // mark props that can stream partial values
})

Always add `.describe()` to every prop

generatePrompt() sources prop descriptions directly from Zod's .describe(). Without it, the generated prompt shows a blank description and the LLM will guess — producing hallucinated field names, wrong types, and malformed values at runtime.

Always describe:

Array props — type alone (string[]) tells the LLM nothing about element shape
Enum props — add context for when each variant should be used
Any prop whose purpose isn't obvious from its name

// ❌ LLM sees:  labels — string[]:
labels: z.array(z.string()),

// ✓ LLM sees:  labels — string[]: Category labels for the X-axis, e.g. ["Q1","Q2","Q3"]
labels: z.array(z.string()).describe('Category labels for the X-axis, e.g. ["Q1","Q2","Q3"]'),

The built-in definitions in packages/core/src/definitions.ts follow this convention throughout — use them as a reference.

`generatePrompt`

Generates a complete system prompt from a registry. The prompt merges two layers:

Library layer (auto-generated from the registry):

TAG SYNTAX reference — self-closing and body tag forms
COMPONENTS list — all registered components with prop types and allowed children
COMPOSITION rules — nesting examples showing card > grid > stat, form > inputs, etc.
STREAMING GUIDELINE — prose-before-components rendering advice

App layer (your options):

preamble — domain context ("You are an e-commerce assistant...")
groups — organize components under headings with domain-specific notes
additionalRules — domain rules ("Use stat inside grid for KPI dashboards")
examples — domain examples showing expected output format

The final prompt structure:

[preamble]              ← your app context
## TAG SYNTAX            ← auto from library
## COMPONENTS            ← auto from registry + your groups
## COMPOSITION           ← auto from library
## STREAMING GUIDELINE   ← auto from library
## RULES                 ← your additionalRules
## EXAMPLE               ← your examples

You never write syntax docs or component signatures — generatePrompt() handles that from the registry. You only provide domain context and usage guidance.

import { generatePrompt, ComponentRegistry } from '@mdocui/core'

const registry = new ComponentRegistry()
// ... register components ...

const prompt = generatePrompt(registry, {
  preamble: 'You are a helpful assistant.',
  additionalRules: [
    'Always use a card for structured answers.',
    'Never nest more than 3 levels deep.',
  ],
  examples: [
    '{% card title="Weather" %}\nSunny, 72F\n{% /card %}',
  ],
  groups: [
    {
      name: 'Layout',
      components: ['stack', 'grid', 'card'],
      notes: ['Use stack for vertical/horizontal layouts'],
    },
  ],
})

`PromptOptions`

| Option | Type | Description | |--------|------|-------------| | preamble | string | Text prepended before the syntax section | | additionalRules | string[] | Extra rules appended as a bullet list | | examples | string[] | Example markup blocks appended at the end | | groups | ComponentGroup[] | Groups components under named headings with optional notes | | verbosity | 'minimal' \| 'default' \| 'detailed' | Prompt detail level. minimal outputs only component signatures (~90% fewer tokens). Default includes syntax docs, composition examples, and streaming guidelines |

`parseAttributes`

Parses the attribute string inside a {% tag ... %} into a key-value record. Handles quoted strings (with escape sequences), arrays via JSON [...], bare booleans, numbers, and null. Prototype pollution keys (__proto__, constructor, prototype) are silently skipped.

import { parseAttributes } from '@mdocui/core'

parseAttributes('action="go" label="Click me" count=42 disabled')
// { action: 'go', label: 'Click me', count: 42, disabled: true }

parseAttributes('options=["a","b","c"] required=true')
// { options: ['a', 'b', 'c'], required: true }

Types

`ASTNode`

type ASTNode = ProseNode | ComponentNode

`ProseNode`

interface ProseNode {
  type: 'prose'
  content: string
}

`ComponentNode`

interface ComponentNode {
  type: 'component'
  name: string
  props: Record<string, unknown>
  children: ASTNode[]
  selfClosing: boolean
}

`ComponentDefinition`

interface ComponentDefinition {
  name: string
  description: string
  props: z.ZodObject<z.ZodRawShape>
  children?: 'none' | 'any' | string[]
  streaming?: Record<string, boolean>
}

`ActionEvent`

Fired by interactive components in the renderer layer.

interface ActionEvent {
  type: 'button_click' | 'form_submit' | 'select_change' | 'link_click'
  action: string
  label?: string
  formName?: string
  formState?: Record<string, unknown>
  tagName: string
  params?: Record<string, unknown>
}

`ParseMeta`

Returned by parser.getMeta().

interface ParseMeta {
  errors: ParseError[]
  nodeCount: number
  isComplete: boolean
  pendingTag?: string    // tag name being buffered (e.g. "chart")
  bufferLength?: number  // bytes buffered for the pending tag
}

`ParseError`

interface ParseError {
  code: 'unknown_tag' | 'validation' | 'malformed' | 'unclosed'
  tagName: string
  message: string
  raw?: string
}

`ValidationResult`

Returned by registry.validate().

interface ValidationResult {
  valid: boolean
  errors: string[]
  props?: Record<string, unknown>
}

License

See the root mdocui repository for license details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@mdocui/core

Install

Overview

How the parser separates prose from components

API Reference

Tokenizer

StreamingParser

ParserOptions

ComponentRegistry

RegistryOptions

defineComponent

Always add .describe() to every prop

generatePrompt

PromptOptions

parseAttributes

Types

ASTNode

ProseNode

ComponentNode

ComponentDefinition

ActionEvent

ParseMeta

ParseError

ValidationResult

License

`Tokenizer`

`StreamingParser`

`ParserOptions`

`ComponentRegistry`

`RegistryOptions`

`defineComponent`

Always add `.describe()` to every prop

`generatePrompt`

`PromptOptions`

`parseAttributes`

`ASTNode`

`ProseNode`

`ComponentNode`

`ComponentDefinition`

`ActionEvent`

`ParseMeta`

`ParseError`

`ValidationResult`