yume-dsl-rich-text
v1.0.7
Published
Single-pass recursive rich-text DSL parser without regex, with pluggable tag handlers. Markdown alternative.
Downloads
2,617
Maintainers
Readme
English | 中文
yume-dsl-rich-text(ユメテキスト)
Important: Upgrade to
1.0.7or later if you use tags that support both inline and block/raw forms. Versions before1.0.7had a serious parsing bug where inline$$tag(...)$$could incorrectly consume the following newline and change rendered output.
▶ Live Demo — DSL Fallback Museum
Shiki code-highlighting plugin · legitimate plugins · intentional malformed markup · error reporting
Zero-dependency, single-pass, pluggable-semantics rich-text DSL parser. Turns text into a token tree — tag semantics, rendering, and UI integration are all yours to define.
- No regex backtracking — deterministic linear scan
- Inline / Raw / Block — three tag forms, one parser
- Fully configurable syntax tokens and tag-name rules
Core parsing API is stable. Some utility and ambient-state APIs are transitional — see Deprecated API. Breaking changes, if any, will land in major versions with explicit migration notes.
Ecosystem
| Package | Role |
|------------------------------------------------------------------------------------|--------------------------------------------------|
| yume-dsl-rich-text | Parser core — text to token tree (this package) |
| yume-dsl-token-walker | Interpreter — token tree to output nodes |
| yume-dsl-shiki-highlight | Syntax highlighting — tokens or TextMate grammar |
Recommended combinations:
- Parse DSL into tokens only →
yume-dsl-rich-text - Interpret token trees into arbitrary output nodes → add
yume-dsl-token-walker - Source-level highlighting or TextMate grammar → add
yume-dsl-shiki-highlight
Table of Contents
- Design Philosophy
- Install
- Quick Start
- DSL Syntax
- API
- Custom Syntax
- Custom Tag Name Characters
- Handler Helpers
- ParseOptions
- Token Structure
- Writing Tag Handlers
- Utility Exports
- Source Position Tracking
- Error Handling
- Graceful Degradation
- Vue 3 Rendering
- Deprecated API
- Changelog
- License
Design Philosophy
- No built-in tags. Every tag's meaning is defined by the handler you register.
- Handlers are the semantic layer. A handler receives parsed tokens and returns a
TokenDraft— output shape, extra fields, and behavior are all yours. - Rendering is not our job. The parser produces a token tree; how you render it (React, Vue, plain HTML, terminal) is entirely up to you.
- Graceful degradation. Unknown or unsupported tags never throw — they degrade silently.
- Everything is configurable. Syntax tokens, tag-name rules, nesting depth — override what you need, keep defaults for the rest.
Install
npm install yume-dsl-rich-text
pnpm add yume-dsl-rich-text
yarn add yume-dsl-rich-textQuick Start
1. Create a parser and register your tags
import {
createParser,
createSimpleInlineHandlers,
createSimpleBlockHandlers,
createSimpleRawHandlers,
declareMultilineTags,
} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
...createSimpleInlineHandlers(["bold", "italic", "underline", "strike"]),
...createSimpleBlockHandlers(["info", "warning"]),
...createSimpleRawHandlers(["code"]),
},
blockTags: declareMultilineTags(["info", "warning", "code"]),
});2. Parse
const tokens = dsl.parse("Hello $$bold(world)$$!");Result:
[
{type: "text", value: "Hello ", id: "rt-0"},
{
type: "bold",
value: [{type: "text", value: "world", id: "rt-1"}],
id: "rt-2",
},
{type: "text", value: "!", id: "rt-3"},
]3. Strip to plain text
const plain = dsl.strip("Hello $$bold(world)$$!");
// "Hello world!"Useful for extracting searchable plain text, generating previews, or building accessibility labels.
Unregistered tags degrade gracefully instead of throwing or crashing.
Recommended reading order
First-time users:
- Quick Start (you are here)
- DSL Syntax — the three tag forms
- createParser — the main entry point
- Handler Helpers — bulk-register tags without boilerplate
- Writing Tag Handlers — custom handler logic
- parseStructural — for structural consumers (highlighting, linting, editors, source inspection)
DSL Syntax
By default, the DSL uses $$ as the tag prefix. All syntax tokens (prefix, delimiters, escape character, block/raw
markers) are fully configurable — see Custom Syntax to adapt the DSL to your host markup.
Tag names allow a-z, A-Z, 0-9, _, - (first character must not be a digit or -).
See Custom Tag Name Characters to override these rules.
Three forms are supported:
Inline
$$tagName(content)$$Inline content is parsed recursively, so nesting works naturally.
$$bold(Hello $$italic(world)$$)$$Raw
$$tagName(arg)%
raw content preserved as-is
%end$$Raw content is not recursively parsed.
The close marker %end$$ must be on its own line.
Block
$$tagName(arg)*
block content parsed recursively
*end$$Block content is parsed recursively.
The close marker *end$$ must be on its own line.
Pipe Parameters
Inside arguments, | separates parameters.
$$link(https://example.com | click here)$$
$$code(js | Title | label)%
const x = 1;
%end$$Use \| to escape a literal pipe.
Escape Sequences
Prefix syntax tokens with \ to produce them literally.
\(→(\)→)\|→|\\→\\%end$$→%end$$\*end$$→*end$$
API
createParser(defaults) — recommended entry point
createParser binds your ParseOptions (handlers, syntax, tagName, depthLimit, onError, trackPositions) into a
reusable instance.
This is the recommended way to use the parser — define your tag handlers once, then call dsl.parse() /
dsl.strip() everywhere without repeating config.
import {
createParser,
createSimpleInlineHandlers,
parsePipeArgs,
} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
...createSimpleInlineHandlers(["bold", "italic", "underline"]),
link: {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {
type: "link",
url: args.text(0),
value: args.materializedTailTokens(1),
};
},
},
},
});
// Use everywhere — handlers are already bound
dsl.parse("Hello $$bold(world)$$!");
dsl.strip("Hello $$bold(world)$$!");
// Per-call overrides are shallow-merged onto defaults
dsl.parse(text, {onError: (e) => console.warn(e)});What createParser binds:
| Option | What it does when pre-bound |
|------------------|--------------------------------------------------------------------------|
| handlers | Your tag definitions — no need to pass them on every call |
| allowForms | Restrict accepted tag forms (default: all forms enabled) |
| syntax | Custom syntax tokens (if you override $$ prefix, etc.) |
| tagName | Custom tag-name character rules |
| depthLimit | Nesting limit — rarely changes per call |
| createId | Custom token id generator (can be overridden per call) |
| blockTags | Tags that receive block-level line-break normalization |
| onError | Default error handler (can still be overridden per call) |
| trackPositions | Attach source positions to all output nodes (can be overridden per call) |
Without createParser you must pass the full options object on every call:
// Repetitive — must pass handlers everywhere
parseRichText(text1, {handlers});
parseRichText(text2, {handlers});
stripRichText(text3, {handlers});
parseStructural(text4, {handlers});
// With createParser — bind once, use everywhere
const dsl = createParser({handlers});
dsl.parse(text1);
dsl.parse(text2);
dsl.strip(text3);
dsl.structural(text4);interface Parser {
parse: (text: string, overrides?: ParseOptions) => TextToken[];
strip: (text: string, overrides?: ParseOptions) => string;
structural: (text: string, overrides?: StructuralParseOptions) => StructuralNode[];
}structural shares handlers, allowForms, syntax, tagName, depthLimit, and trackPositions
from defaults — semantic-only options (blockTags, onError, createId) are
naturally excluded because StructuralParseOptions does not extend them.
parseRichText / stripRichText
Low-level stateless functions. Useful for one-off calls or when you need full control per invocation.
function parseRichText(text: string, options?: ParseOptions): TextToken[];
function stripRichText(text: string, options?: ParseOptions): string;ParseOptions includes handlers, allowForms, syntax, tagName, depthLimit, createId, blockTags,
onError, and trackPositions. See ParseOptions for full details.
Application code should generally use createParser; reach for the bare functions only in one-off utility scripts
or when you need full per-call control.
parseStructural — structural parse
parseStructural is for structural consumers — highlighting, linting, editors, source inspection, or any
scenario where you need to know which tag form was used, not just the semantic result. It preserves the tag form
(inline / raw / block) explicitly in the output tree.
It shares the same language configuration (handlers, allowForms, syntax, tagName, depthLimit,
trackPositions) as parseRichText, so you don't maintain two separate sets of DSL rules.
import {parseStructural} from "yume-dsl-rich-text";
const tree = parseStructural("$$bold(hello)$$ and $$code(ts)%\nconst x = 1;\n%end$$");
// [
// { type: "inline", tag: "bold", children: [{ type: "text", value: "hello" }] },
// { type: "text", value: " and " },
// { type: "raw", tag: "code",
// args: [{ type: "text", value: "ts" }],
// content: "\nconst x = 1;\n" },
// ]function parseStructural(text: string, options?: StructuralParseOptions): StructuralNode[]StructuralParseOptions extends ParserBaseOptions — the same base shared by ParseOptions:
interface ParserBaseOptions {
handlers?: Record<string, TagHandler>;
allowForms?: readonly TagForm[];
depthLimit?: number;
syntax?: Partial<SyntaxInput>;
tagName?: Partial<TagNameConfig>;
baseOffset?: number;
tracker?: PositionTracker;
}
interface ParseOptions extends ParserBaseOptions {
createId?,
blockTags?,
mode?, // deprecated
onError?, // semantic-only
trackPositions? // shared with StructuralParseOptions
}
interface StructuralParseOptions extends ParserBaseOptions {
trackPositions?: boolean;
}| Param | Type | Description |
|--------------------------|------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| text | string | DSL source |
| options.handlers | Record<string, TagHandler> | Tag recognition & form gating (same rules as parseRichText). Omit to accept all syntactically valid tags/forms without semantic gating. |
| options.allowForms | readonly TagForm[] | Restrict accepted forms (requires handlers) |
| options.depthLimit | number | Max nesting depth (default 50) |
| options.syntax | Partial<SyntaxInput> | Override syntax tokens |
| options.tagName | Partial<TagNameConfig> | Override tag-name character rules |
| options.trackPositions | boolean | Attach position to every node (default false) |
When handlers is provided, tag recognition and form gating are identical to parseRichText — the same
supportsInlineForm decision table and filterHandlersByForms logic are used (shared code, not mirrored).
Handler functions themselves are never called; only the presence of inline / raw / block methods matters.
When handlers is omitted, all syntactically valid tags in all forms are accepted.
Ambient capture: when called without syntax / tagName overrides, parseStructural captures the
current getSyntax() / getTagNameConfig() values once at entry and threads them explicitly through the
parse. This makes it composable inside withSyntax / withTagNameConfig wrappers:
withSyntax(customSyntax, () => {
parseStructural(text); // captures customSyntax at entry
parseStructural(text2); // also captures customSyntax at entry
});StructuralNode variants:
| Type | Fields | Description |
|-------------|----------------------------------|-------------------------------|
| text | value: string | Plain text |
| escape | raw: string | Escape sequence (e.g. \)) |
| separator | — | Pipe \| divider (args only) |
| inline | tag, children | $$tag(…)$$ |
| raw | tag, args, content: string | $$tag(…)% … %end$$ |
| block | tag, args, children | $$tag(…)* … *end$$ |
All variants carry an optional position?: SourceSpan when trackPositions is enabled.
Differences from parseRichText (features, not bugs):
| | parseRichText | parseStructural |
|--------------------------|-------------------------------------------------------------|------------------------------------------------------------------|
| Tag recognition | Same (shared ParserBaseOptions) | Same (shared ParserBaseOptions) |
| Form gating | Same | Same |
| Line-break normalization | Always strips (render mode) | Always preserves |
| Pipe \| | Part of text | separator node in args; text elsewhere |
| Error reporting | onError callback | Silent degradation |
| Escape handling | Unescaped at root level | Structural escape nodes |
| Position tracking | trackPositions on TextToken.position (normalized spans) | trackPositions on StructuralNode.position (raw syntax spans) |
| Output type | TextToken[] | StructuralNode[] |
Which one do I use? If your goal is rendering content, use parseRichText.
If your goal is analyzing source structure, use parseStructural.
Custom Syntax
Every syntax token — prefix, open/close delimiters, pipe divider, escape character, and block/raw markers — can be
overridden through options.syntax. This lets you adapt the DSL to any host markup without conflicts.
import {createEasySyntax, parseRichText} from "yume-dsl-rich-text";
const syntax = createEasySyntax({tagPrefix: "@@"});
// endTag, rawClose, blockClose are derived automatically: ")@@", "%end@@", "*end@@"
const tokens = parseRichText("@@bold(hello)@@", {
syntax,
handlers: {
bold: {
inline: (tokens, _ctx) => ({type: "bold", value: tokens}),
},
},
});Default Syntax
The default tokens and where they appear:
Inline: $$tag(content)$$
↑ ↑
tagOpen( endTag)$$
Nested: $$tag(fn(x) text)$$
↑ ↑
tagOpen( tagClose) ← depth tracking keeps inner parens balanced
With arg: $$tag(arg | content)$$
↑
tagDivider |
Raw: $$tag(arg)%
↑ raw content (no parsing)
rawOpen)%
%end$$
↑
rawClose
Block: $$tag(arg)*
↑ block content (recursive parsing)
blockOpen)*
*end$$
↑
blockClose
Escape: \) \\ \|
↑
escapeChar \import {DEFAULT_SYNTAX} from "yume-dsl-rich-text";
// DEFAULT_SYNTAX.tagPrefix === "$$" // tag start marker
// DEFAULT_SYNTAX.tagOpen === "(" // opens the tag argument/content
// DEFAULT_SYNTAX.tagClose === ")" // paired with tagOpen for nested-paren depth matching in args
// DEFAULT_SYNTAX.tagDivider === "|" // separates params inside (…)
// DEFAULT_SYNTAX.endTag === ")$$" // closes an inline tag
// DEFAULT_SYNTAX.rawOpen === ")%" // switches from args to raw content
// DEFAULT_SYNTAX.blockOpen === ")*" // switches from args to block content
// DEFAULT_SYNTAX.rawClose === "%end$$" // closes a raw tag (must be on its own line)
// DEFAULT_SYNTAX.blockClose === "*end$$" // closes a block tag (must be on its own line)
// DEFAULT_SYNTAX.escapeChar === "\\" // escapes the next syntax token literallyWarning: Syntax tokens must remain distinguishable from one another. If two tokens are configured to the same string, behavior is undefined.
Token dependency — createSyntax does a plain shallow merge; no auto-derivation.
The parser has hard couplings between certain tokens — break them and tags stop working:
| Token | Constraint | Why |
|--------------|---------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
| tagClose | endTag, rawOpen, blockOpen must start with it | getTagCloserType matches these three from the position where findTagArgClose stopped — that position points to tagClose |
| tagOpen | Must pair with tagClose | findTagArgClose counts tagOpen/tagClose for nested depth matching |
| endTag | Must start with tagClose | See tagClose above |
| rawOpen | Must start with tagClose | See tagClose above |
| blockOpen | Must start with tagClose | See tagClose above |
| tagPrefix | — | Independent |
| rawClose | — | Independent (whole-line match) |
| blockClose | — | Independent (whole-line match) |
| tagDivider | — | Independent |
| escapeChar | — | Independent |
createEasySyntax (recommended)
function createEasySyntax(overrides?: Partial<SyntaxInput>): SyntaxConfigChange the base tokens, compound tokens stay in sync automatically.
Accepts any subset of SyntaxInput — base tokens drive derivation, explicit compound overrides take precedence.
| Base tokens (you set) | Compound tokens (auto-derived) |
|----------------------------------------------------------------|------------------------------------------------------------|
| tagPrefix, tagOpen, tagClose, tagDivider, escapeChar | endTag, rawOpen, blockOpen, rawClose, blockClose |
Derivation rules:
endTag = tagClose + tagPrefix ")" + "$$" → ")$$"
rawOpen = tagClose + "%" ")" + "%" → ")%"
blockOpen = tagClose + "*" ")" + "*" → ")*"
rawClose = "%" + "end" + tagPrefix "%end" + "$$" → "%end$$"
blockClose = "*" + "end" + tagPrefix "*end" + "$$" → "*end$$"import {createEasySyntax} from "yume-dsl-rich-text";
// Change prefix — compounds follow
createEasySyntax({tagPrefix: "@@"});
// endTag → ")@@" rawClose → "%end@@" blockClose → "*end@@"
// Change prefix + closer — compounds adapt to both
createEasySyntax({tagPrefix: "@@", tagClose: "]"});
// endTag → "]@@" rawOpen → "]%" blockOpen → "]*"When your opening/closing protocol is irregular (e.g. rawOpen: "<raw>" or raw/block use different close keywords),
derivation can't help — use createSyntax instead.
createSyntax (low-level)
Plain shallow merge onto DEFAULT_SYNTAX — no derivation. Use this only when you need full manual control over every
token.
import {createSyntax} from "yume-dsl-rich-text";
const syntax = createSyntax({tagPrefix: "@@", endTag: ")@@"});
// You must update endTag, rawClose, blockClose yourself — no auto-derivationinterface SyntaxConfig extends SyntaxInput {
escapableTokens: string[]; // precomputed, sorted by length (descending)
}Note: Internally, parser state is passed explicitly through the parse pipeline.
parseRichTextpreserves module-local ambient wrapping (withSyntax/withCreateId) for backward compatibility in handler utility calls. This is safe for normal synchronous calls, but if you share one module instance across concurrent async request flows, isolate parser work carefully — or passDslContextexplicitly to eliminate ambient dependency.
Custom Tag Name Characters
function createTagNameConfig(overrides?: Partial<TagNameConfig>): TagNameConfigControls which characters the parser accepts in tag names. Provide only the functions you want to change — the rest
falls back to DEFAULT_TAG_NAME.
| Function | Default | Role | Example match |
|------------------|-------------------------------|----------------------|-----------------------|
| isTagStartChar | a-z, A-Z, _ | First character | $$bold( — b |
| isTagChar | a-z, A-Z, 0-9, _, - | Remaining characters | $$my-tag( — y-tag |
By default, $$ui:button(...)$$ would fail because : is not in isTagChar. To allow it:
import {createParser, createTagNameConfig} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
"ui:button": {inline: (value, _ctx) => ({type: "ui:button", value})},
},
// Only override isTagChar — isTagStartChar keeps the default.
// Keep the normal tag characters, and additionally allow ":" after the first character.
tagName: createTagNameConfig({
isTagChar: (char) => /[A-Za-z0-9_-]/.test(char) || char === ":",
}),
});
dsl.parse("$$ui:button(hello)$$"); // worksYou can also pass a plain partial object directly to tagName — createTagNameConfig is optional:
parseRichText("$$1tag(hello)$$", {
handlers: {"1tag": {inline: (v, _ctx) => ({type: "1tag", value: v})}},
tagName: {
isTagStartChar: (char) => /[A-Za-z0-9_]/.test(char), // allow digit start
isTagChar: (char) => /[A-Za-z0-9_-]/.test(char) || char === ":", // keep normal chars, also allow ":"
},
});Handler Helpers
Handler helpers let you register tags in bulk without writing repetitive handler objects.
createPipeHandlers(definitions)
The recommended handler helper for tags that need pipe parameters, multiple forms, or any custom logic beyond simple
wrapping. Supports any combination of inline, raw, and block per tag in a single definition object.
Each handler receives pre-parsed PipeArgs — no manual parsePipeArgs / parsePipeTextArgs boilerplate needed.
raw and block handlers also receive the original rawArg string for cases where you need the unparsed value.
import {createParser, createPipeHandlers, createSimpleInlineHandlers} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
// Simple tags — use createSimpleInlineHandlers
...createSimpleInlineHandlers(["bold", "italic", "underline"]),
// Tags with pipe parameters or multiple forms — use createPipeHandlers
...createPipeHandlers({
link: {
inline: (args) => ({
type: "link",
url: args.text(0),
value: args.materializedTailTokens(1),
}),
},
info: {
inline: (args) => ({
type: "info",
title: args.text(0, "Info"),
value: args.materializedTailTokens(1),
}),
block: (args, content, _ctx, rawArg) => ({
type: "info",
title: rawArg || "Info",
value: content,
}),
},
code: {
raw: (args, content) => ({
type: "raw-code",
lang: args.text(0, "text"),
title: args.text(1, "Code:"),
value: content,
}),
},
}),
},
});When to use which helper:
| Scenario | Use |
|-------------------------------------------|------------------------------|
| Simple inline (bold, italic, etc.) | createSimpleInlineHandlers |
| Simple block (info, warning, etc.) | createSimpleBlockHandlers |
| Simple raw (code, math, etc.) | createSimpleRawHandlers |
| Pipe parameters ($$link(url \| text)$$) | createPipeHandlers |
| Multiple forms (inline + block + raw) | createPipeHandlers |
| Raw/block tags with structured args | createPipeHandlers |
createSimpleInlineHandlers(names) / createSimpleBlockHandlers(names) / createSimpleRawHandlers(names)
Bulk-register tags that don't need pipe parameters or custom logic. Each form produces a minimal token:
| Helper | Token shape |
|------------------------------|---------------------------------------------------|
| createSimpleInlineHandlers | { type: tagName, value: materializedTokens } |
| createSimpleBlockHandlers | { type: tagName, arg, value: content } |
| createSimpleRawHandlers | { type: tagName, arg, value: content } (string) |
import {
createParser,
createSimpleInlineHandlers,
createSimpleBlockHandlers,
createSimpleRawHandlers,
} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
...createSimpleInlineHandlers(["bold", "italic", "underline", "strike", "code"]),
...createSimpleBlockHandlers(["info", "warning"]),
...createSimpleRawHandlers(["math"]),
},
});function createSimpleInlineHandlers(names: readonly string[]): Record<string, TagHandler>;
function createSimpleBlockHandlers(names: readonly string[]): Record<string, TagHandler>;
function createSimpleRawHandlers(names: readonly string[]): Record<string, TagHandler>;createPipeBlockHandlers(names) / createPipeRawHandlers(names)
Deprecated. See Deprecated API.
declareMultilineTags(names)
Declares which already-registered tags are multiline types. Returns a BlockTagInput[] to pass as
ParseOptions.blockTags.
This does not register tags or create handlers — it only tells the parser which tags need line-break normalization (
stripping the leading \n after )* / )% openers and the trailing \n before *end$$ / %end$$ closers).
Each entry is either a plain tag name (normalization for both raw and block forms — backward compatible) or an
object with a forms array to restrict normalization to specific multiline forms.
import {createParser, createSimpleInlineHandlers, declareMultilineTags} from "yume-dsl-rich-text";
// Basic usage — all multiline forms normalized (backward compatible)
const dsl = createParser({
handlers: {
...createSimpleInlineHandlers(["bold", "italic"]),
info: { /* custom handler registered separately */},
warning: { /* custom handler registered separately */},
},
blockTags: declareMultilineTags(["info", "warning"]),
});
// Granular — restrict normalization to specific forms
const dsl2 = createParser({
handlers: { /* ... */},
blockTags: declareMultilineTags([
"info", // both raw & block normalized
{tag: "code", forms: ["raw"]}, // only raw form normalized
{tag: "note", forms: ["block"]}, // only block form normalized
]),
});Note: If you omit
blockTags, the parser auto-derives it from handlers that haveraworblockmethods. UsedeclareMultilineTagswhen you need explicit control over which tags receive line-break normalization.
type MultilineForm = "raw" | "block";
type BlockTagInput = string | { tag: string; forms?: readonly MultilineForm[] };
function declareMultilineTags(names: readonly BlockTagInput[]): BlockTagInput[];createPassthroughTags(names)
Deprecated. See Deprecated API.
ParseOptions
Both ParseOptions and StructuralParseOptions extend ParserBaseOptions:
interface ParserBaseOptions {
handlers?: Record<string, TagHandler>;
allowForms?: readonly ("inline" | "raw" | "block")[];
depthLimit?: number;
syntax?: Partial<SyntaxInput>;
tagName?: Partial<TagNameConfig>;
baseOffset?: number;
tracker?: PositionTracker;
}
interface ParseOptions extends ParserBaseOptions {
createId?: (token: TokenDraft) => string;
blockTags?: readonly BlockTagInput[];
mode?: "render"; // deprecated
onError?: (error: ParseError) => void;
trackPositions?: boolean;
}
interface StructuralParseOptions extends ParserBaseOptions {
trackPositions?: boolean;
}Fields — shared (ParserBaseOptions)
handlers: tag name → handler definitionallowForms: restrict which tag forms are parsed (default: all forms enabled)depthLimit: maximum nesting depth, default50syntax: override default syntax tokenstagName: override tag-name character rulesbaseOffset: base offset for position tracking when parsing substrings (default0). See Parsing substrings with baseOffset and trackertracker: pre-builtPositionTrackerfrom the original full document for correctline/column. See Parsing substrings with baseOffset and tracker
Fields — ParseOptions only
createId: override token id generation for this parseblockTags: tags treated as block-level for line-break normalization — accepts plain strings or{ tag, forms }objects for per-form controlmode: deprecated — see Deprecated APIonError: callback for parse errorstrackPositions: attach source position info (position) to everyTextToken(defaultfalse). See Source Position Tracking
allowForms
Controls which tag forms the parser will accept. Forms not listed are treated as if the handler does not support them — the parser degrades gracefully.
In practice, disabled forms are left as plain text. This applies globally, including unregistered tags. If "inline"
is disabled, $$unknown(...)$$ is preserved literally instead of being unwrapped.
// Only allow inline tags — block and raw syntax is ignored
const dsl = createParser({
handlers,
allowForms: ["inline"],
});
// Allow inline and block, but not raw
const dsl2 = createParser({
handlers,
allowForms: ["inline", "block"],
});This is useful for user-generated content (comments, chat messages) where you want to allow simple inline formatting but prevent multi-line block or raw tags.
When omitted, all forms are enabled.
Token Structure
interface TextToken {
type: string;
value: string | TextToken[];
id: string;
position?: SourceSpan;
[key: string]: unknown;
}TextToken is the parser's output type. The type and value fields are intentionally loose (string) so the parser
can represent any tag without knowing your schema.
The optional position field is present when trackPositions is enabled. It records the
source span (offset, line, column) of the original text that produced this token.
Extra fields returned by handlers (e.g. url, lang, title) are preserved on the resulting TextToken and
accessible as unknown. You can read them directly without a cast — just narrow the type before use:
const token = tokens[0];
if (token.type === "link" && typeof token.href === "string") {
console.log(token.href); // works, no cast needed
}Handlers return TokenDraft, which shares the same open structure:
interface TokenDraft {
type: string;
value: string | TextToken[];
[key: string]: unknown;
}Strong Typing
For simple use cases, you can access extra fields directly via typeof narrowing — no cast needed.
For full type safety across your entire token schema, define typed interfaces that extend TextToken and cast once at
the call site:
import {parseRichText, type TextToken} from "yume-dsl-rich-text";
// 1. Define your token types — extend TextToken for compatibility
interface PlainText extends TextToken {
type: "text";
value: string;
}
interface BoldToken extends TextToken {
type: "bold";
value: MyToken[];
}
interface LinkToken extends TextToken {
type: "link";
url: string;
value: MyToken[];
}
interface CodeBlockToken extends TextToken {
type: "code-block";
lang: string;
value: string;
}
type MyToken = PlainText | BoldToken | LinkToken | CodeBlockToken;
// 2. Cast once at the call site
const tokens = parseRichText(input, options) as MyToken[];
// 3. Narrow with discriminated unions
function render(token: MyToken): string {
switch (token.type) {
case "text":
return token.value; // string
case "bold":
return `<b>${token.value.map(render).join("")}</b>`;
case "link":
return `<a href="${token.url}">${token.value.map(render).join("")}</a>`;
case "code-block":
return `<pre data-lang="${token.lang}">${token.value}</pre>`;
}
}The cast is safe as long as your handlers return drafts that match the union.
If you add or remove tags, update the union accordingly — TypeScript will flag any unhandled type in exhaustive
switches.
Writing Tag Handlers
For tags that need custom logic — extracting parameters, attaching extra fields, supporting multiple forms — you write a
TagHandler manually.
Use handler helpers for simple wrapper tags. Write custom handlers when you need:
- Pipe parameters — e.g.,
$$link(url | display text)$$ - Extra fields on the output token — e.g.,
url,lang,title - Multiple forms — the same tag supporting inline, raw, and block syntax
- Transformation logic — e.g., language alias mapping for code blocks
TagHandler interface
interface TagHandler {
inline?: (tokens: TextToken[], ctx?: DslContext) => TokenDraft;
raw?: (arg: string | undefined, content: string, ctx?: DslContext) => TokenDraft;
block?: (arg: string | undefined, content: TextToken[], ctx?: DslContext) => TokenDraft;
}You only need to implement the forms your tag supports. Unsupported forms fall back gracefully instead of breaking the parse.
Handlers should accept ctx and pass it through when calling public utility functions such as parsePipeArgs,
parsePipeTextList, materializeTextTokens, unescapeInline, or createToken.
Example: full handler set
import {
createParser,
createSimpleInlineHandlers,
extractText,
parsePipeArgs,
} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
// Simple tags — use helpers
...createSimpleInlineHandlers(["bold", "italic", "underline"]),
// Custom: pipe parameters → extra fields
link: {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {
type: "link",
url: args.text(0),
value:
args.parts.length > 1
? args.materializedTailTokens(1)
: args.materializedTokens(0),
};
},
},
// Custom: raw form → preserves content as-is
code: {
raw: (arg, content, _ctx) => ({
type: "code-block",
lang: arg ?? "text",
value: content,
}),
},
// Custom: supports both inline and block forms
info: {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {
type: "info",
title: extractText(args.materializedTokens(0)),
value: args.materializedTailTokens(1),
};
},
block: (arg, content, _ctx) => ({
type: "info",
title: arg || "Info",
value: content,
}),
},
},
});Input:
Hello $$bold(world)$$!
$$info(Notice)*
This is a $$bold(block)$$ example.
*end$$
$$code(ts)%
const answer = 42;
%end$$const tokens = dsl.parse(input);Utility Exports
Configuration
See Custom Syntax and Custom Tag Name Characters for full documentation.
| Export | Description |
|----------------------------------|---------------------------------------------------------|
| DEFAULT_SYNTAX | The built-in syntax tokens ($$, (, )$$, etc.) |
| createEasySyntax(overrides) | Build SyntaxConfig with auto-derivation (recommended) |
| createSyntax(overrides) | Build SyntaxConfig with plain merge (low-level) |
| DEFAULT_TAG_NAME | The built-in tag-name character rules |
| createTagNameConfig(overrides) | Build a full TagNameConfig from partial overrides |
Handler Helpers
Convenience functions for creating handlers in bulk — most projects only need these.
Recommended
| Export | Description |
|-------------------------------------|--------------------------------------------------------------------|
| createPipeHandlers(definitions) | Pipe-aware handler builder for any combination of inline/raw/block |
| createSimpleInlineHandlers(names) | Create inline handlers for simple tags in bulk |
| createSimpleBlockHandlers(names) | Create block-form handlers for simple tags in bulk |
| createSimpleRawHandlers(names) | Create raw handlers for simple tags in bulk |
| declareMultilineTags(names) | Declare which tags need multiline normalization |
See also Deprecated API for createPipeBlockHandlers, createPipeRawHandlers,
createPassthroughTags.
Handler Utilities
Lower-level tools for writing custom TagHandler implementations.
You will not need these if you only use the handler helpers above.
The ctx? parameter exists for backward compatibility. New code should treat it as required in practice —
pass the DslContext received from the handler callback or construct one explicitly.
See DslContext below.
| Export | Who uses it | Description |
|---------------------------------------|--------------------------------------------|----------------------------------------------------------|
| parsePipeArgs(tokens, ctx?) | Custom handlers with \|-separated params | Split tokens by pipe and access parsed parts |
| parsePipeTextArgs(text, ctx?) | Custom handlers parsing raw args | Same as above, but from a plain text string |
| parsePipeTextList(text, ctx?) | Custom handlers needing string[] args | Split a pipe-delimited string into trimmed string[] |
| splitTokensByPipe(tokens, ctx?) | Low-level handler code | Raw token splitter without helper methods |
| extractText(tokens) | Handlers that need plain-text values | Flatten a token tree into a single string |
| materializeTextTokens(tokens, ctx?) | Handlers returning processed child tokens | Recursively unescape text tokens in a tree |
| unescapeInline(str, ctx?) | Handlers processing raw strings | Unescape DSL escape sequences in a single string |
| readEscapedSequence(text, i, ctx?) | Handlers inspecting escape sequences | Read one escape sequence at position i |
| createTextToken(value, ctx?) | Handlers creating plain text leaf tokens | Create a { type: "text", value } token with id |
| createToken(draft, position?, ctx?) | Handlers building tokens manually | Add an id (and optional position) to a TokenDraft |
| resetTokenIdSeed() | Test code | Reset the token id counter for deterministic test output |
During parsing, token ids default to a parse-local sequence (
rt-0,rt-1, ...).createToken()only uses the module-level counter when called outside an active parse, andresetTokenIdSeed()is mainly intended for tests around that standalone usage. If you need strict request isolation for SSR or concurrent async parsing, prefer isolating parser usage per runtime boundary.
DslContext
DslContext is the lightweight context for public utility functions:
interface DslContext {
syntax: SyntaxConfig;
createId?: CreateId;
}| Field | Description |
|------------|--------------------------------------------------------------------------|
| syntax | The active SyntaxConfig — controls escape characters, delimiters, etc. |
| createId | Optional token id generator — used by createToken when building tokens |
What ctx actually is:
- Inside a
TagHandler,ctxis the second/third argument passed in by the parser for the current parse. - It carries the active syntax and token-id generator for that parse.
- When a handler calls public utilities, pass the same
ctxthrough so those utilities stay on the same parse-local configuration. - Outside parsing, you can construct
DslContextyourself and pass it explicitly.
Use DslContext consistently:
- Inside a
TagHandler, receivectxfrom the parser and pass it through to public utilities. - Outside parsing, construct
DslContextexplicitly and pass it yourself. - Treat explicit
DslContextas the intended 2.0 contract.
// Inside a handler: reuse the parse-local ctx passed in by the parser
link: {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {type: "link", url: args.text(0), value: args.materializedTailTokens(1)};
},
}
// Outside parsing: construct DslContext explicitly
const ctx: DslContext = {syntax: createSyntax(), createId: (draft) => `demo-${draft.type}`};
const args = parsePipeTextArgs("ts | Demo", ctx);
const token = createTextToken("hello", ctx);This keeps handler code, standalone scripts, and future 2.0 usage on the same explicit model.
Migration Guide
Use this section as the migration target for 2.0 preparation.
Affected APIs
This migration affects the handler-to-utility call chain, not the entire library:
TagHandlerparsePipeArgsparsePipeTextArgsparsePipeTextListsplitTokensByPipematerializeTextTokensunescapeInlinereadEscapedSequencecreateToken
Target pattern
link: {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {type: "link", url: args.text(0), value: args.materializedTailTokens(1)};
},
}const ctx: DslContext = {syntax: createSyntax(), createId: (draft) => `demo-${draft.type}`};
const args = parsePipeTextArgs("ts | Demo", ctx);
const token = createToken({type: "text", value: "hello"}, undefined, ctx);Recommended migration order
- Update custom
TagHandlersignatures to acceptctx. - Pass that
ctxthrough to every public utility used inside handlers. - Update standalone scripts/tests to construct and pass
DslContextexplicitly. - Review examples and internal docs so they no longer show implicit utility calls.
PipeArgs / parsePipeTextList
parsePipeArgs and parsePipeTextArgs return a PipeArgs object:
interface PipeArgs {
parts: TextToken[][];
has: (index: number) => boolean;
text: (index: number, fallback?: string) => string;
materializedTokens: (index: number, fallback?: TextToken[]) => TextToken[];
materializedTailTokens: (startIndex: number, fallback?: TextToken[]) => TextToken[];
}| Field | Description |
|----------------------------------------|--------------------------------------------------------------|
| parts | Raw token arrays split by \| |
| has(i) | Whether part i exists |
| text(i, fallback?) | Plain text of part i, unescaped and trimmed |
| materializedTokens(i, fallback?) | Unescaped tokens of part i |
| materializedTailTokens(i, fallback?) | All parts from index i onward, merged into one token array |
If you only need string[] without token trees, use parsePipeTextList instead:
parsePipeTextList("ts | Demo | Label"); // → ["ts", "Demo", "Label"]Source Position Tracking
Pass trackPositions: true to attach a position (source span) to every output node. Disabled by default — when off,
no line table is built and no position fields appear.
import {parseRichText, type SourceSpan} from "yume-dsl-rich-text";
const tokens = parseRichText("hello $$bold(world)$$", {
handlers: {bold: {inline: (t, _ctx) => ({type: "bold", value: t})}},
trackPositions: true,
});
// tokens[0].position
// {
// start: { offset: 0, line: 1, column: 1 },
// end: { offset: 6, line: 1, column: 7 }
// }
// tokens[1].position
// {
// start: { offset: 6, line: 1, column: 7 },
// end: { offset: 21, line: 1, column: 22 }
// }parseStructural supports the same option:
import {parseStructural} from "yume-dsl-rich-text";
const nodes = parseStructural("$$bold(hi)$$", {trackPositions: true});
// nodes[0].position → { start: { offset: 0, ... }, end: { offset: 12, ... } }Types
interface SourcePosition {
offset: number; // 0-indexed string offset (UTF-16 code unit)
line: number; // 1-indexed
column: number; // 1-indexed
}
interface SourceSpan {
start: SourcePosition;
end: SourcePosition;
}Parsing substrings with baseOffset and tracker
TL;DR —
baseOffsetmaps substring positions back to absolute offsets in the original text;trackerresolves those absolute offsets into correctline/column. Pass both for full accuracy.
When you parse a substring extracted from a larger document, pass baseOffset and a pre-built tracker
so that all position fields (offset, line, column) point back into the original source:
import {parseRichText, buildPositionTracker} from "yume-dsl-rich-text";
const fullText = "first line\nprefix $$bold(world)$$ suffix";
const tracker = buildPositionTracker(fullText);
const start = 18; // "$$bold(world)$$" starts at offset 18
const slice = fullText.slice(start, 33);
const tokens = parseRichText(slice, {
handlers: {bold: {inline: (t) => ({type: "bold", value: t})}},
trackPositions: true,
baseOffset: start,
tracker, // ← built from the full document
});
// tokens[0].position.start.offset → 18 (absolute, in fullText)
// tokens[0].position.start.line → 2 (correct line in fullText)
// tokens[0].position.start.column → 8 (correct column in fullText)| Option | Purpose |
|--------------|--------------------------------------------------------------------------------|
| baseOffset | Shift all offsets by this amount (default 0) |
| tracker | Pre-built tracker from the full document — enables correct line/column too |
Both options apply to parseRichText and parseStructural. They require trackPositions: true;
when position tracking is off, both are ignored.
Without tracker (only baseOffset): offset is shifted correctly, but line/column are
resolved locally against the substring. This is sufficient when you only need offset-based lookups.
With tracker (recommended): all three fields are fully correct against the original document.
Build the tracker once with buildPositionTracker(fullText) and reuse it across all subsequent slice parses.
Do not call buildPositionTracker per slice — it rebuilds the line table from scratch each time.
What position covers
Each token's position spans the source range for that parser's own output model.
- In
parseRichText, block/raw token spans include any trailing line break consumed by line-break normalization. - In
parseStructural, spans follow the raw structural syntax and therefore stop at*end$$/%end$$.
For example, given this input (27 characters):
$$info()*\nhello\n*end$$\nnext
0 1 2
012345678901234567890123456| API | info token position.end.offset | Covers |
|-------------------|------------------------------------|------------------------------|
| parseRichText | 23 (past the \n after $$) | $$info()*\nhello\n*end$$\n |
| parseStructural | 22 (stops at $$) | $$info()*\nhello\n*end$$ |
parseRichText consumes the trailing \n as part of block line-break normalization;
parseStructural stops at the raw syntax boundary. The \n at offset 22 becomes the start of the next text node.
Semantic differences between parseRichText and parseStructural
| Aspect | parseRichText | parseStructural |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| Block children offset | Adjusted for leading line-break normalization — inner position maps back to the original source through the normalized content | Raw syntax positions — no normalization adjustment, children start at the content delimiter |
Both APIs use the same SourceSpan type, but the inner child positions reflect their respective processing models.
If you compare child positions across the two APIs on the same input, block content may show an offset difference
equal to the stripped leading line break (1 for \n, 2 for \r\n).
Performance
When trackPositions is false (default):
- No line-offset table is allocated
- No
positionobjects are produced - Remaining overhead is limited to a few null-check branches in the parse pipeline — negligible in practice
When enabled, a line-offset table is built once (O(n) scan), and each position resolution uses O(log n) binary search.
Performance should be understood in tiers: parseStructural is a lightweight syntax/structure scanner suited for
high-throughput scenarios; parseRichText is a semantic parser that, beyond the state-machine scan, includes handler
execution, token-tree construction, and content normalization — the cost difference reflects capability overhead,
not a scanner implementation deficiency.
Baseline throughput (~48 KB DSL input, single-threaded microbenchmark):
| API | Time / call |
|-------------------|-------------|
| parseRichText | ~360 ms |
| stripRichText | ~358 ms |
| parseStructural | ~7.1 ms |
stripRichText internally calls parseRichText then extractText, so its cost is essentially the same.
parseStructural skips handlers, token construction, and materialization — roughly 50x faster than parseRichText
on the same input.
trackPositions overhead (same input):
| API | Without | With | Overhead |
|-------------------|---------|--------|----------|
| parseRichText | 360 ms | 359 ms | ~0% |
| stripRichText | 358 ms | 360 ms | ~0% |
| parseStructural | 7.1 ms | 7.6 ms | ~7% |
parseRichText / stripRichText have heavier per-token work (handlers, recursion, materialization), so position
tracking is a rounding error. parseStructural is inherently lighter, making the relative cost of producing position
objects and resolving offsets more visible — but still not catastrophic.
Measured on Kunpeng 920 24C / 32 GB (2x16 GB DDR4-2666). Local microbenchmark — magnitude is reliable; exact figures will vary by platform.
Error Handling
Use onError to collect parse errors.
import type {ParseError} from "yume-dsl-rich-text";
const errors: ParseError[] = [];
parseRichText("$$bold(unclosed", {
onError: (error) => errors.push(error),
});
// errors[0]
// {
// code: "INLINE_NOT_CLOSED",
// message: "(L1:C1) Inline tag not closed: >>>$$bold(<<< unclosed",
// line: 1,
// column: 1,
// snippet: " >>>$$bold(<<< unclosed"
// }If onError is omitted, malformed markup degrades gracefully and errors are silently discarded.
Error Codes
ParseError.code is typed as ErrorCode, a union of all possible error codes:
type ErrorCode =
| "DEPTH_LIMIT"
| "UNEXPECTED_CLOSE"
| "INLINE_NOT_CLOSED"
| "BLOCK_NOT_CLOSED"
| "BLOCK_CLOSE_MALFORMED"
| "RAW_NOT_CLOSED"
| "RAW_CLOSE_MALFORMED";| Code | Meaning |
|-------------------------|--------------------------------------------|
| DEPTH_LIMIT | Nesting exceeded depthLimit |
| UNEXPECTED_CLOSE | Stray close tag with no matching open |
| INLINE_NOT_CLOSED | Inline tag was never closed |
| BLOCK_NOT_CLOSED | Block close marker is missing |
| BLOCK_CLOSE_MALFORMED | Block close marker exists but is malformed |
| RAW_NOT_CLOSED | Raw close marker is missing |
| RAW_CLOSE_MALFORMED | Raw close marker exists but is malformed |
Graceful Degradation
The parser never throws on malformed or unrecognized input. Instead, it degrades content to plain text and optionally
reports errors via onError. Below are the concrete degradation scenarios.
Unregistered tags → plain text
Tags not present in handlers are not recognized. Their content is unwrapped as plain text.
const dsl = createParser({
handlers: {
...createSimpleInlineHandlers(["bold"]),
// "italic" is NOT registered
},
});
dsl.parse("Hello $$bold(world)$$ and $$italic(goodbye)$$");[
{type: "text", value: "Hello ", id: "rt-0"},
{type: "bold", value: [{type: "text", value: "world", id: "rt-1"}], id: "rt-2"},
{type: "text", value: " and goodbye", id: "rt-3"},
// ↑ "italic" is unregistered — content becomes plain text
]Unsupported form on a registered tag → fallback text
A handler only needs to implement the forms it supports. If a tag is used in a form the handler doesn't cover, the entire markup degrades to plain text.
const dsl = createParser({
handlers: {
// "note" only supports inline, not raw
note: {inline: (tokens, _ctx) => ({type: "note", value: tokens})},
},
});
dsl.parse("$$note(ok)%\nraw content\n%end$$");// The raw form is not supported → entire tag degrades to fallback text
[
{type: "text", value: "$$note(ok)%\nraw content\n%end$$", id: "rt-0"},
]allowForms restriction → form stripped
When allowForms excludes a form, the parser acts as if handlers don't support it — even if they do.
const dsl = createParser({
handlers: {
bold: {inline: (tokens, _ctx) => ({type: "bold", value: tokens})},
code: {raw: (arg, content, _ctx) => ({type: "code", lang: arg ?? "text", value: content})},
},
allowForms: ["inline"], // ← raw and block disabled
});
dsl.parse("$$bold(hello)$$");
// → [{ type: "bold", ... }] ✓ inline works
dsl.parse("$$code(ts)%\nconst x = 1;\n%end$$");
// → [{ type: "text", value: "$$code(ts)%\nconst x = 1;\n%end$$", ... }]
// ↑ raw form is disabled — entire tag degrades to plain textUnclosed tags → partial text recovery
When a tag is opened but never closed, the parser reports an error and recovers the opening markup as plain text.
const errors: ParseError[] = [];
dsl.parse("Hello $$bold(world", {onError: (e) => errors.push(e)});
// → [{ type: "text", value: "Hello $$bold(world", id: "rt-0" }]
//
// errors[0].code === "INLINE_NOT_CLOSED"Without onError, the same recovery happens silently — no error is thrown.
Vue 3 Rendering
The parser produces a TextToken[] tree — here is a drop-in recursive Vue 3 component that renders it.
1. Set up the parser
// dsl.ts
import {
createParser,
createSimpleInlineHandlers,
parsePipeArgs,
parsePipeTextArgs,
createToken,
materializeTextTokens,
type TagHandler,
type TokenDraft,
} from "yume-dsl-rich-text";
const titledHandler = (type: string, defaultTitle: string): TagHandler => ({
inline: (tokens, ctx): TokenDraft => {
const args = parsePipeArgs(tokens, ctx);
if (args.parts.length <= 1) {
return {type, title: defaultTitle, value: args.materializedTokens(0)};
}
return {type, title: args.text(0), value: args.materializedTailTokens(1)};
},
block: (arg, tokens, _ctx): TokenDraft => ({
type,
title: arg || defaultTitle,
value: tokens,
}),
raw: (arg, content, ctx): TokenDraft => ({
type,
title: arg || defaultTitle,
value: [createToken({type: "text", value: content}, undefined, ctx)],
}),
});
const collapseBase = titledHandler("collapse", "Click to expand");
export const dsl = createParser({
handlers: {
...createSimpleInlineHandlers([
"bold", "thin", "underline", "strike", "code", "center",
]),
link: {
inline: (tokens, ctx): TokenDraft => {
const args = parsePipeArgs(tokens, ctx);
const url = args.text(0);
const display =
args.parts.length > 1
? args.materializedTailTokens(1)
: args.materializedTokens(0);
return {type: "link", url, value: display};
},
},
info: titledHandler("info", "Info"),
warning: titledHandler("warning", "Warning"),
collapse: {block: collapseBase.block, raw: collaps