@aiconnect/process-tags

v0.3.0

Published

4 months ago

Core tag parsing library for Process Tags - recognizes and extracts custom tags from text content

0High
0Medium
0Low

ericsantos

parser tags template process-tags text-processing

Process Tags Core

A robust parsing engine for identifying and extracting custom Process Tags from text content. Supports 7 distinct tag syntaxes with module names, input values, and default values.

Features

✅ 7 Tag Syntaxes: Inline and block variants with different parameter combinations
✅ Regex-Based Parsing: Fast, efficient pattern matching with proper precedence
✅ Escape Sequences: Full support for \" and \\ in string values
✅ TypeScript-First: Complete type definitions and IDE support
✅ Zero Dependencies: No runtime dependencies, minimal footprint
✅ Dual Module Formats: Both ESM and CommonJS support
✅ Lenient Parsing: Invalid tags are silently ignored, not errors
✅ Position Tracking: Source mapping for each tag
✅ Parse Options: Content normalization and runtime limits
✅ Range-Based Parsing: Parse specific content regions with absolute positions
✅ Utility Functions: Tag walking, filtering, and overlap detection

Installation

npm install @aiconnect/process-tags

Package published at: https://www.npmjs.com/package/@aiconnect/process-tags

Quick Start

ESM (Recommended)

import { parse } from '@aiconnect/process-tags';

const content = 'Hello [|username|"World"|] from [|app|]!';
const result = parse(content);

console.log(result.tags);
// [
//   { type: 'inline', module: 'username', input: 'World', ... },
//   { type: 'inline', module: 'app', ... }
// ]

CommonJS

const { parse } = require('@aiconnect/process-tags');

const content = 'Hello [|username|"World"|] from [|app|]!';
const result = parse(content);

console.log(result.tags);
// [
//   { type: 'inline', module: 'username', input: 'World', ... },
//   { type: 'inline', module: 'app', ... }
// ]

Tag Syntaxes

Inline Tags

Basic: [|module|]
With Input: [|module|"value"|]
With Default: [|module||"default"|]
Full: [|module|"input"|"default"|]
Compact: [|module§"value"|] (value serves as both input and default)

Block Tags

Simple: [|module]content[/module|]
Compact: [|module§]content[/module§|] (content serves as both input and default)

Nested Tag Detection

Inline tags can appear within block tag content, and both will be detected as independent tags:

const content = `[|description]
  This is a description with [|author|"John Doe"|] inline.
[/description|]`;

const result = parse(content);
// Returns 2 tags:
// 1. Block tag with full content (including the raw inline tag text)
// 2. Inline tag with its parsed values

console.log(result.tags);
// [
//   {
//     type: 'block',
//     module: 'description',
//     input: 'This is a description with [|author|"John Doe"|] inline.',
//     position: { start: 0, end: ... }
//   },
//   {
//     type: 'inline',
//     module: 'author',
//     input: 'John Doe',
//     position: { start: ..., end: ... }
//   }
// ]

Note: The parser uses a two-pass strategy to detect both block and inline tags independently. Block tags within block tags (hierarchical nesting) are not supported.

API

`parse(content: string): ParseResult`

Main parsing function. Returns all tags found in the content.

const result = parse('[|title|"My Page"|]');
// { tags: [...], original: "[|title|\"My Page\"|]" }

`findTags(content: string): ProcessTag[]`

Convenience function that returns just the tags array.

const tags = findTags('[|title|"My Page"|]');
// [{ type: 'inline', module: 'title', input: 'My Page', ... }]

`isValidTag(tagString: string): boolean`

Validates if a string represents a valid Process Tag.

isValidTag('[|valid|]'); // true
isValidTag('[|invalid tag|]'); // false

`extractModule(tagString: string): string | null`

Extracts the module name from a tag string.

extractModule('[|myModule|]'); // "myModule"
extractModule('invalid'); // null

Advanced Features

Parse Options

Configure parsing behavior with options for normalization and limits:

import { parse, ParseOptions } from '@aiconnect/process-tags';

const options: ParseOptions = {
  normalizeBlockContent: true,     // Normalize block tag content
  maxTags: 100,                     // Stop after 100 tags
  maxBlockContentBytes: 10000,     // Skip blocks > 10KB
  maxBlockContentLines: 50         // Skip blocks > 50 lines
};

const result = parse(content, options);

Content Normalization

When normalizeBlockContent: true, block tags get an additional normalizedInput field with:

Leading/trailing empty lines removed
Common indentation stripped

const content = `[|code]
    function hello() {
        console.log("Hi");
    }
[/code|]`;

const result = parse(content, { normalizeBlockContent: true });

console.log(result.tags[0].normalizedInput);
// "function hello() {\n    console.log(\"Hi\");\n}"
// (common indentation removed)

console.log(result.tags[0].input);
// Original content preserved (trimmed)

Parse-Time Limits

Protect against resource exhaustion in untrusted content:

// Limit total tags
parse(content, { maxTags: 10 });  // Parse up to 10 tags only

// Skip large blocks
parse(content, {
  maxBlockContentBytes: 1000,     // Skip blocks > 1KB
  maxBlockContentLines: 20        // Skip blocks > 20 lines
});

Note: Limits are applied to normalized content when normalizeBlockContent: true.

Range-Based Parsing

Parse only a specific portion of content with absolute position tracking:

import { parseInRange } from '@aiconnect/process-tags';

const content = "prefix [|tag1|] middle [|tag2|] suffix";
const range = { start: 7, end: 35 };  // Parse middle section only

const result = parseInRange(content, range);

// Returns tags with absolute positions (relative to original content)
console.log(result.tags[0].position);
// { start: 7, end: 15 } (absolute positions)

Use Cases:

Re-parse inline tags within block content
Process specific sections of large documents
10-100x faster than re-parsing entire document

Utility Functions

`walkTags(tags, callbacks)`

Iterate over tags with type-specific callbacks:

import { walkTags } from '@aiconnect/process-tags';

walkTags(result.tags, {
  onBlock: (tag, index) => {
    console.log(`Block: ${tag.module}`);
    return true; // continue iteration
  },
  onInline: (tag, index) => {
    console.log(`Inline: ${tag.module}`);
    if (tag.module === 'stop') return false; // stop iteration
  }
});

`filterNested(tags, parentRange)`

Filter tags contained within a parent range:

import { filterNested } from '@aiconnect/process-tags';

const blockTag = result.tags[0]; // A block tag
const nestedTags = filterNested(result.tags, blockTag.position);

// Returns only tags fully within the block's range

`hasOverlap(range1, range2)`

Check if two ranges overlap:

import { hasOverlap } from '@aiconnect/process-tags';

hasOverlap(
  { start: 0, end: 10 },
  { start: 5, end: 15 }
); // true (overlapping)

hasOverlap(
  { start: 0, end: 10 },
  { start: 10, end: 20 }
); // false (adjacent, not overlapping)

`normalizeContent(content)`

Normalize content (remove empty lines and common indentation):

import { normalizeContent } from '@aiconnect/process-tags';

const input = "\n    Line 1\n    Line 2\n";
const normalized = normalizeContent(input);
// "Line 1\nLine 2"

TypeScript Types

interface ProcessTag {
  type: 'inline' | 'block';
  module: string;
  input?: string;
  default?: string;
  raw: string;
  position: { start: number; end: number };
  normalizedInput?: string;  // Present when normalizeBlockContent is enabled
}

interface ParseResult {
  tags: ProcessTag[];
  original: string;
}

interface ParseOptions {
  normalizeBlockContent?: boolean;
  maxTags?: number;
  maxBlockContentBytes?: number;
  maxBlockContentLines?: number;
}

interface Range {
  start: number;  // Inclusive
  end: number;    // Exclusive
}

interface WalkCallbacks {
  onBlock?: (tag: ProcessTag, index: number) => void | boolean;
  onInline?: (tag: ProcessTag, index: number) => void | boolean;
}

Module Name Rules

Module names must match /^[a-zA-Z0-9_-]+$/:

✅ Allowed: letters, digits, underscore, hyphen
❌ Not allowed: dots, spaces, special characters

Escape Sequences

Strings support two escape sequences:

\" → " (literal quote)
\\ → \ (literal backslash)

Example:

const result = parse('[|text|"He said \\"Hello\\""|]');
// result.tags[0].input === 'He said "Hello"'

Development

# Install dependencies
npm install

# Run tests
npm test

# Build
npm run build

# Lint
npm run lint

Performance

< 5ms for typical documents (< 10KB)
Handles 1000+ tags efficiently
No blocking operations

License

MIT