@bernierllc/markdown-parser

v1.2.0

Published

2 months ago

High-performance markdown parser with CommonMark, GFM, and plugin support

0High
0Medium
0Low

alikhan410

mkbernier

markdown parser ast commonmark gfm github-flavored-markdown

@bernierllc/markdown-parser

High-performance markdown parser with CommonMark, GitHub-Flavored Markdown (GFM), and plugin support. Converts markdown text to Abstract Syntax Tree (AST) format for further processing.

Installation

npm install @bernierllc/markdown-parser

Features

CommonMark Support - Full CommonMark v0.31.2 specification compliance
GitHub-Flavored Markdown - Task lists, tables, strikethrough, autolinks
Plugin System - Extend parser with custom transformations
Caching - Optional in-memory caching with TTL and size limits
Performance - Fast parsing with position tracking
TypeScript - Strict typing with comprehensive type definitions
Zero Dependencies - Only uses @bernierllc packages

Usage

Basic Parsing

import { MarkdownParser } from '@bernierllc/markdown-parser';

const parser = new MarkdownParser();

const result = await parser.parse('# Hello World\n\nThis is **bold** text.');

if (result.success) {
  console.log('Parsed AST:', result.ast);
  console.log('Node count:', result.metadata.nodeCount);
  console.log('Parse time:', result.metadata.parseTime, 'ms');
}

With Caching

const parser = new MarkdownParser({
  cache: {
    enabled: true,
    maxSize: 1000,  // Maximum 1000 cached documents
    ttl: 3600,      // Cache for 1 hour
  },
});

// First parse - not cached
const result1 = await parser.parse('# Document');
console.log('Cached:', result1.metadata.cached); // false

// Second parse - cached
const result2 = await parser.parse('# Document');
console.log('Cached:', result2.metadata.cached); // true
console.log('Faster:', result2.metadata.parseTime < result1.metadata.parseTime); // true

With Custom Plugins

import { MarkdownParser, MarkdownPlugin, transformNodes } from '@bernierllc/markdown-parser';

// Create a plugin that converts all text to uppercase
const uppercasePlugin: MarkdownPlugin = {
  name: 'uppercase',
  version: '1.0.0',
  transform: (ast) => {
    return transformNodes(ast, (node) => {
      if (node.type === 'text' && node.value) {
        return { ...node, value: node.value.toUpperCase() };
      }
      return node;
    });
  },
};

const parser = new MarkdownParser({
  plugins: [uppercasePlugin],
});

const result = await parser.parse('# hello world');
// Result: heading with text "HELLO WORLD"

GFM Features

const parser = new MarkdownParser({ gfm: true });

// Task lists
const tasks = await parser.parse(`
- [ ] Unchecked task
- [x] Checked task
`);

// Tables
const table = await parser.parse(`
| Header 1 | Header 2 |
| -------- | -------- |
| Cell 1   | Cell 2   |
`);

// Strikethrough
const strikethrough = await parser.parse('This is ~~deleted~~ text');

// Autolinks
const autolink = await parser.parse('Visit https://example.com for more');

With Logger

import { Logger } from '@bernierllc/logger';

const logger = new Logger({ level: 'debug' });

const parser = new MarkdownParser({ logger });

await parser.parse('# Test');
// Logs: "Markdown parsed successfully" with metadata

API Reference

`MarkdownParser`

Main parser class for converting markdown to AST.

Constructor

new MarkdownParser(config?: MarkdownParserConfig)

Config Options:

gfm?: boolean - Enable GitHub-Flavored Markdown (default: true)
commonmark?: boolean - Enable CommonMark strict mode (default: true)
plugins?: MarkdownPlugin[] - Custom plugins to apply (default: [])
cache?: CacheConfig - Caching configuration
- enabled: boolean - Enable caching
- maxSize?: number - Maximum cache entries (default: 1000)
- ttl?: number - Time to live in seconds (default: 3600)
logger?: Logger - Logger instance for debug output

Methods

`parse(markdown: string): Promise<ParseResult>`

Parse markdown string to AST.

Returns:

{
  success: boolean;
  ast?: ASTNode;
  error?: string;
  metadata?: {
    parseTime: number;    // Milliseconds
    nodeCount: number;    // Total nodes in AST
    cached: boolean;      // Whether result was cached
  };
}

`clearCache(): void`

Clear all cached parse results.

`getCacheStats(): { size: number; enabled: boolean }`

Get current cache statistics.

`pruneCache(): number`

Remove expired cache entries. Returns number of entries removed.

AST Node Types

The parser generates nodes with the following structure:

interface ASTNode {
  type: NodeType;
  value?: string;
  children?: ASTNode[];
  attributes?: Record<string, unknown>;
  position?: Position;
}

Supported Node Types:

Block: root, heading, paragraph, blockquote, list, listItem, codeBlock, table, tableRow, tableCell, horizontalRule
Inline: text, emphasis, strong, code, link, image, lineBreak
GFM: strikethrough, taskList, taskListItem, autolink

Plugin Interface

interface MarkdownPlugin {
  name: string;
  version: string;
  transform: (ast: ASTNode) => ASTNode;
}

Utility Functions

`countNodes(node: ASTNode): number`

Count total nodes in AST tree.

`visitNodes(node: ASTNode, visitor: (node: ASTNode) => void): void`

Visit each node in AST tree with callback.

`transformNodes(node: ASTNode, transformer: (node: ASTNode) => ASTNode): ASTNode`

Transform AST nodes recursively.

`findNodes(node: ASTNode, predicate: (node: ASTNode) => boolean): ASTNode[]`

Find all nodes matching predicate.

`cloneNode(node: ASTNode): ASTNode`

Create deep copy of AST node.

Examples

Example 1: Parsing Headers

const parser = new MarkdownParser();
const result = await parser.parse(`
# Heading 1
## Heading 2
### Heading 3
`);

// Find all headings
const headings = findNodes(result.ast!, (node) => node.type === 'heading');
console.log('Found', headings.length, 'headings');

Example 2: Extract All Links

const parser = new MarkdownParser();
const result = await parser.parse('Check [Google](https://google.com) and [GitHub](https://github.com)');

const links = findNodes(result.ast!, (node) => node.type === 'link');
links.forEach((link) => {
  console.log('Link:', link.attributes.href);
});

Example 3: Count Code Blocks

const parser = new MarkdownParser();
const result = await parser.parse(`
\`\`\`javascript
const x = 1;
\`\`\`

\`\`\`python
print("hello")
\`\`\`
`);

const codeBlocks = findNodes(result.ast!, (node) => node.type === 'codeBlock');
console.log('Found', codeBlocks.length, 'code blocks');

Example 4: Performance Monitoring

const parser = new MarkdownParser({
  cache: { enabled: true },
});

const markdown = '# Heading\n\n' + 'Paragraph text. '.repeat(1000);

// First parse
const result1 = await parser.parse(markdown);
console.log('First parse:', result1.metadata.parseTime, 'ms');

// Cached parse
const result2 = await parser.parse(markdown);
console.log('Cached parse:', result2.metadata.parseTime, 'ms');
console.log('Speedup:', result1.metadata.parseTime / result2.metadata.parseTime, 'x');

Integration Status

Logger Integration

Status: Integrated

Justification: This package uses @bernierllc/logger for debug output during markdown parsing operations. Logging includes parse timing, cache hits/misses, and error details to help with debugging and performance monitoring.

Pattern: Direct integration - logger is a required dependency for this package.

NeverHub Integration

Status: Not applicable

Justification: This is a core utility package that performs markdown parsing. It does not participate in service discovery, event publishing, or service mesh operations. Markdown parsing is a stateless utility operation that doesn't require service registration or discovery.

Pattern: Core utility - no service mesh integration needed.

Docs-Suite Integration

Status: Ready

Format: TypeDoc-compatible JSDoc comments are included throughout the source code. All public APIs are documented with examples and type information.

Performance

Large documents: Parses 1000 headings in <1 second
Caching: 10-100x speedup for repeated parsing
Memory efficient: Configurable cache size limits
Non-blocking: Async API for integration with event loops

Related Packages

@bernierllc/markdown-renderer - Renders AST to HTML
@bernierllc/logger - Structured logging
@bernierllc/crypto-utils - Hash generation for cache keys

License

This file is licensed to the client under a limited-use license. The client may use and modify this code only within the scope of the project it was delivered for. Redistribution or use in other products or commercial offerings is not permitted without written consent from Bernier LLC.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@bernierllc/markdown-parser

Installation

Features

Usage

Basic Parsing

With Caching

With Custom Plugins

GFM Features

With Logger

API Reference

MarkdownParser

Constructor

Methods

parse(markdown: string): Promise<ParseResult>

clearCache(): void

getCacheStats(): { size: number; enabled: boolean }

pruneCache(): number

AST Node Types

Plugin Interface

Utility Functions

countNodes(node: ASTNode): number

visitNodes(node: ASTNode, visitor: (node: ASTNode) => void): void

transformNodes(node: ASTNode, transformer: (node: ASTNode) => ASTNode): ASTNode

findNodes(node: ASTNode, predicate: (node: ASTNode) => boolean): ASTNode[]

cloneNode(node: ASTNode): ASTNode

Examples

Example 1: Parsing Headers

Example 2: Extract All Links

Example 3: Count Code Blocks

Example 4: Performance Monitoring

Integration Status

Logger Integration

NeverHub Integration

Docs-Suite Integration

Performance

Related Packages

License

See Also

`MarkdownParser`

`parse(markdown: string): Promise<ParseResult>`

`clearCache(): void`

`getCacheStats(): { size: number; enabled: boolean }`

`pruneCache(): number`

`countNodes(node: ASTNode): number`

`visitNodes(node: ASTNode, visitor: (node: ASTNode) => void): void`

`transformNodes(node: ASTNode, transformer: (node: ASTNode) => ASTNode): ASTNode`

`findNodes(node: ASTNode, predicate: (node: ASTNode) => boolean): ASTNode[]`

`cloneNode(node: ASTNode): ASTNode`