nx-md-parser

v2.2.1

Published

2 months ago

Extensible Multi-Format AI-Powered Markdown to JSON Transformer with support for any markdown variant through custom parsers, intelligent schema validation, auto-fixing, table parsing, and optional persistent machine learning

nx-md-parser

Extensible Multi-Format AI-Powered Markdown to JSON Transformer - Transform markdown documents into structured JSON with support for multiple markdown formats, intelligent schema validation, auto-fixing, and machine learning capabilities. Built with an extensible parser architecture that can handle any markdown variant.

✨ Features

📝 Extensible Multi-Format Markdown Support

Auto-Detection: Automatically detects and selects appropriate parser for any markdown format
Built-in Formats: Heading (### Section), bullet (- Section), and colon (Key: Value) formats
Custom Parsers: Easy to add support for any markdown variant (YAML frontmatter, numbered sections, etc.)
Parser Registry: Register multiple parsers for different formats
Format Selection: Explicitly specify format or let auto-detection choose
Backward Compatible: All existing code continues to work unchanged

🤖 Advanced AI-Powered Matching

Multi-Algorithm Fuzzy Matching: Jaccard tokens, Jaro-Winkler, Dice coefficient, Levenshtein ratio
Configurable Weights & Thresholds: Fine-tune matching sensitivity for different use cases
Machine Learning: Learn aliases and improve matching accuracy over time
Context-Aware: Different thresholds for key-to-key, title-to-key, and object matching
Persistent Learning (Optional): Save/load ML data with `@xronoces/xronox-ml``
Schema Consistency: Same intelligent matching for objects AND arrays

🔧 Intelligent Auto-Fixing

Typo Correction: Automatically fix property name typos using advanced fuzzy matching
Case Normalization: Handle camelCase, snake_case, Title Case seamlessly
Type Conversion: Smart conversion (string → number, string → boolean, etc.)
Structural Repair: Restructure flat objects into nested schemas
Missing Data: Add missing properties with sensible defaults
Content Intelligence: Parse tables, lists, key-value pairs automatically

🏗️ Modular Architecture

Extensible Parser System: Add support for any markdown format by implementing BaseMarkdownParser
Clean Separation: Parsers, converters, transformers, and utilities in separate modules
Plugin Architecture: Register custom parsers for specialized formats (YAML, XML, custom syntax)
Format Detection: Intelligent auto-selection of appropriate parsers from registered options
Multiple Parsers: Support multiple parsers simultaneously for different document types

📋 Schema Validation

Intuitive Schema DSL: Clean, TypeScript-friendly schema definition
Nested Objects: Unlimited depth object support
Array Handling: Complex array schemas with validation
Table Parsing: Markdown tables (| Header |) → Arrays of objects
Advanced Content: Key-value pairs, nested structures, mixed content types
Validation Status: Clear validated, fixed, or failed status reporting

🔄 Enterprise-Grade nx-helpers Integration

Advanced Merging: Intelligent object merging with deduplication
Role-Based Aggregation: Merge data with specific roles using mergeWithRoles
Schema Loading: Load schemas from JSON files
Dual Transformer Support: Use either nx-md-parser or nx-helpers JSONTransformer

📝 Markdown Parsing Capabilities

nx-md-parser intelligently parses various markdown structures with support for multiple formats:

Heading Format (###)

### User Profile
John Doe

### Settings
Dark mode enabled

{
  "userProfile": "John Doe",
  "settings": "Dark mode enabled"
}

Bullet Format (-)

- User Profile
John Doe

- Settings
Dark mode enabled

{
  "userProfile": "John Doe",
  "settings": "Dark mode enabled"
}

Auto-Detection

Both formats work identically - nx-md-parser automatically detects which format you're using!

Tables → Arrays of Objects

| Name | Age | Active |
|------|-----|--------|
| Alice | 28  | true   |
| Bob   | 34  | false  |

[
  { "name": "Alice", "age": 28, "active": true },
  { "name": "Bob", "age": 34, "active": false }
]

Lists → Arrays

### Features
- Schema validation
- Auto-fixing capabilities
- Machine learning integration

{
  "features": [
    "Schema validation",
    "Auto-fixing capabilities",
    "Machine learning integration"
  ]
}

Key-Value Pairs → Nested Objects

### Database
Host: localhost
Port: 5432
SSL: true

{
  "database": {
    "host": "localhost",
    "port": 5432,
    "ssl": true
  }
}

Colon-Separated Format

Title: My Project
Description: Project description
Tags: TypeScript, React
Active: true

{
  "title": "My Project",
  "description": "Project description",
  "tags": ["TypeScript", "React"],
  "active": true
}

Mixed Content Types

All parsing types can be nested and combined for complex document structures.

🚀 Installation

npm install nx-md-parser

Note: nx-helpers is a peer dependency and will be installed automatically.

🔍 Logging Configuration

The parser uses micro-logs for detailed logging of internal decision-making processes. Set the DEBUG_LEVEL environment variable to control log verbosity:

# Very verbose - shows all internal reasoning and decisions
DEBUG_LEVEL=debug npm test

# Important decisions and results
DEBUG_LEVEL=info npm test

# Warnings and potential issues only
DEBUG_LEVEL=warn npm test

# Errors only
DEBUG_LEVEL=error npm test

Or create a .env file:

DEBUG_LEVEL=debug

The logs provide visibility into:

Format detection reasoning and confidence scores
Parser selection decisions
Section header vs content classification
Content merging logic for bullet formats
Schema transformation steps

📖 Quick Start

import { JSONTransformer, Schema } from 'nx-md-parser';

// Define your schema
const schema = Schema.object({
  title: Schema.string(),
  tags: Schema.array(Schema.string()),
  metadata: Schema.object({
    author: Schema.string(),
    priority: Schema.string(),
  }),
  active: Schema.boolean(),
});

// Create transformer (auto-detects format)
const transformer = new JSONTransformer(schema);

// Works with heading format (###)
const headingResult = transformer.transformMarkdown(`
### Title
My Awesome Project

### Tags
- TypeScript
- React
- Node.js

### Metadata
#### Author
John Doe

#### Priority
High

### Active
true
`);

// Also works with bullet format (-)
const bulletResult = transformer.transformMarkdown(`
- Title
My Awesome Project

- Tags
- TypeScript
- React
- Node.js

- Metadata
Author: John Doe
Priority: High

- Active
true
`);

// And with colon format (Key: Value)
const colonResult = transformer.transformMarkdown(\`
Title: My Awesome Project
Tags: TypeScript, React, Node.js
Metadata: Author - John Doe, Priority - High
Active: true
Version: 1.0.0
\`);

// All formats produce the same structured result!
console.log(headingResult.result);  // Your structured JSON
console.log(bulletResult.result);   // Same structured JSON
console.log(colonResult.result);    // Same structured JSON

Format Selection & Auto-Detection

import { JSONTransformer, Schema, MarkdownFormat, analyzeMarkdownFormat } from 'nx-md-parser';

// Auto-detect format (recommended)
const transformer = new JSONTransformer(schema); // Automatically chooses best parser

// Force specific built-in format
const headingTransformer = new JSONTransformer(schema, {
  parserOptions: { format: MarkdownFormat.HEADING }
});

const bulletTransformer = new JSONTransformer(schema, {
  parserOptions: { format: MarkdownFormat.BULLET }
});

// Analyze what formats your markdown supports
const analysis = analyzeMarkdownFormat(yourMarkdown);
console.log('Primary format:', analysis.primaryFormat);
console.log('Confidence:', analysis.allMatches[0]?.confidence);
console.log('Section ranges:', analysis.allMatches[0]?.sectionRanges);

// Works with any registered format - the system is extensible!

🎯 Advanced Usage

Custom Fuzzy Matching Configuration

import { JSONTransformer, Schema, defaultMatcherConfig } from 'nx-md-parser';

const schema = Schema.object({
  title: Schema.string(),
  description: Schema.string(),
});

const transformer = new JSONTransformer(schema, {
  // Custom matcher configuration
  thresholds: {
    keyToKey: 0.8,        // Higher threshold for key matching
    titleToKey: 0.6,      // Lower threshold for title matching
    generic: 0.5          // Baseline threshold
  },
  weights: {
    jaroWinkler: 0.5,     // 50% weight on character similarity
    jaccardTokens: 0.3,   // 30% weight on token similarity
    dice: 0.2,           // 20% weight on n-gram similarity
  }
});

Machine Learning - Learning Aliases

import { learnAliasesFromTransformations } from 'nx-md-parser';

// Learn from successful transformations
const learningResult = learnAliasesFromTransformations([
  {
    input: { "Projct Name": "Test", "Desc": "Test description" },
    output: { title: "Test", description: "Test description" },
    schema: yourSchema
  },
  // ... more examples
]);

console.log(learningResult.proposedAliases);
// { "Projct Name": ["title"], "Desc": ["description"] }

Schema Loading from Files

import { createTransformerFromSchemaFile } from 'nx-md-parser';

// schema.json
// {
//   "type": "object",
//   "properties": {
//     "title": { "type": "string" },
//     "tags": { "type": "array", "items": { "type": "string" } }
//   }
// }

const transformer = createTransformerFromSchemaFile('./schema.json');

Advanced Merging with Roles

import { mergeWithRoles } from 'nx-md-parser';

const roleBasedData = [
  { role: 'user-profile', value: { name: 'Alice', email: '[email protected]' } },
  { role: 'user-preferences', value: { theme: 'dark', notifications: true } },
  { role: 'account-settings', value: { plan: 'premium', storage: '100GB' } }
];

const merged = mergeWithRoles(roleBasedData);
// {
//   userProfile: { name: 'Alice', email: '[email protected]' },
//   userPreferences: { theme: 'dark', notifications: true },
//   accountSettings: { plan: 'premium', storage: '100GB' }
// }

Custom Parsers & Format Extension

import { BaseMarkdownParser, MarkdownFormat, getFormatDetector } from 'nx-md-parser';

// Example: YAML Frontmatter parser
class YamlFrontmatterParser extends BaseMarkdownParser {
  canParse(markdown: string): boolean {
    return markdown.startsWith('---\n');
  }

  parseSections(markdown: string): MarkdownSection[] {
    // Parse YAML frontmatter + markdown body
    return [];
  }

  getFormatName(): MarkdownFormat {
    return 'yaml-frontmatter' as any;
  }
}

// Example: Numbered sections parser
class NumberedSectionsParser extends BaseMarkdownParser {
  canParse(markdown: string): boolean {
    return /^\d+\.\s/.test(markdown);
  }

  parseSections(markdown: string): MarkdownSection[] {
    // Parse numbered sections like "1. Introduction"
    return [];
  }

  getFormatName(): MarkdownFormat {
    return 'numbered-sections' as any;
  }
}

// Register multiple custom parsers
const detector = getFormatDetector();
detector.registerParser(new YamlFrontmatterParser());
detector.registerParser(new NumberedSectionsParser());

// Now supports: headings, bullets, colon format, YAML frontmatter, numbered sections, etc.

JSON to Markdown Generation

import { jsonToMarkdown } from 'nx-md-parser';

const data = {
  title: "Project Alpha",
  features: ["AI", "ML", "Cloud"],
  metadata: { version: "1.0.0" }
};

console.log(jsonToMarkdown(data));
// # Title
// Project Alpha
//
// # Features
// - AI
// - ML
// - Cloud
//
// # Metadata
// ## Version
// 1.0.0

📚 API Reference

Core Classes

`JSONTransformer`

new JSONTransformer(
  schema: SchemaType,
  options?: {
    matcherConfig?: Partial<MatcherConfig>;
    parserOptions?: ParserOptions;
  }
)

transformMarkdown(markdown: string): TransformResult
transform(input: any): TransformResult

Parser Options:

interface ParserOptions {
  format?: MarkdownFormat;           // AUTO, HEADING, BULLET, MIXED
  sectionKeywords?: string[];        // Keywords for bullet section detection
  fuzzyThreshold?: number;          // Fuzzy matching threshold
}

`LearningTransformer` (Optional)

new LearningTransformer(
  schema: SchemaType,
  matcherConfig?: Partial<MatcherConfig>,
  mlOptions?: {
    storage?: { type: 'file' | 'database', path?: string },
    enableLearning?: boolean
  }
)

transformMarkdown(markdown: string): TransformResult
transform(input: any): TransformResult
transformMarkdownWithLearning(markdown: string): Promise<TransformResult>
transformWithLearning(input: any): Promise<TransformResult>

Requires: npm install @xronoces/xronox-ml

Features:

Persistent machine learning data storage
Continuous improvement from transformation history
Automatic loading of learned configurations
Graceful fallback when ML package unavailable

Parser Classes

BaseMarkdownParser - Abstract base class for creating custom parsers

abstract class BaseMarkdownParser {
  canParse(markdown: string): boolean;
  parseSections(markdown: string): MarkdownSection[];
  getFormatName(): MarkdownFormat;
}

HeadingParser - Parses ### Section format

import { HeadingParser } from 'nx-md-parser';
const parser = new HeadingParser();

BulletParser - Parses - Section bullet format

import { BulletParser } from 'nx-md-parser';
const parser = new BulletParser();

ColonParser - Parses Key: Value colon format

import { ColonParser } from 'nx-md-parser';
const parser = new ColonParser();

FormatDetector - Auto-detects and selects appropriate parsers

import { FormatDetector, getFormatDetector, analyzeMarkdownFormat } from 'nx-md-parser';

const detector = getFormatDetector();
const format = detector.detect(markdown);  // MarkdownFormat
const parser = detector.getParser(format, markdown);

// Advanced format analysis with confidence scores and line ranges
const analysis = analyzeMarkdownFormat(markdown);
console.log(analysis.primaryFormat);     // 'heading' | 'bullet' | 'colon'
console.log(analysis.allMatches[0]);     // { format, confidence, sections, sectionRanges }

Schema Builders

Schema.string(): SchemaType
Schema.number(): SchemaType
Schema.boolean(): SchemaType
Schema.array(items: SchemaType): SchemaType
Schema.object(properties: Record<string, SchemaType>): SchemaType

Utility Functions

Transformation Utilities

mergeTransformResults(...results: TransformResult[]): TransformResult
jsonToMarkdown(data: any, level?: number): string

Format Analysis

analyzeMarkdownFormat(markdown: string): FormatAnalysisResult
// Returns detailed analysis of what formats the markdown supports
// with confidence scores, section counts, and line ranges

Schema Management

loadSchemaFromFile(filePath: string): SchemaType
createTransformerFromSchemaFile(schemaFilePath: string): JSONTransformer
createNxHelpersTransformer(schema: SchemaType, config?: Partial<MatcherConfig>): any

Machine Learning

learnAliasesFromTransformations(transformations: TransformationExample[]): LearningResult

Parser Types & Enums

enum MarkdownFormat {
  AUTO = 'auto',       // Auto-detect format
  HEADING = 'heading',  // ### Section format
  BULLET = 'bullet',    // - Section format
  COLON = 'colon',      // Key: Value format
  MIXED = 'mixed'       // Mixed formats
}

interface MarkdownSection {
  heading: string;
  content: string;
  level: number;
  format: 'heading' | 'bullet' | 'mixed';
}

interface ParserOptions {
  format?: MarkdownFormat;
  sectionKeywords?: string[];
  fuzzyThreshold?: number;
}

nx-helpers Integration

// Merging
mergeNoRedundancy(base: T, override: Partial<T>): T
mergeMultiple(...objects: Partial<T>[]): T
mergeWithRoles(items: MergableItem[]): any

// Matching
bestMatchOneToMany(term: string, candidates: string[], config: MatcherConfig): StringScore | null
defaultMatcherConfig(): MatcherConfig

// Schema Building (nx-helpers)
nxString: SchemaNode
nxNumber: SchemaNode
nxBoolean: SchemaNode
nxArray(items: SchemaNode): SchemaNode
nxObject(properties: Record<string, SchemaNode>): SchemaNode

🔬 Examples

Run the comprehensive examples:

# Basic usage
npm run example

# Advanced features (merging, ML, etc.)
npm run integration-example

🧪 Testing

npm test              # Run test suite
npm run test:watch    # Watch mode

📊 Performance & Accuracy

Matching Algorithms (nx-helpers v1.5.0)

Jaro-Winkler: Character-level similarity (40% weight)
Jaccard Tokens: Token-based similarity (30% weight)
Dice Coefficient: N-gram similarity (20% weight)
Levenshtein Ratio: Edit distance (10% weight)

Real-World Results

Typo Correction: 85%+ accuracy on common typos
Case Handling: 100% accuracy on case variations
Context Awareness: Different thresholds for different match types
Machine Learning: Continuous improvement with usage data

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

ISC License - see LICENSE file for details.

🙏 Acknowledgments

Built on nx-helpers for advanced AI capabilities
Inspired by the need for intelligent markdown processing in enterprise workflows
Thanks to the nx-intelligence team for the powerful fuzzy matching algorithms

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions

📋 What's New in v2.1

Format Analysis API: New analyzeMarkdownFormat() function provides detailed format detection with confidence scores and line ranges
Multi-Format Detection: Detects all supported formats in a document with ranking by confidence
Section Range Analysis: Get exact line numbers for each section in your markdown
Mixed Content Detection: Identifies documents that contain multiple format types

📋 What's New in v2.0

Extensible Multi-Format Architecture: Support for any markdown variant through custom parsers, not just headings and bullets
Intelligent Parser System: Auto-detection and selection from multiple registered parsers
Plugin Architecture: Easy to extend with custom parsers for YAML frontmatter, numbered sections, XML, or any format
Modular Design: Clean separation of parsing, conversion, and transformation logic
Backward Compatibility: All existing code continues to work unchanged
Enhanced TypeScript: Better type safety with extensible parser interfaces

Made with ❤️ by the nx-intelligence team

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

nx-md-parser

✨ Features

📝 Extensible Multi-Format Markdown Support

🤖 Advanced AI-Powered Matching

🔧 Intelligent Auto-Fixing

🏗️ Modular Architecture

📋 Schema Validation

🔄 Enterprise-Grade nx-helpers Integration

📝 Markdown Parsing Capabilities

Heading Format (###)

Bullet Format (-)

Auto-Detection

Tables → Arrays of Objects

Lists → Arrays

Key-Value Pairs → Nested Objects

Colon-Separated Format

Mixed Content Types

🚀 Installation

🔍 Logging Configuration

📖 Quick Start

Format Selection & Auto-Detection

🎯 Advanced Usage

Custom Fuzzy Matching Configuration

Machine Learning - Learning Aliases

Schema Loading from Files

Advanced Merging with Roles

Custom Parsers & Format Extension

JSON to Markdown Generation

📚 API Reference

Core Classes

JSONTransformer

LearningTransformer (Optional)

Parser Classes

Schema Builders

Utility Functions

Transformation Utilities

Format Analysis

Schema Management

Machine Learning

Parser Types & Enums

nx-helpers Integration

🔬 Examples

🧪 Testing

📊 Performance & Accuracy

Matching Algorithms (nx-helpers v1.5.0)

Real-World Results

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

📋 What's New in v2.1

📋 What's New in v2.0

`JSONTransformer`

`LearningTransformer` (Optional)