nx-md-parser
v2.2.1
Published
Extensible Multi-Format AI-Powered Markdown to JSON Transformer with support for any markdown variant through custom parsers, intelligent schema validation, auto-fixing, table parsing, and optional persistent machine learning
Downloads
1,292
Maintainers
Readme
nx-md-parser
Extensible Multi-Format AI-Powered Markdown to JSON Transformer - Transform markdown documents into structured JSON with support for multiple markdown formats, intelligent schema validation, auto-fixing, and machine learning capabilities. Built with an extensible parser architecture that can handle any markdown variant.
✨ Features
📝 Extensible Multi-Format Markdown Support
- Auto-Detection: Automatically detects and selects appropriate parser for any markdown format
- Built-in Formats: Heading (
### Section), bullet (- Section), and colon (Key: Value) formats - Custom Parsers: Easy to add support for any markdown variant (YAML frontmatter, numbered sections, etc.)
- Parser Registry: Register multiple parsers for different formats
- Format Selection: Explicitly specify format or let auto-detection choose
- Backward Compatible: All existing code continues to work unchanged
🤖 Advanced AI-Powered Matching
- Multi-Algorithm Fuzzy Matching: Jaccard tokens, Jaro-Winkler, Dice coefficient, Levenshtein ratio
- Configurable Weights & Thresholds: Fine-tune matching sensitivity for different use cases
- Machine Learning: Learn aliases and improve matching accuracy over time
- Context-Aware: Different thresholds for key-to-key, title-to-key, and object matching
- Persistent Learning (Optional): Save/load ML data with `@xronoces/xronox-ml``
- Schema Consistency: Same intelligent matching for objects AND arrays
🔧 Intelligent Auto-Fixing
- Typo Correction: Automatically fix property name typos using advanced fuzzy matching
- Case Normalization: Handle camelCase, snake_case, Title Case seamlessly
- Type Conversion: Smart conversion (string → number, string → boolean, etc.)
- Structural Repair: Restructure flat objects into nested schemas
- Missing Data: Add missing properties with sensible defaults
- Content Intelligence: Parse tables, lists, key-value pairs automatically
🏗️ Modular Architecture
- Extensible Parser System: Add support for any markdown format by implementing BaseMarkdownParser
- Clean Separation: Parsers, converters, transformers, and utilities in separate modules
- Plugin Architecture: Register custom parsers for specialized formats (YAML, XML, custom syntax)
- Format Detection: Intelligent auto-selection of appropriate parsers from registered options
- Multiple Parsers: Support multiple parsers simultaneously for different document types
📋 Schema Validation
- Intuitive Schema DSL: Clean, TypeScript-friendly schema definition
- Nested Objects: Unlimited depth object support
- Array Handling: Complex array schemas with validation
- Table Parsing: Markdown tables (
| Header |) → Arrays of objects - Advanced Content: Key-value pairs, nested structures, mixed content types
- Validation Status: Clear
validated,fixed, orfailedstatus reporting
🔄 Enterprise-Grade nx-helpers Integration
- Advanced Merging: Intelligent object merging with deduplication
- Role-Based Aggregation: Merge data with specific roles using
mergeWithRoles - Schema Loading: Load schemas from JSON files
- Dual Transformer Support: Use either nx-md-parser or nx-helpers JSONTransformer
📝 Markdown Parsing Capabilities
nx-md-parser intelligently parses various markdown structures with support for multiple formats:
Heading Format (###)
### User Profile
John Doe
### Settings
Dark mode enabled{
"userProfile": "John Doe",
"settings": "Dark mode enabled"
}Bullet Format (-)
- User Profile
John Doe
- Settings
Dark mode enabled{
"userProfile": "John Doe",
"settings": "Dark mode enabled"
}Auto-Detection
Both formats work identically - nx-md-parser automatically detects which format you're using!
Tables → Arrays of Objects
| Name | Age | Active |
|------|-----|--------|
| Alice | 28 | true |
| Bob | 34 | false |[
{ "name": "Alice", "age": 28, "active": true },
{ "name": "Bob", "age": 34, "active": false }
]Lists → Arrays
### Features
- Schema validation
- Auto-fixing capabilities
- Machine learning integration{
"features": [
"Schema validation",
"Auto-fixing capabilities",
"Machine learning integration"
]
}Key-Value Pairs → Nested Objects
### Database
Host: localhost
Port: 5432
SSL: true{
"database": {
"host": "localhost",
"port": 5432,
"ssl": true
}
}Colon-Separated Format
Title: My Project
Description: Project description
Tags: TypeScript, React
Active: true{
"title": "My Project",
"description": "Project description",
"tags": ["TypeScript", "React"],
"active": true
}Mixed Content Types
All parsing types can be nested and combined for complex document structures.
🚀 Installation
npm install nx-md-parserNote: nx-helpers is a peer dependency and will be installed automatically.
🔍 Logging Configuration
The parser uses micro-logs for detailed logging of internal decision-making processes. Set the DEBUG_LEVEL environment variable to control log verbosity:
# Very verbose - shows all internal reasoning and decisions
DEBUG_LEVEL=debug npm test
# Important decisions and results
DEBUG_LEVEL=info npm test
# Warnings and potential issues only
DEBUG_LEVEL=warn npm test
# Errors only
DEBUG_LEVEL=error npm testOr create a .env file:
DEBUG_LEVEL=debugThe logs provide visibility into:
- Format detection reasoning and confidence scores
- Parser selection decisions
- Section header vs content classification
- Content merging logic for bullet formats
- Schema transformation steps
📖 Quick Start
import { JSONTransformer, Schema } from 'nx-md-parser';
// Define your schema
const schema = Schema.object({
title: Schema.string(),
tags: Schema.array(Schema.string()),
metadata: Schema.object({
author: Schema.string(),
priority: Schema.string(),
}),
active: Schema.boolean(),
});
// Create transformer (auto-detects format)
const transformer = new JSONTransformer(schema);
// Works with heading format (###)
const headingResult = transformer.transformMarkdown(`
### Title
My Awesome Project
### Tags
- TypeScript
- React
- Node.js
### Metadata
#### Author
John Doe
#### Priority
High
### Active
true
`);
// Also works with bullet format (-)
const bulletResult = transformer.transformMarkdown(`
- Title
My Awesome Project
- Tags
- TypeScript
- React
- Node.js
- Metadata
Author: John Doe
Priority: High
- Active
true
`);
// And with colon format (Key: Value)
const colonResult = transformer.transformMarkdown(\`
Title: My Awesome Project
Tags: TypeScript, React, Node.js
Metadata: Author - John Doe, Priority - High
Active: true
Version: 1.0.0
\`);
// All formats produce the same structured result!
console.log(headingResult.result); // Your structured JSON
console.log(bulletResult.result); // Same structured JSON
console.log(colonResult.result); // Same structured JSONFormat Selection & Auto-Detection
import { JSONTransformer, Schema, MarkdownFormat, analyzeMarkdownFormat } from 'nx-md-parser';
// Auto-detect format (recommended)
const transformer = new JSONTransformer(schema); // Automatically chooses best parser
// Force specific built-in format
const headingTransformer = new JSONTransformer(schema, {
parserOptions: { format: MarkdownFormat.HEADING }
});
const bulletTransformer = new JSONTransformer(schema, {
parserOptions: { format: MarkdownFormat.BULLET }
});
// Analyze what formats your markdown supports
const analysis = analyzeMarkdownFormat(yourMarkdown);
console.log('Primary format:', analysis.primaryFormat);
console.log('Confidence:', analysis.allMatches[0]?.confidence);
console.log('Section ranges:', analysis.allMatches[0]?.sectionRanges);
// Works with any registered format - the system is extensible!🎯 Advanced Usage
Custom Fuzzy Matching Configuration
import { JSONTransformer, Schema, defaultMatcherConfig } from 'nx-md-parser';
const schema = Schema.object({
title: Schema.string(),
description: Schema.string(),
});
const transformer = new JSONTransformer(schema, {
// Custom matcher configuration
thresholds: {
keyToKey: 0.8, // Higher threshold for key matching
titleToKey: 0.6, // Lower threshold for title matching
generic: 0.5 // Baseline threshold
},
weights: {
jaroWinkler: 0.5, // 50% weight on character similarity
jaccardTokens: 0.3, // 30% weight on token similarity
dice: 0.2, // 20% weight on n-gram similarity
}
});Machine Learning - Learning Aliases
import { learnAliasesFromTransformations } from 'nx-md-parser';
// Learn from successful transformations
const learningResult = learnAliasesFromTransformations([
{
input: { "Projct Name": "Test", "Desc": "Test description" },
output: { title: "Test", description: "Test description" },
schema: yourSchema
},
// ... more examples
]);
console.log(learningResult.proposedAliases);
// { "Projct Name": ["title"], "Desc": ["description"] }Schema Loading from Files
import { createTransformerFromSchemaFile } from 'nx-md-parser';
// schema.json
// {
// "type": "object",
// "properties": {
// "title": { "type": "string" },
// "tags": { "type": "array", "items": { "type": "string" } }
// }
// }
const transformer = createTransformerFromSchemaFile('./schema.json');Advanced Merging with Roles
import { mergeWithRoles } from 'nx-md-parser';
const roleBasedData = [
{ role: 'user-profile', value: { name: 'Alice', email: '[email protected]' } },
{ role: 'user-preferences', value: { theme: 'dark', notifications: true } },
{ role: 'account-settings', value: { plan: 'premium', storage: '100GB' } }
];
const merged = mergeWithRoles(roleBasedData);
// {
// userProfile: { name: 'Alice', email: '[email protected]' },
// userPreferences: { theme: 'dark', notifications: true },
// accountSettings: { plan: 'premium', storage: '100GB' }
// }Custom Parsers & Format Extension
import { BaseMarkdownParser, MarkdownFormat, getFormatDetector } from 'nx-md-parser';
// Example: YAML Frontmatter parser
class YamlFrontmatterParser extends BaseMarkdownParser {
canParse(markdown: string): boolean {
return markdown.startsWith('---\n');
}
parseSections(markdown: string): MarkdownSection[] {
// Parse YAML frontmatter + markdown body
return [];
}
getFormatName(): MarkdownFormat {
return 'yaml-frontmatter' as any;
}
}
// Example: Numbered sections parser
class NumberedSectionsParser extends BaseMarkdownParser {
canParse(markdown: string): boolean {
return /^\d+\.\s/.test(markdown);
}
parseSections(markdown: string): MarkdownSection[] {
// Parse numbered sections like "1. Introduction"
return [];
}
getFormatName(): MarkdownFormat {
return 'numbered-sections' as any;
}
}
// Register multiple custom parsers
const detector = getFormatDetector();
detector.registerParser(new YamlFrontmatterParser());
detector.registerParser(new NumberedSectionsParser());
// Now supports: headings, bullets, colon format, YAML frontmatter, numbered sections, etc.JSON to Markdown Generation
import { jsonToMarkdown } from 'nx-md-parser';
const data = {
title: "Project Alpha",
features: ["AI", "ML", "Cloud"],
metadata: { version: "1.0.0" }
};
console.log(jsonToMarkdown(data));
// # Title
// Project Alpha
//
// # Features
// - AI
// - ML
// - Cloud
//
// # Metadata
// ## Version
// 1.0.0📚 API Reference
Core Classes
JSONTransformer
new JSONTransformer(
schema: SchemaType,
options?: {
matcherConfig?: Partial<MatcherConfig>;
parserOptions?: ParserOptions;
}
)
transformMarkdown(markdown: string): TransformResult
transform(input: any): TransformResultParser Options:
interface ParserOptions {
format?: MarkdownFormat; // AUTO, HEADING, BULLET, MIXED
sectionKeywords?: string[]; // Keywords for bullet section detection
fuzzyThreshold?: number; // Fuzzy matching threshold
}LearningTransformer (Optional)
new LearningTransformer(
schema: SchemaType,
matcherConfig?: Partial<MatcherConfig>,
mlOptions?: {
storage?: { type: 'file' | 'database', path?: string },
enableLearning?: boolean
}
)
transformMarkdown(markdown: string): TransformResult
transform(input: any): TransformResult
transformMarkdownWithLearning(markdown: string): Promise<TransformResult>
transformWithLearning(input: any): Promise<TransformResult>Requires: npm install @xronoces/xronox-ml
Features:
- Persistent machine learning data storage
- Continuous improvement from transformation history
- Automatic loading of learned configurations
- Graceful fallback when ML package unavailable
Parser Classes
BaseMarkdownParser - Abstract base class for creating custom parsers
abstract class BaseMarkdownParser {
canParse(markdown: string): boolean;
parseSections(markdown: string): MarkdownSection[];
getFormatName(): MarkdownFormat;
}HeadingParser - Parses ### Section format
import { HeadingParser } from 'nx-md-parser';
const parser = new HeadingParser();BulletParser - Parses - Section bullet format
import { BulletParser } from 'nx-md-parser';
const parser = new BulletParser();ColonParser - Parses Key: Value colon format
import { ColonParser } from 'nx-md-parser';
const parser = new ColonParser();FormatDetector - Auto-detects and selects appropriate parsers
import { FormatDetector, getFormatDetector, analyzeMarkdownFormat } from 'nx-md-parser';
const detector = getFormatDetector();
const format = detector.detect(markdown); // MarkdownFormat
const parser = detector.getParser(format, markdown);
// Advanced format analysis with confidence scores and line ranges
const analysis = analyzeMarkdownFormat(markdown);
console.log(analysis.primaryFormat); // 'heading' | 'bullet' | 'colon'
console.log(analysis.allMatches[0]); // { format, confidence, sections, sectionRanges }Schema Builders
Schema.string(): SchemaType
Schema.number(): SchemaType
Schema.boolean(): SchemaType
Schema.array(items: SchemaType): SchemaType
Schema.object(properties: Record<string, SchemaType>): SchemaTypeUtility Functions
Transformation Utilities
mergeTransformResults(...results: TransformResult[]): TransformResult
jsonToMarkdown(data: any, level?: number): stringFormat Analysis
analyzeMarkdownFormat(markdown: string): FormatAnalysisResult
// Returns detailed analysis of what formats the markdown supports
// with confidence scores, section counts, and line rangesSchema Management
loadSchemaFromFile(filePath: string): SchemaType
createTransformerFromSchemaFile(schemaFilePath: string): JSONTransformer
createNxHelpersTransformer(schema: SchemaType, config?: Partial<MatcherConfig>): anyMachine Learning
learnAliasesFromTransformations(transformations: TransformationExample[]): LearningResultParser Types & Enums
enum MarkdownFormat {
AUTO = 'auto', // Auto-detect format
HEADING = 'heading', // ### Section format
BULLET = 'bullet', // - Section format
COLON = 'colon', // Key: Value format
MIXED = 'mixed' // Mixed formats
}
interface MarkdownSection {
heading: string;
content: string;
level: number;
format: 'heading' | 'bullet' | 'mixed';
}
interface ParserOptions {
format?: MarkdownFormat;
sectionKeywords?: string[];
fuzzyThreshold?: number;
}nx-helpers Integration
// Merging
mergeNoRedundancy(base: T, override: Partial<T>): T
mergeMultiple(...objects: Partial<T>[]): T
mergeWithRoles(items: MergableItem[]): any
// Matching
bestMatchOneToMany(term: string, candidates: string[], config: MatcherConfig): StringScore | null
defaultMatcherConfig(): MatcherConfig
// Schema Building (nx-helpers)
nxString: SchemaNode
nxNumber: SchemaNode
nxBoolean: SchemaNode
nxArray(items: SchemaNode): SchemaNode
nxObject(properties: Record<string, SchemaNode>): SchemaNode🔬 Examples
Run the comprehensive examples:
# Basic usage
npm run example
# Advanced features (merging, ML, etc.)
npm run integration-example🧪 Testing
npm test # Run test suite
npm run test:watch # Watch mode📊 Performance & Accuracy
Matching Algorithms (nx-helpers v1.5.0)
- Jaro-Winkler: Character-level similarity (40% weight)
- Jaccard Tokens: Token-based similarity (30% weight)
- Dice Coefficient: N-gram similarity (20% weight)
- Levenshtein Ratio: Edit distance (10% weight)
Real-World Results
- Typo Correction: 85%+ accuracy on common typos
- Case Handling: 100% accuracy on case variations
- Context Awareness: Different thresholds for different match types
- Machine Learning: Continuous improvement with usage data
🤝 Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
ISC License - see LICENSE file for details.
🙏 Acknowledgments
- Built on nx-helpers for advanced AI capabilities
- Inspired by the need for intelligent markdown processing in enterprise workflows
- Thanks to the nx-intelligence team for the powerful fuzzy matching algorithms
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
📋 What's New in v2.1
- Format Analysis API: New
analyzeMarkdownFormat()function provides detailed format detection with confidence scores and line ranges - Multi-Format Detection: Detects all supported formats in a document with ranking by confidence
- Section Range Analysis: Get exact line numbers for each section in your markdown
- Mixed Content Detection: Identifies documents that contain multiple format types
📋 What's New in v2.0
- Extensible Multi-Format Architecture: Support for any markdown variant through custom parsers, not just headings and bullets
- Intelligent Parser System: Auto-detection and selection from multiple registered parsers
- Plugin Architecture: Easy to extend with custom parsers for YAML frontmatter, numbered sections, XML, or any format
- Modular Design: Clean separation of parsing, conversion, and transformation logic
- Backward Compatibility: All existing code continues to work unchanged
- Enhanced TypeScript: Better type safety with extensible parser interfaces
Made with ❤️ by the nx-intelligence team
