@bottxrnif/latex-json-ast-converter
v1.0.0
Published
Convert LaTeX documents to JSON abstract syntax trees with advanced statistics and section counting
Maintainers
Readme
@latex-converter/json-ast
A powerful TypeScript library for converting LaTeX documents to JSON abstract syntax trees with advanced statistics and section counting. Perfect for document analysis, content management systems, and academic processing workflows.
✨ Features
- 🚀 Fast LaTeX Parsing - Convert LaTeX to structured JSON AST
- 📊 Advanced Statistics - Word counts, math expressions, section analysis
- 🎯 Section Hierarchy - Accurate section/subsection/subsubsection parsing
- 📐 Math Detection - Separate counting for inline and displayed math
- 🌐 Unicode Support - Handle international characters and special symbols
- 🔧 Robust Error Handling - Graceful processing of malformed LaTeX
- 📱 Web Interface - Built-in web UI with file upload and visualization
- 🧪 Comprehensive Testing - 66+ tests with edge case coverage
📦 Installation
```bash npm install @latex-converter/json-ast ```
🚀 Quick Start
Basic Usage
```typescript import { LaTeXToJSONAST } from '@latex-converter/json-ast';
const latex = ` \documentclass{article} \begin{document}
\section{Introduction} This is the introduction section with math: $E = mc^2$.
\subsection{Background} Some background information.
\end{document}`;
const ast = LaTeXToJSONAST.convert(latex); console.log(ast); ```
Advanced Statistics
```typescript import { AccurateLaTeXParser } from '@latex-converter/json-ast';
const result = AccurateLaTeXParser.parseWithStatistics(latex);
console.log('Document Statistics:'); console.log(`Words in text: ${result.stats.wordsInText}`); console.log(`Math inlines: ${result.stats.mathInlines}`); console.log(`Total sections: ${result.stats.numberOfHeaders}`); ```
📊 Output Format
The JSON AST follows this structure:
```typescript interface LaTeXAST { type: 'document'; title?: string; sections: SectionNode[]; metadata: { totalSections: number; totalSubsections: number; totalSubsubsections: number; maxDepth: number; }; }
interface SectionNode { type: 'section' | 'subsection' | 'subsubsection'; title: string; level: number; children: SectionNode[]; content?: string; } ```
🔧 Advanced Features
Section-Level Statistics
Get detailed statistics for each section:
```typescript const result = AccurateLaTeXParser.parseWithStatistics(latex);
result.stats.sectionStats.forEach(stat => { console.log(`${stat.type}: ${stat.title}`); console.log(` Words: ${stat.wordsInText}, Headers: ${stat.wordsInHeaders}`); console.log(` Math: ${stat.mathInlines} inline, ${stat.mathDisplayed} displayed`); console.log(` Subcounts: ${stat.subcounts}`); }); ```
Round-trip Conversion
Convert JSON back to LaTeX:
```typescript import { JSONToLaTeX } from '@latex-converter/json-ast';
const latex = JSONToLaTeX.convert(ast); console.log(latex); ```
Web Interface
Launch the built-in web interface:
```bash npm start
Visit http://localhost:8080
```
📈 Statistics Features
The parser provides comprehensive document statistics:
- Word Counting: Separate counts for text, headers, and captions
- Math Analysis: Distinguishes inline (
$...$) from displayed ($$...$$,\\begin{equation}...) - Section Hierarchy: Accurate parsing of nested sections
- Float Detection: Count figures and tables
- Unicode Support: Handle international characters
Example Statistics Output
```json { "wordsInText": 21157, "wordsInHeaders": 203, "wordsOutsideText": 25, "mathInlines": 3992, "mathDisplayed": 72, "numberOfHeaders": 80, "numberOfFloats": 0, "sectionStats": [ { "title": "Introduction", "type": "section", "level": 1, "wordsInText": 150, "wordsInHeaders": 1, "mathInlines": 5, "mathDisplayed": 0, "subcounts": "150+1+0 (1/0/5/0)" } ] } ```
🧪 Testing
Run the comprehensive test suite:
```bash npm test ```
Run tests with coverage:
```bash npm run test:coverage ```
🏗️ Development
Building
```bash npm run build ```
Development Mode
```bash npm run dev ```
Linting
```bash npm run lint npm run lint:fix ```
📚 API Reference
LaTeXToJSONAST
Basic LaTeX to JSON conversion.
```typescript static convert(latex: string): LaTeXAST ```
AccurateLaTeXParser
Advanced parsing with detailed statistics.
```typescript static parseWithStatistics(latex: string): { ast: LaTeXAST; stats: AccurateStatistics } ```
JSONToLaTeX
Convert JSON AST back to LaTeX.
```typescript static convert(ast: LaTeXAST): string static generateRandomAST(maxDepth?: number, maxSectionsPerLevel?: number): LaTeXAST ```
🎯 Use Cases
- Academic Content Management - Analyze research papers and theses
- Document Processing - Extract structure from LaTeX documents
- Content Analysis - Generate statistics for large document collections
- Educational Tools - Create LaTeX learning applications
- Publishing Systems - Integrate with content management workflows
🔍 Supported LaTeX Features
- Section Commands:
\\section,\\subsection,\\subsubsection - Math Environments: Inline
$...$, displayed$$...$$,\\begin{equation}... - Document Classes:
article,report,book,amsart - Unicode Characters: International text and symbols
- Comments: Proper handling of LaTeX comments
- Escaped Characters:
\\$,\\&,\\%, etc.
🤝 Contributing
Contributions are welcome! Please read our Contributing Guide and submit a Pull Request.
📄 License
MIT © LaTeX Converter Team
🔗 Related Projects
- TexSoup - Python LaTeX parsing library
- LaTeX.js - JavaScript LaTeX parser
- KaTeX - Fast math typesetting library
📞 Support
- 📧 Email: [email protected]
- 🐛 Issues: GitHub Issues
- 📖 Docs: Documentation
