document-outline-extractor
v1.0.1
Published
Extract structured outlines from documents with optional AI enhancement
Maintainers
Readme
document-outline-extractor
A flexible TypeScript library for extracting structured outlines from documents of arbitrary length, with optional OpenAI/Azure OpenAI integration for enhanced outline generation.
Features
- 📝 Extract outlines from Markdown documents
- 🤖 Optional AI-powered outline generation using OpenAI/Azure OpenAI
- 📊 Automatic document chunking for large documents
- 🎯 Smart quality scoring to determine if existing outline is sufficient
- 🔧 Multiple output formats (tree, markdown, JSON)
- ⚡ Fallback to regex-based extraction when AI is unavailable
- 🖥️ Command-line interface for quick testing
Installation
npm install -g document-outline-extractorOr as a library:
npm install document-outline-extractorCLI Usage
Basic Commands
# Extract outline from file
outline-extractor -i document.md
# Extract with specific format
outline-extractor -i document.md -f json -o outline.json
# Use OpenAI for enhanced extraction
outline-extractor -i document.md --openai-key sk-... --model gpt-4o
# Check document quality
outline-extractor -i document.md -q
# Pipe content
cat document.md | outline-extractor -f markdown
# Use configuration file
outline-extractor -i document.md -c config.jsonCLI Options
-i, --input <file>- Input markdown file path-o, --output <file>- Output file path (default: stdout)-f, --format <format>- Output format: tree, markdown, json, flat-d, --max-depth <n>- Maximum heading depth to include-q, --quality- Show quality metrics instead of outline-c, --config <file>- Configuration file path (JSON)--openai-key <key>- OpenAI API key--openai-url <url>- OpenAI base URL--model <name>- Model name-h, --help- Show help message-v, --version- Show version
Configuration File
Create a config.json file:
{
"format": "markdown",
"maxDepth": 3,
"openai": {
"apiKey": "your-api-key",
"baseUrl": "https://api.openai.com/v1",
"model": "gpt-4o-mini",
"temperature": 0.3,
"maxTokens": 2000
},
"extractor": {
"chunkSize": 5000,
"qualityThreshold": 0.8,
"defaultFormat": "tree"
}
}Library Usage
Basic Usage
import { OutlineExtractor } from 'document-outline-extractor';
const extractor = new OutlineExtractor();
const outline = await extractor.extract(markdownContent);
console.log(outline);With OpenAI Configuration
import { OutlineExtractor } from 'document-outline-extractor';
const extractor = new OutlineExtractor({
openai: {
baseUrl: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'gpt-4o-mini',
temperature: 0.5,
maxTokens: 3000
}
});
const outline = await extractor.extract(markdownContent, {
format: 'json',
maxDepth: 3
});Quality Evaluation
const extractor = new OutlineExtractor();
const metrics = extractor.evaluateQuality(markdownContent);
console.log('Quality Score:', metrics.score);
console.log('Heading Count:', metrics.headingCount);
console.log('Max Depth:', metrics.depth);Document Chunking
const extractor = new OutlineExtractor({ chunkSize: 3000 });
const chunks = extractor.splitDocument(longDocument, 'smart');
for (const chunk of chunks) {
console.log('Chunk length:', chunk.length);
}Custom OpenAI Parameters per Request
// Override temperature and max tokens for specific requests
const extractor = new OutlineExtractor({
openai: {
baseUrl: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'gpt-4o-mini'
}
});
// Pass custom parameters to generateOutline
const outline = await extractor.generateOutlineWithAI(content, systemPrompt, {
temperature: 0.7,
maxTokens: 4000,
maxCompletionTokens: 3500 // Use max_completion_tokens instead of max_tokens
});API Reference
OutlineExtractor
Main class for extracting outlines.
Constructor Options
interface OutlineExtractorConfig {
openai?: OpenAIConfig; // OpenAI configuration
chunkSize?: number; // Max chunk size (default: 5000)
qualityThreshold?: number; // Min quality score (default: 0.8)
defaultFormat?: OutlineFormat; // Default output format
caching?: boolean; // Enable caching (default: true)
}Methods
extract(content: string, options?: ExtractOptions)- Extract outline from contentevaluateQuality(content: string)- Evaluate outline quality scoresplitDocument(content: string, strategy?: ChunkingStrategy)- Split document into chunksclearCache()- Clear internal cacheupdateConfig(config: Partial<OutlineExtractorConfig>)- Update configuration
Output Formats
- tree - Indented tree structure
- markdown - Markdown headings
- json - JSON object with hierarchy
- flat - Numbered flat list
Examples
Extract from README
outline-extractor -i README.md -f treeGenerate JSON Outline
outline-extractor -i document.md -f json -o outline.jsonQuality Check
outline-extractor -i document.md -qOutput:
Document Outline Quality Metrics:
────────────────────────────────────
Overall Score: 85.3%
Richness: 50.0%
Balance: 92.1%
Coherence: 100.0%
Coverage: 8.5%
Heading Count: 12
Max Depth: 3
────────────────────────────────────
✓ Document has good outline structureDevelopment
# Install dependencies
npm install
# Build
npm run build
# Test
npm test
# Run CLI in development
npm run cli -- -i document.mdLicense
MIT
