@voicenter-team/nuxt-llms-generator
v0.1.12
Published
Nuxt 3 module for automatically generating AI-optimized documentation files (llms.txt, llms-full.txt, and individual .md files) from Umbraco CMS data using Anthropic's Claude API.
Readme
🤖 LLMS Documentation Generator for Nuxt 3
Transform your Umbraco CMS content into AI-optimized documentation following the 2024 LLMS.txt standard
Generate high-quality, AI-optimized markdown documentation from your Umbraco CMS content using Claude AI. Perfect for creating LLMS.txt files that help AI systems understand your website.
🔄 How It Works
flowchart TB
subgraph "INPUT"
JSON[UmbracoData.json<br/>📋 CMS Content]
CONFIG[nuxt.config.ts<br/>⚙️ Configuration]
API_KEY[🔑 Anthropic API Key]
end
subgraph "PROCESSING PIPELINE"
START([🚀 Build Process Starts])
subgraph "1️⃣ INITIALIZATION"
LOAD[Load Configuration]
VALIDATE[Validate API Connection]
CACHE_CHECK[Check Template Cache]
end
subgraph "2️⃣ CONTENT ANALYSIS"
FILTER[Filter Visible Pages<br/>📊 Skip hidePage: "1"]
EXTRACT[Extract Page Content<br/>🔍 JSONPath Resolution]
HASH[Generate Structure Hash<br/>🏗️ Detect Changes]
end
subgraph "3️⃣ TEMPLATE GENERATION"
CACHE_HIT{Cache Hit?}
CLAUDE[🤖 Claude AI Analysis<br/>Semantic Understanding]
TEMPLATE[Generate Mustache Template<br/>📝 AI-Optimized Structure]
STORE_CACHE[💾 Store in Cache]
end
subgraph "4️⃣ CLEANUP & OPTIMIZATION"
CLEANUP[🧹 Orphaned Template Cleanup<br/>Remove deleted/hidden pages]
HTML_CLEAN[🔧 HTML-to-Markdown<br/>Clean Artifacts & Entities]
end
subgraph "5️⃣ FILE GENERATION"
RENDER[Render Templates<br/>🎨 Mustache + Data]
POST_PROCESS[Post-Process Markdown<br/>✨ Final Quality Pass]
end
end
subgraph "OUTPUT FILES"
LLMS_TXT[📄 llms.txt<br/>Navigation Index]
LLMS_FULL[📄 llms-full.txt<br/>Complete Documentation]
MD_FILES[📁 Individual .md Files<br/>Per-Page Documentation]
end
subgraph "MULTI-SITE SUPPORT"
ENV1[🌐 Site 1<br/>SITE_ENV=main]
ENV2[🌐 Site 2<br/>SITE_ENV=partner]
ENV3[🌐 Site 3<br/>SITE_ENV=staging]
CACHE1[💾 .llms-templates/main/]
CACHE2[💾 .llms-templates/partner/]
CACHE3[💾 .llms-templates/staging/]
OUT1[📂 .output/llms/main/]
OUT2[📂 .output/llms/partner/]
OUT3[📂 .output/llms/staging/]
end
%% Flow connections
JSON --> START
CONFIG --> START
API_KEY --> START
START --> LOAD
LOAD --> VALIDATE
VALIDATE --> CACHE_CHECK
CACHE_CHECK --> FILTER
FILTER --> EXTRACT
EXTRACT --> HASH
HASH --> CACHE_HIT
CACHE_HIT -->|❌ No| CLAUDE
CACHE_HIT -->|✅ Yes| CLEANUP
CLAUDE --> TEMPLATE
TEMPLATE --> STORE_CACHE
STORE_CACHE --> CLEANUP
CLEANUP --> HTML_CLEAN
HTML_CLEAN --> RENDER
RENDER --> POST_PROCESS
POST_PROCESS --> LLMS_TXT
POST_PROCESS --> LLMS_FULL
POST_PROCESS --> MD_FILES
%% Multi-site flows
CONFIG -.-> ENV1
CONFIG -.-> ENV2
CONFIG -.-> ENV3
ENV1 -.-> CACHE1
ENV2 -.-> CACHE2
ENV3 -.-> CACHE3
CACHE1 -.-> OUT1
CACHE2 -.-> OUT2
CACHE3 -.-> OUT3
%% Styling
classDef input fill:#e1f5fe
classDef process fill:#f3e5f5
classDef output fill:#e8f5e8
classDef multisite fill:#fff3e0
class JSON,CONFIG,API_KEY input
class LLMS_TXT,LLMS_FULL,MD_FILES output
class ENV1,ENV2,ENV3,CACHE1,CACHE2,CACHE3,OUT1,OUT2,OUT3 multisite🎯 Key Process Details
| Phase | What Happens | Why It Matters | |--------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------| | 🔍 Content Analysis | Filters visible pages, extracts content via JSONPath, generates structure hashes | Only processes public pages, detects actual changes (not just content updates) | | 🤖 AI Generation | Claude analyzes page structure and generates semantic Mustache templates | Creates context-aware templates that understand your business domain | | 💾 Smart Caching | Stores templates with structure hashes, reuses unchanged templates | Saves API costs and generation time on subsequent builds | | 🧹 Automatic Cleanup | Removes templates for deleted/hidden pages, syncs with current content | Prevents cache bloat and maintains accuracy | | 🔧 Post-Processing | Converts HTML to clean markdown, removes artifacts and entities | Ensures AI-optimized output that follows 2024 LLMS.txt standards |
🏢 Multi-Site Architecture
The system automatically adapts to different environments using the SITE_ENV variable:
SITE_ENV=main → UmbracoData-main.json → .llms-templates/main/ → .output/llms/main/
SITE_ENV=partner → UmbracoData-partner.json → .llms-templates/partner/ → .output/llms/partner/
SITE_ENV=staging → UmbracoData-staging.json → .llms-templates/staging/ → .output/llms/staging/Each environment maintains its own isolated cache and output, preventing conflicts while sharing the same codebase and configuration logic.
🌟 Features
🤖 AI-Powered Template Generation
- Claude API Integration: Uses Anthropic's Claude for intelligent content analysis
- 2024 LLMS.txt Compliance: Follows the latest LLMS.txt standard for AI consumption
- Semantic Understanding: Automatically detects content types and generates appropriate templates
- Multi-language Support: Handles Hebrew/English mixed content and RTL text
⚡ Smart Caching System
- Structure-Based Detection: Only regenerates when page structure changes (not content values)
- Incremental Updates: Process only changed pages for faster builds
- API Cost Optimization: Avoids unnecessary Claude API calls
- Git-Friendly: Templates stored in git, outputs excluded
🧹 Automatic Cleanup
- Orphaned Template Detection: Removes templates for deleted pages
- Hidden Page Handling: Cleans up templates for pages marked as hidden
- Cache Synchronization: Keeps template cache aligned with Umbraco content
📄 Multiple Output Formats
llms.txt- Navigation index following 2024 standardllms-full.txt- Complete site documentation in one file- Individual
.mdfiles - Clean, AI-optimized markdown per page
🎯 Production Ready
- 26 Passing Tests: Comprehensive test coverage
- TypeScript Support: Full type safety throughout
- Parallel Processing: Configurable concurrency for large sites
- Error Resilience: Graceful handling of failures with detailed logging
🚀 Quick Start
Installation
npm install nuxt-llms-generator
# or
yarn add nuxt-llms-generator
# or
pnpm add nuxt-llms-generatorBasic Configuration
// nuxt.config.ts
export default defineNuxtConfig({
modules: ['nuxt-llms-generator'],
llmsGenerator: {
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
umbracoDataPath: './public/UmbracoData.json',
finalOutputDir: './.output/llms'
}
})Environment Variables
# .env
ANTHROPIC_API_KEY=sk-ant-api03-your-key-hereGenerate Documentation
npm run build
# Documentation generated automatically during build process⚙️ Configuration Options
Core Settings
| Option | Type | Required | Default | Description |
|----------------------|----------|----------|------------------|--------------------------------------------|
| anthropicApiKey | string | ✅ | - | Your Claude API key from Anthropic |
| umbracoDataPath | string | ✅ | - | Path to your UmbracoData.json file |
| templatesDir | string | ❌ | ./.llms-templates | Directory for templates and cache files |
| finalOutputDir | string | ❌ | ./.output/llms | Output directory for final documentation |
Generation Options
| Option | Type | Default | Description |
|------------------------|-----------|------------------------------|------------------------------------------|
| enableIndividualMd | boolean | true | Generate individual .md files per page |
| enableLLMSFullTxt | boolean | true | Generate combined llms-full.txt file |
| enableHtmlToMarkdown | boolean | true | Convert HTML content to markdown using node-html-markdown |
| maxConcurrent | number | 5 | Maximum concurrent API requests |
| maxTokens | number | 65000 | Maximum tokens for page content before truncation (Claude context limit protection) |
| anthropicModel | string | claude-3-5-sonnet-20241022 | Claude model to use |
Cleanup Options
| Option | Type | Default | Description |
|---------------------|-----------|---------|----------------------------------------|
| enableAutoCleanup | boolean | true | Automatically clean orphaned templates |
| cleanupOrphaned | boolean | true | Remove templates for deleted pages |
| cleanupHidden | boolean | true | Remove templates for hidden pages |
🏢 Multi-Site Implementation
Perfect for projects where one codebase generates multiple websites based on environment variables.
Environment-Based Configuration
// nuxt.config.ts
const siteEnv = process.env.SITE_ENV || 'main' // 'main', 'staging', 'partner', etc.
export default defineNuxtConfig({
modules: ['nuxt-llms-generator'],
llmsGenerator: {
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
// Environment-specific paths
umbracoDataPath: `./public/UmbracoData-${siteEnv}.json`,
templatesDir: `./.llms-templates/${siteEnv}`,
finalOutputDir: `./.output/llms/${siteEnv}`,
// Shared settings
maxConcurrent: 5,
enableAutoCleanup: true
}
})Build Commands
// package.json
{
"scripts": {
"build:main": "SITE_ENV=main nuxt build",
"build:partner": "SITE_ENV=partner nuxt build",
"build:staging": "SITE_ENV=staging nuxt build"
}
}Directory Structure
project/
├── .llms-templates/
│ ├── main/ # Main site templates & cache
│ ├── partner/ # Partner site templates & cache
│ └── staging/ # Staging site templates & cache
├── .output/
│ └── llms/
│ ├── main/ # Main site docs
│ ├── partner/ # Partner site docs
│ └── staging/ # Staging site docs
├── public/
│ ├── UmbracoData-main.json
│ ├── UmbracoData-partner.json
│ └── UmbracoData-staging.json
└── templates/
├── main/ # Main site templates
├── partner/ # Partner site templates
└── staging/ # Staging site templates📊 Generated Output Examples
llms.txt (Navigation Index)
# Business Communication Solutions | Voicenter
> Thousands of organizations in Israel manage their business communications through our advanced cloud platform
This website contains comprehensive information about business communication solutions. The content is organized into the following sections:
## Services
- [Call Center Solutions](call-center-solutions.md): Complete call center management tools
- [Smart PBX for Business](smart-pbx-business.md): Advanced business telephony services
- [Mobile Solutions](mobile-solutions.md): Unlimited mobile communication solutions
## Technical
- [API Integration](api-integration.md): Developer tools and API documentation
- [CRM Connectivity](crm-connectivity.md): Full CRM integration capabilities
## Optional
- [Complete Documentation](llms-full.txt): All content combined in a single fileIndividual .md Files
# Call Center Solutions
> Complete call center management tools for modern businesses
## Overview
Our call center solutions provide comprehensive tools for managing customer communications efficiently. Built on advanced cloud technology, these tools enable seamless implementation and superior organizational management.
## Key Features
- **Advanced Queue Management**: Intelligent call routing and distribution
- **Automated Callbacks**: Smart callback scheduling system
- **CRM Integration**: Seamless connection with existing CRM systems
- **Real-time Analytics**: Live monitoring and performance dashboards
- **Multi-channel Support**: Handle calls, emails, and chat in one platform
## Benefits
- Reduced customer wait times
- Increased agent productivity
- Better customer satisfaction scores
- Scalable solution that grows with your business
*Generated with Claude AI | Last updated: 2024-01-16*🔧 Advanced Usage
Custom Build Script
// scripts/generate-docs.js
import { LLMSFilesGenerator } from 'nuxt-llms-generator'
const config = {
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
umbracoDataPath: './public/UmbracoData.json',
finalOutputDir: './docs/ai-generated'
}
const generator = new LLMSFilesGenerator(config)
try {
const files = await generator.generateAllFiles()
console.log(`✅ Generated ${files.individualMdFiles?.length || 0} markdown files`)
console.log('📝 LLMS documentation generation complete!')
} catch (error) {
console.error('❌ Generation failed:', error)
process.exit(1)
}Development vs Production
// nuxt.config.ts
const isDev = process.env.NODE_ENV === 'development'
export default defineNuxtConfig({
llmsGenerator: {
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
umbracoDataPath: './public/UmbracoData.json',
// Generate fewer files during development
enableIndividualMd: !isDev,
enableLLMSFullTxt: !isDev,
// Lower concurrency in development
maxConcurrent: isDev ? 2 : 8
}
})CI/CD Integration
# .github/workflows/build.yml
name: Generate Documentation
on:
push:
branches: [main]
jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: 18
- run: npm ci
- run: npm run build
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Deploy docs
run: cp -r .output/llms/* ./public/docs/🧪 Testing
Run the comprehensive test suite:
npm test
# or
npm run test:watch # Watch modeTest Coverage
- ✅ Template generation and caching
- ✅ HTML-to-markdown conversion
- ✅ Multi-language content handling
- ✅ Page visibility filtering
- ✅ Orphaned template cleanup
- ✅ Configuration validation
- ✅ Error handling and resilience
🐛 Troubleshooting
Common Issues
❌ "Claude API key not found"
# Make sure your API key is set
echo $ANTHROPIC_API_KEY
# Should show: sk-ant-api03-...❌ "UmbracoData.json not found"
# Check the file exists
ls -la public/UmbracoData.json
# Verify path in nuxt.config.ts matches❌ "Template generation failed"
- Check Claude API quota and rate limits
- Verify UmbracoData.json has valid structure
- Enable debug logging:
DEBUG=llms:* npm run build
Performance Tips
Large Sites (1000+ pages):
{ maxConcurrent: 8, // Higher concurrency maxTokens: 80000, // More content per page (if using larger models) enableAutoCleanup: true, // Keep cache clean }Development Speed:
{ enableIndividualMd: false, // Skip individual files maxConcurrent: 2, // Lower API usage maxTokens: 50000, // Smaller context for faster processing }Production Optimization:
{ enableAutoCleanup: true, cleanupOrphaned: true, cleanupHidden: true, maxTokens: 65000, // Balance between detail and API limits enableHtmlToMarkdown: true // Clean HTML from CMS content }HTML Content Processing:
{ enableHtmlToMarkdown: true, // Convert <p>, <h1>, etc. to clean markdown enableHtmlToMarkdown: false // Keep HTML as-is (if AI already generates clean content) }
📚 API Reference
LLMSFilesGenerator
import { LLMSFilesGenerator } from 'nuxt-llms-generator'
const generator = new LLMSFilesGenerator({
anthropicApiKey: 'your-api-key',
umbracoDataPath: './data.json',
finalOutputDir: './output'
})
// Generate all documentation files
const files = await generator.generateAllFiles()
// Test Claude API connection
const isConnected = await generator.testConnection()
// Clear template cache
generator.clearCache()
// Get generation statistics
const stats = generator.getStats()Configuration Interface
interface LLMSConfig {
// Required
anthropicApiKey: string;
umbracoDataPath: string;
// Optional with defaults
templatesDir?: string; // './.llms-templates'
finalOutputDir?: string; // './.output/llms'
anthropicModel?: string; // 'claude-3-5-sonnet-20241022'
maxTokens?: number; // 65000
maxConcurrent?: number; // 5
enableLLMSFullTxt?: boolean; // true
enableIndividualMd?: boolean; // true
enableHtmlToMarkdown?: boolean; // true
enableAutoCleanup?: boolean; // true
cleanupOrphaned?: boolean; // true
cleanupHidden?: boolean; // true
}🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
git clone https://github.com/your-org/nuxt-llms-generator.git
cd nuxt-llms-generator
npm install
npm run devRunning Tests
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # Coverage report📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Anthropic for Claude AI API
- Jeremy Howard for the 2024 LLMS.txt standard
- Nuxt 3 for the amazing framework
- The open-source community for inspiration and feedback
Made with ❤️ for the AI-first web
Transform your CMS content into AI-optimized documentation that helps AI systems understand your business better.
