@voicenter-team/nuxt-llms-generator

v0.1.13

Published

2 months ago

Nuxt 3 module for automatically generating AI-optimized documentation files (llms.txt, llms-full.txt, and individual .md files) from Umbraco CMS data using Anthropic's Claude API.

0High
0Medium
0Low

🤖 LLMS Documentation Generator for Nuxt 3

Transform your Umbraco CMS content into AI-optimized documentation following the 2024 LLMS.txt standard

Generate high-quality, AI-optimized markdown documentation from your Umbraco CMS content using Claude AI. Perfect for creating LLMS.txt files that help AI systems understand your website.

🔄 How It Works

flowchart TB
    subgraph "INPUT"
        JSON[UmbracoData.json<br/>📋 CMS Content]
        CONFIG[nuxt.config.ts<br/>⚙️ Configuration]
        API_KEY[🔑 Anthropic API Key]
    end

    subgraph "PROCESSING PIPELINE"
        START([🚀 Build Process Starts])

        subgraph "1️⃣ INITIALIZATION"
            LOAD[Load Configuration]
            VALIDATE[Validate API Connection]
            CACHE_CHECK[Check Template Cache]
        end

        subgraph "2️⃣ CONTENT ANALYSIS"
            FILTER[Filter Visible Pages<br/>📊 Skip hidePage: "1"]
            EXTRACT[Extract Page Content<br/>🔍 JSONPath Resolution]
            HASH[Generate Structure Hash<br/>🏗️ Detect Changes]
        end

        subgraph "3️⃣ TEMPLATE GENERATION"
            CACHE_HIT{Cache Hit?}
            CLAUDE[🤖 Claude AI Analysis<br/>Semantic Understanding]
            TEMPLATE[Generate Mustache Template<br/>📝 AI-Optimized Structure]
            STORE_CACHE[💾 Store in Cache]
        end

        subgraph "4️⃣ CLEANUP & OPTIMIZATION"
            CLEANUP[🧹 Orphaned Template Cleanup<br/>Remove deleted/hidden pages]
            HTML_CLEAN[🔧 HTML-to-Markdown<br/>Clean Artifacts & Entities]
        end

        subgraph "5️⃣ FILE GENERATION"
            RENDER[Render Templates<br/>🎨 Mustache + Data]
            POST_PROCESS[Post-Process Markdown<br/>✨ Final Quality Pass]
        end
    end

    subgraph "OUTPUT FILES"
        LLMS_TXT[📄 llms.txt<br/>Navigation Index]
        LLMS_FULL[📄 llms-full.txt<br/>Complete Documentation]
        MD_FILES[📁 Individual .md Files<br/>Per-Page Documentation]
    end

    subgraph "MULTI-SITE SUPPORT"
        ENV1[🌐 Site 1<br/>SITE_ENV=main]
        ENV2[🌐 Site 2<br/>SITE_ENV=partner]
        ENV3[🌐 Site 3<br/>SITE_ENV=staging]

        CACHE1[💾 .llms-templates/main/]
        CACHE2[💾 .llms-templates/partner/]
        CACHE3[💾 .llms-templates/staging/]

        OUT1[📂 .output/llms/main/]
        OUT2[📂 .output/llms/partner/]
        OUT3[📂 .output/llms/staging/]
    end

    %% Flow connections
    JSON --> START
    CONFIG --> START
    API_KEY --> START

    START --> LOAD
    LOAD --> VALIDATE
    VALIDATE --> CACHE_CHECK

    CACHE_CHECK --> FILTER
    FILTER --> EXTRACT
    EXTRACT --> HASH

    HASH --> CACHE_HIT
    CACHE_HIT -->|❌ No| CLAUDE
    CACHE_HIT -->|✅ Yes| CLEANUP

    CLAUDE --> TEMPLATE
    TEMPLATE --> STORE_CACHE
    STORE_CACHE --> CLEANUP

    CLEANUP --> HTML_CLEAN
    HTML_CLEAN --> RENDER
    RENDER --> POST_PROCESS

    POST_PROCESS --> LLMS_TXT
    POST_PROCESS --> LLMS_FULL
    POST_PROCESS --> MD_FILES

    %% Multi-site flows
    CONFIG -.-> ENV1
    CONFIG -.-> ENV2
    CONFIG -.-> ENV3

    ENV1 -.-> CACHE1
    ENV2 -.-> CACHE2
    ENV3 -.-> CACHE3

    CACHE1 -.-> OUT1
    CACHE2 -.-> OUT2
    CACHE3 -.-> OUT3

    %% Styling
    classDef input fill:#e1f5fe
    classDef process fill:#f3e5f5
    classDef output fill:#e8f5e8
    classDef multisite fill:#fff3e0

    class JSON,CONFIG,API_KEY input
    class LLMS_TXT,LLMS_FULL,MD_FILES output
    class ENV1,ENV2,ENV3,CACHE1,CACHE2,CACHE3,OUT1,OUT2,OUT3 multisite

🎯 Key Process Details

| Phase | What Happens | Why It Matters | |--------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------| | 🔍 Content Analysis | Filters visible pages, extracts content via JSONPath, generates structure hashes | Only processes public pages, detects actual changes (not just content updates) | | 🤖 AI Generation | Claude analyzes page structure and generates semantic Mustache templates | Creates context-aware templates that understand your business domain | | 💾 Smart Caching | Stores templates with structure hashes, reuses unchanged templates | Saves API costs and generation time on subsequent builds | | 🧹 Automatic Cleanup | Removes templates for deleted/hidden pages, syncs with current content | Prevents cache bloat and maintains accuracy | | 🔧 Post-Processing | Converts HTML to clean markdown, removes artifacts and entities | Ensures AI-optimized output that follows 2024 LLMS.txt standards |

🏢 Multi-Site Architecture

The system automatically adapts to different environments using the SITE_ENV variable:

SITE_ENV=main     → UmbracoData-main.json     → .llms-templates/main/     → .output/llms/main/
SITE_ENV=partner  → UmbracoData-partner.json  → .llms-templates/partner/  → .output/llms/partner/
SITE_ENV=staging  → UmbracoData-staging.json  → .llms-templates/staging/  → .output/llms/staging/

Each environment maintains its own isolated cache and output, preventing conflicts while sharing the same codebase and configuration logic.

🌟 Features

🤖 AI-Powered Template Generation

Claude API Integration: Uses Anthropic's Claude for intelligent content analysis
2024 LLMS.txt Compliance: Follows the latest LLMS.txt standard for AI consumption
Semantic Understanding: Automatically detects content types and generates appropriate templates
Multi-language Support: Handles Hebrew/English mixed content and RTL text

⚡ Smart Caching System

Structure-Based Detection: Only regenerates when page structure changes (not content values)
Incremental Updates: Process only changed pages for faster builds
API Cost Optimization: Avoids unnecessary Claude API calls
Git-Friendly: Templates stored in git, outputs excluded

🧹 Automatic Cleanup

Orphaned Template Detection: Removes templates for deleted pages
Hidden Page Handling: Cleans up templates for pages marked as hidden
Cache Synchronization: Keeps template cache aligned with Umbraco content

📄 Multiple Output Formats

llms.txt - Navigation index following 2024 standard
llms-full.txt - Complete site documentation in one file
Individual .md files - Clean, AI-optimized markdown per page

🎯 Production Ready

26 Passing Tests: Comprehensive test coverage
TypeScript Support: Full type safety throughout
Parallel Processing: Configurable concurrency for large sites
Error Resilience: Graceful handling of failures with detailed logging

🚀 Quick Start

Installation

npm install nuxt-llms-generator
# or
yarn add nuxt-llms-generator
# or
pnpm add nuxt-llms-generator

Basic Configuration

// nuxt.config.ts
export default defineNuxtConfig({
  modules: ['nuxt-llms-generator'],
  llmsGenerator: {
    anthropicApiKey: process.env.ANTHROPIC_API_KEY,
    umbracoDataPath: './public/UmbracoData.json',
    finalOutputDir: './.output/llms'
  }
})

Environment Variables

# .env
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

Generate Documentation

npm run build
# Documentation generated automatically during build process

⚙️ Configuration Options

Core Settings

| Option | Type | Required | Default | Description | |----------------------|----------|----------|------------------|--------------------------------------------| | anthropicApiKey | string | ✅ | - | Your Claude API key from Anthropic | | umbracoDataPath | string | ✅ | - | Path to your UmbracoData.json file | | templatesDir | string | ❌ | ./.llms-templates | Directory for templates and cache files | | finalOutputDir | string | ❌ | ./.output/llms | Output directory for final documentation |

Generation Options

| Option | Type | Default | Description | |------------------------|-----------|------------------------------|------------------------------------------| | enableIndividualMd | boolean | true | Generate individual .md files per page | | enableLLMSFullTxt | boolean | true | Generate combined llms-full.txt file | | enableHtmlToMarkdown | boolean | true | Convert HTML content to markdown using node-html-markdown | | maxConcurrent | number | 5 | Maximum concurrent API requests | | maxTokens | number | 65000 | Maximum tokens for page content before truncation (Claude context limit protection) | | anthropicModel | string | claude-3-5-sonnet-20241022 | Claude model to use |

Cleanup Options

| Option | Type | Default | Description | |---------------------|-----------|---------|----------------------------------------| | enableAutoCleanup | boolean | true | Automatically clean orphaned templates | | cleanupOrphaned | boolean | true | Remove templates for deleted pages | | cleanupHidden | boolean | true | Remove templates for hidden pages |

🏢 Multi-Site Implementation

Perfect for projects where one codebase generates multiple websites based on environment variables.

Environment-Based Configuration

// nuxt.config.ts
const siteEnv = process.env.SITE_ENV || 'main' // 'main', 'staging', 'partner', etc.

export default defineNuxtConfig({
  modules: ['nuxt-llms-generator'],
  llmsGenerator: {
    anthropicApiKey: process.env.ANTHROPIC_API_KEY,

    // Environment-specific paths
    umbracoDataPath: `./public/UmbracoData-${siteEnv}.json`,
    templatesDir: `./.llms-templates/${siteEnv}`,
    finalOutputDir: `./.output/llms/${siteEnv}`,

    // Shared settings
    maxConcurrent: 5,
    enableAutoCleanup: true
  }
})

Build Commands

// package.json
{
  "scripts": {
    "build:main": "SITE_ENV=main nuxt build",
    "build:partner": "SITE_ENV=partner nuxt build",
    "build:staging": "SITE_ENV=staging nuxt build"
  }
}

Directory Structure

project/
├── .llms-templates/
│   ├── main/        # Main site templates & cache
│   ├── partner/     # Partner site templates & cache
│   └── staging/     # Staging site templates & cache
├── .output/
│   └── llms/
│       ├── main/    # Main site docs
│       ├── partner/ # Partner site docs
│       └── staging/ # Staging site docs
├── public/
│   ├── UmbracoData-main.json
│   ├── UmbracoData-partner.json
│   └── UmbracoData-staging.json
└── templates/
    ├── main/        # Main site templates
    ├── partner/     # Partner site templates
    └── staging/     # Staging site templates

📊 Generated Output Examples

`llms.txt` (Navigation Index)

# Business Communication Solutions | Voicenter

> Thousands of organizations in Israel manage their business communications through our advanced cloud platform

This website contains comprehensive information about business communication solutions. The content is organized into the following sections:

## Services

- [Call Center Solutions](call-center-solutions.md): Complete call center management tools
- [Smart PBX for Business](smart-pbx-business.md): Advanced business telephony services
- [Mobile Solutions](mobile-solutions.md): Unlimited mobile communication solutions

## Technical

- [API Integration](api-integration.md): Developer tools and API documentation
- [CRM Connectivity](crm-connectivity.md): Full CRM integration capabilities

## Optional

- [Complete Documentation](llms-full.txt): All content combined in a single file

Individual `.md` Files

# Call Center Solutions

> Complete call center management tools for modern businesses

## Overview

Our call center solutions provide comprehensive tools for managing customer communications efficiently. Built on advanced cloud technology, these tools enable seamless implementation and superior organizational management.

## Key Features

- **Advanced Queue Management**: Intelligent call routing and distribution
- **Automated Callbacks**: Smart callback scheduling system
- **CRM Integration**: Seamless connection with existing CRM systems
- **Real-time Analytics**: Live monitoring and performance dashboards
- **Multi-channel Support**: Handle calls, emails, and chat in one platform

## Benefits

- Reduced customer wait times
- Increased agent productivity
- Better customer satisfaction scores
- Scalable solution that grows with your business

*Generated with Claude AI | Last updated: 2024-01-16*

🔧 Advanced Usage

Custom Build Script

// scripts/generate-docs.js
import { LLMSFilesGenerator } from 'nuxt-llms-generator'

const config = {
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  umbracoDataPath: './public/UmbracoData.json',
  finalOutputDir: './docs/ai-generated'
}

const generator = new LLMSFilesGenerator(config)

try {
  const files = await generator.generateAllFiles()
  console.log(`✅ Generated ${files.individualMdFiles?.length || 0} markdown files`)
  console.log('📝 LLMS documentation generation complete!')
} catch (error) {
  console.error('❌ Generation failed:', error)
  process.exit(1)
}

Development vs Production

// nuxt.config.ts
const isDev = process.env.NODE_ENV === 'development'

export default defineNuxtConfig({
  llmsGenerator: {
    anthropicApiKey: process.env.ANTHROPIC_API_KEY,
    umbracoDataPath: './public/UmbracoData.json',

    // Generate fewer files during development
    enableIndividualMd: !isDev,
    enableLLMSFullTxt: !isDev,

    // Lower concurrency in development
    maxConcurrent: isDev ? 2 : 8
  }
})

CI/CD Integration

# .github/workflows/build.yml
name: Generate Documentation
on:
  push:
    branches: [main]

jobs:
  docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
      - run: npm ci
      - run: npm run build
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - name: Deploy docs
        run: cp -r .output/llms/* ./public/docs/

🧪 Testing

Run the comprehensive test suite:

npm test
# or
npm run test:watch  # Watch mode

Test Coverage

✅ Template generation and caching
✅ HTML-to-markdown conversion
✅ Multi-language content handling
✅ Page visibility filtering
✅ Orphaned template cleanup
✅ Configuration validation
✅ Error handling and resilience

🐛 Troubleshooting

Common Issues

❌ "Claude API key not found"

# Make sure your API key is set
echo $ANTHROPIC_API_KEY
# Should show: sk-ant-api03-...

❌ "UmbracoData.json not found"

# Check the file exists
ls -la public/UmbracoData.json
# Verify path in nuxt.config.ts matches

❌ "Template generation failed"

Check Claude API quota and rate limits
Verify UmbracoData.json has valid structure
Enable debug logging: DEBUG=llms:* npm run build

Performance Tips

Large Sites (1000+ pages):

{
  maxConcurrent: 8, // Higher concurrency
  maxTokens: 80000, // More content per page (if using larger models)
  enableAutoCleanup: true, // Keep cache clean
}

Development Speed:

{
  enableIndividualMd: false, // Skip individual files
  maxConcurrent: 2, // Lower API usage
  maxTokens: 50000, // Smaller context for faster processing
}

Production Optimization:

{
  enableAutoCleanup: true,
  cleanupOrphaned: true,
  cleanupHidden: true,
  maxTokens: 65000, // Balance between detail and API limits
  enableHtmlToMarkdown: true  // Clean HTML from CMS content
}

HTML Content Processing:

{
  enableHtmlToMarkdown: true,  // Convert <p>, <h1>, etc. to clean markdown
  enableHtmlToMarkdown: false  // Keep HTML as-is (if AI already generates clean content)
}

📚 API Reference

LLMSFilesGenerator

import { LLMSFilesGenerator } from 'nuxt-llms-generator'

const generator = new LLMSFilesGenerator({
  anthropicApiKey: 'your-api-key',
  umbracoDataPath: './data.json',
  finalOutputDir: './output'
})

// Generate all documentation files
const files = await generator.generateAllFiles()

// Test Claude API connection
const isConnected = await generator.testConnection()

// Clear template cache
generator.clearCache()

// Get generation statistics
const stats = generator.getStats()

Configuration Interface

interface LLMSConfig {
  // Required
  anthropicApiKey: string;
  umbracoDataPath: string;

  // Optional with defaults
  templatesDir?: string;              // './.llms-templates'
  finalOutputDir?: string;            // './.output/llms'
  anthropicModel?: string;            // 'claude-3-5-sonnet-20241022'
  maxTokens?: number;                 // 65000
  maxConcurrent?: number;             // 5
  enableLLMSFullTxt?: boolean;        // true
  enableIndividualMd?: boolean;       // true
  enableHtmlToMarkdown?: boolean;     // true
  enableAutoCleanup?: boolean;        // true
  cleanupOrphaned?: boolean;          // true
  cleanupHidden?: boolean;            // true
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/your-org/nuxt-llms-generator.git
cd nuxt-llms-generator
npm install
npm run dev

Running Tests

npm test                # Run all tests
npm run test:watch     # Watch mode
npm run test:coverage  # Coverage report

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Anthropic for Claude AI API
Jeremy Howard for the 2024 LLMS.txt standard
Nuxt 3 for the amazing framework
The open-source community for inspiration and feedback

Made with ❤️ for the AI-first web

Transform your CMS content into AI-optimized documentation that helps AI systems understand your business better.