@waldheimdev/astro-ai-llms-txt

v1.3.0

Published

a month ago

Astro Integration: llms.txt generator (AI-optimized summary of all HTML pages)

Downloads

0High
0Medium
0Low

mwaldheim

astro plugin llms.txt ai llm seo astro-integration astro-component

@waldheimdev/astro-ai-llms-txt

An Astro integration that automatically generates LLM-optimized llms.txt and llms-full.txt files in your build output. It uses AI to summarize your pages, making them perfectly digestible for Large Language Models.

Follows the llms.txt standard.

🚀 Features

🤖 AI-Powered Summarization: Uses OpenAI, Google Gemini, Anthropic Claude, or local Ollama models.
💻 CLI Provider: Use any CLI tool (e.g., gemini-cli, copilot-cli, claude-code) as a provider.
📂 Automatic Sectioning: Groups pages by their root directories (e.g., /blog/, /docs/).
⚡ Concurrency Control: Limit simultaneous AI requests to avoid rate limits.
📜 Full Content Support: Optionally generate llms-full.txt — in Markdown or XML (<document>) format.
💾 Caching: AI responses are cached locally (.llms-txt-cache/) to speed up subsequent builds.
🌍 Multi-language Support: Customize prompts based on your site's language (en, de, fr).
🔍 GEO Linter: Build-time warnings for content that may be hard for LLMs to interpret (word count, untagged code fences).
🗄️ Astro 5 DataStore Adapter: Pull content directly from an Astro 5 DataStore instead of HTML files.
🔪 Chunking Pipeline: Split full-content output into fixed, recursive, structure-aware, or semantic chunks — exportable as JSONL.
🔌 MCP Integration: Auto-generate .cursor/mcp.json, .vscode/mcp.json, and .mcp.json at build time, and serve a live Model Context Protocol SSE endpoint during astro dev.
🏷️ data-llm Metadata: Embed structured JSON metadata on any HTML element for LLM consumption.
🛠️ Robust & Fast: Optimized for Astro 5+ and Node 24+.

📋 Requirements

Node.js: 24.x or higher
Astro: 5.0.0 or higher, including Astro 6.x

📦 Installation

npm install @waldheimdev/astro-ai-llms-txt

🛠️ Usage

Add the integration to your astro.config.mjs:

import { defineConfig } from 'astro/config';
import llmsTxt from '@waldheimdev/astro-ai-llms-txt';

export default defineConfig({
  site: 'https://example.com',
  integrations: [
    llmsTxt({
      projectName: 'My Awesome Project',
      description: 'A deep dive into awesome things.',
      aiProvider: 'openai',
      aiApiKey: process.env.OPENAI_API_KEY,
      aiModel: 'gpt-4o-mini',
      llmsFull: true,
    }),
  ],
});

🧠 AI Provider Examples

Anthropic Claude

llmsTxt({
  aiProvider: 'claude',
  aiApiKey: process.env.ANTHROPIC_API_KEY,
  aiModel: 'claude-3-5-sonnet-latest',
});

Google Gemini

llmsTxt({
  aiProvider: 'gemini',
  aiApiKey: process.env.GEMINI_API_KEY,
  aiModel: 'gemini-2.5-flash',
  // Optional: enable extended thinking
  geminiThinkingLevel: 'low', // 'low' | 'medium' | 'high' | 'minimal'
  geminiThinkingBudget: 1024,
});

Local LLM (Ollama)

llmsTxt({
  aiProvider: 'ollama',
  aiModel: 'llama3', // ensure this model is pulled in Ollama
});

CLI Tool Provider

Use any CLI tool that accepts a prompt + text via stdin and returns the summary on stdout.

llmsTxt({
  aiProvider: 'cli',
  cliCommand: 'gemini summarize',
});

✅ GEO Linter

The GEO (Generative Engine Optimization) linter runs automatically during every build and emits warnings for pages that may be difficult for LLMs to consume:

Content exceeding 400 words — consider splitting into smaller sections.
Code fences without a language tag — add a language identifier (e.g., ```typescript).

To disable the linter:

llmsTxt({ geoLinter: false });

📄 Full Content Formats

Markdown (default)

llmsTxt({ llmsFull: true }); // generates llms-full.txt in Markdown

XML (`<document>` tags, compatible with Anthropic prompt format)

llmsTxt({ llmsFull: true, llmsFullFormat: 'xml' });

🗄️ Astro 5 DataStore Adapter

Pull content from an Astro 5 DataStore instead of scanning HTML output files. Useful for content-collection-heavy sites.

llmsTxt({ contentSource: 'datastore' }); // use DataStore only
llmsTxt({ contentSource: 'auto' });      // prefer DataStore, fall back to HTML
llmsTxt({ contentSource: 'html' });      // default: scan HTML build output

DataStore entries may include the following data fields:

🔪 Chunking Pipeline

Generate semantically segmented output alongside llms-full.txt. Useful for embedding pipelines and RAG systems.

llmsTxt({
  llmsFull: true,
  chunking: {
    strategy: 'structure', // 'none' | 'fixed' | 'recursive' | 'structure' | 'semantic'
    chunkSize: 1500,        // characters per chunk (for fixed/recursive)
    chunkOverlap: 200,      // overlap between adjacent chunks
  },
  chunkExport: 'jsonl',     // writes llms-chunks.jsonl to the build output
});

Chunking Strategies

Note: The semantic strategy requires the optional peer dependency @xenova/transformers:
npm install @xenova/transformers

JSONL Export Format

Each line in llms-chunks.jsonl is a JSON object:

{
  "text": "chunk content...",
  "metadata": { "title": "Page Title", "filePath": "/docs/guide", "topic": "guide", "index": 0 },
  "formatted": "# Page Title\n\nchunk content..."
}

🔌 MCP Integration (Model Context Protocol)

Enable the MCP integration to make your site's content available to AI coding assistants (Cursor, VS Code, etc.).

llmsTxt({ mcp: true }); // enable with defaults

Or with fine-grained control:

llmsTxt({
  mcp: {
    manifests: true,          // write .cursor/mcp.json, .vscode/mcp.json, .mcp.json at build
    devServer: true,          // serve a live SSE endpoint during `astro dev`
    serverPath: '/__mcp/sse', // custom endpoint path
  },
});

Generated Files

At build time the following manifest files are written to your project root:

.cursor/mcp.json
.vscode/mcp.json
.mcp.json

Each manifest registers:

astro-docs — SSE server pointing to <siteUrl><serverPath>
astro-docs-full — direct URL to llms-full.txt (when llmsFull: true)

Dev Server SSE Endpoint

During astro dev, a live SSE endpoint is available at /__mcp/sse (or your custom serverPath). It responds with a JSON-RPC 2.0 resources/list message containing all pages processed in the last build.

🏷️ `data-llm` Metadata

Embed structured JSON metadata directly on any HTML element to include it in llms-full.txt:

<div data-llm='{"type":"pricing","product":"Pro","monthly_cost":29}'>
  Pro Plan — $29/month
</div>

The metadata is extracted at build time and appended as LLM-readable comments to the full-content section of that page:

<!-- LLM Metadata [div]: {"type":"pricing","product":"Pro","monthly_cost":29} -->

🔖 `llms-optional` Meta Tag

Mark pages as optional in the llms.txt spec by adding:

<meta name="llms-optional" content="true" />

Optional pages are grouped under a separate ## Optional section in llms.txt.

⚙️ Configuration Options

`ChunkingOptions`

`McpOptions`

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@waldheimdev/astro-ai-llms-txt

🚀 Features

📋 Requirements

📦 Installation

🛠️ Usage

🧠 AI Provider Examples

Anthropic Claude

Google Gemini

Local LLM (Ollama)

CLI Tool Provider

✅ GEO Linter

📄 Full Content Formats

Markdown (default)

XML (<document> tags, compatible with Anthropic prompt format)

🗄️ Astro 5 DataStore Adapter

🔪 Chunking Pipeline

Chunking Strategies

JSONL Export Format

🔌 MCP Integration (Model Context Protocol)

Generated Files

Dev Server SSE Endpoint

🏷️ data-llm Metadata

🔖 llms-optional Meta Tag

⚙️ Configuration Options

ChunkingOptions

McpOptions

📄 License

XML (`<document>` tags, compatible with Anthropic prompt format)

🏷️ `data-llm` Metadata

🔖 `llms-optional` Meta Tag

`ChunkingOptions`

`McpOptions`