@silupanda/llms-txt
v0.2.1
Published
Auto-generate and serve llms.txt for AI agent discoverability
Downloads
19
Maintainers
Readme
llms-txt
Make your website discoverable by AI agents in minutes.
Every AI agent — ChatGPT, Claude, Perplexity, Copilot — needs to understand what your site offers. Instead of letting them blindly crawl, guess, or hallucinate, llms-txt auto-generates a structured llms.txt manifest that tells agents exactly what's on your site, organized and ready to consume.
Point it at your HTML files, a sitemap, or a live URL. It discovers your pages, extracts titles, descriptions, headings, JSON-LD, and OpenGraph metadata, organizes everything into sections, and serves the result as drop-in middleware for Express, Fastify, Next.js, or Hono.
Features
- 🔍 Auto-Discovery — Scan a local directory, parse a sitemap, or crawl a live site. Combine all three — results are deduplicated automatically.
- 🧠 Smart Extraction — Pulls titles, descriptions, headings, JSON-LD structured data, and OpenGraph metadata from raw HTML using Cheerio.
- 📑 Organized Sections — Group pages by path prefix (
/docs,/blog) or explicit page lists. Unmatched pages go to "Other" automatically. - 📄 Three Output Formats — Concise
llms.txtindex,llms-full.txtwith full page content, andllms-small.txtwith just titles and URLs. - 🔌 Framework Adapters — Drop-in middleware for Express, Fastify, Next.js (App Router), and Hono. Two lines of code.
- ⚡ Built-in Caching — TTL-based in-memory cache so you're not regenerating on every request.
- 👁️ Watch Mode — Auto-regenerate when your source files change.
- 🖥️ CLI — Generate files from the command line, from a config file, or in watch mode.
- 🇹 Type-Safe — 100% TypeScript with full type exports.
Installation
npm install @silupanda/llms-txtRequires Node.js >= 18.
Quick Start
Generate from a local directory
import { generate } from '@silupanda/llms-txt';
const output = await generate({
site: {
name: 'My Docs',
url: 'https://docs.example.com',
description: 'Developer documentation for the Example platform.',
},
discover: { dir: './public' },
sections: [
{ name: 'Guides', pathPrefix: '/guides' },
{ name: 'API Reference', pathPrefix: '/api' },
],
});
console.log(output.llmsTxt); // llms.txt content
console.log(output.llmsFullTxt); // llms-full.txt content
console.log(output.pages); // PageInfo[] with extracted metadataGenerate from a live website
const output = await generate({
site: {
name: 'Express.js',
url: 'https://expressjs.com',
description: 'Fast, unopinionated, minimalist web framework for Node.js.',
},
discover: {
crawlUrl: 'https://expressjs.com', // follows internal links
maxDepth: 2,
},
sections: [
{ name: 'Getting Started', pathPrefix: '/en/starter' },
{ name: 'Guide', pathPrefix: '/en/guide' },
{ name: 'API', pathPrefix: '/en/api' },
],
});Generate from a sitemap
const output = await generate({
site: {
name: 'Vite',
url: 'https://vite.dev',
description: 'Next generation frontend tooling.',
},
discover: { sitemapUrl: 'https://vite.dev/sitemap.xml' },
sections: [
{ name: 'Guide', pathPrefix: '/guide' },
{ name: 'Config', pathPrefix: '/config' },
],
});Manual pages (no discovery)
const output = await generate({
site: {
name: 'My Site',
url: 'https://example.com',
description: 'A brief description of the site.',
},
pages: [
{ path: '/', title: 'Home', description: 'Welcome to our site' },
{ path: '/about', title: 'About Us', description: 'Company overview' },
{ path: '/pricing', title: 'Pricing', description: 'Plans and pricing' },
],
sections: [
{ name: 'Product', pages: ['/pricing'] },
],
});Output Preview
llms.txt
# My Docs
> Developer documentation for the Example platform.
## Guides
- [Getting Started](https://docs.example.com/guides/getting-started): Step-by-step setup guide
- [Authentication](https://docs.example.com/guides/auth): How to authenticate API requests
## API Reference
- [REST API](https://docs.example.com/api/rest): Full REST endpoint reference
- [Webhooks](https://docs.example.com/api/webhooks): Receive real-time event notificationsllms-full.txt
Same structure with full page content included under each entry:
# My Docs
> Developer documentation for the Example platform.
## Guides
### Getting Started
URL: https://docs.example.com/guides/getting-started
Step-by-step setup guide for the Example platform. Install the SDK,
configure your API key, and make your first request in under 5 minutes...
---
### Authentication
URL: https://docs.example.com/guides/auth
How to authenticate API requests using API keys or OAuth tokens...llms-small.txt
Ultra-compact format with just titles and URLs — ideal for token-constrained scenarios:
# My Docs
## Guides
- [Getting Started](https://docs.example.com/guides/getting-started)
- [Authentication](https://docs.example.com/guides/auth)
## API Reference
- [REST API](https://docs.example.com/api/rest)
- [Webhooks](https://docs.example.com/api/webhooks)Framework Integration
Express
import express from 'express';
import { llmsTxt } from '@silupanda/llms-txt/express';
const app = express();
app.use(llmsTxt({
site: { name: 'My App', url: 'https://example.com', description: 'My application' },
discover: { dir: './public' },
}));
// Serves /llms.txt, /llms-full.txt, /llms-small.txt, and per-page .md endpoints
app.listen(3000);Next.js (App Router)
// app/llms.txt/route.ts
import { serveLlmsTxt } from '@silupanda/llms-txt/next';
export const GET = serveLlmsTxt({
site: { name: 'My App', url: 'https://example.com', description: 'My application' },
pages: [
{ path: '/docs', title: 'Documentation', description: 'API docs' },
],
});Or use the { GET } export pattern:
// app/llms.txt/route.ts
import { createLlmsTxtHandler } from '@silupanda/llms-txt/next';
export const { GET } = createLlmsTxtHandler({
site: { name: 'My App', url: 'https://example.com', description: 'My application' },
discover: { dir: './public' },
});Fastify
import Fastify from 'fastify';
import { llmsTxtPlugin } from '@silupanda/llms-txt/fastify';
const app = Fastify();
app.register(llmsTxtPlugin({
site: { name: 'My App', url: 'https://example.com', description: 'My application' },
discover: { dir: './public' },
}));
app.listen({ port: 3000 });Hono
import { Hono } from 'hono';
import { llmsTxt } from '@silupanda/llms-txt/hono';
const app = new Hono();
app.use('*', llmsTxt({
site: { name: 'My App', url: 'https://example.com', description: 'My application' },
discover: { dir: './public' },
}));
export default app;CLI
# From a local directory
npx llms-txt generate \
--name "My Docs" \
--description "Developer documentation" \
--dir ./public \
--output ./dist
# From a sitemap
npx llms-txt generate \
--name "My Site" \
--description "Main website" \
--sitemap https://example.com/sitemap.xml \
--output ./dist
# Crawl a live site
npx llms-txt generate \
--name "My Site" \
--description "Main website" \
--url https://example.com \
--output ./dist
# Watch mode (auto-regenerate on file changes)
npx llms-txt generate --name "My Site" --description "..." --dir ./public --watch
# From a config file
npx llms-txt generate --config ./llms-txt.config.ts
# CI check (exit 1 if output would change)
npx llms-txt generate --config ./llms-txt.config.ts --check
# Validate an existing llms.txt file
npx llms-txt validate ./dist/llms.txtCLI Options
| Option | Description |
|--------|-------------|
| -u, --url <url> | Site URL (used as base URL and for crawling) |
| -d, --dir <dir> | Directory to scan for HTML files |
| -s, --sitemap <url> | Sitemap URL to parse |
| -n, --name <name> | Site name (required unless using --config) |
| --description <text> | Site description (required unless using --config) |
| -o, --output <dir> | Output directory (default: .) |
| -c, --config <file> | Config file path (JS/TS with default export) |
| --no-full | Skip generating llms-full.txt |
| --no-small | Skip generating llms-small.txt |
| --check | Exit non-zero if output files would change (for CI) |
| --dry-run | Preview output without writing files |
| -w, --watch | Watch source directory and regenerate on changes |
Configuration
LlmsTxtConfig
interface LlmsTxtConfig {
site: {
name: string; // Site name — rendered as the # heading
url: string; // Base URL for resolving page paths
description: string; // Site description — rendered as > blockquote
};
pages?: PageConfig[]; // Manually specified pages
discover?: DiscoverOptions; // Auto-discovery settings
sections?: SectionConfig[]; // Page grouping rules
options?: GenerateOptions; // Generation settings
}Discovery Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| dir | string | — | Directory to scan for HTML files. |
| sitemapUrl | string | — | Sitemap URL to fetch and parse for page URLs. |
| crawlUrl | string | — | URL to crawl by following internal links. |
| maxDepth | number | 3 | Maximum crawl depth (link hops from start URL). |
| include | string[] | ['**/*.html'] | Glob patterns to include when scanning a directory. |
| exclude | string[] | [] | Glob patterns to exclude. |
All three discovery methods can be combined — results are deduplicated by normalized path.
Sections
Pages can be organized into named sections using path prefixes or explicit page lists:
const config = defineConfig({
// ...
sections: [
{ name: 'Documentation', pathPrefix: '/docs', description: 'Technical guides' },
{ name: 'Blog', pathPrefix: '/blog' },
{ name: 'Legal', pages: ['/terms', '/privacy'] },
],
});First match wins — a page is assigned to the first section whose pathPrefix matches or whose pages list includes it. Unmatched pages are placed in an "Other" section automatically.
Generation Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| full | boolean | true | Generate llms-full.txt alongside llms.txt. |
| small | boolean | true | Generate llms-small.txt (titles + URLs only). |
| preamble | string | — | Custom markdown inserted after the site description. |
| additionalSections | Record<string, string> | — | Extra sections appended to the end of the output. |
| cacheTtl | number | 60000 | Cache TTL in milliseconds. Set to 0 to disable caching. |
| filterPages | (page: PageInfo) => boolean | — | Custom filter — return false to exclude a page from output. |
| maxContentLength | number | — | Truncate page content to this many characters (with ...). |
| noiseSelectors | string[] | — | Extra CSS selectors for noise elements to strip (e.g., '.cookie-banner'). |
Config File
Create a llms-txt.config.ts (or .js) with a default export:
// llms-txt.config.ts
import { defineConfig } from '@silupanda/llms-txt';
export default defineConfig({
site: {
name: 'My Docs',
url: 'https://docs.example.com',
description: 'Developer documentation for the Example platform.',
},
discover: {
dir: './public',
include: ['**/*.html'],
exclude: ['**/drafts/**'],
},
sections: [
{ name: 'Guides', pathPrefix: '/guides' },
{ name: 'API Reference', pathPrefix: '/api' },
],
options: {
preamble: 'This documentation covers the Example REST API and client SDKs.',
cacheTtl: 300_000,
filterPages: (page) => !page.path.includes('/internal/'),
},
});Return Values
generate() → LlmsTxtOutput
{
llmsTxt: string; // The generated llms.txt content
llmsFullTxt?: string; // The generated llms-full.txt content (when full: true)
llmsSmallTxt?: string; // The generated llms-small.txt content (when small: true)
pages: PageInfo[]; // All resolved pages with extracted metadata
generatedAt: Date; // Timestamp of generation
tokenCounts?: { // Approximate token counts (~4 chars/token)
llmsTxt: number;
llmsFullTxt?: number;
llmsSmallTxt?: number;
};
}PageInfo (extracted from HTML)
{
path: string; // URL path (e.g., "/docs/getting-started")
title: string; // From <title> or <h1>
description: string; // From <meta name="description">
content: string; // Main text content (from <main>, <article>, or <body>)
headings: string[]; // All h2/h3 heading text
structuredData?: Record<string, unknown>[]; // JSON-LD data
og?: { // OpenGraph metadata
title?: string;
description?: string;
type?: string;
image?: string;
};
}Core API
For advanced use cases, all internal modules are exported:
import {
// High-level
generate, // Full pipeline: discover → extract → generate
defineConfig, // Type-safe config helper
// Discovery
discoverPages, // Run all discovery methods and deduplicate
discoverFromDirectory, // Scan local HTML files
discoverFromSitemap, // Fetch and parse a sitemap
crawlUrl, // Crawl a URL following internal links
// Extraction
extractPageInfo, // Parse HTML → PageInfo with cheerio
// Generation
generateLlmsTxt, // PageInfo[] → llms.txt string
generateLlmsFullTxt, // PageInfo[] → llms-full.txt string
generateLlmsSmallTxt, // PageInfo[] → llms-small.txt string
estimateTokens, // Rough token count (~4 chars/token)
// Validation
validateLlmsTxt, // Validate llms.txt against the spec
// Utilities
Cache, // TTL-based in-memory cache
watch, // File system watcher for auto-regeneration
} from '@silupanda/llms-txt';License
MIT
