@alexcarol/astro-llms-txt
v0.1.1
Published
Astro integration that generates llms.txt and llms-full.txt at build time
Maintainers
Readme
astro-llms-txt
Astro integration that generates llms.txt and llms-full.txt at build
time by reading your rendered HTML output.
llms.txt- a lightweight index of all pages with links and descriptionsllms-full.txt- the full text content of every page
Install
npx astro add @alexcarol/astro-llms-txtNote:
astro addwill auto-generate the import namealexcarolllmsTxtfrom the scoped package name. You may want to rename it tollmsTxtfor readability.
Or install manually:
npm install @alexcarol/astro-llms-txtQuick start
If you used astro add, your config is already set up. Otherwise, add the integration manually:
// astro.config.mjs
import { defineConfig } from 'astro/config';
import llmsTxt from '@alexcarol/astro-llms-txt';
export default defineConfig({
site: 'https://example.com', // required
integrations: [llmsTxt()],
});That's it. With no options the integration will:
- Use the homepage
<h1>as the# heading(falls back to hostname) - Use
<h1>text as page titles (falls back to<title>if no<h1>is present) - Exclude
/404pages - Extract content from
<main>, descriptions from<meta name="description">
Options
All options are optional.
| Option | Type | Default | Description |
|---|---|---|---|
| name | string | homepage <h1>, then hostname | Site or brand name used as the # heading in generated files. |
| titleSource | 'h1' | 'title' | 'h1' | Which HTML element to read page titles from. 'h1' uses the first <h1> (falls back to <title>). 'title' uses the raw <title> tag text. |
| excludedPaths | string[] | ['404'] | Path segments to exclude. Matched without leading/trailing slashes (e.g. 'admin' excludes /admin/). |
Examples
// Zero-config
llmsTxt()
// Explicit name (recommended for personal/brand sites)
llmsTxt({ name: 'Alex Carol' })
// Use raw <title> tags instead of <h1>
llmsTxt({ name: 'My Site', titleSource: 'title' })
// Custom exclusions
llmsTxt({ name: 'My Site', excludedPaths: ['404', 'admin', 'drafts'] })Title source
The titleSource option controls where page titles come from:
'h1'(default) - reads the first<h1>on the page. This avoids the common "Page Title - Site Name" suffix pattern in<title>tags, producing cleaner output. Falls back to<title>if no<h1>is found.'title'- reads the raw<title>tag, including any site name suffix.
Most static sites use an <h1> that matches the clean page name, so the default works
without configuration.
How it works
The integration hooks into two Astro lifecycle events:
astro:config:done- validates thatsiteis set in your Astro config (throws if missing).astro:build:done- reads each builtindex.htmlin parallel, extracts page data, infers the site name from the homepage<h1>if not provided, then writes both output files.
Pages are sorted with the homepage first, then alphabetically by path.
Output format
Follows the llms.txt specification:
llms.txt
# Site Name
> Homepage description from meta tag
## Pages
- [Home](https://example.com/): Homepage description
- [About](https://example.com/about/): About page descriptionllms-full.txt
# Site Name
> Homepage description from meta tag
## [Home](https://example.com/)
Full text content of the homepage...
## [About](https://example.com/about/)
Full text content of the about page...Requirements
sitemust be set in your Astro config (the integration throws a clear error if missing)- Pages should use
<main>for primary content - Pages should have
<meta name="description">for summaries - For the default
titleSource: 'h1', pages should have an<h1>element
License
MIT
