@casoon/astro-site-files
v0.1.6
Published
Astro integration that generates robots.txt, llms.txt, sitemap.xml, security.txt, and humans.txt at build time from typed configuration.
Maintainers
Readme
@casoon/astro-site-files
Astro integration that generates all standard site meta-files from typed configuration at build time.
What it does
- Generates
robots.txt— crawl rules with per-agent overrides and automatic sitemap reference - Generates
llms.txt— AI model discovery file following the llmstxt.org specification - Generates
sitemap.xml— built-in, enabled by default, with i18n hreflang and sitemap-index support - Generates
/.well-known/security.txt— vulnerability disclosure contact per RFC 9116 - Generates
humans.txt— team and technology credits per humanstxt.org
All files are written to the build output directory when astro build runs.
Successor package. This integration replaces @casoon/astro-crawler-policy (robots.txt + llms.txt) and @casoon/astro-sitemap (sitemap.xml). Both predecessor packages are no longer actively maintained.
Installation
npm install @casoon/astro-site-filesQuick start
// astro.config.ts
import { defineConfig } from 'astro/config'
import siteFiles from '@casoon/astro-site-files'
export default defineConfig({
site: 'https://example.com',
integrations: [
siteFiles({
robots: { disallow: ['/admin'] },
llms: { title: 'Example', description: 'An example website.' },
security: { contact: 'mailto:[email protected]' },
humans: {
team: [{ name: 'Alice', role: 'Development' }],
technology: ['Astro', 'TypeScript']
}
})
]
})robots.txt and sitemap.xml are enabled by default. The other three files are generated only when their option is configured.
robots.txt
siteFiles({
robots: {
disallow: ['/admin', '/private/'],
allow: ['/admin/public/'],
crawlDelay: 2,
sitemap: true, // auto-derive from astro.config site URL (default)
agents: [
{
userAgent: 'Googlebot',
crawlDelay: 1
}
]
}
})Option reference:
| Option | Type | Default | Description |
|---|---|---|---|
| disallow | string[] | [] | Paths to disallow for User-agent: * |
| allow | string[] | [] | Paths to explicitly allow for User-agent: * |
| crawlDelay | number | — | Crawl-delay for User-agent: * |
| sitemap | boolean \| string | true | true = derive URL from astro.config.site, string = explicit URL, false = omit |
| agents | AgentRule[] | [] | Additional per-agent rule blocks |
Each entry in agents:
| Field | Type | Description |
|---|---|---|
| userAgent | string \| string[] | User-agent value(s) |
| allow | string[] | Paths to allow |
| disallow | string[] | Paths to disallow |
| crawlDelay | number | Crawl-delay for this agent |
Disable: robots: false
Generated output:
User-agent: *
Disallow: /admin
Disallow: /private/
Allow: /admin/public/
Crawl-delay: 2
User-agent: Googlebot
Crawl-delay: 1
Sitemap: https://example.com/sitemap.xmlllms.txt
Follows the llmstxt.org specification. Provides structured metadata for AI models discovering what your site is about.
siteFiles({
llms: {
title: 'Example',
description: 'An example website focused on TypeScript tooling.',
details: 'This site documents internal tools and workflows.',
sections: [
{
title: 'Documentation',
links: [
{ title: 'Getting started', url: '/docs/start', description: 'Setup guide' },
{ title: 'API reference', url: '/docs/api' }
]
}
]
}
})Option reference:
| Option | Type | Description |
|---|---|---|
| title | string | Required. Site or project name |
| description | string | Short description rendered as a blockquote |
| details | string | Additional plain-text context |
| sections | Section[] | Named sections with link lists |
Each entry in sections:
| Field | Type | Description |
|---|---|---|
| title | string | Section heading |
| links | Link[] | Optional list of links |
Each entry in links:
| Field | Type | Description |
|---|---|---|
| title | string | Link label |
| url | string | Absolute or relative URL |
| description | string | Optional inline description after the link |
Disable: Omit the option or set llms: false
Generated output:
# Example
> An example website focused on TypeScript tooling.
This site documents internal tools and workflows.
## Documentation
- [Getting started](/docs/start): Setup guide
- [API reference](/docs/api)sitemap.xml
Sitemap generation is built-in and enabled by default. Static pages are discovered automatically from Astro's build output. Dynamic URLs can be added via sources.
siteFiles({
sitemap: {
exclude: ['/landing/'],
priority: [{ pattern: '/blog/', priority: 0.9 }],
sources: [
async () => {
const posts = await getCollection('blog')
return posts.map(p => ({ loc: `/blog/${p.id}/`, lastmod: p.data.date }))
}
]
}
})Option reference:
| Option | Type | Description |
|---|---|---|
| siteUrl | string | Override the site URL (auto-detected from astro.config.site) |
| sources | SitemapSource[] | Async functions returning additional SitemapEntry[] |
| exclude | (string \| RegExp)[] | URL paths or patterns to exclude |
| filter | (url: string) => boolean | Custom filter on the full absolute URL |
| priority | PriorityRule[] | Pattern-based priority overrides (first match wins) |
| changefreq | ChangefreqRule[] | Pattern-based changefreq overrides (first match wins) |
| serialize | (entry) => entry \| undefined | Per-item transform or filter hook |
| i18n | { defaultLocale, locales } | Generates <xhtml:link rel="alternate"> hreflang entries |
| output.mode | 'single' \| 'index' | index splits into numbered chunks (auto when > maxUrls) |
| output.maxUrls | number | Max URLs per file in index mode — default 50 000 |
| output.filename | string | Output filename — default sitemap.xml |
| audit.warnOnEmpty | boolean | Warn when sitemap has zero entries — default true |
| audit.errorOnDuplicates | boolean | Emit error instead of warning for duplicate URLs — default false |
Built-in exclusions (always applied): /404, /500, /_*, /api/, /landing/, /drafts/, sitemap.xml, robots.txt, llms.txt, rss.xml.
Built-in priority defaults: / → 1.0, depth 1 → 0.9, depth 2 → 0.8, depth 3+ → 0.7
Built-in changefreq defaults: / and content paths (/blog/, /artikel/, etc.) → weekly, everything else → monthly
Disable: sitemap: false
security.txt
Generated at /.well-known/security.txt per RFC 9116. The contact field is required by the specification.
siteFiles({
security: {
contact: 'mailto:[email protected]',
policy: 'https://example.com/security-policy',
acknowledgments: 'https://example.com/hall-of-fame',
preferredLanguages: ['en', 'de'],
expires: '2027-01-01T00:00:00.000Z',
hiring: 'https://example.com/jobs'
}
})Option reference:
| Option | Type | Description |
|---|---|---|
| contact | string \| string[] | Required. mailto: or https: URI for reporting vulnerabilities |
| policy | string | URL of the security policy |
| acknowledgments | string | URL of the acknowledgments or hall-of-fame page |
| preferredLanguages | string[] | BCP 47 language tags, e.g. ['en', 'de'] |
| expires | string \| Date | ISO 8601 expiry date — when to renew the file |
| encryption | string | URL of the PGP public key |
| canonical | string | Canonical URL of this security.txt file |
| hiring | string | URL of a security-focused jobs page |
Disable: Omit the option or set security: false
Generated output:
Contact: mailto:[email protected]
Expires: 2027-01-01T00:00:00.000Z
Acknowledgments: https://example.com/hall-of-fame
Preferred-Languages: en, de
Policy: https://example.com/security-policy
Hiring: https://example.com/jobshumans.txt
Follows the humanstxt.org convention.
siteFiles({
humans: {
team: [
{ name: 'Alice', role: 'Development', location: 'Berlin' },
{ name: 'Bob', role: 'Design', twitter: '@bob' }
],
thanks: ['Open Source Community', 'Our early users'],
technology: ['Astro', 'TypeScript', 'Tailwind CSS'],
note: 'Built with care.'
}
})Option reference:
| Option | Type | Description |
|---|---|---|
| team | TeamMember[] | List of team members |
| thanks | string[] | Acknowledgment entries |
| technology | string[] | Technologies used — rendered as a comma-separated list |
| note | string | Free-form note |
| lastUpdate | string \| Date | Defaults to the build date |
Each entry in team:
| Field | Type | Description |
|---|---|---|
| name | string | Required. Full name |
| role | string | Job title or role |
| twitter | string | Twitter / X handle |
| location | string | City or country |
| email | string | Contact email |
Disable: Omit the option or set humans: false
Generated output:
/* TEAM */
Name: Alice
Role: Development
Location: Berlin
/* SITE LAST UPDATED */
2026-05-06
/* TECHNOLOGY COLOPHON */
Astro, TypeScript, Tailwind CSSBuild-time audit hints
The integration emits build-time hints when configuration looks incomplete or incorrect. Each hint has a rule ID, a level (info / warn), and a help message.
All rule IDs:
| Rule ID | Level | Triggered when |
|---|---|---|
| robots/legal-pages-blocked | warn | A legal page (/privacy, /terms, /impressum, …) is in disallow |
| llms/no-description | info | llms has no description |
| llms/no-sections | info | llms has no sections |
| llms/sections-without-links | info | Sections exist but none have links |
| security/no-expires | warn | security has no expires date (required by RFC 9116) |
| security/no-policy | info | security has no policy URL |
| humans/no-team | info | humans has no team entries |
| humans/no-technology | info | humans has no technology entries |
Disable all hints:
siteFiles({ audit: false })Suppress specific rules:
siteFiles({
audit: {
disable: [
'llms/no-description',
'security/no-expires',
],
},
})audit option reference:
| Option | Type | Description |
|---|---|---|
| enabled | boolean | Set to false to silence all hints |
| disable | string[] | Rule IDs to suppress individually |
Passing audit: false is equivalent to audit: { enabled: false }.
Option defaults
| Option | Default behavior |
|---|---|
| robots | Enabled — generates robots.txt with Disallow: (allow all) |
| llms | Disabled — requires { title } |
| sitemap | Enabled — built-in sitemap generation from Astro's build output |
| security | Disabled — requires { contact } |
| humans | Disabled — generates when any option is provided |
| audit | Enabled — emits build-time hints for all generated files |
Programmatic usage
The renderer functions are exported for use outside of the Astro integration:
import {
renderRobotsTxt,
renderLlmsTxt,
renderSecurityTxt,
renderHumansTxt,
renderSitemapXml,
renderSitemapIndex,
resolveEntry,
deduplicateEntries,
auditSitemap,
auditRobots,
auditLlms,
auditSecurity,
auditHumans,
filterIssues,
} from '@casoon/astro-site-files'
import type { AuditOptions, AuditIssue } from '@casoon/astro-site-files'
const robots = renderRobotsTxt({ disallow: ['/admin'] }, 'https://example.com')
const llms = renderLlmsTxt({ title: 'My Site', description: 'A site.' })
const security = renderSecurityTxt({ contact: 'mailto:[email protected]' })
const humans = renderHumansTxt({ team: [{ name: 'Alice' }], technology: ['Astro'] })
const entries = [{ loc: '/blog/post/' }].map(e => resolveEntry(e, {}, 'https://example.com'))
const xml = renderSitemapXml(deduplicateEntries(entries))This package covers static file generation. Actual crawl enforcement depends on whether bots respect these files — many do not.
