@nuasite/llm-enhancements
v0.0.75
Published
Expose pages as .md endpoints for Astro
Readme
@nuasite/llm-enhancements
An Astro integration that exposes pages as .md endpoints. During development, any page can be accessed as markdown by appending .md to its URL. In production builds, corresponding .md files are generated alongside your HTML output.
Features
- Dev Server Support: Access any page as markdown (e.g.,
/about.md) - Build Output: Generates
.mdfiles duringastro build - Content Collections: Preserves original markdown from Astro content collections
- HTML to Markdown: Converts static pages to clean markdown
- Alternate Links: Injects
<link rel="alternate" type="text/markdown">into HTML - Frontmatter: Includes metadata like title, description, and source path
- LLM Discovery: Auto-generated
/.well-known/llm.mdendpoint for LLM-friendly site discovery - llms.txt: Auto-generated
/llms.txtfollowing the llms.txt convention for crawler/LLM guidance
Installation
bun add -D @nuasite/llm-enhancements
# or: npm install -D @nuasite/llm-enhancementsUsage
Add the integration to your astro.config.mjs:
import pageMarkdown from '@nuasite/llm-enhancements'
import { defineConfig } from 'astro/config'
export default defineConfig({
integrations: [
pageMarkdown({
// Optional configuration
contentDir: 'src/content',
includeStaticPages: true,
includeFrontmatter: true,
llmEndpoint: true, // or configure with options
llmsTxt: true, // or configure with options
}),
],
})How It Works
Development Mode
When running astro dev, any page can be accessed as markdown by appending .md to its URL:
/about → /about.md
/blog/hello → /blog/hello.md
/ → /index.mdThe dev server intercepts these requests and generates markdown on-the-fly.
Production Build
During astro build, the integration processes each page and generates a corresponding .md file in the dist directory:
dist/
├── index.html
├── index.md
├── about/
│ ├── index.html
├── about.md
└── blog/
└── hello/
├── index.html
└── hello.mdContent Collection Pages
For pages that come from Astro content collections, the integration reads the original markdown source and preserves it in the output:
---
title: My Blog Post
description: An example post
url: /blog/hello
type: collection
source: src/content/blog/hello.md
generatedAt: 2024-01-15T10:30:00.000Z
---
# My Blog Post
This is the original markdown content from the collection...Static Pages
For static .astro pages, the integration converts the rendered HTML to markdown:
---
title: About Us
description: Learn more about our company
url: /about
type: static
generatedAt: 2024-01-15T10:30:00.000Z
---
# About Us
Our company was founded in 2020...HTML Alternate Links
The integration automatically injects a <link> tag into HTML pages pointing to their markdown version:
<head>
<link rel="alternate" type="text/markdown" href="/about.md">
</head>LLM Discovery Endpoint
Note: The
/.well-known/llm.mdand/llms.txtendpoints require asiteto be configured in yourastro.config.mjsto generate valid absolute URLs. If no site is configured, these endpoints will be skipped with a warning.
The integration generates a /.well-known/llm.md endpoint that provides LLM-friendly site discovery information:
http://localhost:4321/.well-known/llm.mdThis endpoint includes:
- Site title and description (extracted from homepage metadata)
- List of all available markdown endpoints
- Usage instructions for accessing markdown versions
Example output:
---
generatedAt: 2024-01-15T10:30:00.000Z
---
# My Site
Welcome to my site.
## Markdown Endpoints
This site exposes page content as markdown at `.md` URLs.
### Pages
- [https://example.com/index.md](https://example.com/index.md) - My Site
- [https://example.com/about.md](https://example.com/about.md) - About Us
- [https://example.com/blog/hello.md](https://example.com/blog/hello.md) - Hello World
## Usage
Append `.md` to any page URL to get the markdown version:
- `https://example.com/about` → `https://example.com/about.md`
- `https://example.com/blog/hello` → `https://example.com/blog/hello.md`llms.txt Endpoint
The integration also generates a /llms.txt file following the llms.txt convention. This provides a standardized way to communicate site structure to LLMs and crawlers:
http://localhost:4321/llms.txtExample output:
# My Site
> Welcome to my site.
This site provides markdown versions of all pages for LLM consumption.
## LLM Discovery
- [LLM Discovery Endpoint](https://example.com/.well-known/llm.md): Full site map with all available markdown endpoints
## Markdown Endpoints
All pages are available as markdown by appending `.md` to the URL.
### Content
- [My Blog Post](https://example.com/blog/hello.md): /blog/hello
### Pages
- [My Site](https://example.com/index.md): /
- [About Us](https://example.com/about.md): /about
## Permissions
LLMs and crawlers are welcome to access markdown endpoints.Configuration Options
contentDir
- Type:
string - Default:
'src/content' - Directory containing Astro content collections.
includeStaticPages
- Type:
boolean - Default:
true - Whether to generate markdown for static (non-collection) pages.
includeFrontmatter
- Type:
boolean - Default:
true - Whether to include YAML frontmatter in the output.
llmEndpoint
- Type:
boolean | LlmEndpointOptions - Default:
true - Enable or configure the
/.well-known/llm.mdendpoint.
When set to true, the endpoint is enabled with default settings. You can also pass an options object:
pageMarkdown({
llmEndpoint: {
siteName: 'My Custom Site Name',
description: 'A custom description for my site',
additionalContent: '## Contact\n\nReach us at [email protected]',
},
})LlmEndpointOptions
| Option | Type | Description |
| ------------------- | -------- | ---------------------------------------------- |
| siteName | string | Override the site name in llm.md |
| description | string | Override the site description |
| baseUrl | string | Override base URL (defaults to Astro's site) |
| additionalContent | string | Additional markdown content to append |
Set to false to disable the endpoint entirely:
pageMarkdown({
llmEndpoint: false,
})llmsTxt
- Type:
boolean | LlmsTxtOptions - Default:
true - Enable or configure the
/llms.txtendpoint.
When set to true, the endpoint is enabled with default settings. URLs are generated using the site value from your Astro config. You can also pass an options object:
pageMarkdown({
llmsTxt: {
siteName: 'My Custom Site Name',
description: 'A custom description for my site',
baseUrl: 'https://example.com', // Override Astro's site config
allowCrawling: true,
instructions: 'Please be respectful of rate limits.',
additionalContent: '## Contact\n\nReach us at [email protected]',
},
})LlmsTxtOptions
| Option | Type | Description |
| ------------------- | --------- | ---------------------------------------------- |
| siteName | string | Override the site name in llms.txt |
| description | string | Override the site description |
| baseUrl | string | Override base URL (defaults to Astro's site) |
| allowCrawling | boolean | Whether crawling is allowed (default: true) |
| instructions | string | Custom instructions for LLMs |
| additionalContent | string | Additional content to append |
Set to false to disable the endpoint entirely:
pageMarkdown({
llmsTxt: false,
})HTML to Markdown Conversion
When converting static pages, the integration:
- Extracts main content from
<main>,<article>, or similar containers - Converts headings, paragraphs, lists, code blocks, tables, and links
- Excludes navigation, footer, header, scripts, and forms
- Extracts title and description from meta tags
- Cleans up excessive whitespace
Supported Elements
| HTML | Markdown |
| ---------------------- | --------------- |
| <h1> - <h6> | # - ###### |
| <p> | Paragraph |
| <strong>, <b> | **bold** |
| <em>, <i> | *italic* |
| <code> | `code` |
| <pre><code> | Code blocks |
| <a> | [text](url) |
| <img> |  |
| <ul>, <ol>, <li> | Lists |
| <blockquote> | > quote |
| <table> | Markdown tables |
Integration with @nuasite/cms-marker
When used alongside @nuasite/cms-marker, the integration can access content collection data through the CMS manifest. This is optional and works without it.
import cmsMarker from '@nuasite/cms-marker'
import pageMarkdown from '@nuasite/llm-enhancements'
import { defineConfig } from 'astro/config'
export default defineConfig({
integrations: [
cmsMarker(),
pageMarkdown(),
],
})Output Structure
Each markdown file includes:
interface MarkdownOutput {
/** YAML frontmatter fields */
frontmatter: Record<string, unknown>
/** Markdown body content */
body: string
/** Path to the original source file (if from collection) */
sourcePath?: string
}Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
