@bridgetoagent-com/llms-txt-validator
v0.1.0
Published
Validator for llms.txt — checks parser conformance against the llmstxt.org reference spec, link reachability, and missing sections.
Maintainers
Readme
llms-txt-validator
Validator for llms.txt — checks parser conformance against the reference spec, link reachability, and missing or malformed sections.
Ships as:
- A hosted web tool at bridgetoagent.com/tools/llms-txt-validator — paste, upload, or fetch by URL. No signup, no email gate.
- A CLI for local files and CI pipelines.
- A JavaScript / TypeScript library for embedding in your own validation flow.
MIT licensed. No telemetry. No external dependencies beyond fetch.
Install
npm install --save-dev @bridgetoagent-com/llms-txt-validator
# or
pnpm add -D @bridgetoagent-com/llms-txt-validator
# or
yarn add -D @bridgetoagent-com/llms-txt-validatorRequires Node 18 or newer (uses the native fetch API).
Quick start
CLI
# Validate a local file
npx @bridgetoagent-com/llms-txt-validator ./llms.txt
# Fetch and validate from a URL
npx @bridgetoagent-com/llms-txt-validator https://example.com/llms.txt
# Also verify every link is reachable
npx @bridgetoagent-com/llms-txt-validator ./llms.txt --check-links
# Machine-readable output for CI
npx @bridgetoagent-com/llms-txt-validator ./llms.txt --jsonExit codes:
| Code | Meaning |
| ---- | ------- |
| 0 | Valid (status: pass) |
| 1 | Warnings only (status: pass_with_warnings) |
| 2 | Errors found (status: fail) |
| 64 | Bad command-line usage |
| 66 | Could not read input |
Library
import { validate } from "@bridgetoagent-com/llms-txt-validator";
const source = await fs.readFile("./llms.txt", "utf8");
const report = await validate(source);
console.log(report.status); // "pass" | "pass_with_warnings" | "fail"
console.log(report.summary); // { errors, warnings, infos, sections, links }
console.log(report.issues); // [{ severity, code, message, line, ... }, ...]
console.log(report.parsed); // parsed document treeEnable reachability checking:
const report = await validate(source, {
checkReachability: true,
concurrency: 8, // default
timeoutMs: 5000, // default
slowThresholdMs: 3000 // default
});
console.log(report.reachability); // { checked, failed, slow }Parser only (no validation, no I/O):
import { parse } from "@bridgetoagent-com/llms-txt-validator";
const doc = parse(source);
// doc.title, doc.description, doc.sections[].links, etc.What gets checked
Structure (always on)
# Titleis present and is the first non-blank content- No duplicate H1 headings
- Optional
> blockquotedescription captured immediately after the title ## Sectionheadings used for resource groups (H2 level)- H3+ headings flagged (info — uncommon in llms.txt)
- Empty sections flagged
Link bullets
- [text](url)or- [text](url): descriptionsyntax- Malformed Markdown links flagged
- Empty link text or URL
- Relative URLs (
/docs/foo) — llms.txt is consumed by external agents, must be absolute - Fragment-only URLs (
#anchor) mailto:URLs flagged as unusualhttp://URLs flagged (preferhttps://)- Duplicate URLs (warning) and duplicate link text (info)
Reachability (opt-in)
Pass --check-links (CLI) or { checkReachability: true } (library) to enable network checks:
- Bounded-concurrency HEAD requests against every link
- Falls back to GET when HEAD returns 405 or 501
link-non-2xxfor 4xx/5xx responseslink-unreachablefor network errors and timeoutslink-slowwarning for links above the slow threshold (default 3s)
User-agent: bridgetoagent-llms-txt-validator/0.1 (+https://github.com/bridgetoagent/llms-txt-validator) — identifies itself so server logs aren't anonymous.
Issue codes
Stable identifiers for every kind of finding — pin on these in CI or tooling.
| Code | Severity | Meaning |
| ---- | -------- | ------- |
| missing-title | error | No # Title heading found |
| title-not-first | error | Content appears before the title |
| title-not-h1 | error | First heading is not H1 |
| duplicate-title | warning | Multiple H1 headings |
| section-wrong-level | info | H3+ heading where H2 is conventional |
| empty-section | warning | Section has no link bullets |
| malformed-link | error | Bullet line is not a valid Markdown link |
| link-missing-url | error | Link [text]() has no URL |
| link-empty-text | warning | Link [](url) has no display text |
| link-relative-url | warning | Root-relative URL — must be absolute |
| link-hash-only | warning | Fragment-only URL — meaningless to external agents |
| link-mailto | info | mailto: URL — unusual in llms.txt |
| link-non-https | warning | http:// — prefer https:// |
| duplicate-url | warning | Same URL appears more than once |
| duplicate-link-text | info | Same link text appears more than once |
| link-unreachable | error | Network error or timeout (reachability mode) |
| link-non-2xx | error | HTTP 4xx or 5xx response (reachability mode) |
| link-slow | warning | Response slower than threshold (reachability mode) |
| no-content-after-title | warning | Title exists but nothing else |
| trailing-whitespace | info | Line has trailing whitespace |
| tabs-instead-of-spaces | info | Line contains tab characters |
CI / pre-commit usage
GitHub Actions
- name: Validate llms.txt
run: npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-linksHusky / lint-staged
{
"lint-staged": {
"public/llms.txt": "@bridgetoagent-com/llms-txt-validator"
}
}Standalone script
#!/usr/bin/env bash
set -euo pipefail
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links --json > .llms-txt-report.jsonHow this compares to other tools
This validator focuses on conformance to the llmstxt.org reference spec plus link health.
It does not:
- Generate
llms.txtfrom a site's DOM (see BridgeToAgent for that) - Score how good the content of your
llms.txtis for any particular agent - Crawl the URLs it references to validate the linked content
Generating a complete agent-readiness kit from your real site is a separate problem — that's what the BridgeToAgent kit at bridgetoagent.com does ($49, generates agents.json + llms.txt + agent-instructions.md together).
Contributing
Bug reports, edge cases, and PRs welcome. See CONTRIBUTING.md.
Good first issues:
- Real-world
llms.txtfiles we miss-parse (open an issue with the source URL) - Additional issue codes for spec corner cases
- Performance improvements on large files
License
MIT — copyright 2026 BridgeToAgent editorial.
See also
- The llmstxt.org reference specification
- BridgeToAgent's agent-readiness blog coverage
- The Lighthouse Agentic Browsing audit suite, which includes the
llms-txt-well-formedaudit this validator helps you pass
