robots-txt-kit
v0.1.0
Published
Parse and evaluate robots.txt rules with structured diagnostics.
Downloads
134
Maintainers
Readme
robots-txt-kit
Parse and evaluate robots.txt rules with structured diagnostics.
robots-txt-kit is a clean-room TypeScript draft for tools that need to inspect crawl policy without fetching files, caching domains, or depending on Node-only APIs.
Install
npm install robots-txt-kitQuick Start
import { checkRobotsTxt, parseRobotsTxt } from "robots-txt-kit";
const robots = `
User-agent: *
Disallow: /private
Allow: /private/public
Sitemap: https://example.com/sitemap.xml
`;
const parsed = parseRobotsTxt(robots);
const decision = checkRobotsTxt(robots, "https://example.com/private/public/page", {
userAgent: "ExampleBot"
});
console.log(parsed.document.sitemaps);
console.log(decision.allowed); // true
console.log(decision.rule?.line); // 4API
parseRobotsTxt(input)
Parses a string into groups, rules, sitemaps and diagnostics. Expected problems return stable diagnostics instead of throwing.
const result = parseRobotsTxt("User-agent: *\nDisallow: /tmp");
if (result.ok) {
console.log(result.document.groups[0]?.rules);
}checkRobotsTxt(input, urlOrPath, options?)
Parses and evaluates in one call. urlOrPath may be an absolute URL or a path beginning with /.
checkRobotsTxt("User-agent: *\nDisallow: /*.json$", "/feed.json");matchRobotsTxt(document, urlOrPath, options?)
Evaluates a pre-parsed document.
const parsed = parseRobotsTxt(robots);
const decision = matchRobotsTxt(parsed.document, "/admin", {
userAgent: "Googlebot"
});listRobotsTxtSitemaps(input)
Small helper for extracting valid Sitemap: directives.
Options
| Option | Default | Description |
| --- | --- | --- |
| userAgent | "*" | User agent used to select the best group. Matching is lowercase and substring-based. |
| defaultAllowed | true | Decision when no matching group or rule exists. |
Diagnostics
Diagnostics are objects with stable code values and optional line numbers:
invalid-inputinvalid-optionsempty-inputmissing-colonempty-directiveempty-user-agentrule-before-user-agentunsupported-directiveinvalid-crawl-delayinvalid-url
Scope
The MVP supports:
User-agent,Allow,Disallow,SitemapandCrawl-delay;- grouped adjacent
User-agentlines; - merging rules from multiple groups with the same best matching user-agent;
- wildcard
*and end-anchor$path matching; - percent-encoding normalization for path inputs such as
/café; - most-specific rule selection, with
Allowwinning specificity ties; - browser, worker and build-tool usage with no runtime dependencies.
It intentionally does not fetch remote robots.txt files, cache domains, implement every crawler-specific extension, ship a public suffix list, or replace crawler-specific validators. Treat it as a portable inspector for local policy checks.
Package quality
- TypeScript types are generated from the source.
- ESM-only package with no runtime dependencies.
- Defensive API: invalid inputs and invalid runtime options return diagnostics instead of throwing.
- CI runs
npm ci,typecheck,build, andtest. - Tested on Node.js 20 and 22 with GitHub Actions.
License
MPL-2.0
