npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@xiaozhu2007/robotstxt-parser

v0.1.1

Published

A comprehensive JavaScript library for parsing, validating, and generating robots.txt files with ease.

Downloads

10

Readme

robotstxt-parser — Robots.txt Parser

A comprehensive JavaScript library for parsing, validating, and generating robots.txt files with ease.

✨ Features

  • 📖 Parse robots.txt files with full directive support
  • Validate robots.txt content and catch common mistakes
  • 🔨 Generate robots.txt files programmatically
  • 🎯 Check URL permissions for specific user agents
  • ⏱️ Extract crawl delays and sitemap URLs
  • 🔗 Fluent API with method chaining support

📦 Installation

bun install @xiaozhu2007/robotstxt-parser

🚀 Quick Start

Parsing a robots.txt file

import { RobotsParser } from "@xiaozhu2007/robotstxt-parser";

const parser = new RobotsParser();
const content = `
User-agent: *
Disallow: /admin/
Allow: /admin/public/
Crawl-delay: 10

Sitemap: https://example.com/sitemap.xml
`;

parser.parse(content);

// Check if a URL is allowed
console.log(parser.isAllowed("Googlebot", "/admin/public/page.html")); // true
console.log(parser.isAllowed("Googlebot", "/admin/secret.html")); // false

// Get crawl delay
console.log(parser.getCrawlDelay("Googlebot")); // 10

// Get sitemaps
console.log(parser.getSitemaps()); // ['https://example.com/sitemap.xml']

Building a robots.txt file

import { RobotsBuilder } from "@xiaozhu2007/robotstxt-parser";

const robots = new RobotsBuilder()
  .userAgent("*")
  .disallow("/admin/")
  .disallow("/private/")
  .allow("/admin/public/")
  .crawlDelay(5)
  .userAgent("Googlebot")
  .disallow("/temp/")
  .sitemap("https://example.com/sitemap.xml")
  .build();

console.log(robots);

Output:

# Robots.txt file generated by RobotsParser

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public/
Crawl-delay: 5

User-agent: Googlebot
Disallow: /temp/

# Sitemaps
Sitemap: https://example.com/sitemap.xml

📚 API Reference

RobotsParser

parse(content)

Parse robots.txt content and extract all directives.

Parameters:

  • content (string): The robots.txt file content

Returns: RobotsParser - Returns the parser instance for chaining

isAllowed(userAgent, url)

Check if a URL is allowed for a specific user agent.

Parameters:

  • userAgent (string): The user agent string
  • url (string): The URL path to check

Returns: boolean - True if allowed, false if disallowed

getCrawlDelay(userAgent)

Get the crawl delay for a specific user agent.

Parameters:

  • userAgent (string): The user agent string

Returns: number|null - Crawl delay in seconds or null if not set

getSitemaps()

Get all sitemap URLs from the robots.txt file.

Returns: string[] - Array of sitemap URLs

validate()

Validate the parsed robots.txt content.

Returns: Object with validation results:

{
  valid: boolean,
  errors: Array<{type: string, message: string, line?: number}>,
  warnings: Array<{type: string, message: string}>
}

generate(options)

Generate a robots.txt file from the current rules.

Parameters:

  • options (object): Generation options
    • includeComments (boolean): Include header comments (default: true)
    • sortUserAgents (boolean): Sort user agents alphabetically (default: false)

Returns: string - Generated robots.txt content

getSummary()

Get a summary of the parsed robots.txt file.

Returns: Object with summary information:

{
  userAgents: string[],
  totalRules: number,
  sitemapCount: number,
  commentCount: number
}

RobotsBuilder

userAgent(userAgent)

Set the current user agent for subsequent rules.

Parameters:

  • userAgent (string): User agent name (e.g., '*', 'Googlebot')

Returns: RobotsBuilder - Returns the builder for chaining

disallow(path)

Add a disallow rule for the current user agent.

Parameters:

  • path (string): Path to disallow

Returns: RobotsBuilder - Returns the builder for chaining

allow(path)

Add an allow rule for the current user agent.

Parameters:

  • path (string): Path to allow

Returns: RobotsBuilder - Returns the builder for chaining

crawlDelay(seconds)

Set crawl delay for the current user agent.

Parameters:

  • seconds (number): Delay in seconds

Returns: RobotsBuilder - Returns the builder for chaining

sitemap(url)

Add a sitemap URL.

Parameters:

  • url (string): Sitemap URL

Returns: RobotsBuilder - Returns the builder for chaining

build(options)

Build and return the robots.txt content.

Parameters:

  • options (object): Same as RobotsParser.generate() options

Returns: string - Generated robots.txt content

🔍 Advanced Examples

Validating a robots.txt file

const parser = new RobotsParser();
parser.parse(content);

const validation = parser.validate();

if (!validation.valid) {
  console.error("Validation errors:", validation.errors);
}

if (validation.warnings.length > 0) {
  console.warn("Warnings:", validation.warnings);
}

Handling wildcards and patterns

const parser = new RobotsParser();
parser.parse(`
User-agent: *
Disallow: /*.json$
Allow: /api/*.json$
Disallow: /temp*
`);

console.log(parser.isAllowed("*", "/data.json")); // false
console.log(parser.isAllowed("*", "/api/users.json")); // true
console.log(parser.isAllowed("*", "/temporary")); // false

Getting detailed summaries

const parser = new RobotsParser();
parser.parse(content);

const summary = parser.getSummary();
console.log(
  `Found ${summary.totalRules} rules for ${summary.userAgents.length} user agents`,
);
console.log(`Sitemaps: ${summary.sitemapCount}`);

🎯 Supported Directives

  • User-agent: Specify target crawler
  • Disallow: Block access to paths
  • Allow: Explicitly allow access to paths
  • Crawl-delay: Set delay between requests
  • Sitemap: Specify sitemap locations
  • ✅ Pattern matching with * and $
  • ✅ Comments with #

🤝 Contributing

To install dependencies:

bun install

To run:

bun run index.ts

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - feel free to use this library in your projects!

🐛 Bug Reports

If you discover any bugs, please create an issue with detailed information about the problem.