npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@playfulsparkle/robotstxt-js

v1.0.10

Published

A lightweight, Open Source robots.txt parser written in JavaScript

Readme

robotstxt.js

robotstxt.js is a lightweight JavaScript library for parsing robots.txt files. It provides a compliant solution in both browser and Node.js environments.

Directives

  • Clean-param
  • Host
  • Sitemap
  • User-agent
    • Allow
    • Disallow
    • Crawl-delay
    • Cache-delay
    • Comment
    • NoIndex
    • Request-rate
    • Robot-version
    • Visit-time

Benefits

  • Accurately parse and interpret robots.txt rules.
  • Ensure compliance with robots.txt standards to avoid accidental blocking of legitimate bots.
  • Easily check URL permissions for different user agents programmatically.
  • Simplify the process of working with robots.txt in JavaScript applications.

Usage

Here's how to use robotstxt.js to analyze robots.txt content and check crawler permissions.

Node.js

const { robotstxt } = require("@playfulsparkle/robotstxt-js")
...

### JavaScript

```javascript
// Parse robots.txt content
const robotsTxtContent = `
User-Agent: GoogleBot
Allow: /public
Disallow: /private
Crawl-Delay: 5
Sitemap: https://example.com/sitemap.xml
`;

const parser = robotstxt(robotsTxtContent);

// Check URL permissions
console.log(parser.isAllowed("/public/data", "GoogleBot"));   // true
console.log(parser.isDisallowed("/private/admin", "GoogleBot")); // true

// Get specific user agent group
const googleBotGroup = parser.getGroup("googlebot"); // Case-insensitive
if (googleBotGroup) {
    console.log("Crawl Delay:", googleBotGroup.getCrawlDelay()); // 5
    console.log("Rules:", googleBotGroup.getRules().map(rule =>
        `${rule.type}: ${rule.path}`
    )); // ["allow: /public", "disallow: /private"]
}

// Get all sitemaps
console.log("Sitemaps:", parser.getSitemaps()); // ["https://example.com/sitemap.xml"]

// Check default rules (wildcard *)
console.log(parser.isAllowed("/protected", "*")); // true (if no wildcard rules exist)

Installation

NPM

npm i @playfulsparkle/robotstxt-js

Yarn

yarn add @playfulsparkle/robotstxt-js

Bower (deprecated)

Bower

bower install playfulsparkle/robotstxt.js

API Documentation

Core Methods

  • robotstxt(content: string): RobotsTxtParser - Creates a new parser instance with the provided robots.txt content.
  • getReports(): string[] - Get an array of parsing error, warning etc.
  • isAllowed(url: string, userAgent: string): boolean - Check if a URL is allowed for the specified user agent (throws if parameters are missing).
  • isDisallowed(url: string, userAgent: string): boolean - Check if a URL is disallowed for the specified user agent (throws if parameters are missing).
  • getGroup(userAgent: string): Group | undefined - Get the rules group for a specific user agent (case-insensitive match).
  • getSitemaps(): string[] - Get an array of discovered sitemap URLs from Sitemap directives.
  • getCleanParams(): string[] - Retrieve Clean-param directives for URL parameter sanitization.
  • getHost(): string | undefined - Get canonical host declaration for domain normalization.

Group Methods (via getGroup() result)

User Agent Info

  • getName(): string - User agent name for this group.
  • getComment(): string[] - Associated comment from the Comment directive.
  • getRobotVersion(): string | undefined - Robots.txt specification version.
  • getVisitTime(): string | undefined - Recommended crawl time window.

Crawl Management

  • getCacheDelay(): number | undefined - Cache delay in seconds.
  • getCrawlDelay(): number | undefined - Crawl delay in seconds.
  • getRequestRates(): string[] - Request rate limitations.

Rule Access

  • getRules(): Rule[] - All rules (allow/disallow/noindex) for this group.
  • addRule(type: string, path: string): void - Add rule (throws if type missing, throws if path missing).

Specification Support

Full Support

  • User-agent groups and inheritance
  • Allow/Disallow directives
  • Wildcard pattern matching (*)
  • End-of-path matching ($)
  • Crawl-delay directives
  • Sitemap discovery
  • Case-insensitive matching
  • Default user-agent (*) handling
  • Multiple user-agent declarations
  • Rule precedence by specificity

Support

Node.js

robotstxt.js runs in all active Node versions (6.x+).

Browser Support

This library is written using modern JavaScript ES2015 (ES6) features. It is expected to work in the following browser versions and later:

| Browser | Minimum Supported Version | |--------------------------|---------------------------| | Desktop Browsers | | | Chrome | 49 | | Edge | 13 | | Firefox | 45 | | Opera | 36 | | Safari | 14.1 | | Mobile Browsers | | | Chrome Android | 49 | | Firefox for Android | 45 | | Opera Android | 36 | | Safari on iOS | 14.5 | | Samsung Internet | 5.0 | | WebView Android | 49 | | WebView on iOS | 14.5 | | Other | | | Node.js | 6.13.0 |

Specifications

License

robotstxt.js is licensed under the terms of the BSD 3-Clause License.