npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@k0nf/crawler-detector

v1.2.0

Published

High-speed crawler detection library using optimized local databases for IP and user-agent matching

Readme

Crawler Detector

⚡ High-speed crawler detection library for JavaScript using optimized local databases. Detects web crawlers and bots using IP addresses and user-agent patterns with microsecond-level performance.

Features

  • 🚀 Blazing Fast - Optimized detection with early-exit logic (< 0.02ms per check)
  • 🌐 IPv4 & IPv6 Support - Full support for both IPv4 and IPv6 addresses using BigInt
  • 📦 Zero Runtime Dependencies - No external API calls, all data is local
  • 🎯 Dual Detection - Identifies crawlers by IP address and user-agent patterns
  • 🔄 Auto-Updated Patterns - Easy rebuild from latest crawler sources
  • 📊 Binary Search - Efficient CIDR range lookup using integer comparisons
  • Well Tested - Comprehensive test suite with 39 passing tests

Coverage

IP Database:

  • IPv4: 20.7 million addresses (11,057 exact IPs + 749 CIDR ranges)
  • IPv6: 2.6 sextillion addresses (475 exact IPs + 777 CIDR blocks)
  • Sources: Googlebot (official API), Yandex, Meta/Facebook, TikTok, 27 sources from GoodBots

User-Agent Database:

Installation

npm install @k0nf/crawler-detector

Usage

Basic Detection

const { isCrawler } = require('@k0nf/crawler-detector');

// Check both IP and user-agent
const isBot = isCrawler(
  '66.249.64.1',
  'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
);
console.log(isBot); // true

IP-Only Detection

const { isCrawlerByIP } = require('@k0nf/crawler-detector');

const isFromCrawlerIP = isCrawlerByIP('66.249.64.1');
console.log(isFromCrawlerIP); // true (Googlebot IP range)

User-Agent-Only Detection

const { isCrawlerByUA } = require('@k0nf/crawler-detector');

const hasCrawlerUA = isCrawlerByUA('Mozilla/5.0 (compatible; Googlebot/2.1)');
console.log(hasCrawlerUA); // true

IPv6 Support

const { isCrawlerByIP } = require('@k0nf/crawler-detector');

// Detect Googlebot IPv6
const isGooglebot = isCrawlerByIP('2001:4860:4801:10::1');
console.log(isGooglebot); // true

// Detect Yandex IPv6
const isYandex = isCrawlerByIP('2a02:6b8::1');
console.log(isYandex); // true

How It Works

The library uses two optimized local databases:

1. User-Agent Patterns

Patterns are sourced from monperrus/crawler-user-agents (raw JSON).

During build time, patterns are:

  • Downloaded from the GitHub repository
  • Classified into substrings (fast path) or regex patterns (slower path)
  • Normalized and lowercased for case-insensitive matching
  • Deduplicated and optimized

Detection order (fastest first, early-exit):

  1. Substring matching - simple .includes() checks
  2. Regex pattern matching - pre-compiled patterns

2. IP Database

IP ranges are manually curated from known crawler sources and stored as:

  • Exact IPs - Direct Set lookup (O(1))
  • CIDR ranges - Converted to integer pairs and sorted for binary search (O(log n))

Detection order (fastest first, early-exit):

  1. Exact IP match
  2. CIDR range binary search

Performance

Expected performance characteristics:

  • User-Agent Detection: < 0.1ms average (substring match)
  • IP Detection: < 0.01ms average (exact match) or < 0.1ms (CIDR range)
  • Combined Detection: < 1ms total

Actual test results (10,000 iterations):

Ran 10,000 detections in 110ms
Average: 0.011ms per detection
✅ Well under 1ms target

Updating Databases

Rebuild All Databases

npm run build

This will:

  1. Fetch the latest user-agent patterns from GitHub
  2. Build the optimized user-agent database
  3. Build the IP database from seed file
  4. Validate all generated data

Update Only User-Agent Patterns

npm run build:ua

Update Only IP Database

npm run build:ip

Validate Databases

npm run validate

Adding Custom Crawler IPs

Edit scripts/seed-crawler-ips.txt and add IPs or CIDR ranges (one per line):

# Custom crawler IPs
203.0.113.5
198.51.100.0/24

Then rebuild:

npm run build:ip

Testing

Run the test suite:

npm test

Tests cover:

  • ✅ Known crawler detection (Googlebot, Bingbot, etc.)
  • ✅ Regular browser exclusion (Chrome, Firefox, Safari)
  • ✅ Edge cases (null, undefined, empty strings)
  • ✅ IP conversion and binary search logic
  • ✅ Performance benchmarks

Architecture

crawler-detector/
├── index.js                      # Main API entry point
├── lib/
│   ├── ip-detector.js           # IP detection logic
│   └── ua-detector.js           # User-agent detection logic
├── data/
│   ├── crawler-ips.json         # Built IP database
│   └── crawler-ua-patterns.json # Built UA patterns database
├── scripts/
│   ├── fetch-ua-patterns.js     # Fetch patterns from GitHub
│   ├── build-ip-database.js     # Build IP database
│   ├── validate-data.js         # Validate databases
│   ├── build-all.js             # Build all databases
│   └── seed-crawler-ips.txt     # Source IP list
└── test/
    └── test.js                   # Test suite

Detection Flow

isCrawler(ip, userAgent)
    │
    ├─> Check IP (if provided)
    │   ├─> Exact match? → return true
    │   └─> In CIDR range? → return true
    │
    └─> Check User-Agent (if provided)
        ├─> Substring match? → return true
        └─> Regex match? → return true
        
return false

Data Sources

  • User-Agent Patterns: monperrus/crawler-user-agents
  • IP Ranges (IPv4 & IPv6):
    • Googlebot: Official IPv4/IPv6 ranges from Google API
      • 142 IPv6 /64 blocks automatically fetched
    • 27 Bot Sources: GoodBots Repository
      • Includes Bingbot, Yandex, Facebook, Twitter, Telegram, Ahrefs, and more
    • Manual Additions:
      • Yandex: 15 IPv4 ranges + 2a02:6b8::/29 IPv6
      • Meta/Facebook: 7 IPv4 ranges (AS32934)
      • TikTok: 67 IPv4 ranges (AS138699, AS137775, AS396986)

API Reference

isCrawler(ip, userAgent)

Detects if either the IP or user-agent belongs to a known crawler.

Parameters:

  • ip (string|null) - IP address to check
  • userAgent (string|null) - User-Agent string to check

Returns: boolean - true if crawler detected

isCrawlerByIP(ip)

Detects if the IP address belongs to a known crawler.

Parameters:

  • ip (string) - IP address to check

Returns: boolean - true if crawler IP detected

isCrawlerByUA(userAgent)

Detects if the user-agent belongs to a known crawler.

Parameters:

  • userAgent (string) - User-Agent string to check

Returns: boolean - true if crawler user-agent detected

License

MIT

Contributing

Contributions welcome! Please:

  1. Add tests for new functionality
  2. Update documentation
  3. Run npm test before submitting

Credits

Special thanks to monperrus/crawler-user-agents for maintaining the comprehensive crawler user-agent database.