npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@k0nf/crawler-detector

v1.3.0

Published

High-speed crawler detection library using optimized local databases for IP and user-agent matching

Readme

Crawler Detector

⚡ High-speed crawler detection library for JavaScript using optimized local databases. Detects web crawlers and bots using IP addresses and user-agent patterns with microsecond-level performance.

Features

  • 🚀 Blazing Fast - Optimized detection with early-exit logic (< 0.02ms per check)
  • 🌐 IPv4 & IPv6 Support - Full support for both IPv4 and IPv6 addresses using BigInt
  • 📦 Zero Runtime Dependencies - No external API calls, all data is local
  • 🎯 Dual Detection - Identifies crawlers by IP address and user-agent patterns
  • 🔄 Auto-Updated Patterns - Easy rebuild from latest crawler sources
  • 📊 Binary Search - Efficient CIDR range lookup using integer comparisons
  • 🔌 Framework Middleware - Built-in support for Express, Next.js, Remix, Koa, Fastify, Hapi
  • Well Tested - Comprehensive test suite with 94 passing tests

Coverage

IP Database:

  • IPv4: 20.7 million addresses (11,057 exact IPs + 749 CIDR ranges)
  • IPv6: 2.6 sextillion addresses (475 exact IPs + 777 CIDR blocks)
  • Sources: Googlebot (official API), Yandex, Meta/Facebook, TikTok, 27 sources from GoodBots

User-Agent Database:

Installation

npm install @k0nf/crawler-detector

Usage

Basic Detection

const { isCrawler } = require('@k0nf/crawler-detector');

// Check both IP and user-agent
const isBot = isCrawler(
  '66.249.64.1',
  'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
);
console.log(isBot); // true

IP-Only Detection

const { isCrawlerByIP } = require('@k0nf/crawler-detector');

const isFromCrawlerIP = isCrawlerByIP('66.249.64.1');
console.log(isFromCrawlerIP); // true (Googlebot IP range)

User-Agent-Only Detection

const { isCrawlerByUA } = require('@k0nf/crawler-detector');

const hasCrawlerUA = isCrawlerByUA('Mozilla/5.0 (compatible; Googlebot/2.1)');
console.log(hasCrawlerUA); // true

IPv6 Support

const { isCrawlerByIP } = require('@k0nf/crawler-detector');

// Detect Googlebot IPv6
const isGooglebot = isCrawlerByIP('2001:4860:4801:10::1');
console.log(isGooglebot); // true

// Detect Yandex IPv6
const isYandex = isCrawlerByIP('2a02:6b8::1');
console.log(isYandex); // true

Framework Middleware

Built-in middleware for popular Node.js frameworks. All middleware support these options:

| Option | Type | Default | Description | |--------|------|---------|-------------| | block | boolean | false | Block crawlers with HTTP error | | blockStatusCode | number | 403 | Status code for blocked requests | | blockMessage | string | 'Forbidden' | Response body for blocked requests | | onCrawlerDetected | function | null | Custom callback when crawler detected | | ipHeaders | string[] | ['x-forwarded-for', ...] | Headers to check for client IP | | ipOnly | boolean | false | Only check IP, skip user-agent | | uaOnly | boolean | false | Only check user-agent, skip IP |

Express / Connect

const express = require('express');
const crawlerDetector = require('@k0nf/crawler-detector/middleware/express');

const app = express();

// Tag all requests with req.isCrawler (boolean)
app.use(crawlerDetector());

// Block crawlers from specific routes
app.use('/api', crawlerDetector({ block: true }));

// Custom handling
app.use(crawlerDetector({
  onCrawlerDetected: (req, res, next) => {
    console.log(`Bot detected: ${req.headers['user-agent']}`);
    next(); // continue processing
  }
}));

app.get('/', (req, res) => {
  if (req.isCrawler) {
    res.send('Hello bot!');
  } else {
    res.send('Hello human!');
  }
});

Next.js

API Routes (Pages Router):

// pages/api/hello.js
const { withCrawlerDetection } = require('@k0nf/crawler-detector/middleware/nextjs');

function handler(req, res) {
  res.json({ isCrawler: req.isCrawler });
}

module.exports = withCrawlerDetection(handler);

// Block crawlers
module.exports = withCrawlerDetection(handler, { block: true });

getServerSideProps:

// pages/index.js
const { isCrawlerRequest } = require('@k0nf/crawler-detector/middleware/nextjs');

export async function getServerSideProps(context) {
  const isBot = isCrawlerRequest(context);
  return { props: { isBot } };
}

App Router Route Handlers:

// app/api/hello/route.js
const { isCrawlerAppRoute } = require('@k0nf/crawler-detector/middleware/nextjs');

export const runtime = 'nodejs'; // Required - not compatible with Edge runtime

export async function GET(request) {
  const isBot = isCrawlerAppRoute(request);
  return Response.json({ isBot });
}

Note: Requires Node.js runtime. Not compatible with Edge runtime since the detection databases are loaded from the filesystem.

Remix / React Router v7

In loaders and actions:

// app/routes/index.tsx
import { isCrawlerRequest } from '@k0nf/crawler-detector/middleware/remix';

export async function loader({ request }) {
  const isBot = isCrawlerRequest(request);

  if (isBot) {
    return json({ content: 'SEO-optimized content', isBot: true });
  }

  return json({ content: 'Full interactive content', isBot: false });
}

Block crawlers with middleware:

import { createCrawlerMiddleware } from '@k0nf/crawler-detector/middleware/remix';

const crawlerMiddleware = createCrawlerMiddleware({ block: true });

export function middleware(request) {
  const blocked = crawlerMiddleware(request);
  if (blocked) return blocked;
}

Koa

const Koa = require('koa');
const crawlerDetector = require('@k0nf/crawler-detector/middleware/koa');

const app = new Koa();

// Tag all requests - sets ctx.state.isCrawler
app.use(crawlerDetector());

// Block crawlers
app.use(crawlerDetector({ block: true }));

app.use(async (ctx) => {
  if (ctx.state.isCrawler) {
    ctx.body = 'Hello bot!';
  } else {
    ctx.body = 'Hello human!';
  }
});

Fastify

const fastify = require('fastify')();
const crawlerDetectorPlugin = require('@k0nf/crawler-detector/middleware/fastify');

// Register plugin - decorates request.isCrawler
fastify.register(crawlerDetectorPlugin);

// With options
fastify.register(crawlerDetectorPlugin, { block: true });

fastify.get('/', (request, reply) => {
  if (request.isCrawler) {
    reply.send('Hello bot!');
  } else {
    reply.send('Hello human!');
  }
});

Hapi

const Hapi = require('@hapi/hapi');
const crawlerDetectorPlugin = require('@k0nf/crawler-detector/middleware/hapi');

const server = Hapi.server({ port: 3000 });

await server.register({
  plugin: crawlerDetectorPlugin,
  options: { block: false }
});

server.route({
  method: 'GET',
  path: '/',
  handler: (request, h) => {
    if (request.plugins.crawlerDetector.isCrawler) {
      return 'Hello bot!';
    }
    return 'Hello human!';
  }
});

Raw Node.js HTTP

const http = require('http');
const withCrawlerDetection = require('@k0nf/crawler-detector/middleware/http');

const server = http.createServer(
  withCrawlerDetection((req, res) => {
    if (req.isCrawler) {
      res.end('Hello bot!');
    } else {
      res.end('Hello human!');
    }
  })
);

// Block crawlers
const server = http.createServer(
  withCrawlerDetection(handler, { block: true })
);

How It Works

The library uses two optimized local databases:

1. User-Agent Patterns

Patterns are sourced from monperrus/crawler-user-agents (raw JSON).

During build time, patterns are:

  • Downloaded from the GitHub repository
  • Classified into substrings (fast path) or regex patterns (slower path)
  • Normalized and lowercased for case-insensitive matching
  • Deduplicated and optimized

Detection order (fastest first, early-exit):

  1. Substring matching - simple .includes() checks
  2. Regex pattern matching - pre-compiled patterns

2. IP Database

IP ranges are manually curated from known crawler sources and stored as:

  • Exact IPs - Direct Set lookup (O(1))
  • CIDR ranges - Converted to integer pairs and sorted for binary search (O(log n))

Detection order (fastest first, early-exit):

  1. Exact IP match
  2. CIDR range binary search

Performance

Expected performance characteristics:

  • User-Agent Detection: < 0.1ms average (substring match)
  • IP Detection: < 0.01ms average (exact match) or < 0.1ms (CIDR range)
  • Combined Detection: < 1ms total

Actual test results (10,000 iterations):

Ran 10,000 detections in 110ms
Average: 0.011ms per detection
✅ Well under 1ms target

Updating Databases

Rebuild All Databases

npm run build

This will:

  1. Fetch the latest user-agent patterns from GitHub
  2. Build the optimized user-agent database
  3. Build the IP database from seed file
  4. Validate all generated data

Update Only User-Agent Patterns

npm run build:ua

Update Only IP Database

npm run build:ip

Validate Databases

npm run validate

Adding Custom Crawler IPs

Edit scripts/seed-crawler-ips.txt and add IPs or CIDR ranges (one per line):

# Custom crawler IPs
203.0.113.5
198.51.100.0/24

Then rebuild:

npm run build:ip

Testing

Run the test suite:

npm test

Tests cover:

  • ✅ Known crawler detection (Googlebot, Bingbot, etc.)
  • ✅ Regular browser exclusion (Chrome, Firefox, Safari)
  • ✅ Edge cases (null, undefined, empty strings)
  • ✅ IP conversion and binary search logic
  • ✅ Performance benchmarks

Architecture

crawler-detector/
├── index.js                      # Main API entry point
├── lib/
│   ├── ip-detector.js           # IP detection logic
│   └── ua-detector.js           # User-agent detection logic
├── data/
│   ├── crawler-ips.json         # Built IP database
│   └── crawler-ua-patterns.json # Built UA patterns database
├── scripts/
│   ├── fetch-ua-patterns.js     # Fetch patterns from GitHub
│   ├── build-ip-database.js     # Build IP database
│   ├── validate-data.js         # Validate databases
│   ├── build-all.js             # Build all databases
│   └── seed-crawler-ips.txt     # Source IP list
└── test/
    └── test.js                   # Test suite

Detection Flow

isCrawler(ip, userAgent)
    │
    ├─> Check IP (if provided)
    │   ├─> Exact match? → return true
    │   └─> In CIDR range? → return true
    │
    └─> Check User-Agent (if provided)
        ├─> Substring match? → return true
        └─> Regex match? → return true
        
return false

Data Sources

  • User-Agent Patterns: monperrus/crawler-user-agents
  • IP Ranges (IPv4 & IPv6):
    • Googlebot: Official IPv4/IPv6 ranges from Google API
      • 142 IPv6 /64 blocks automatically fetched
    • 27 Bot Sources: GoodBots Repository
      • Includes Bingbot, Yandex, Facebook, Twitter, Telegram, Ahrefs, and more
    • Manual Additions:
      • Yandex: 15 IPv4 ranges + 2a02:6b8::/29 IPv6
      • Meta/Facebook: 7 IPv4 ranges (AS32934)
      • TikTok: 67 IPv4 ranges (AS138699, AS137775, AS396986)

API Reference

isCrawler(ip, userAgent)

Detects if either the IP or user-agent belongs to a known crawler.

Parameters:

  • ip (string|null) - IP address to check
  • userAgent (string|null) - User-Agent string to check

Returns: boolean - true if crawler detected

isCrawlerByIP(ip)

Detects if the IP address belongs to a known crawler.

Parameters:

  • ip (string) - IP address to check

Returns: boolean - true if crawler IP detected

isCrawlerByUA(userAgent)

Detects if the user-agent belongs to a known crawler.

Parameters:

  • userAgent (string) - User-Agent string to check

Returns: boolean - true if crawler user-agent detected

License

MIT

Contributing

Contributions welcome! Please:

  1. Add tests for new functionality
  2. Update documentation
  3. Run npm test before submitting

Credits

Special thanks to monperrus/crawler-user-agents for maintaining the comprehensive crawler user-agent database.