@chambrin/ai-crawler-guard

v0.1.0

Published

2 months ago

Detect and control AI crawlers (GPTBot, ClaudeBot, PerplexityBot) with configurable actions

0High
0Medium
0Low

ai crawler bot detection gptbot claudebot perplexitybot middleware nextjs express hono h3 robots.txt ai-detection web-scraping bot-detection

@chambrin/ai-crawler-guard

Detect and control AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) with configurable server-side actions.

A lightweight, framework-agnostic TypeScript library to detect AI crawlers and execute customizable actions like blocking images, redirecting, or logging visits. Works seamlessly with Next.js, Express, Hono, Nuxt, and SvelteKit.

Features

Server-side only - No client-side JavaScript needed
Framework-agnostic - Works with any Node.js framework
Ready-to-use middlewares for Next.js, Express, Hono, and H3 (Nuxt/SvelteKit)
Configurable actions - Block images, redirect, log, or create custom actions
robots.txt generator - Generate robots.txt rules automatically
TypeScript - Fully typed with strict types
Lightweight - Zero dependencies for core functionality
Extensible - Add custom bot detection patterns

Installation

npm install @chambrin/ai-crawler-guard

Quick Start

Next.js (App Router)

// middleware.ts
import { nextMiddleware } from '@chambrin/ai-crawler-guard';

export const config = {
  matcher: ['/((?!api|_next/static|_next/image|favicon.ico).*)'],
};

export default nextMiddleware({
  blockImagesFor: ['gptbot', 'claudebot', 'perplexitybot'],
  redirectUrls: {
    gptbot: '/blocked',
  },
  logLevel: 'info',
});

Express

import express from 'express';
import { expressMiddleware } from '@chambrin/ai-crawler-guard/core';

const app = express();

app.use(expressMiddleware({
  blockImagesFor: ['gptbot', 'claudebot'],
  logLevel: 'warn',
}));

Hono

import { Hono } from 'hono';
import { honoMiddleware } from '@chambrin/ai-crawler-guard/core';

const app = new Hono();

app.use('*', honoMiddleware({
  blockImagesFor: ['gptbot', 'claudebot'],
  redirectUrls: {
    perplexitybot: '/no-ai',
  },
}));

Nuxt / SvelteKit (H3)

// server/middleware/ai-guard.ts
import { h3Middleware } from '@chambrin/ai-crawler-guard/core';

export default h3Middleware({
  blockImagesFor: ['gptbot', 'claudebot'],
  logLevel: 'info',
});

API Reference

Detection

`detectAiCrawler(request: Request): AiCrawlerMatch`

Detect AI crawler from a Web Request object.

import { detectAiCrawler } from '@chambrin/ai-crawler-guard/core';

const match = detectAiCrawler(request);

if (match.type === 'gptbot') {
  console.log('GPTBot detected!');
}

`detectAiCrawler(userAgent: string, ip?: string): AiCrawlerMatch`

Detect AI crawler from user agent string.

const match = detectAiCrawler('Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0;');
// match.type === 'gptbot'

`isAiCrawler(requestOrUserAgent: Request | string, type?: AiCrawlerType): boolean`

Quick check if request is from an AI crawler.

if (isAiCrawler(request)) {
  // Any AI crawler detected
}

if (isAiCrawler(request, 'gptbot')) {
  // Specifically GPTBot
}

Types

type AiCrawlerType =
  | 'gptbot'
  | 'claudebot'
  | 'perplexitybot'
  | 'anthropic-ai'
  | 'google-extended'
  | 'bytespider'
  | 'ccbot'
  | 'custom'
  | null;

interface AiCrawlerMatch {
  type: AiCrawlerType;
  confidence: number; // 0-1
  userAgent: string;
  ip?: string;
  isKnown: boolean;
}

Actions

Actions are composable functions that execute when an AI crawler is detected.

`blockImages()`

Block all image requests with 403 Forbidden.

import { AiCrawlerGuard, blockImages } from '@chambrin/ai-crawler-guard/core';

const guard = new AiCrawlerGuard()
  .addAction(blockImages());

`redirect(url: string, statusCode?: number)`

Redirect AI crawlers to a specific URL.

guard.addAction(redirect('/no-ai', 302));

`log(level?: 'info' | 'warn' | 'error')`

Log AI crawler visits.

guard.addAction(log('info'));

`textOnly()`

Block all non-text content (images, CSS, JS, fonts, etc.).

guard.addAction(textOnly());

Guard

The AiCrawlerGuard class manages a pipeline of actions.

import { AiCrawlerGuard, detectAiCrawler, blockImages, redirect, log } from '@chambrin/ai-crawler-guard/core';

const guard = new AiCrawlerGuard()
  .addAction(log('info'))
  .addAction(blockImages())
  .addAction(redirect('/blocked'));

const match = detectAiCrawler(request);
if (match.type) {
  const response = guard.execute(match, request);
  if (response) {
    return response; // Return the response from the first action that returns one
  }
}

Configuration

interface AiCrawlerConfig {
  knownBots: Record<string, AiCrawlerType>;
  blockImagesFor: AiCrawlerType[];
  redirectUrls: Partial<Record<AiCrawlerType, string>>;
  logLevel: 'none' | 'info' | 'warn';
  enableIpTracking?: boolean;
}

Robots.txt Generation

`generateRobotsTxt(config: Partial<AiCrawlerConfig>): string`

Generate robots.txt content based on configuration.

import { generateRobotsTxt } from '@chambrin/ai-crawler-guard/robots-txt';

const robotsTxt = generateRobotsTxt({
  blockImagesFor: ['gptbot', 'claudebot'],
});

// In your route handler:
// app.get('/robots.txt', (req, res) => {
//   res.setHeader('Content-Type', 'text/plain');
//   res.send(robotsTxt);
// });

Presets

import {
  defaultAiBotsRobotsTxt,
  blockImagesPreset,
  blockGPTBotOnly
} from '@chambrin/ai-crawler-guard/robots-txt';

// Block all AI crawlers completely
console.log(defaultAiBotsRobotsTxt);

// Block only images
console.log(blockImagesPreset);

// Block only GPTBot
console.log(blockGPTBotOnly);

Detected Bots

The library detects the following AI crawlers by default:

| Bot Type | User-Agent Patterns | |----------|-------------------| | gptbot | GPTBot, ChatGPT-User | | claudebot | ClaudeBot, Claude-Web | | anthropic-ai | anthropic-ai | | perplexitybot | PerplexityBot | | google-extended | Google-Extended | | bytespider | Bytespider (ByteDance) | | ccbot | CCBot (Common Crawl) |

Additional bots detected with lower confidence:

Cohere-AI
Omgilibot
Diffbot
FacebookBot
Various AI scrapers

Advanced Usage

Custom Actions

Create your own action executor:

import { ActionExecutor, AiCrawlerMatch } from '@chambrin/ai-crawler-guard/core';

function customBlock(): ActionExecutor {
  return {
    execute(match: AiCrawlerMatch, request?: Request): Response | void {
      if (match.type === 'gptbot') {
        return new Response('GPTBot not allowed', { status: 403 });
      }
    }
  };
}

const guard = new AiCrawlerGuard()
  .addAction(customBlock());

Add Custom Bots

Extend the known bots list:

import { DEFAULT_KNOWN_BOTS } from '@chambrin/ai-crawler-guard/core';

const customConfig = {
  knownBots: {
    ...DEFAULT_KNOWN_BOTS,
    'my-custom-bot': 'custom',
  },
  blockImagesFor: ['custom'],
};

Next.js Custom Middleware

import { createNextMiddleware, blockImages, log } from '@chambrin/ai-crawler-guard/core';

export default createNextMiddleware((guard, config) => {
  guard
    .addAction(log('warn'))
    .addAction(blockImages());
}, {
  logLevel: 'warn',
  blockImagesFor: ['gptbot'],
});

Conditional Actions

const guard = new AiCrawlerGuard();

const match = detectAiCrawler(request);

if (match.type === 'gptbot') {
  guard.addAction(redirect('/gptbot-blocked'));
} else if (match.type === 'claudebot') {
  guard.addAction(blockImages());
}

const response = guard.execute(match, request);

Examples

See the /examples directory for complete working examples:

Next.js 15 App Router
Express server
Hono API
Nuxt 4 application
SvelteKit application

Important Notes

SEO Disclaimer

This library is designed to control AI crawlers for legitimate purposes such as:

Protecting proprietary content from being used in AI training
Reducing server load from AI crawlers
Enforcing terms of service

DO NOT use this library for:

Cloaking content from search engines (violates Google's guidelines)
Serving different content to users vs. crawlers
Any deceptive SEO practices

Always respect search engine guidelines and robots.txt standards.

Server-Side Only

This library works exclusively on the server side. Client-side detection is ineffective against crawlers since they don't execute JavaScript.

Performance

The library is lightweight and has minimal performance impact:

User agent detection is based on simple string matching
No external API calls
No heavy dependencies

License

MIT

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@chambrin/ai-crawler-guard

Features

Installation

Quick Start

Next.js (App Router)

Express

Hono

Nuxt / SvelteKit (H3)

API Reference

Detection

detectAiCrawler(request: Request): AiCrawlerMatch

detectAiCrawler(userAgent: string, ip?: string): AiCrawlerMatch

isAiCrawler(requestOrUserAgent: Request | string, type?: AiCrawlerType): boolean

Types

Actions

blockImages()

redirect(url: string, statusCode?: number)

log(level?: 'info' | 'warn' | 'error')

textOnly()

Guard

Configuration

Robots.txt Generation

generateRobotsTxt(config: Partial<AiCrawlerConfig>): string

Presets

Detected Bots

Advanced Usage

Custom Actions

Add Custom Bots

Next.js Custom Middleware

Conditional Actions

Examples

Important Notes

SEO Disclaimer

Server-Side Only

Performance

License

Contributing

Links

`detectAiCrawler(request: Request): AiCrawlerMatch`

`detectAiCrawler(userAgent: string, ip?: string): AiCrawlerMatch`

`isAiCrawler(requestOrUserAgent: Request | string, type?: AiCrawlerType): boolean`

`blockImages()`

`redirect(url: string, statusCode?: number)`

`log(level?: 'info' | 'warn' | 'error')`

`textOnly()`

`generateRobotsTxt(config: Partial<AiCrawlerConfig>): string`