@searchablehq/middleware
v0.4.0
Published
Server-side middleware that captures request data from AI crawlers and human traffic, sending events to the Searchable analytics pipeline.
Downloads
419
Readme
@searchable/middleware
Server-side middleware that captures request data from AI crawlers and human traffic, sending events to the Searchable analytics pipeline.
Installation
npm install @searchable/middlewareQuick Start (Next.js)
// middleware.ts
import { withSearchable } from "@searchable/middleware/nextjs";
export default withSearchable({
siteToken: "st_your_token_here",
apiKey: process.env.SEARCHABLE_API_KEY!, // sk_live_*
});With an existing middleware:
import { withSearchable } from "@searchable/middleware/nextjs";
import { NextResponse } from "next/server";
export default withSearchable(
{
siteToken: "st_your_token_here",
apiKey: process.env.SEARCHABLE_API_KEY!, // sk_live_*
debug: true, // logs events to console
},
async (request) => {
// Your existing middleware logic
return NextResponse.next();
}
);Configuration
interface SearchableConfig {
/** Site token (st_* from the Searchable dashboard) — required */
siteToken: string;
/** Workspace API key (sk_live_*) — sent as Bearer for edge auth. Required. */
apiKey: string;
/** Collector endpoint URL. Default: Searchable Worker */
endpoint?: string;
/** Anonymize IP addresses (zero last octet). Default: true */
anonymizeIp?: boolean;
/** Log events to console. Default: false */
debug?: boolean;
/** Skip capturing for certain paths. Return true to skip. */
ignore?: (path: string) => boolean;
/** Inject custom properties into the event. */
custom?: (request: any) => Record<string, string>;
}Example with all options
withSearchable({
siteToken: "st_abc123",
apiKey: process.env.SEARCHABLE_API_KEY!,
endpoint: "https://your-custom-collector.com/v1/middleware",
anonymizeIp: true,
debug: process.env.NODE_ENV === "development",
ignore: (path) => path.startsWith("/api/health") || path.startsWith("/api/internal"),
custom: (request) => ({
tenant: request.headers.get("x-tenant-id") ?? "unknown",
}),
});What Gets Captured
For every non-static request:
- HTTP method, path, URL, status code, response time
- User agent (for bot detection downstream)
- IP address (anonymized by default)
- Referrer and referrer domain
- UTM parameters (extracted from URL)
- Geo location (country, region, city — when available from edge runtime)
- Filtered request headers (safe allowlist only)
- Query parameters (excluding UTM params)
- Custom properties (from your
customfunction)
Auto-skipped paths
/_next/*(Next.js internal)- Static files:
.js,.css,.png,.jpg,.svg,.ico,.woff,.woff2,.ttf,.map
Event Pipeline
Customer's Next.js app
→ withSearchable middleware intercepts request
→ Captures request metadata (fire-and-forget, non-blocking)
→ POST to Cloudflare Worker (/v1/middleware)
→ Worker resolves site token → domain_id
→ Worker forwards to ingest service
→ Ingest transforms and writes to ClickHouseNative Payload Format
The middleware sends a compact JSON payload using short keys to minimize bandwidth. This is the same format as the browser beacon (packages/tracker).
Envelope
{
"v": 1,
"d": "example.com",
"tk": "st_abc123",
"src": "middleware",
"events": [{ ... }]
}| Field | Type | Description |
|-------|--------|--------------------------------------------------|
| v | number | Protocol version (always 1) |
| d | string | Customer domain |
| tk | string | Site token (from domains table) |
| src | string | Event source: "middleware" (vs "beacon" for browser) |
Event Short Key Reference
Each event in the events array uses these short keys:
| Short Key | Full Name | Type | Description |
|-----------|---------------------|--------|------------------------------------------|
| t | event_name | string | Always "server_request" for middleware |
| ts | timestamp | number | Unix timestamp in milliseconds |
| p | path | string | Request path (e.g., /pricing) |
| u | url | string | Full request URL |
| r | referrer | string | HTTP Referer header |
| rd | referrer_domain | string | Parsed referrer hostname |
| us | utm_source | string | UTM source parameter |
| um | utm_medium | string | UTM medium parameter |
| uc | utm_campaign | string | UTM campaign parameter |
| ut | utm_term | string | UTM term parameter |
| uco | utm_content | string | UTM content parameter |
| method | method | string | HTTP method (GET, POST, etc.) |
| sc | status_code | number | HTTP response status code |
| rt | response_time_ms | number | Server response time in milliseconds |
| ua | user_agent | string | User-Agent header value |
| ip | ip_address | string | Client IP (anonymized by default) |
| geo | geo | object | { country, region, city } if available |
| hdrs | headers | object | Filtered request headers (safe allowlist)|
| qp | query_parameters | object | Non-UTM query parameters |
| props | custom_properties | object | User-defined custom properties |
Example payload
{
"v": 1,
"d": "example.com",
"tk": "st_abc123",
"src": "middleware",
"events": [{
"t": "server_request",
"ts": 1711756800000,
"p": "/pricing",
"u": "https://example.com/pricing?ref=blog",
"r": "https://chatgpt.com/c/abc123",
"rd": "chatgpt.com",
"method": "GET",
"sc": 200,
"rt": 45,
"ua": "Mozilla/5.0 (compatible; GPTBot/1.0)",
"ip": "1.2.3.0",
"geo": { "country": "US", "region": "CA", "city": "San Francisco" },
"hdrs": { "accept-language": "en-US", "host": "example.com" },
"qp": { "ref": "blog" }
}]
}Differences from Beacon (Browser Tracker)
| Feature | Middleware (server-side) | Beacon (client-side) |
|------------------|----------------------------------|---------------------------------|
| Event source | src: "middleware" | No src field (default beacon) |
| Event name | server_request | pageview, page_leave, etc. |
| Identity | No visitor/session ID | vid, sid, sn |
| Server fields | method, sc, rt, ip, hdrs, qp | Not available |
| Engagement | Not available | et, sd (engagement, scroll)|
| ClickHouse table | events | web_events |
Development
# Run tests
pnpm test
# Build
pnpm build
# Type check
pnpm typecheck