web-sentinel
v0.1.2
Published
High performance protection against bots and probes
Maintainers
Readme
Sentinel
Ultra-high performance blocking of bots and vulnerability scanners.
When you host a service on the web you'll invitably get hit with bots scanning for a vulnerable website. The most common target is Wordpress due to its poor security history but there are others too, and sometimes it can be a modern site which someone could have accidentally deployed some credentials to or some other sensitive files.
Even if your site isn't vulnerable, these bots can be a nuisance sending hundreds of requests that could trigger database lookups, wake instances, fill your logs with noise, and generally consume your resources.
Sure, there are some very nice protection services available but they cost money and take time to setup and manage. Wouldn't it be nice to just detect invalid requests and slam the door shut as quickly as possible? After all, you moved on from Wordpress 20 years ago, nothing on your site is serving up .php files, so why let requests looking for those even hit your app? The faster you can swat these things away, the better, and sending the minimum response possible is better than a pretty rendered 404 page, because they are just bots.
It's provided as a low-level function generator, as Node or Polka http middlware, or as a SvelteKit hooks.server handle function although it's better to put it in front of your app using the middleware option with a custom node server even if you are using SvelteKit.
Effectiveness
It's very effective at blocking what you tell it to with minimal performance overhead. While requests will still hit your instances they are rejected so quickly they no longer have a damaging impact. Here's an example of the endless Wordpress / PHP traffic any server on the internet is constantly dealing with, terminated instantly. It becomes a very minor blip of sudden request and nothing more:

Here are some charts of a Google Cloud Run service being hit with different unwanted bot activity and web-sentinel being deployed - indicated in the first chart when response status codes changing from 2xx (blue) to 4xx (purple). You can see it dramatically reduces the impact it's having - the number of instances and request latency times are reduced, saving costs, with the CPU and Memory dedicated to serving legitimate traffic which is then unaffected. It even saves on the egress traffic which again can save money and prioritize your genuine visitors.

While using a firewall service such as Google Cloud Armor can prevent this traffic hitting the service at all, it can be complex to configure and be more expensive than having your front-end instances able to deal with it.
Installation
Install using your package manager of choice, I like pnpm:
pnpm add web-sentinelNext, add it to your app so it can intercept requests.
as standard http middleware
Adding the middleware to a regular http server:
import polka from 'polka'
import compression from '@polka/compression'
import { handler } from './build/handler.js'
import { middleware } from 'web-sentinel/middleware'
const sentinel = middleware(/* custom config here */)
const compress = compression({ brotli: true })
polka().use(sentinel).use(compress).use(handler).listen(process.env.PORT)using SvelteKit hooks.server.ts
Adding a SvelteKit handle function. If using sequence to combine multiple hooks, add it to the start of the sequence.
import { createHandler } from 'web-sentinel/hooks'
export const handle = createHandler(/* custom config here */)Configuration
You'll likely want to customize the default configuration, explained below. If you only want to override a single option you can import default_options and use spread operators to pull in or replace individual pieces, or just build up your own rules based on your request log files. Whatever part of the request is being matched (hostname, pathname, user-agent, search-params) you can define a list of prefix and suffix strings, which will match at the start or end of the value, a list of exact matches (less common) and a list of contains strings which will look anywhere in the value. The latter should be avoided if at all possible as it's less efficient than the others.
Above all else, think carefully about the rules and what your app needs. Do you serve out .zip files? If so, you don't want to block them. If in doubt, run it with both the log and preview modes enabled (the default) which will log URLs that would be blocked and why, but without actually blocking them. Once you are happy with the rules, you can disable preview to have URLs blocked, and also set log to false if you don't need to see the rule matches.
The stats_path property allows you to define an endpoint that will render a table of rule hits and misses, useful to determine which rules are needed and which could be removed. If undefined (the default) it will be disabled.
const config: Config = {
log: true,
preview: true,
stats_path: undefined,
hostname: {
suffix: ['.bc.googleusercontent.com', '.appspot.com', '.google.com'],
},
pathname: {
prefix: [
'/.env',
'/.git',
'/.ssh',
'/.map',
'/.yml',
'/.yaml',
'/.npmrc',
'/.well-known/security.txt',
'/.aws/credentials',
'/wp-admin',
'/wp-config',
'/wp-content',
'/wp-includes',
'/cgi-bin',
'/bash_history',
'/etc/passwd',
],
suffix: [
'.env',
'.bak',
'.cgi',
'.php',
'.dat',
'.rar',
'.tar',
'.zip',
'.gz',
'.sql',
'/wlwmanifest.xml',
'/credentials.txt',
'/package.json',
],
},
user_agent: {
exact: [''],
prefix: ['python-requests/', 'Go-http-client/', 'curl/', 'Wget/', 'Scrapy/', 'Python-urllib/', 'axios/'],
contain: ['HeadlessChrome', 'aiohttp'],
},
search_params: {
contain: ['../'],
},
http_status: 404,
}Live Configuration Updates
Hard-coding the configuration is the simplest approach and provides the fastest app-startup as as it doesn't need to load any external resource. But you may want to centralize configuration, allow updating the configuration without having to redeploy your app, and periodically check for new configurations. The middleware and handle hook both include an extra update(config) method that allows you to control fetching a new configuration and re-building the running matcher.
Here's an example of loading a web-sentinel.json configuration object from Google Cloud Storage:
import polka from 'polka'
import compression from '@polka/compression'
import { handler } from './build/handler.js'
import { default_options } from 'web-sentinel'
import { middleware } from 'web-sentinel/middleware'
const sentinel = await initSentinel()
const compress = compression({ brotli: true })
polka().use(sentinel).use(compress).use(handler).listen(process.env.PORT)
/**
* Initializes the sentinel middleware, loading config from GCS with
* fallback to defaults. Sets up hourly update with ETag caching.
*/
async function initSentinel() {
let lastEtag = null
async function getAccessToken(signal) {
if (process.env.LOCAL_TOKEN) return process.env.LOCAL_TOKEN
const url = 'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token'
const response = await fetch(url, {
headers: { 'Metadata-Flavor': 'Google' },
signal,
})
if (!response.ok) throw new Error('Could not fetch access token from Metadata Server')
const { access_token } = await response.json()
return access_token
}
async function fetchConfig() {
const BUCKET = process.env.PUBLIC_FIREBASE_STORAGE_BUCKET
const FILE = 'web-sentinel.json'
const controller = new AbortController()
const timeout = setTimeout(() => controller.abort(), 2000)
try {
console.time('sentinel fetch')
const token = await getAccessToken(controller.signal)
const url = `https://storage.googleapis.com/storage/v1/b/${BUCKET}/o/${FILE}?alt=media`
const headers = { Authorization: `Bearer ${token}` }
if (lastEtag) {
headers['If-None-Match'] = lastEtag
}
const response = await fetch(url, {
headers,
signal: controller.signal,
})
if (response.status === 304) return null
if (!response.ok) {
throw new Error(`gcs response ${response.status}`)
}
lastEtag = response.headers.get('etag')
return await response.json()
} catch (err) {
console.error('sentinel fetch failed', err)
return null
} finally {
console.timeEnd('sentinel fetch')
clearTimeout(timeout)
}
}
const initialConfig = await fetchConfig()
console.time('sentinel init')
const sentinel = middleware(initialConfig ?? default_options)
console.timeEnd('sentinel init')
setInterval(async () => {
const config = await fetchConfig()
if (config) {
console.time('sentinel update')
sentinel.update(config)
console.timeEnd('sentinel update')
}
}, 3_600_000)
return sentinel
}Performance: The "Jump-Trie" Approach
Why is it fast? This library utilizes a pre-compiled, static Non-deterministic Finite Automaton (NFA) implemented via nested switch statements. This approach is specifically engineered to leverage the architectural strengths of modern CPUs and the V8 JavaScript engine's optimization pipeline to produce the fastest pattern checks possible. See below for an example of the code it generates.
🚀 Performance Benchmarks
On an M4 Mac Mini, this implementation achieves:
- Throughput: ~33,000,000+ operations per second.
- Latency: < 0.0001ms (Mean).
- Stability: ±0.08% RME (Relative Margin of Error).
🧠 How the "Jump Code" Works
Standard routers or filters typically use a Map, Set, or Regex. While flexible, those methods incur overhead from hashing, state machine initialization, or object property lookups.
This implementation compiles your blocklists into a Hard-Coded Prefix Tree (Trie) using path.charCodeAt(n) with a fast path.startsWith() static check to confirm a match once there is only one option remaining. No garbage, no string operations, around 2x faster than a Regex approach.
1. The Power of charCodeAt
Unlike path[0] or path.substring(), charCodeAt returns a raw integer representing the character at a specific memory offset. This maps directly to low-level CPU instructions, avoiding the creation of new string objects. By checking string locations are within bounds before reading characters we ensure it never returns NaN, which would cause the engine to de-optimize, preserving the fastest code path.
2. Jump Tables vs. Linear Search
When the JavaScript engine (V8) encounters a switch statement with integer cases, it doesn't just perform a series of "if/else" checks. If the cases are sufficiently optimized, it creates a Jump Table.
Instead of checking every possibility, the CPU calculates a memory offset based on the character code and "jumps" directly to the next block of code. This makes the search complexity O(L), where L is the length of the string to match, regardless of how many total paths are in your block rules. What does this mean? Pure speed baby!
3. Early Exit Strategy
Most bot probes can be rejected after checking only one or two characters.
- Traditional Regex: Scans the string for patterns, often looking at the entire path. Multiple regexes scan the same path over and over.
- Jump-Trie: If a path starts with
/a...and your filter only cares about/w...(wordpress) and/.e...(.env), the function returns on the very first character.
🛠 Why it’s so fast on modern hardware
Mechanical Sympathy
Modern CPUs feature highly advanced Branch Predictors. Because the logic is "baked" into the source code rather than stored in a data structure, the CPU can "learn" the structure of your filter. It begins speculatively executing the next switch level before the current one has even finished, effectively hiding the latency of the check.
Zero Allocations
This filter is "garbage collector friendly." It performs:
- Zero array iterations.
- Zero object allocations.
- Zero string slicing.
It operates entirely on the stack using primitives. It lways returns a single integer to indicate which rule matched or 0 if the request passed all checks.
⚠️ Limitations & Best Practices
- Static Nature: This is not a dynamic router. Any changes to the rules require a re-compile but that is also blazingly fast to do suring app startup to generate the optimized code that can then be re-used.
- Code Size: While extremely fast, a list of 10,000+ paths will result in a large JS file. This may eventually exceed the CPU’s L1 Instruction Cache, leading to a slight performance dip. It's intended for tens to hundreds of patterns.
- Type Safety: Ensure only strings are passed to the
test()function. Passingundefinedor anobjectwill cause a V8 "De-optimization," dropping performance significantly.
Generated Code
Here's an example of the code that is generated using the default config:
/* user_agent checks */
const user_agent_length = user_agent.length
if (user_agent === '') return 1 // user_agent exact ''
if (user_agent_length > 0) {
switch (user_agent.charCodeAt(0)) {
case 71: // 'G'
if (user_agent.startsWith('o-http-client/', 1)) return 2 // user_agent prefix 'Go-http-client/'
break
case 80: // 'P'
if (user_agent.startsWith('ython-urllib/', 1)) return 3 // user_agent prefix 'Python-urllib/'
break
case 83: // 'S'
if (user_agent.startsWith('crapy/', 1)) return 4 // user_agent prefix 'Scrapy/'
break
case 87: // 'W'
if (user_agent.startsWith('get/', 1)) return 5 // user_agent prefix 'Wget/'
break
case 97: // 'a'
if (user_agent.startsWith('xios/', 1)) return 6 // user_agent prefix 'axios/'
break
case 99: // 'c'
if (user_agent.startsWith('url/', 1)) return 7 // user_agent prefix 'curl/'
break
case 112: // 'p'
if (user_agent.startsWith('ython-requests/', 1)) return 8 // user_agent prefix 'python-requests/'
break
}
}
if (user_agent.includes('HeadlessChrome')) return 9 // user_agent contain 'HeadlessChrome'
if (user_agent.includes('aiohttp')) return 10 // user_agent contain 'aiohttp'
/* pathname checks */
const pathname_length = pathname.length
if (pathname_length > 0) {
switch (pathname.charCodeAt(0)) {
case 47: // '/'
if (pathname_length > 1) {
switch (pathname.charCodeAt(1)) {
case 46: // '.'
if (pathname_length > 2) {
switch (pathname.charCodeAt(2)) {
case 97: // 'a'
if (pathname.startsWith('ws/credentials', 3)) return 11 // pathname prefix '/.aws/credentials'
break
case 101: // 'e'
if (pathname.charCodeAt(3) === 110 && pathname.charCodeAt(4) === 118) return 12 // pathname prefix '/.env'
break
case 103: // 'g'
if (pathname.charCodeAt(3) === 105 && pathname.charCodeAt(4) === 116) return 13 // pathname prefix '/.git'
break
case 109: // 'm'
if (pathname.charCodeAt(3) === 97 && pathname.charCodeAt(4) === 112) return 14 // pathname prefix '/.map'
break
case 110: // 'n'
if (pathname.startsWith('pmrc', 3)) return 15 // pathname prefix '/.npmrc'
break
case 115: // 's'
if (pathname.charCodeAt(3) === 115 && pathname.charCodeAt(4) === 104) return 16 // pathname prefix '/.ssh'
break
case 118: // 'v'
if (pathname.startsWith('scode', 3)) return 17 // pathname prefix '/.vscode'
break
case 119: // 'w'
if (pathname.startsWith('ell-known/security.txt', 3)) return 18 // pathname prefix '/.well-known/security.txt'
break
case 121: // 'y'
if (pathname_length > 3) {
switch (pathname.charCodeAt(3)) {
case 97: // 'a'
if (pathname.charCodeAt(4) === 109 && pathname.charCodeAt(5) === 108) return 19 // pathname prefix '/.yaml'
break
case 109: // 'm'
if (pathname.charCodeAt(4) === 108) return 20 // pathname prefix '/.yml'
break
}
}
break
}
}
break
case 98: // 'b'
if (pathname.startsWith('ash_history', 2)) return 21 // pathname prefix '/bash_history'
break
case 99: // 'c'
if (pathname.startsWith('gi-bin', 2)) return 22 // pathname prefix '/cgi-bin'
break
case 101: // 'e'
if (pathname.startsWith('tc/passwd', 2)) return 23 // pathname prefix '/etc/passwd'
break
case 119: // 'w'
if (pathname_length > 2) {
switch (pathname.charCodeAt(2)) {
case 112: // 'p'
if (pathname_length > 3) {
switch (pathname.charCodeAt(3)) {
case 45: // '-'
if (pathname_length > 4) {
switch (pathname.charCodeAt(4)) {
case 97: // 'a'
if (pathname.startsWith('dmin', 5)) return 24 // pathname prefix '/wp-admin'
break
case 99: // 'c'
if (pathname_length > 5) {
switch (pathname.charCodeAt(5)) {
case 111: // 'o'
if (pathname_length > 6) {
switch (pathname.charCodeAt(6)) {
case 110: // 'n'
if (pathname_length > 7) {
switch (pathname.charCodeAt(7)) {
case 102: // 'f'
if (pathname.charCodeAt(8) === 105 && pathname.charCodeAt(9) === 103)
return 25 // pathname prefix '/wp-config'
break
case 116: // 't'
if (pathname.startsWith('ent', 8)) return 26 // pathname prefix '/wp-content'
break
}
}
break
}
}
break
}
}
break
case 105: // 'i'
if (pathname.startsWith('ncludes', 5)) return 27 // pathname prefix '/wp-includes'
break
}
}
break
}
}
break
}
}
break
}
}
break
}
}
if (pathname_length > 0) {
switch (pathname.charCodeAt(pathname_length - 1)) {
case 98: // 'b'
if (pathname.charCodeAt(pathname_length - 2) === 114 && pathname.charCodeAt(pathname_length - 3) === 46) return 28 // pathname suffix '.rb'
break
case 105: // 'i'
if (
pathname.charCodeAt(pathname_length - 2) === 103 &&
pathname.charCodeAt(pathname_length - 3) === 99 &&
pathname.charCodeAt(pathname_length - 4) === 46
)
return 29 // pathname suffix '.cgi'
break
case 107: // 'k'
if (
pathname.charCodeAt(pathname_length - 2) === 97 &&
pathname.charCodeAt(pathname_length - 3) === 98 &&
pathname.charCodeAt(pathname_length - 4) === 46
)
return 30 // pathname suffix '.bak'
break
case 108: // 'l'
if (pathname_length > 1) {
switch (pathname.charCodeAt(pathname_length - 2)) {
case 109: // 'm'
if (pathname.endsWith('/wlwmanifest.x', pathname_length - 2)) return 31 // pathname suffix '/wlwmanifest.xml'
break
case 113: // 'q'
if (pathname.charCodeAt(pathname_length - 3) === 115 && pathname.charCodeAt(pathname_length - 4) === 46)
return 32 // pathname suffix '.sql'
break
}
}
break
case 110: // 'n'
if (pathname.endsWith('/package.jso', pathname_length - 1)) return 33 // pathname suffix '/package.json'
break
case 111: // 'o'
if (pathname.charCodeAt(pathname_length - 2) === 103 && pathname.charCodeAt(pathname_length - 3) === 46) return 34 // pathname suffix '.go'
break
case 112: // 'p'
if (
pathname.charCodeAt(pathname_length - 2) === 104 &&
pathname.charCodeAt(pathname_length - 3) === 112 &&
pathname.charCodeAt(pathname_length - 4) === 46
)
return 35 // pathname suffix '.php'
break
case 114: // 'r'
if (pathname_length > 1) {
switch (pathname.charCodeAt(pathname_length - 2)) {
case 97: // 'a'
if (pathname_length > 2) {
switch (pathname.charCodeAt(pathname_length - 3)) {
case 114: // 'r'
if (pathname.charCodeAt(pathname_length - 4) === 46) return 36 // pathname suffix '.rar'
break
case 116: // 't'
if (pathname.charCodeAt(pathname_length - 4) === 46) return 37 // pathname suffix '.tar'
break
}
}
break
}
}
break
case 116: // 't'
if (pathname_length > 1) {
switch (pathname.charCodeAt(pathname_length - 2)) {
case 97: // 'a'
if (pathname.charCodeAt(pathname_length - 3) === 100 && pathname.charCodeAt(pathname_length - 4) === 46)
return 38 // pathname suffix '.dat'
break
case 102: // 'f'
if (pathname.endsWith('.swi', pathname_length - 2)) return 39 // pathname suffix '.swift'
break
case 120: // 'x'
if (pathname.endsWith('/credentials.t', pathname_length - 2)) return 40 // pathname suffix '/credentials.txt'
break
}
}
break
case 118: // 'v'
if (
pathname.charCodeAt(pathname_length - 2) === 110 &&
pathname.charCodeAt(pathname_length - 3) === 101 &&
pathname.charCodeAt(pathname_length - 4) === 46
)
return 41 // pathname suffix '.env'
break
case 121: // 'y'
if (pathname.charCodeAt(pathname_length - 2) === 112 && pathname.charCodeAt(pathname_length - 3) === 46) return 42 // pathname suffix '.py'
break
case 122: // 'z'
if (pathname.charCodeAt(pathname_length - 2) === 103 && pathname.charCodeAt(pathname_length - 3) === 46) return 43 // pathname suffix '.gz'
break
}
}
/* search_params checks */
const search_params_length = search_params.length
if (search_params.includes('../')) return 44 // search_params contain '../'
/* hostname checks */
const hostname_length = hostname.length
if (hostname_length > 0) {
switch (hostname.charCodeAt(hostname_length - 1)) {
case 109: // 'm'
if (hostname_length > 1) {
switch (hostname.charCodeAt(hostname_length - 2)) {
case 111: // 'o'
if (hostname_length > 2) {
switch (hostname.charCodeAt(hostname_length - 3)) {
case 99: // 'c'
if (hostname_length > 3) {
switch (hostname.charCodeAt(hostname_length - 4)) {
case 46: // '.'
if (hostname_length > 4) {
switch (hostname.charCodeAt(hostname_length - 5)) {
case 101: // 'e'
if (hostname.endsWith('.googl', hostname_length - 5)) return 45 // hostname suffix '.google.com'
break
case 116: // 't'
if (hostname_length > 5) {
switch (hostname.charCodeAt(hostname_length - 6)) {
case 110: // 'n'
if (hostname.endsWith('.bc.googleuserconte', hostname_length - 6)) return 46 // hostname suffix '.bc.googleusercontent.com'
break
case 111: // 'o'
if (hostname.endsWith('.appsp', hostname_length - 6)) return 47 // hostname suffix '.appspot.com'
break
}
}
break
}
}
break
}
}
break
}
}
break
}
}
break
}
}
return 0 // no match