kashsearch
v0.3.0
Published
Browser automation for AI coding agents — intent-based, token-efficient, agent-first
Maintainers
Readme
KashSearch
Browser automation built for AI agents, not humans. Intent-based, token-efficient, agent-first.
Why KashSearch?
Current browser tools (Playwright MCP, agent-browser) were built for human developers and adapted for AI. They dump thousands of tokens of DOM trees and accessibility snapshots into the context window. KashSearch was designed from the agent's perspective:
- 20-50x fewer tokens -- compressed semantic page model instead of raw DOM dumps
- Intent-based actions --
act("click Sign In")notclick('[data-testid="signin-btn"]') - Diff-only responses -- after the first page load, only sends what changed
- Smart queries --
assert("page title contains Dashboard")costs 5 tokens, not a 1,000-token screenshot
Quick Start
npm install -g kashsearch
kashsearch init # writes MCP config to .claude/settings.jsonThat's it. Claude Code picks it up automatically.
How It Works
A real GitHub page through Playwright MCP's accessibility snapshot (~2,000 tokens):
- navigation "Global"
- link "Skip to content"
- link "Homepage" [ref=s1e3]
- button "Toggle navigation" [ref=s1e4]
- link "Sign in" [ref=s1e5]
- link "Sign up" [ref=s1e6]
- search "Search or jump to..."
- combobox "Search or jump to..." [ref=s1e7]
- link "Product" [ref=s1e8]
- link "Solutions" [ref=s1e9]
- link "Resources" [ref=s1e10]
- link "Open Source" [ref=s1e11]
- link "Enterprise" [ref=s1e12]
- link "Pricing" [ref=s1e13]
- main
- heading "Build and ship software on a single..."
- link "Sign up for GitHub" [ref=s1e14]
- link "Start a free enterprise trial" [ref=s1e15]
...400+ more nodesThe same page through KashSearch (~50 tokens):
[nav: Skip to content(e0) | Homepage(e1) | Sign in(e2) | Sign up(e3) | Product(e4) | Solutions(e5) | Resources(e6) | Open Source(e7) | Enterprise(e8) | Pricing(e9)]
[input: Search or jump to...(e10)]
[h1: Build and ship software on a single...]
[link: Sign up for GitHub(e11)]
[link: Start a free enterprise trial(e12)]After clicking "Sign in", Playwright dumps the full page again (~2,000 more tokens). KashSearch sends only the diff:
added: [input: Username or email(e13), input: Password(e14), btn: Sign in(e15)]Tools
Core Actions
| Tool | Params | Description | Example |
|------|--------|-------------|---------|
| navigate | url | Go to a URL | navigate({url: "https://github.com"}) |
| act | intent | Perform intent-based actions | act({intent: "click Sign In"}) |
| query | question | Ask about the page | query({question: "what is the page title?"}) |
| assert | condition | Check a page condition | assert({condition: "url contains /dashboard"}) |
| scope | region | Focus on a page region | scope({region: "navigation"}) |
| extract | what | Pull structured data | extract({what: "all links"}) |
| screenshot | region? | Capture the viewport | screenshot({region: "header"}) |
| inject | js | Run JavaScript in page | inject({js: "document.title"}) |
Navigation
| Tool | Params | Description |
|------|--------|-------------|
| back | -- | Browser back |
| forward | -- | Browser forward |
| tabs | -- | List open tabs |
| switch_tab | tabId | Switch to tab by index or ID |
| reorient | -- | Rebuild the page model from scratch |
Monitoring
| Tool | Params | Description | Example |
|------|--------|-------------|---------|
| watch | condition, timeout? | Wait for a condition | watch({condition: "Sign in visible"}) |
| intercept | pattern | Monitor network requests | intercept({pattern: "/api/users"}) |
| diff | -- | Show changes since last snapshot | |
Automation
| Tool | Params | Description | Example |
|------|--------|-------------|---------|
| record | command | Record action sequences | record({command: "start login-flow"}) |
| replay | id | Replay a recording | replay({id: "login-flow"}) |
Action Batching
The act tool parses compound intents separated by commas or "then":
act({intent: "fill Username with admin, fill Password with secret, click Sign In"})This executes as three steps in sequence, rebuilding the page model between each step. The response includes only the final diff -- not three separate snapshots.
Element Resolution
When you say act({intent: "click Sign In"}), KashSearch resolves "Sign In" through a 5-stage pipeline:
- ID match --
e2resolves directly to the registered element - Exact text -- case-insensitive match against element text
- Semantic -- partial word matching across text and role (
"sign"matches"Sign In") - Spatial --
"button near Password"uses bounding box proximity - Fuzzy -- Levenshtein-style similarity for typos (threshold: 0.6)
The first match wins. Confidence scores are returned so the agent can decide whether to retry with a more specific reference.
Plugin System
Register custom extractors that hook into the extract tool:
import { KashSearchEngine } from "kashsearch";
import type { ExtractorPlugin } from "kashsearch";
const priceExtractor: ExtractorPlugin = {
name: "prices",
description: "Extract product prices from the page",
keywords: ["price", "cost", "pricing"],
extract: async (engine) => {
const result = await engine.browser.evaluate(`
JSON.stringify([...document.querySelectorAll('[class*="price"]')]
.map(el => ({ text: el.textContent.trim() })))
`);
return result as string;
},
};
const engine = new KashSearchEngine();
engine.plugins.registerExtractor(priceExtractor);When the agent calls extract({what: "product prices"}), the keyword "price" matches and your custom extractor runs instead of the built-in logic.
CLI Usage
# Start MCP server (used by Claude Code)
kashsearch serve
kashsearch serve --headed # visible browser window
# Write MCP config for Claude Code
kashsearch init
# Standalone commands (launch browser, run, exit)
kashsearch navigate https://example.com
kashsearch act "click Sign In"
kashsearch query "how many links?"
kashsearch assert "page title contains Example"
kashsearch extract "all links"
kashsearch screenshot
kashsearch inject "document.title"
kashsearch tabs
kashsearch switch-tab 2
kashsearch watch "Sign in visible" --timeout 5000
kashsearch intercept "/api/*"
kashsearch record "start my-flow"
kashsearch replay my-flowConfiguration
MCP (Claude Code)
Run kashsearch init in your project root, or add manually to .claude/settings.json:
{
"mcpServers": {
"kashsearch": {
"command": "kashsearch",
"args": ["serve"]
}
}
}Headed Mode
Pass --headed to see the browser window:
kashsearch serve --headedOr when using the engine programmatically:
const engine = new KashSearchEngine({ headless: false });Architecture
KashSearch has four subsystems:
MCP Server (stdio)
|
KashSearchEngine
/ | \ \
Browser Semantic Intent Response
Manager Model Resolver Formatter
| Builder |
Chrome | Action
CDP Element Executor
Registry |
| Diff
Plugin Engine
Registry- Browser Manager -- launches Chrome, manages CDP connection, handles navigation and JS evaluation
- Semantic Model Builder -- walks the DOM, compresses interactive elements into a compact notation (
[btn: Sign In(e2)]), collapses navigation links into single lines - Intent Resolver -- parses natural-language intents into action steps, resolves element references through the 5-stage pipeline
- Diff Engine -- compares consecutive page models, emits only what changed (added, removed, modified elements). When >70% of the page changed, sends the full compressed model instead
Supporting modules: ElementRegistry (stable IDs across page rebuilds), ResponseFormatter (consistent output shape), Recorder (capture/replay action sequences), PluginRegistry (custom extractors).
Development
git clone https://github.com/nicktash/kashsearch.git
cd kashsearch
npm install
npm test
npm run buildnpm run dev # tsc --watch
npm run test:watch # vitest in watch modeLicense
MIT
