kashsearch

v0.3.0

Published

17 days ago

Browser automation for AI coding agents — intent-based, token-efficient, agent-first

0High
0Medium
0Low

lkash

browser automation ai agent mcp claude cdp chrome token-efficient

KashSearch

Browser automation built for AI agents, not humans. Intent-based, token-efficient, agent-first.

Why KashSearch?

Current browser tools (Playwright MCP, agent-browser) were built for human developers and adapted for AI. They dump thousands of tokens of DOM trees and accessibility snapshots into the context window. KashSearch was designed from the agent's perspective:

20-50x fewer tokens -- compressed semantic page model instead of raw DOM dumps
Intent-based actions -- act("click Sign In") not click('[data-testid="signin-btn"]')
Diff-only responses -- after the first page load, only sends what changed
Smart queries -- assert("page title contains Dashboard") costs 5 tokens, not a 1,000-token screenshot

Quick Start

npm install -g kashsearch
kashsearch init  # writes MCP config to .claude/settings.json

That's it. Claude Code picks it up automatically.

How It Works

A real GitHub page through Playwright MCP's accessibility snapshot (~2,000 tokens):

- navigation "Global"
  - link "Skip to content"
  - link "Homepage" [ref=s1e3]
  - button "Toggle navigation" [ref=s1e4]
  - link "Sign in" [ref=s1e5]
  - link "Sign up" [ref=s1e6]
  - search "Search or jump to..."
    - combobox "Search or jump to..." [ref=s1e7]
  - link "Product" [ref=s1e8]
  - link "Solutions" [ref=s1e9]
  - link "Resources" [ref=s1e10]
  - link "Open Source" [ref=s1e11]
  - link "Enterprise" [ref=s1e12]
  - link "Pricing" [ref=s1e13]
- main
  - heading "Build and ship software on a single..."
  - link "Sign up for GitHub" [ref=s1e14]
  - link "Start a free enterprise trial" [ref=s1e15]
  ...400+ more nodes

The same page through KashSearch (~50 tokens):

[nav: Skip to content(e0) | Homepage(e1) | Sign in(e2) | Sign up(e3) | Product(e4) | Solutions(e5) | Resources(e6) | Open Source(e7) | Enterprise(e8) | Pricing(e9)]
[input: Search or jump to...(e10)]
[h1: Build and ship software on a single...]
[link: Sign up for GitHub(e11)]
[link: Start a free enterprise trial(e12)]

After clicking "Sign in", Playwright dumps the full page again (~2,000 more tokens). KashSearch sends only the diff:

added: [input: Username or email(e13), input: Password(e14), btn: Sign in(e15)]

Tools

Core Actions

| Tool | Params | Description | Example | |------|--------|-------------|---------| | navigate | url | Go to a URL | navigate({url: "https://github.com"}) | | act | intent | Perform intent-based actions | act({intent: "click Sign In"}) | | query | question | Ask about the page | query({question: "what is the page title?"}) | | assert | condition | Check a page condition | assert({condition: "url contains /dashboard"}) | | scope | region | Focus on a page region | scope({region: "navigation"}) | | extract | what | Pull structured data | extract({what: "all links"}) | | screenshot | region? | Capture the viewport | screenshot({region: "header"}) | | inject | js | Run JavaScript in page | inject({js: "document.title"}) |

Navigation

| Tool | Params | Description | |------|--------|-------------| | back | -- | Browser back | | forward | -- | Browser forward | | tabs | -- | List open tabs | | switch_tab | tabId | Switch to tab by index or ID | | reorient | -- | Rebuild the page model from scratch |

Monitoring

| Tool | Params | Description | Example | |------|--------|-------------|---------| | watch | condition, timeout? | Wait for a condition | watch({condition: "Sign in visible"}) | | intercept | pattern | Monitor network requests | intercept({pattern: "/api/users"}) | | diff | -- | Show changes since last snapshot | |

Automation

| Tool | Params | Description | Example | |------|--------|-------------|---------| | record | command | Record action sequences | record({command: "start login-flow"}) | | replay | id | Replay a recording | replay({id: "login-flow"}) |

Action Batching

The act tool parses compound intents separated by commas or "then":

act({intent: "fill Username with admin, fill Password with secret, click Sign In"})

This executes as three steps in sequence, rebuilding the page model between each step. The response includes only the final diff -- not three separate snapshots.

Element Resolution

When you say act({intent: "click Sign In"}), KashSearch resolves "Sign In" through a 5-stage pipeline:

ID match -- e2 resolves directly to the registered element
Exact text -- case-insensitive match against element text
Semantic -- partial word matching across text and role ("sign" matches "Sign In")
Spatial -- "button near Password" uses bounding box proximity
Fuzzy -- Levenshtein-style similarity for typos (threshold: 0.6)

The first match wins. Confidence scores are returned so the agent can decide whether to retry with a more specific reference.

Plugin System

import { KashSearchEngine } from "kashsearch";
import type { ExtractorPlugin } from "kashsearch";

const priceExtractor: ExtractorPlugin = {
  name: "prices",
  description: "Extract product prices from the page",
  keywords: ["price", "cost", "pricing"],
  extract: async (engine) => {
    const result = await engine.browser.evaluate(`
      JSON.stringify([...document.querySelectorAll('[class*="price"]')]
        .map(el => ({ text: el.textContent.trim() })))
    `);
    return result as string;
  },
};

const engine = new KashSearchEngine();
engine.plugins.registerExtractor(priceExtractor);

When the agent calls extract({what: "product prices"}), the keyword "price" matches and your custom extractor runs instead of the built-in logic.

CLI Usage

# Start MCP server (used by Claude Code)
kashsearch serve
kashsearch serve --headed    # visible browser window

# Write MCP config for Claude Code
kashsearch init

# Standalone commands (launch browser, run, exit)
kashsearch navigate https://example.com
kashsearch act "click Sign In"
kashsearch query "how many links?"
kashsearch assert "page title contains Example"
kashsearch extract "all links"
kashsearch screenshot
kashsearch inject "document.title"
kashsearch tabs
kashsearch switch-tab 2
kashsearch watch "Sign in visible" --timeout 5000
kashsearch intercept "/api/*"
kashsearch record "start my-flow"
kashsearch replay my-flow

Configuration

MCP (Claude Code)

Run kashsearch init in your project root, or add manually to .claude/settings.json:

{
  "mcpServers": {
    "kashsearch": {
      "command": "kashsearch",
      "args": ["serve"]
    }
  }
}

Headed Mode

Pass --headed to see the browser window:

kashsearch serve --headed

Or when using the engine programmatically:

const engine = new KashSearchEngine({ headless: false });

Architecture

KashSearch has four subsystems:

                    MCP Server (stdio)
                         |
                   KashSearchEngine
                    /    |    \     \
            Browser  Semantic  Intent  Response
            Manager  Model     Resolver Formatter
               |     Builder      |
            Chrome      |     Action
             CDP    Element   Executor
                   Registry     |
                       |     Diff
                    Plugin   Engine
                   Registry

Browser Manager -- launches Chrome, manages CDP connection, handles navigation and JS evaluation
Semantic Model Builder -- walks the DOM, compresses interactive elements into a compact notation ([btn: Sign In(e2)]), collapses navigation links into single lines
Intent Resolver -- parses natural-language intents into action steps, resolves element references through the 5-stage pipeline
Diff Engine -- compares consecutive page models, emits only what changed (added, removed, modified elements). When >70% of the page changed, sends the full compressed model instead

Supporting modules: ElementRegistry (stable IDs across page rebuilds), ResponseFormatter (consistent output shape), Recorder (capture/replay action sequences), PluginRegistry (custom extractors).

Development

git clone https://github.com/nicktash/kashsearch.git
cd kashsearch
npm install
npm test
npm run build

npm run dev          # tsc --watch
npm run test:watch   # vitest in watch mode

License

MIT