@duckstech/flatbrowser

v1.0.1

Published

a month ago

Semantic headless browser — any website becomes structured JSON ordered by z-index

Downloads

178

0High
0Medium
0Low

dirgogo

headless browser scraping semantic cdp chrome automation z-index

Flatbrowser

A Chrome built for automation and AI. Any website becomes structured JSON — no selectors, no scrapers, no vision models.

Flatbrowser renders pages with Chrome via CDP but returns semantic data instead of pixels. Elements are ordered by z-index (modals and toasts appear before page content), interactive vs. content is separated, and everything behind overlays is marked blocked. Works as a CLI and a programmatic library.

import { flatbrowser } from "@duckstech/flatbrowser";

const fb = flatbrowser();
const s = fb.session("login");

await s.navigate("https://app.example.com/login");
await s.action("label=Email", "fill", "[email protected]");
await s.action("label=Password", "fill", "secret");
await s.action("text=Sign in", "click");
await s.waitForText("Dashboard");

console.log(s.recording());       // every step auto-recorded
console.log(s.exportPuppeteer()); // export as a replayable script

await fb.close();

Why Flatbrowser

| Most tools... | Flatbrowser... | |---|---| | Return pixels (need vision models) | Returns semantic JSON (LLMs read directly) | | Require CSS selectors | Accepts text=Login, role=button, label=Email | | Fight async onclick handlers | Uses CDP input events — no page.evaluate() hangs | | Force you to pass Page everywhere | Session encapsulates everything | | Manual recording | Auto-records every operation | | Ignore z-index | First-class overlay and stacking context handling |

Installation

npm install @duckstech/flatbrowser
# or: bun add @duckstech/flatbrowser / pnpm add @duckstech/flatbrowser / yarn add @duckstech/flatbrowser

Requires Node.js 20+ and Chrome or Chromium installed. Flatbrowser auto-detects Chrome on Linux, macOS, and Windows. Override with CHROME_PATH if needed.

# If Chrome is not in a standard location
export CHROME_PATH=/path/to/chrome

# Or install Chrome programmatically
npx @puppeteer/browsers install chrome@stable

CLI

# Extract any page as structured JSON
npx flatbrowser navigate https://example.com

# Human-readable for terminal or LLM input
npx flatbrowser navigate https://example.com --format text

# Only interactive elements (compact)
npx flatbrowser navigate https://example.com --compact

# Interactive shell with persistent session
npx flatbrowser shell

12 commands: navigate, action, read, screenshot, find, evaluate, console, network, storage, wait, session, shell. See CLI.md for the full reference.

Library

Quick start

import { flatbrowser } from "@duckstech/flatbrowser";

const fb = flatbrowser();

// One-shot: uses the "default" session
const page = await fb.navigate("https://example.com");
console.log(page.title, page.layers);

await fb.close();

Session-based workflow

const fb = flatbrowser();
const s = fb.session("checkout");

await s.navigate("https://shop.example.com/cart");
await s.action("text=Add to cart", "click");
await s.action("text=Checkout", "click");
await s.waitForSelector("form[name=payment]");

// Everything auto-recorded — no store.record() calls
const script = s.exportPuppeteer();
fs.writeFileSync("checkout-flow.ts", script);

await fb.close();

Multi-session (parallel)

Each session has its own page, console log, network log, and recording. Safe for Promise.all.

const fb = flatbrowser();
const buyer = fb.session("buyer");
const seller = fb.session("seller");

await Promise.all([
  buyer.navigate("https://marketplace.com/buy"),
  seller.navigate("https://marketplace.com/sell"),
]);

await fb.close();

Feeding pages to an LLM

import { flatbrowser, formatPageForAI } from "@duckstech/flatbrowser";

const fb = flatbrowser();
const s = fb.session();

const page = await s.navigate("https://news.ycombinator.com");
const prompt = formatPageForAI(page, { compact: true });

// Send `prompt` to Claude, GPT, Gemini — they see the page as text
// e.g. "[a] selector=... 'Show HN: ...' → /item?id=..."

await fb.close();

Recording, export, and replay

const fb = flatbrowser();
const s = fb.session("record-me");

await s.navigate("https://site.com");
await s.action("text=Next", "click");

// Persist the recording
fs.writeFileSync("flow.json", s.exportJson());
fs.writeFileSync("flow.ts", s.exportPuppeteer()); // standalone Puppeteer script

// Later — replay in a fresh session
const replayer = fb.session("replay");
replayer.loadRecording("flow.json");
await replayer.replay();

await fb.close();

Semantic targeting

No CSS selectors required:

| Prefix | Finds by | Example | |---|---|---| | text= | Visible text | text=Sign in | | role= | ARIA role + optional name | role=button[Submit] | | label= | Associated <label> | label=Email | | placeholder= | Placeholder attr | placeholder=Search | | title= | title attr | title=Close | | (no prefix) | Treated as CSS selector | #main > .item |

FlatSession API

All operations are auto-recorded and return the session's new page state where applicable.

s.navigate(url, options?)          // → FlatPage
s.read()                           // → FlatPage
s.action(target, action, value?)   // → FlatPage
s.actions(steps[])                 // → FlatPage (batch)
s.find(target)                     // → { found, selector?, tag?, text? }

s.waitForSelector(sel, { timeout })
s.waitForHidden(sel, { timeout })
s.waitForText(text, { timeout })
s.waitForNetwork({ timeout })

s.evaluate(expression)             // → unknown (JS in page)
s.screenshot({ selector?, fullPage? })  // → Buffer
s.storage({ type, action, key?, value? }) // cookies / localStorage / sessionStorage

s.emulate({ locale?, timezone?, geolocation?, touchEnabled? })

s.console(options?)                // → ConsoleEntry[]
s.network(options?)                // → NetworkEntry[]
s.responseBody(requestId)          // → { body, base64Encoded } | null

s.recording()                      // → RecordedStep[]
s.exportJson()                     // → string
s.exportPuppeteer()                // → string
s.loadRecording(filePath)          // → number (steps loaded)
s.replay()                         // → FlatPage

s.close()                          // close this session's page
s.page()                           // → Puppeteer Page (escape hatch)
s.cdpSession()                     // → CDPSession (escape hatch)

Low-level API

For cases the session API doesn't cover, the standalone functions are still exported:

import {
  newPage,
  extractPage,
  performAction,
  findElement,
  resolveTarget,
  getCDPSession,
  captureScreenshot,
  setEmulation,
} from "@duckstech/flatbrowser";

The `FlatPage` format

{
  url: string,
  title: string,
  layers: [
    {
      zIndex: number,
      isOverlay: boolean,   // true if this layer covers >50% of viewport
      content: FlatElement[],      // h1, p, img, etc.
      interactive: FlatElement[],  // a, button, input, etc.
    },
    // ... sorted by zIndex, highest first
  ],
  forms: [
    { id?, action?, method?, fields: FlatElement[] },
  ],
  timestamp: string,
}

Each FlatElement has tag, role?, text?, href?, rect, zIndex, blocked, selector (unique CSS), plus tag-specific fields (value, checked, placeholder, etc.).

How it works

Chrome renders the page (via CDP, not Puppeteer abstractions).
Flatbrowser injects a vanilla JS extraction script in an isolated world — traverses the DOM, computes effective z-index per element (respects position, opacity, transform, will-change), generates unique selectors, and groups into layers.
Elements beneath overlays (>50% viewport) are marked blocked.
Actions (click, fill, hover) dispatch trusted input events via CDP — avoids the page.evaluate() deadlock when onclick handlers call fetch.
If the main context is corrupted, a DOMSnapshot fallback runs with no JS execution.

License

MIT