navia-ai
v0.25.3
Published
Agente de navegador autónomo con IA (Claude). Abre Chrome o Firefox reales, lee la página y la opera con una sola instrucción en lenguaje natural. Incluye truco CDP anti-Cloudflare.
Downloads
3,836
Maintainers
Readme
███╗ ██╗ █████╗ ██╗ ██╗██╗ █████╗
████╗ ██║██╔══██╗██║ ██║██║██╔══██╗
██╔██╗ ██║███████║██║ ██║██║███████║
██║╚██╗██║██╔══██║╚██╗ ██╔╝██║██╔══██║
██║ ╚████║██║ ██║ ╚████╔╝ ██║██║ ██║
╚═╝ ╚═══╝╚═╝ ╚═╝ ╚═══╝ ╚═╝╚═╝ ╚═╝🌐 Navia
Automate any repetitive task on any web portal — in plain language. Fill and submit forms, update records, create entries, download reports, move data between systems, extract tables… Navia opens a real browser, logs in (solving text captchas locally for free), and does the busywork for you. Just like a person — but tireless.
npm i -g navia-ai && navia[!NOTE] Works with your Anthropic (Claude) API key — or with no key at all, using the
claude/antCLI already signed in on your terminal. No per-site scripts: the AI discovers buttons and fields live.
📑 Table of contents
- What can you automate?
- Why Navia
- Quick start
- How it works
- The login + captcha flow
- CLI usage
- Credentials, 2FA & sessions
- Per-domain memory
- No API key (terminal AI)
- Deterministic macros
- Structured extraction
- Library usage
- MCP server
- Engines
- Responsible use
💡 What can you automate?
Anything you'd do by hand in a web portal, described in one sentence:
navia "log into my-portal.com and fill the new-client form with: name Ada Lovelace, email [email protected], plan Pro"
navia "update my profile phone number to +52 55 1234 5678 and save"
navia "download every invoice from this quarter into my Downloads folder"
navia "go through the pending tickets and mark as resolved the ones older than 30 days"
navia "register these 20 rows from a CSV as new products" --record macro.jsonl # then replay daily, free
navia extract "all clients with name, email and status" --url ... --schema clients.json # web → typed JSON…forms, data entry, updates, bulk actions, downloads, scraping to JSON, moving info between systems — the boring repetitive stuff. The login (and its captcha) is just the first step Navia handles on the way.
✨ Why Navia
| | |
|---|---|
| 🧠 One instruction, not a script | Describe the task in plain language; Navia discovers the buttons/fields and does the steps. No per-site coding. |
| 📝 Forms & data entry on autopilot | Fills inputs, dropdowns, checkboxes, uploads files, submits, and confirms it worked — across multi-step flows. |
| 🔁 Do it once, repeat forever | Record a flow and replay it daily with no LLM, no API key (free & fast). Self-heals if the site changes. |
| 🪄 Zero setup, nothing to remember | Auto-detects login, auto-downloads the browser, auto-installs the captcha reader. You just answer the task. |
| 🔓 Text captchas solved automatically & free | Local OCR reads "PCF53"-style captchas on your machine — no paid service, no API, not the LLM. On by default. |
| 🔐 Secrets the model never sees | Encrypted vault for passwords/2FA, domain-bound (anti-phishing). Injected locally, outside the prompt. |
| 🛡️ Anti-Cloudflare built in | --browser chrome connects via CDP to your real Chrome → navigator.webdriver=false. Not evasion — it's your own browser. |
| 👁️ Reads like a human | Accessibility tree (not pixels), traverses shadow DOM + cross-origin iframes, stable versioned refs. |
| 🎛️ Four primitives — a dial | agent (autonomous), observe (propose), act (run one, no LLM), extract (typed JSON). |
| 💬 Conversation mode | Keeps the browser + session open and takes follow-up commands — do task after task without re-logging in. |
| 📦 CLI + library + MCP server | TypeScript/ESM. Use it from the terminal, your code, or inside Claude Desktop/Code/Cursor. |
🚀 Quick start
npm i -g navia-ai # install once → use the `navia` command
navia # launches the guided wizardOn the first run Navia downloads the browser by itself if missing (no manual playwright install) and installs the local captcha reader on demand. Optionally, set an API key for faster runs (vision + prompt caching); without it, Navia uses the claude/ant CLI on your terminal:
ANTHROPIC_API_KEY=sk-ant-...npx navia-ai "open example.com and tell me what the page is about"Run navia doctor anytime to check your environment.
🔧 How it works
flowchart LR
U([Your instruction]) --> A
subgraph Loop["BrowserAgent · tool-use loop"]
A["🧠 Claude / CLI"] -->|"navigate, click, type, fill_credential…"| D[BrowserDriver]
D -->|"accessibility snapshot + change-observation"| A
end
D --> E{Engine}
E -->|CDP| C[Real Chrome 🔑]
E --> CH[Chromium]
E --> FF[Firefox]
E --> PR[patchright 🥷]
C & CH & FF & PR --> W([🌐 The website])- snapshot = accessibility tree, one
refper element (the AI acts byref).- Chromium/Chrome: built with CDP (
Accessibility.getFullAXTree) — doesn't mutate the DOM, traverses shadow DOM and iframes (cross-origin/OOPIF like Turnstile via a dedicated CDP session),refs are stable (backendNodeId). - Firefox: JS-injection snapshot as fallback.
refs are versioned (v<N>:id): using a stale ref from an old snapshot is rejected instead of hitting the wrong node.
- Chromium/Chrome: built with CDP (
- evaluate runs JS for bulk extraction or stubborn clicks (gate it off with
--no-eval). batch_actions runs several actions in one tool call. - detectChallenge recognizes anti-bot walls (Cloudflare/Turnstile/hCaptcha/reCAPTCHA/DataDome).
- The system prompt treats all page content as untrusted data, never instructions (prompt-injection spotlighting).
🔓 The login + captcha flow
Most portal automation starts behind a login. This is the part that usually breaks other tools — Navia makes it fully automatic, deterministic, no loops — so it can get to the actual task (the form, the update, the report):
flowchart TD
S([Login page]) --> U[Type username]
U --> P["fill_credential password — never seen by the model"]
P --> SUB{About to submit?}
SUB -->|"captcha empty"| OCR["🔓 Local OCR reads the captcha<br/>ddddocr · free · on your machine"]
OCR --> CL["Click 'Sign in' — same step"]
SUB -->|"no captcha"| CL
CL --> V{assessLoginOutcome}
V -->|"private URL + logout link + no error"| OK([✅ Logged in])
V -->|"still on login / error"| RETRY["Re-type & retry · max 2-3 · then stop honestly"]
RETRY --> SUB
OCR -.->|"cannot read / disabled"| HUMAN["🙋 Hand the window to you"]- Text captchas → solved automatically by local OCR before submitting (default
--captcha local). - Empty captcha → submit is blocked (no blind sends, no infinite loops; hard retry cap).
- Interactive captchas (reCAPTCHA grid, hCaptcha, sliders) & 2FA → handed to you.
- Success is verified — Navia won't claim "logged in" unless it really is.
The LLM is never asked to "solve" a captcha (Claude declines that by policy). The OCR is a separate, dedicated, local tool — for your own authorized accounts.
🖥️ CLI usage
# Guided wizard (recommended): just run navia
navia
# → asks the start URL, auto-detects login, asks user + hidden password,
# the task, the browser, and where to save the journal. Captcha is automatic.
# Conversational: keeps the session open and asks "what now?". Press ESC to quit.
# Direct task
navia "search 't-shirts' on example-shop.com and list the first 5 with prices"
# Conversation mode for a one-off too (stays open, asks for the next)
navia run "explore this site and map its sections" --chat
# Cloudflare-walled sites → real Chrome via CDP
navia chrome # 1) launch Chrome with debugging
navia run "search jobs on {portal}" --browser chrome # 2) the tasknavia "..." --browser firefox|chrome|patchright # engine (default chromium)
navia "..." --headless # no visible window
navia "..." --slow-mo 300 # go slow (anti rate-limit)
navia "..." --start-url https://... # open a URL before starting
navia "..." --model claude-opus-4-8 # another model
navia "..." --workspace # per-task log/brain folder (asks where)
navia "..." --validate # an LLM judge re-checks the result and retries once
navia "..." --captcha off # disable local captcha OCR (default: local)
navia "..." --no-eval # disable the evaluate JS tool (untrusted sites)
navia "..." --allow-domain example.com # network allow-list (repeatable, anti-exfiltration)
navia "..." --yes # auto-approve irreversible actions (TEST ONLY)Set your defaults once · scaffold a project
navia init # save model/engine/profile/provider to ~/.navia/config.json
navia create my-bot # scaffold: navia.config.json, .env.example, tasks.txt, run.mjsPrecedence: CLI flag > env var > ~/.navia/config.json > built-in default.
🔐 Credentials, 2FA & sessions
Store passwords / 2FA in an encrypted vault; the AI uses them by key but never sees the value:
navia secret set shop.password # prompts, hidden
navia secret set shop.password --origin https://accounts.x.com # bind it: only fills on this origin
navia secret totp shop.2fa # TOTP base32 from your authenticator
navia secret list # keys only, no valuesIn a task the AI uses fill_credential(ref, "shop.password") / fill_totp(ref, "shop.2fa") — the real value is injected locally, outside the prompt.
- 🔒 Encrypted by default (AES-256-GCM, auto-key at
~/.navia/key). SetNAVIA_SECRETfor your own passphrase (key never touches disk). - 🎯 Domain binding (anti-phishing): with
--origin, the secret fills only when the element's real frame origin matches — typing your password into an unexpected/cross-origin frame is hard-rejected.
navia login my-portal --start-url https://my-portal.com/login # sign in once, save the profile
navia run "download my latest invoice" --profile my-portal # reuse it, already authenticatedProfiles live in ~/.navia/profiles/ (gitignored), encrypted.
🧠 Per-domain memory (playbooks)
Navia learns reusable "operating tips" per site and re-injects them next time it visits — so it stops rediscovering each site from scratch.
navia playbook add example.com --note "the 'Sign in' button enables only after re-typing the email"
navia playbook show example.com
navia playbook listTips are also captured automatically from your wait_for_human notes. Disable with --no-memory. Stored in ~/.navia/playbooks/.
🔑 No API key — use your terminal's AI CLI
Navia can "think" with an AI CLI already authenticated on your terminal, with no ANTHROPIC_API_KEY:
navia run "..." --provider claude-cli # uses `claude` (Claude Code)
navia run "..." --provider claude-cli --cli-command ant # recommended: Anthropic CLIauto(default): API key if present; otherwise theclaudeCLI.antrecommended:ant auth loginonce → clean single-shot completion over your login.claudeworks as a slower fallback.- Any other terminal AI:
NAVIA_CLI_CMD="my-cli --flags".
CLI mode spawns one process per step → slower than
--provider api, but needs no key. With theclaude/antCLI, Navia can also pass the captcha image to it for tasks that need vision.
🔁 Deterministic macros (record & replay, no AI)
Record once, replay forever with no LLM and no API key — fast and free. Replay uses stable locators (role + name) and self-heals if the site drifts:
navia "sign in and download this month's invoice" --record ./invoice.jsonl
navia replay ./invoice.jsonl --profile my-portalSecrets aren't stored in the macro: fill_credential/fill_totp are re-injected fresh from the vault each replay.
🧱 Structured extraction (web → typed JSON)
Get schema-validated data: Navia forces the model to answer through a tool whose schema is your schema (with retry). Requires an API key.
navia extract "the first 5 products with name and price" --url https://example-shop.com --schema ./schema.jsonimport { extract } from "navia-ai";
const data = await extract({
url: "https://news.example.com",
instruction: "the top 5 headlines with title and points",
schema: {
type: "object",
properties: {
items: { type: "array", items: { type: "object",
properties: { title: { type: "string" }, points: { type: "number" } }, required: ["title"] } },
},
required: ["items"],
},
});📊 Reliability & evals
Every run reports metrics beyond pass/fail (steps, tokens, recoveries, repeated-action loops). Benchmark on live-site tasks with an LLM judge:
navia eval --dataset ./tasks.jsonl --report ./report.json # Online-Mind2Web-ish; ships a sample set🧑💻 Library usage
import { runNavia } from "navia-ai";
const { summary, steps, metrics } = await runNavia({
task: "Open example.com and extract all the main-menu links",
browser: "chromium",
validate: true,
hooks: { log: (m) => console.log(m) },
});
console.log(summary, metrics); // steps, toolCalls, toolErrors, tokensIn/Out, recoveries, loopHitsSee candidate actions without running them, then run exactly one — by ref, with no extra LLM call.
import { BrowserDriver, observe, act } from "navia-ai";
const driver = await BrowserDriver.create({ engine: "chromium" });
await driver.navigate("https://example.com");
const actions = await observe({ instruction: "the 'More information' link", driver });
await act(actions[0], { driver }); // deterministic, no LLM
// or one-shot: await act("click 'More information'", { driver });🔌 As an MCP server (Claude Desktop / Code / Cursor)
Navia exposes its browser tools as an MCP server — the client's model drives them (CDP snapshot, stable refs, captcha detection, profiles, vault).
Claude Code:
claude mcp add navia -- npx -y navia-ai mcp --browser chromiumClaude Desktop / Cursor (JSON):
{ "mcpServers": { "navia": { "command": "npx", "args": ["-y", "navia-ai", "mcp", "--browser", "chromium"] } } }🔐 Secure credential elicitation: if a task needs a vault secret that isn't stored, the server asks you through your client's secure prompt (MCP elicitation) and saves it encrypted — never through the model.
🧭 Browser engines
| Engine | When to use it |
|---|---|
| chromium (default) | Most sites. |
| firefox | Alternative; some portals behave better. |
| chrome (CDP) | 🔑 Cloudflare-walled sites. Launches your real Chrome and connects via CDP. |
| patchright | 🥷 Anti-detection without pre-opening Chrome (removes the Runtime.enable leak). Opt-in: npm i patchright. |
⚠️ Responsible use
Navia drives a real browser with your credentials and session. Use it only on sites and accounts you own or are authorized to access, respecting their Terms of Service. The CDP mode does not forcibly bypass protections — it uses your real browser. Navia bundles no third-party (paid) captcha-solving services; the local OCR is a dedicated tool for your own authorized login, and interactive/behavioral captchas + 2FA are always handed to you.
Made with ❤️ for people tired of doing the same portal busywork every day.
