navia-ai

v0.25.3

Published

12 hours ago

Agente de navegador autónomo con IA (Claude). Abre Chrome o Firefox reales, lee la página y la opera con una sola instrucción en lenguaje natural. Incluye truco CDP anti-Cloudflare.

Downloads

3,836

0High
0Medium
0Low

samael-azull1

browser-automation ai-agent claude anthropic playwright web-automation cloudflare cdp chrome firefox llm agent

███╗   ██╗ █████╗ ██╗   ██╗██╗ █████╗
████╗  ██║██╔══██╗██║   ██║██║██╔══██╗
██╔██╗ ██║███████║██║   ██║██║███████║
██║╚██╗██║██╔══██║╚██╗ ██╔╝██║██╔══██║
██║ ╚████║██║  ██║ ╚████╔╝ ██║██║  ██║
╚═╝  ╚═══╝╚═╝  ╚═╝  ╚═══╝  ╚═╝╚═╝  ╚═╝

🌐 Navia

Automate any repetitive task on any web portal — in plain language. Fill and submit forms, update records, create entries, download reports, move data between systems, extract tables… Navia opens a real browser, logs in (solving text captchas locally for free), and does the busywork for you. Just like a person — but tireless.

npm i -g navia-ai   &&   navia

[!NOTE] Works with your Anthropic (Claude) API key — or with no key at all, using the claude/ant CLI already signed in on your terminal. No per-site scripts: the AI discovers buttons and fields live.

📑 Table of contents

💡 What can you automate?

Anything you'd do by hand in a web portal, described in one sentence:

navia "log into my-portal.com and fill the new-client form with: name Ada Lovelace, email [email protected], plan Pro"
navia "update my profile phone number to +52 55 1234 5678 and save"
navia "download every invoice from this quarter into my Downloads folder"
navia "go through the pending tickets and mark as resolved the ones older than 30 days"
navia "register these 20 rows from a CSV as new products" --record macro.jsonl   # then replay daily, free
navia extract "all clients with name, email and status" --url ... --schema clients.json   # web → typed JSON

…forms, data entry, updates, bulk actions, downloads, scraping to JSON, moving info between systems — the boring repetitive stuff. The login (and its captcha) is just the first step Navia handles on the way.

✨ Why Navia

| | | |---|---| | 🧠 One instruction, not a script | Describe the task in plain language; Navia discovers the buttons/fields and does the steps. No per-site coding. | | 📝 Forms & data entry on autopilot | Fills inputs, dropdowns, checkboxes, uploads files, submits, and confirms it worked — across multi-step flows. | | 🔁 Do it once, repeat forever | Record a flow and replay it daily with no LLM, no API key (free & fast). Self-heals if the site changes. | | 🪄 Zero setup, nothing to remember | Auto-detects login, auto-downloads the browser, auto-installs the captcha reader. You just answer the task. | | 🔓 Text captchas solved automatically & free | Local OCR reads "PCF53"-style captchas on your machine — no paid service, no API, not the LLM. On by default. | | 🔐 Secrets the model never sees | Encrypted vault for passwords/2FA, domain-bound (anti-phishing). Injected locally, outside the prompt. | | 🛡️ Anti-Cloudflare built in | --browser chrome connects via CDP to your real Chrome → navigator.webdriver=false. Not evasion — it's your own browser. | | 👁️ Reads like a human | Accessibility tree (not pixels), traverses shadow DOM + cross-origin iframes, stable versioned refs. | | 🎛️ Four primitives — a dial | agent (autonomous), observe (propose), act (run one, no LLM), extract (typed JSON). | | 💬 Conversation mode | Keeps the browser + session open and takes follow-up commands — do task after task without re-logging in. | | 📦 CLI + library + MCP server | TypeScript/ESM. Use it from the terminal, your code, or inside Claude Desktop/Code/Cursor. |

🚀 Quick start

npm i -g navia-ai     # install once → use the `navia` command
navia                 # launches the guided wizard

On the first run Navia downloads the browser by itself if missing (no manual playwright install) and installs the local captcha reader on demand. Optionally, set an API key for faster runs (vision + prompt caching); without it, Navia uses the claude/ant CLI on your terminal:

ANTHROPIC_API_KEY=sk-ant-...

npx navia-ai "open example.com and tell me what the page is about"

Run navia doctor anytime to check your environment.

🔧 How it works

flowchart LR
    U([Your instruction]) --> A
    subgraph Loop["BrowserAgent · tool-use loop"]
      A["🧠 Claude / CLI"] -->|"navigate, click, type, fill_credential…"| D[BrowserDriver]
      D -->|"accessibility snapshot + change-observation"| A
    end
    D --> E{Engine}
    E -->|CDP| C[Real Chrome 🔑]
    E --> CH[Chromium]
    E --> FF[Firefox]
    E --> PR[patchright 🥷]
    C & CH & FF & PR --> W([🌐 The website])

snapshot = accessibility tree, one ref per element (the AI acts by ref).
- Chromium/Chrome: built with CDP (Accessibility.getFullAXTree) — doesn't mutate the DOM, traverses shadow DOM and iframes (cross-origin/OOPIF like Turnstile via a dedicated CDP session), refs are stable (backendNodeId).
- Firefox: JS-injection snapshot as fallback.
- refs are versioned (v<N>:id): using a stale ref from an old snapshot is rejected instead of hitting the wrong node.
evaluate runs JS for bulk extraction or stubborn clicks (gate it off with --no-eval). batch_actions runs several actions in one tool call.
detectChallenge recognizes anti-bot walls (Cloudflare/Turnstile/hCaptcha/reCAPTCHA/DataDome).
The system prompt treats all page content as untrusted data, never instructions (prompt-injection spotlighting).

🔓 The login + captcha flow

Most portal automation starts behind a login. This is the part that usually breaks other tools — Navia makes it fully automatic, deterministic, no loops — so it can get to the actual task (the form, the update, the report):

flowchart TD
    S([Login page]) --> U[Type username]
    U --> P["fill_credential password — never seen by the model"]
    P --> SUB{About to submit?}
    SUB -->|"captcha empty"| OCR["🔓 Local OCR reads the captcha<br/>ddddocr · free · on your machine"]
    OCR --> CL["Click 'Sign in' — same step"]
    SUB -->|"no captcha"| CL
    CL --> V{assessLoginOutcome}
    V -->|"private URL + logout link + no error"| OK([✅ Logged in])
    V -->|"still on login / error"| RETRY["Re-type & retry · max 2-3 · then stop honestly"]
    RETRY --> SUB
    OCR -.->|"cannot read / disabled"| HUMAN["🙋 Hand the window to you"]

Text captchas → solved automatically by local OCR before submitting (default --captcha local).
Empty captcha → submit is blocked (no blind sends, no infinite loops; hard retry cap).
Interactive captchas (reCAPTCHA grid, hCaptcha, sliders) & 2FA → handed to you.
Success is verified — Navia won't claim "logged in" unless it really is.

The LLM is never asked to "solve" a captcha (Claude declines that by policy). The OCR is a separate, dedicated, local tool — for your own authorized accounts.

🖥️ CLI usage

# Guided wizard (recommended): just run navia
navia
#  → asks the start URL, auto-detects login, asks user + hidden password,
#    the task, the browser, and where to save the journal. Captcha is automatic.
#    Conversational: keeps the session open and asks "what now?". Press ESC to quit.

# Direct task
navia "search 't-shirts' on example-shop.com and list the first 5 with prices"

# Conversation mode for a one-off too (stays open, asks for the next)
navia run "explore this site and map its sections" --chat

# Cloudflare-walled sites → real Chrome via CDP
navia chrome                                          # 1) launch Chrome with debugging
navia run "search jobs on {portal}" --browser chrome  # 2) the task

navia "..." --browser firefox|chrome|patchright   # engine (default chromium)
navia "..." --headless                            # no visible window
navia "..." --slow-mo 300                         # go slow (anti rate-limit)
navia "..." --start-url https://...               # open a URL before starting
navia "..." --model claude-opus-4-8               # another model
navia "..." --workspace                           # per-task log/brain folder (asks where)
navia "..." --validate                            # an LLM judge re-checks the result and retries once
navia "..." --captcha off                         # disable local captcha OCR (default: local)
navia "..." --no-eval                             # disable the evaluate JS tool (untrusted sites)
navia "..." --allow-domain example.com            # network allow-list (repeatable, anti-exfiltration)
navia "..." --yes                                 # auto-approve irreversible actions (TEST ONLY)

Set your defaults once · scaffold a project

navia init                     # save model/engine/profile/provider to ~/.navia/config.json
navia create my-bot            # scaffold: navia.config.json, .env.example, tasks.txt, run.mjs

Precedence: CLI flag > env var > ~/.navia/config.json > built-in default.

🔐 Credentials, 2FA & sessions

Store passwords / 2FA in an encrypted vault; the AI uses them by key but never sees the value:

navia secret set shop.password                                   # prompts, hidden
navia secret set shop.password --origin https://accounts.x.com   # bind it: only fills on this origin
navia secret totp shop.2fa                                       # TOTP base32 from your authenticator
navia secret list                                                # keys only, no values

In a task the AI uses fill_credential(ref, "shop.password") / fill_totp(ref, "shop.2fa") — the real value is injected locally, outside the prompt.

🔒 Encrypted by default (AES-256-GCM, auto-key at ~/.navia/key). Set NAVIA_SECRET for your own passphrase (key never touches disk).
🎯 Domain binding (anti-phishing): with --origin, the secret fills only when the element's real frame origin matches — typing your password into an unexpected/cross-origin frame is hard-rejected.

navia login my-portal --start-url https://my-portal.com/login   # sign in once, save the profile
navia run "download my latest invoice" --profile my-portal      # reuse it, already authenticated

Profiles live in ~/.navia/profiles/ (gitignored), encrypted.

🧠 Per-domain memory (playbooks)

Navia learns reusable "operating tips" per site and re-injects them next time it visits — so it stops rediscovering each site from scratch.

navia playbook add example.com --note "the 'Sign in' button enables only after re-typing the email"
navia playbook show example.com
navia playbook list

Tips are also captured automatically from your wait_for_human notes. Disable with --no-memory. Stored in ~/.navia/playbooks/.

🔑 No API key — use your terminal's AI CLI

Navia can "think" with an AI CLI already authenticated on your terminal, with no ANTHROPIC_API_KEY:

navia run "..." --provider claude-cli                    # uses `claude` (Claude Code)
navia run "..." --provider claude-cli --cli-command ant  # recommended: Anthropic CLI

auto (default): API key if present; otherwise the claude CLI.
ant recommended: ant auth login once → clean single-shot completion over your login. claude works as a slower fallback.
Any other terminal AI: NAVIA_CLI_CMD="my-cli --flags".

CLI mode spawns one process per step → slower than --provider api, but needs no key. With the claude/ant CLI, Navia can also pass the captcha image to it for tasks that need vision.

🔁 Deterministic macros (record & replay, no AI)

Record once, replay forever with no LLM and no API key — fast and free. Replay uses stable locators (role + name) and self-heals if the site drifts:

navia "sign in and download this month's invoice" --record ./invoice.jsonl
navia replay ./invoice.jsonl --profile my-portal

Secrets aren't stored in the macro: fill_credential/fill_totp are re-injected fresh from the vault each replay.

🧱 Structured extraction (web → typed JSON)

Get schema-validated data: Navia forces the model to answer through a tool whose schema is your schema (with retry). Requires an API key.

navia extract "the first 5 products with name and price" --url https://example-shop.com --schema ./schema.json

import { extract } from "navia-ai";

const data = await extract({
  url: "https://news.example.com",
  instruction: "the top 5 headlines with title and points",
  schema: {
    type: "object",
    properties: {
      items: { type: "array", items: { type: "object",
        properties: { title: { type: "string" }, points: { type: "number" } }, required: ["title"] } },
    },
    required: ["items"],
  },
});

📊 Reliability & evals

Every run reports metrics beyond pass/fail (steps, tokens, recoveries, repeated-action loops). Benchmark on live-site tasks with an LLM judge:

navia eval --dataset ./tasks.jsonl --report ./report.json   # Online-Mind2Web-ish; ships a sample set

🧑‍💻 Library usage

import { runNavia } from "navia-ai";

const { summary, steps, metrics } = await runNavia({
  task: "Open example.com and extract all the main-menu links",
  browser: "chromium",
  validate: true,
  hooks: { log: (m) => console.log(m) },
});

console.log(summary, metrics); // steps, toolCalls, toolErrors, tokensIn/Out, recoveries, loopHits

See candidate actions without running them, then run exactly one — by ref, with no extra LLM call.

import { BrowserDriver, observe, act } from "navia-ai";

const driver = await BrowserDriver.create({ engine: "chromium" });
await driver.navigate("https://example.com");

const actions = await observe({ instruction: "the 'More information' link", driver });
await act(actions[0], { driver });          // deterministic, no LLM
// or one-shot:  await act("click 'More information'", { driver });

🔌 As an MCP server (Claude Desktop / Code / Cursor)

Navia exposes its browser tools as an MCP server — the client's model drives them (CDP snapshot, stable refs, captcha detection, profiles, vault).

Claude Code:

claude mcp add navia -- npx -y navia-ai mcp --browser chromium

Claude Desktop / Cursor (JSON):

{ "mcpServers": { "navia": { "command": "npx", "args": ["-y", "navia-ai", "mcp", "--browser", "chromium"] } } }

🔐 Secure credential elicitation: if a task needs a vault secret that isn't stored, the server asks you through your client's secure prompt (MCP elicitation) and saves it encrypted — never through the model.

🧭 Browser engines

| Engine | When to use it | |---|---| | chromium (default) | Most sites. | | firefox | Alternative; some portals behave better. | | chrome (CDP) | 🔑 Cloudflare-walled sites. Launches your real Chrome and connects via CDP. | | patchright | 🥷 Anti-detection without pre-opening Chrome (removes the Runtime.enable leak). Opt-in: npm i patchright. |

⚠️ Responsible use

Navia drives a real browser with your credentials and session. Use it only on sites and accounts you own or are authorized to access, respecting their Terms of Service. The CDP mode does not forcibly bypass protections — it uses your real browser. Navia bundles no third-party (paid) captcha-solving services; the local OCR is a dedicated tool for your own authorized login, and interactive/behavioral captchas + 2FA are always handed to you.

npm · Issues · MIT License

Made with ❤️ for people tired of doing the same portal busywork every day.