site-agent-pro
v1.0.12
Published
AI-powered browser agent that tests websites like a real user and produces evidence-based, scored reports.
Readme
Site Agent Pro
AI-powered browser agent that executes real user tasks on any website, captures step-by-step evidence, and produces scored, actionable reports.
Playwright · OpenAI / Ollama · axe-core · TypeScript · Zod
How It Works
User submits URL + tasks
│
▼
┌─────────────────────────────┐
│ Chromium launches │
│ (desktop 1440×900 │
│ or mobile 390×844) │
└──────────┬──────────────────┘
▼
┌─────────────────────────────┐
│ For each task: │
│ 1. Capture page state │
│ 2. LLM plans next action │
│ 3. Playwright executes it │
│ 4. Repeat until done │
└──────────┬──────────────────┘
▼
┌─────────────────────────────┐
│ Site checks: │
│ SEO · Performance · │
│ Security · Accessibility · │
│ Mobile · Content · CRO │
└──────────┬──────────────────┘
▼
┌─────────────────────────────┐
│ LLM evaluates the run │
│ → Scored report (1-10) │
│ → HTML / Markdown / JSON │
│ → Activity replay animation │
└─────────────────────────────┘Features
- Task-driven execution — the agent follows only the tasks you provide, nothing more
- Step-by-step evidence — every interaction, page state, relevant console signal, and network failure is logged
- Ordered instruction parsing — pasted instructions, bullet lists, JSON tasks, and uploaded text files are normalized into accepted task lanes
- Independent evaluation — the LLM scores from captured evidence, not from the agent's own impressions
- Multi-agent perspectives — run 1–5 agents with different personas on the same site, merged into one report
- Auth-aware — detects login walls mid-run, fills signup forms, polls IMAP for OTP/verification emails
- Supplemental audits — SEO crawl, security headers, performance timings, accessibility (axe-core), CRO signals, content readability, mobile layout
- Activity replay — compact animated WebP that overlays all recorded agent actions onto the captured click frames
- Exchange-flow QA — safely tests Naira/crypto buy and sell flows with harmless values and stops before real transfers
- Paystack Integration — provision dedicated virtual Naira accounts for agents and execute autonomous bank transfers during tasks
- Dual LLM support — OpenAI (GPT-5) for production, Ollama for local/private development
- Two deployment modes — CLI or web dashboard, including Render web service deployment
Quick Start
Site Agent Pro is a zero-config AI visitor. Install it, initialize your identity, and run your first audit in seconds.
1. Install & Initialize
# Install globally
npm install -g site-agent-pro
# Set up your identity (OpenAI Key, Wallet, etc.)
site-agent-pro init2. Run an Audit
The agent handles everything else. It automatically uses your existing Google Chrome or Edge browser (saving disk space and time). If no system browser is found, it safely self-heals by downloading its own.
# Run a quick audit
site-agent-pro --url https://google.com --task "Check the homepage"
# OR launch the web UI to enter tasks visually
site-agent-pro ui3. View Results
Artifacts are saved to data/runs/<run-id>/ (or your current directory if using the local .env):
4. View Results
Artifacts are saved to runs/<run-id>/:
| File | Contents |
|---|---|
| report.html | Standalone shareable report |
| report.json | Machine-readable scored report |
| report.md | Markdown report |
| task-results.json | Per-task step history and outcomes |
| raw-events.json | Every browser event, console log, and network request |
| accessibility.json | axe-core violation list |
| site-checks.json | SEO, performance, security, CRO, content, mobile checks |
| click-replay.webp | Compact animated activity replay with click screenshots and overlays for all recorded actions |
| *.webm | Full browser session video recording (enabled via RECORD_VIDEO=true in .env), now featuring a red mouse cursor tracking the agent's exact movements |
| inputs.json | Run configuration and timing metadata |
CLI Reference
# Basic single-task run
npm run dev -- --url https://example.com --task "Click the pricing tab"
# Multiple tasks
npm run dev -- --url https://example.com \
--task "Read the visible how-to-play section" \
--task "Play the game five times and record each win or loss"
# Mobile viewport (iPhone 13, 390×844)
npm run dev -- --url https://example.com --task "Check the mobile nav" --mobile
# Headed mode (visible browser)
npm run dev -- --url https://example.com --task "Open pricing" --headed
# Ollama for local sites
npm run dev -- --url http://127.0.0.1:3000 --task "Check the homepage CTA" \
--llm-provider ollama --model llama3.1:8b
# Allow self-signed HTTPS certificates
npm run dev -- --url https://localhost:3000 --task "Check the homepage" --ignore-https-errors
# Exchange-flow QA without real transfers
npm run dev -- --url https://example.com \
--task "Click Buy; enter 50000 NGN; confirm the crypto preview updates; copy the account number if available; stop before making any real payment" \
--task "Click Sell; enter 0.01 USDT; confirm the Naira payout preview updates; stop before sending any real crypto"
# Deterministic onchain validation in dry-run mode
npm run dev -- --url https://example-dapp.test \
--task "Sell 0.01 USDC using the visible deposit address" \
--trade-dry-run --trade-strategy deposit_only
# Autonomous Paystack transfer during a task
npm run dev -- --url https://example-shop.com \
--task "Pay 100 Naira to the bank account shown on the checkout page" \
--trade-enabledNote: Every CLI run requires at least one
--taskflag. Runs with no tasks are rejected.
All CLI Options
| Flag | Description |
|---|---|
| --url <url> | (Required) Website URL to test |
| --task <task> | (Required) Task for the agent. Repeat for multiple tasks |
| --headed | Run browser in headed (visible) mode |
| --mobile | Use iPhone 13 mobile viewport |
| --ignore-https-errors | Allow invalid or self-signed HTTPS certificates |
| --llm-provider <name> | LLM provider: openai or ollama |
| --model <name> | Override the model name |
| --ollama-base-url <url> | Override the Ollama endpoint |
| --storage-state <path> | Load Playwright storage state JSON before the run |
| --save-storage-state <path> | Save Playwright storage state JSON after the run |
| --auth-flow | Bootstrap a test account (signup/login/OTP), then run tasks |
| --auth-only | Bootstrap a test account and save session — skip task run |
| --signup-url <url> | Signup page URL (absolute or relative) |
| --login-url <url> | Login page URL (absolute or relative) |
| --access-url <url> | Protected page URL to verify after login |
| --trade-enabled | Allow deterministic onchain trade execution for this run |
| --trade-dry-run | Validate extracted trade details without broadcasting a transaction |
| --trade-strategy <strategy> | Trade strategy: auto, dapp_only, or deposit_only |
| --trade-confirmations <count> | Confirmations to wait for before marking a trade confirmed, 0–12 |
Programmatic API
Install site-agent-pro as a devDependency in your own project — no cloning required:
npm install --save-dev site-agent-proThen run audits from any script, test file, or CI pipeline:
import { runAudit } from "site-agent-pro";
const result = await runAudit({
url: "http://localhost:3000", // works with localhost dev servers
tasks: [
"Open the pricing page and note the visible plans",
"Click the sign-up button and check the form fields",
],
});
console.log(`Score: ${result.report.overall_score}/10`);
console.log(`Summary: ${result.report.summary}`);
console.log(`Artifacts saved to: ${result.runDir}`);
// Gate your CI pipeline on a minimum quality score
if (result.report.overall_score < 7) {
process.exit(1);
}Or add site-agent-pro to your package.json scripts and run it from the CLI:
"scripts": {
"audit": "site-agent-pro --url http://localhost:3000 --task 'Check the homepage CTA'",
"audit:ui": "site-agent-pro ui",
"audit:mobile": "site-agent-pro --url http://localhost:3000 --task 'Check mobile nav' --mobile",
"audit:auth": "site-agent-pro --url http://localhost:3000 --task 'Reach the dashboard' --auth-flow"
}npm run dev & # your app
npm run audit # site-agent-pro hits it liveFor full control, import the lower-level API directly:
import { runAuditJob, buildCustomTaskSuite, normalizeCustomTasks } from "site-agent-pro";Note: site-agent-pro automatically uses your system's Google Chrome or Chromium. You do not need to manually install Playwright browsers.
Web Dashboard
npm run dashboard| URL | Purpose |
|---|---|
| http://localhost:4173/ | Public submission form — enter URL, paste instructions, and select the number of agents (1-5) |
| http://localhost:4173/dashboard | Internal run dashboard — inspect all results |
| /submissions/<id> | Submission progress tracking |
| /r/<token> | Public shareable report link (valid 30 days) |
| /outputs/<run-id> | Standalone HTML report for any run |
| /api/runs | REST API — list all runs |
| /api/runs/<id> | REST API — run detail |
The dashboard supports:
- Instruction paste box plus optional
.txt,.md,.json, or.csvupload - Public-hosted or localhost/private target mode for local development
- 1–5 concurrent agent perspectives per submission
- Per-submission trade controls for enablement, dry-run, strategy, and confirmation count
- Aggregate and per-agent report inspection
- Artifact downloads (JSON, Markdown, HTML, WebP activity replay)
- Strengths, weaknesses, and top fix recommendations
Authentication
The agent can bootstrap authenticated sessions for sites that require signup/login.
Quick Example
npm run dev -- --url https://example.com \
--task "Reach the account dashboard and confirm billing is visible" \
--auth-flow --signup-url /register --login-url /login --access-url /appWhat Auth Bootstrap Does
- Fills visible signup fields with your configured test identity
- Polls your IMAP inbox for OTP or verification emails
- Submits the OTP code or opens the verification link
- Logs in with the same credentials
- Verifies a protected page (if
--access-urlis provided) - Saves the authenticated session to
.auth/session.json
Successful auth also caches the working identity in .auth/credentials.json, keyed by target origin, so later runs against the same site can reuse the saved username or email plus password.
Auth walls detected mid-task are handled automatically when auth credentials are configured or a working identity has already been cached for that target origin.
Auth-Only Mode
Save an authenticated session without running tasks:
npm run dev -- --url https://example.com --auth-only \
--signup-url /register --login-url /login --access-url /dashboardSession Reuse
For sites behind verified sessions, load a saved Playwright storage state:
# Via CLI flag
npm run dev -- --url https://example.com --task "Reach the dashboard" \
--storage-state .auth/session.json
# Via .env (auto-loaded on every run)
PLAYWRIGHT_STORAGE_STATE_PATH=.auth/session.jsonImportant: This does not bypass CAPTCHA, MFA, or anti-bot controls. It reuses legitimate sessions you've established.
Auth Environment Variables
Required for a fresh auth bootstrap (not needed when the target origin already has cached credentials):
| Variable | Description |
|---|---|
| AUTH_TEST_EMAIL | Email address for signup/login |
| AUTH_TEST_PASSWORD | Password for signup/login |
Optional login field:
AUTH_TEST_USERNAME
IMAP inbox (for OTP/verification email polling):
| Variable | Default | Description |
|---|---|---|
| AUTH_IMAP_HOST | — | IMAP server hostname |
| AUTH_IMAP_PORT | 993 | IMAP port |
| AUTH_IMAP_SECURE | true | Use TLS |
| AUTH_IMAP_USER | — | IMAP username |
| AUTH_IMAP_PASSWORD | — | IMAP password |
| AUTH_IMAP_MAILBOX | INBOX | Mailbox to poll |
Optional identity fields:
AUTH_TEST_FIRST_NAME · AUTH_TEST_LAST_NAME · AUTH_TEST_PHONE · AUTH_TEST_ADDRESS_LINE1 · AUTH_TEST_ADDRESS_LINE2 · AUTH_TEST_CITY · AUTH_TEST_STATE · AUTH_TEST_POSTAL_CODE · AUTH_TEST_COUNTRY · AUTH_TEST_COMPANY
Optional tuning:
| Variable | Default | Description |
|---|---|---|
| AUTH_EMAIL_POLL_TIMEOUT_MS | 180000 | Max wait time for verification email |
| AUTH_EMAIL_POLL_INTERVAL_MS | 5000 | Poll frequency |
| AUTH_OTP_LENGTH | 6 | Expected OTP digit count |
| AUTH_EMAIL_FROM_FILTER | — | Filter emails by sender |
| AUTH_EMAIL_SUBJECT_FILTER | — | Filter emails by subject |
| AUTH_GENERATED_IDENTITY_MAX_ATTEMPTS | 5 | Signup retry count when a generated identity is rejected |
| AUTH_EMAIL_DOMAIN | — | Override the generated plus-address domain |
| AUTH_SIGNUP_URL | — | Default signup URL (instead of CLI flag) |
| AUTH_LOGIN_URL | — | Default login URL |
| AUTH_ACCESS_URL | — | Default protected page URL |
| AUTH_SESSION_STATE_PATH | .auth/session.json | Where to save the session |
Web3 Wallet Integration
The agent has built-in support for interacting with Web3 dApps. It uses a dual-mode architecture:
- Programmatic Provider (Default): Injects a secure, headless-compatible
window.ethereumprovider. Transaction signing requests are intercepted and sent to a local HTTP relay running securely inside the Node.js process. The private key never enters the browser. - MetaMask Extension Mode (Optional): Runs a full headed browser with the MetaMask extension loaded, and automatically clicks "Connect", "Confirm", or "Sign" in MetaMask popups.
Quick Setup
Configure your wallet in .env:
# Required
WALLET_PRIVATE_KEY=your_private_key_here
WALLET_RPC_URL=https://eth-sepolia.g.alchemy.com/v2/...
WALLET_CHAIN_ID=11155111
# Optional: Mnemonic instead of private key
# WALLET_MNEMONIC="word1 word2 ..."Once configured, the agent will automatically inject the wallet into every page it visits. The LLM planner is also aware of its wallet address and can interact with "Connect Wallet" flows.
Deterministic Trade Execution
Trade execution is off by default unless TRADE_ENABLED=true or a run explicitly passes --trade-enabled or --trade-dry-run. When enabled, the agent only attempts a deterministic EVM sell/deposit handoff when the visible page and task provide enough evidence for recipient address, token, chain, and amount. Dry-run mode validates the extracted instruction and writes trade-executions.json without broadcasting.
Useful controls:
# Validate only; do not broadcast
--trade-dry-run
# Broadcast only if validation passes
--trade-enabled
# Choose how to handle the visible trade path
--trade-strategy auto|dapp_only|deposit_onlyFor a standalone trade instruction JSON file, use:
npm run trade:run -- --instruction ./sell-instruction.json --strategy deposit_only
# Add --broadcast only when you are ready to send the transaction
npm run trade:run -- --instruction ./sell-instruction.json --broadcastUsing MetaMask Extension Mode
If you specifically need to test how a dApp interacts with the MetaMask UI, you can run the agent with the extension loaded. This requires extracting the MetaMask extension folder (not a .crx file).
How to get the Extension Path (Mac):
- Ensure MetaMask is installed in your normal Google Chrome browser.
- The extracted path is typically located at:
/Users/<YourUsername>/Library/Application Support/Google/Chrome/Default/Extensions/nkbihfbeogaeaoehlefnkodbefgpgknn/<version_number> - Set the environment variable in
.env:WALLET_METAMASK_EXTENSION_PATH=/Users/YourUsername/Library/.../11.14.2_0 WALLET_METAMASK_USER_DATA_DIR=/Users/YourUsername/.site-agent-metamask-profile
Note: Using this mode forces the agent to run in headed (visible) mode. For real MetaMask signing/confirm popups, point WALLET_METAMASK_USER_DATA_DIR at a persistent Chromium profile where MetaMask is already set up and unlocked.
Paystack Integration
The agent has built-in support for the Paystack API (Nigeria) to handle Naira payments and payouts. This enables "Agent-as-a-Service" monetization flows.
Features
- Dedicated Virtual Accounts (DVA): Provisions or reuses a real Nigerian bank account number (Wema/GTB) for the agent to receive payments.
- Naira Transfers: Initiates outbound transfers to any Nigerian bank account via the Transfers API.
- Webhook Processing: Securely handles
charge.successandtransfer.successevents with HMAC-SHA512 verification. - Zero-Dependency Client: Uses Node 20+ native
fetch(noaxiosrequired).
Quick Setup
Configure Paystack in .env:
PAYSTACK_SECRET_KEY=sk_live_...
PAYSTACK_PUBLIC_KEY=pk_live_...
PAYSTACK_DVA_PROVIDER=wema-bank
[email protected] # Email of the Paystack customer that holds the DVA
PAYSTACK_AGENT_FIRST_NAME=Site
PAYSTACK_AGENT_LAST_NAME=Agent
PAYSTACK_AGENT_PHONE=+234... # Required for Live DVA creation
PAYSTACK_TRANSFER_ENABLED=false # Set to true only when ready for real transfersReusing an Existing DVA (Recommended for Live Keys)
On startup, getAgentAccount() looks up the customer by PAYSTACK_AGENT_EMAIL first. If that customer already has a DVA on your Paystack account, it is returned immediately — no new DVA is created. This means you can point the agent at any existing Paystack customer and reuse their bank account number without provisioning anything new.
To find your existing DVAs:
source .env && curl -s \
-H "Authorization: Bearer $PAYSTACK_SECRET_KEY" \
"https://api.paystack.co/dedicated_account" | python3 -m json.toolThen set PAYSTACK_AGENT_EMAIL to the email of whichever customer owns the DVA you want to use.
Settlement Timing & Balance
Important: DVA receipts (money sent to the virtual account number) are not immediately spendable. Paystack settles DVA payments to your Paystack balance on a T+1 business day schedule.
Two funding strategies:
| Strategy | Receiving Account | Spend Delay | |---|---|---| | DVA receipts | Share the DVA account number | T+1 business day | | Pre-fund balance | Fund via Paystack Dashboard → Fund Balance | Available immediately |
The agent's outbound transfers always draw from the Paystack balance (source: 'balance'), not the DVA directly. If you pre-fund the balance via the dashboard, the agent can spend immediately without waiting for settlement.
To check your current spendable balance:
source .env && curl -s \
-H "Authorization: Bearer $PAYSTACK_SECRET_KEY" \
"https://api.paystack.co/balance" | python3 -m json.toolAutonomous Transfers & Verification
The agent can autonomously fulfill payment requests and verify incoming funds or tokens using its direct API/on-chain access:
- Sending Naira: When it encounters a task like "Pay the vendor ₦500," it extracts the bank details from the page and initiates a transfer using the
payaction with formatamount:bankCode:accountNumber(e.g.5000:058:0123456789). - Verifying Naira: If tasked to "wait for payment before releasing tokens," the agent monitors its own Paystack transaction history. It will only proceed to click "Release" or "Confirm" once it sees the matching successful transaction in its account.
- Verifying Tokens: If tasked to "confirm tokens have arrived before paying Naira," the agent monitors its actual on-chain wallet balance via its RPC connection. It ignores potentially fake website UIs and only pays once the blockchain confirms the receipt of funds.
Safety: Transfers are only broadcast if PAYSTACK_TRANSFER_ENABLED=true. Otherwise, it performs a dry-run validation.
Testing the Integration
Run the standalone smoke test to verify your API keys and DVA connectivity:
npm run paystack:testNote on live keys: The smoke test's step 5 (transfer) uses a placeholder account number. On live keys, Paystack validates that the account exists at the specified bank — the test will fail at step 5 unless you replace
0123456789with a real account. Steps 1–4 (environment, DVA, bank list, bank code resolution) are the meaningful validation checks.
Configuration
All settings are read from environment variables (.env file).
Core Settings
| Variable | Default | Description |
|---|---|---|
| LLM_PROVIDER | openai | LLM backend: openai or ollama |
| OPENAI_API_KEY | — | (Required for OpenAI) API key |
| OPENAI_MODEL | gpt-5 | Model for planning and evaluation |
| OLLAMA_BASE_URL | http://127.0.0.1:11434 | Ollama server URL |
| OLLAMA_MODEL | llama3.1:8b | Ollama model name |
Execution Limits
| Variable | Default | Description |
|---|---|---|
| MAX_SESSION_DURATION_MS | 600000 | Total run time cap (clamped 60s–600s) |
| MAX_STEPS_PER_TASK | 32 | Max actions per task |
| ACTION_DELAY_MS | 600 | Delay between actions (human-like pacing) |
| NAVIGATION_TIMEOUT_MS | 25000 | Page load timeout |
| RECORD_VIDEO | false | Record Playwright video into the run directory |
Browser
| Variable | Default | Description |
|---|---|---|
| HEADLESS | true | Set false for headed mode |
| PLAYWRIGHT_STORAGE_STATE_PATH | — | Auto-load session state JSON |
| PLAYWRIGHT_EXECUTABLE_PATH | — | Custom Chromium binary path |
| USE_SERVERLESS_CHROMIUM | — | Force @sparticuz/chromium |
| SPARTICUZ_CHROMIUM_LOCATION | — | Chromium binary location hint |
Trade Policy
| Variable | Default | Description |
|---|---|---|
| TRADE_ENABLED | false | Enable deterministic trade execution by default |
| TRADE_ALLOWLISTED_CHAIN_IDS | — | Comma-separated chain IDs allowed for trade execution |
| TRADE_TOKEN_REGISTRY | [] | JSON array of { chainId, symbol, assetKind, contract?, decimals } entries |
| TRADE_MAX_TOKEN_AMOUNT | — | Maximum token amount allowed by policy |
| TRADE_REQUIRE_EXACT_TOKEN_CONTRACT | true | Require ERC-20 contract matches when validating trades |
| TRADE_CONFIRMATIONS_REQUIRED | 1 | Default confirmations to wait for, 0–12 |
| TRADE_RECEIPT_TIMEOUT_MS | 120000 | Max wait time for transaction receipt/confirmation |
Dashboard & Deployment
| Variable | Default | Description |
|---|---|---|
| APP_BASE_URL | — | Production URL for public report links. On Render, RENDER_EXTERNAL_URL is used automatically when this is unset. |
| SITE_AGENT_DATA_DIR | — | Root directory for persisted runs and submissions. Set this to your Render disk mount path for durable storage. |
| PORT | 10000 on Render | Public HTTP port for Render web services |
| DASHBOARD_PORT | 4173 | Dashboard server port |
| DASHBOARD_HOST | 127.0.0.1 locally, 0.0.0.0 on Render | Dashboard server host |
| REPORT_TTL_DAYS | 30 | Public report link expiry |
| INTERNAL_JOB_SECRET | — | Restrict background job invocation |
Writing Effective Tasks
Since the agent follows only your tasks, structure them as focused coverage lanes:
# Map the main journey
--task "Navigate to pricing and compare the monthly vs yearly plans"
# Inspect discovery paths
--task "Use the site search to find 'refund policy' and read the visible result"
# Follow the conversion path
--task "Click the Sign Up Free tab, fill every visible detail, and submit"
# Probe edge cases
--task "Enter an invalid email in the signup form and check the error message"
# Safely test an exchange flow
--task "Click Buy; enter 50000 NGN; confirm the crypto preview updates; provide a harmless test wallet address; verify the payment account card appears; stop before making any real payment"
# Ask for monitoring evidence
--task "Check exchange-flow monitoring evidence for amount entry, wallet submission, bank submission, displayed account details, copy actions, and transfer attempts"
# Autonomous fulfillment
--task "Buy the token; pay ₦500 to the Zenith Bank account shown on the confirmation screen"Tips for better results:
- Write specific, concrete actions — not "explore the site"
- Use ordered verbs like click, enter, copy, scroll, wait, go back, and stop when the sequence matters
- Include literal values when needed, for example
enter 50000 NGNortype "[email protected]" into email - Split large journeys into separate tasks so early clicks don't consume the entire budget
- Paste multi-line instructions or upload text/JSON files in the dashboard when tasks come from a spec
- A combined Naira/crypto exchange spec that mentions Buy flow, Sell flow, Naira, crypto, and logging/monitoring/events is expanded into separate Buy, Sell, and monitoring tasks
- For slow sites, increase
NAVIGATION_TIMEOUT_MSbefore increasing step counts - Use
--storage-statefor pages behind authentication - Run multiple agent perspectives (2-5) when you want broader coverage
- For game sites, be explicit: "Play 5 rounds and record each win or loss"
- For exchange/payment QA, use harmless test values and explicitly tell the agent to stop before any real payment, crypto transfer, purchase, or payout
Render Deployment
This repo now targets a standard Render web service deployment:
- Dashboard server (
src/dashboard/server.ts) — handles the app, submission routes, public reports, and dashboard APIs - Local filesystem persistence — submissions and run artifacts are stored under
SITE_AGENT_DATA_DIR - Render Blueprint (
render.yaml) — defines the Render web service, health check, and persistent disk mount - Full Playwright runtime — the build installs Chromium for the dashboard worker process
Included render.yaml
The repo root includes a Render Blueprint with:
runtime: nodebuildCommand: npm ci && npm run build &&startCommand: npm run render:starthealthCheckPath: /health- a persistent disk mounted at
/opt/render/project/src/data
Required Environment Variables
| Variable | Description |
|---|---|
| OPENAI_API_KEY | OpenAI API key when LLM_PROVIDER=openai |
Recommended Environment Variables
| Variable | Description |
|---|---|
| LLM_PROVIDER | Use openai for a single-service Render deployment unless you are also hosting Ollama separately |
| APP_BASE_URL | Optional. If unset on Render, the app falls back to RENDER_EXTERNAL_URL |
| SITE_AGENT_DATA_DIR | Override only if you change the disk mount path from the default in render.yaml |
| INTERNAL_JOB_SECRET | Optional hardening for internal job-style routes |
Note: Render web services must bind to
0.0.0.0:$PORT, and persistent filesystem data survives deploys only when it is written under the attached disk mount path. See the official Render docs for web services, persistent disks, and the Blueprint spec.
Architecture
For a detailed technical breakdown of every module, see ARCHITECTURE.md.
High-level summary:
| Layer | Key Files | Purpose |
|---|---|---|
| Entry points | cli/run.ts, dashboard/server.ts | CLI and web UI |
| Orchestration | runAuditJob.ts, processSubmissionBatch.ts | Single-run and multi-agent execution |
| Agent loop | runner.ts → planner.ts → executor.ts | Capture state → LLM plans → Playwright acts |
| Page understanding | pageState.ts, siteBrief.ts, taskDirectives.ts, submissions/customTasks.ts | DOM snapshots, site comprehension, ordered instruction and upload parsing |
| Authentication | auth/profile.ts, auth/inbox.ts, auth/runner.ts | Identity management, IMAP OTP polling, login flows |
| Evaluation | evaluator.ts, aggregateReport.ts | LLM scoring, multi-agent result merging |
| Site checks | siteChecks.ts, audit.ts | SEO, performance, security, accessibility |
| Paystack | paystack/* | Dedicated virtual accounts, Naira transfers, webhooks |
| Reporting | reporting/html.ts, reporting/markdown.ts, clickReplay.ts | HTML/MD/JSON reports, activity replay animation |
| Trade safety | trade/*, wallet/* | Wallet injection, deterministic trade extraction, policy validation, dry-run/broadcast records |
| LLM | llm/client.ts, prompts/browserAgent.ts, prompts/reviewer.ts | OpenAI + Ollama client, system prompts |
Important Constraints
- No CAPTCHA/MFA bypass — the agent does not solve CAPTCHAs, MFA challenges, or anti-bot controls
- No hidden DOM access — the agent interacts only with visible elements, like a real user
- No unsupported claims — the evaluator scores from evidence only, not from the agent's impressions
- Task-required — every run must have at least one explicit task
- Legitimate sessions only — storage state reuse is for approved, pre-established sessions
- Trade-safe by default — onchain execution is disabled unless explicitly enabled, and exchange-flow QA should stop before real-world transfers
Step-by-Step Guides
| Guide | Topic |
|---|---|
| docs/01-installation.md | Installation and setup |
| docs/02-running-your-first-audit.md | Your first run |
| docs/03-configuration.md | Configuration deep-dive |
| docs/04-how-the-agent-thinks.md | Agent planning internals |
| docs/05-extending-personas-and-tasks.md | Custom personas and tasks |
| docs/06-hardening-for-production.md | Production deployment |
Recommended Rollout
- Start local — run manually on desktop, inspect logs and reports
- Tune tasks — write focused coverage lanes for your product
- Add mobile — include
--mobileruns - Multi-agent — use 2-5 perspectives for broader coverage
- CI integration — only after you've validated the scores match your expectations
Treat the scores as signals, not ground truth until you've calibrated them against your own quality bar.
