ultimatrix
v1.0.2
Published
AI-powered security team-in-a-box. Multi-agent security testing platform with AutoGen-style orchestration.
Downloads
129
Maintainers
Readme
Ultimatrix
AI-Powered Autonomous Security Operator — Single-agent loop that explores, maps, analyses, and attacks web applications. The LLM crafts every payload dynamically based on response analysis. No canned payload lists, no fixed pipeline.
⚠️ Under active development. Not yet published. API and behavior may change without notice.
Quick Start
# Install
npm install
# Interactive setup (writes ultimatrix.yaml)
npx tsx src/cli/index.ts init
# Full assessment — auto-explore then attack
npx tsx src/cli/index.ts assess -t https://your-app.com -o ./outputModes
assess — Full assessment (primary entry point)
Two-phase operation:
Phase 1 — Automated Exploration (pre-map): BFS crawl with Playwright. Intercepts all HTTP traffic via page.route(). Captures DOM before/after every click and form submit. Correlates network requests with DOM transitions to build the workflow graph automatically.
Phase 2 — LLM Attack: The agent starts with a pre-mapped workflow graph (nodes, edges, endpoints, forms, auth boundaries, classified parameters). It reads the map and attacks known endpoints with crafted payloads — no blind re-exploration needed.
# Basic
ultimatrix assess -t https://target.com -o ./output
# With automated exploration (default depth=2)
ultimatrix assess -t https://target.com -o ./output --depth 3
# Skip exploration for fast re-scans
ultimatrix assess -t https://target.com -o ./output --skip-explore
# Pre-populate from existing artifacts
ultimatrix assess -t https://target.com -o ./output \
--with-openapi ./api-spec.yaml \
--with-har ./session.har \
--with-postman ./collection.json \
--with-src ./src
# Live dashboard
ultimatrix assess -t https://target.com -o ./output --dashboardscan — Autonomous pentest
Legacy entry point, checks for existing app-model.json in output dir and passes it to the agent.
ultimatrix scan -t https://target.com -o ./outputverify — Re-run findings against new deployment
Replays exact same payloads against a fresh target, classifies each finding as fixed/regressed/unchanged.
ultimatrix verify -a ./output/app-model.json -t https://new-deployment.cominteract — Live REPL chat loop with manual recording
Start a chat session with the LLM agent. Use /record start to toggle manual browser interaction — the visible Playwright window opens so you can click, type, and navigate directly. Type /record stop when done to save captured steps to the app model for later replay.
ultimatrix interact -t https://target.cominit — Interactive config wizard
ultimatrix initAutomated Exploration (Pre-Map Phase)
When --depth N (default 2) is specified during assess, Ultimatrix performs a fully automated crawl before the LLM gets involved:
- Network interception: Every request/response is captured via Playwright
page.route()— method, URL, status, headers, bodies, timing. - DOM snapshots: Before and after every interaction (click, form fill, submit), the full page state is captured: URL, title, forms, interactive elements, text content.
- Hash-based diffing: DOM hashes are compared to detect real state changes. If the DOM didn't change, the interaction is skipped.
- Context-aware form filling: Fields are auto-filled with realistic test data based on field name, type, and placeholder (e.g.,
email→[email protected],price→100). - BFS queue: New URLs discovered during crawl are added to a breadth-first queue with depth tracking.
The result is a complete workflow graph (nodes + edges) + discovered endpoints + forms + auth boundaries + parameter classifications — all pre-populated in app-model.json before the LLM sees it.
App Model (Knowledge Graph)
The agent's persistent memory is a structured JSON file (app-model.json) with 18 sections:
| Section | Purpose |
|---------|---------|
| target | Target URL |
| techStack | Detected technologies |
| auth | Auth type, login endpoint, cookies, tokens, sessions |
| workflow | Nodes (pages/APIs) + Edges (transitions) — the state machine |
| endpoints | Known API routes with params, methods, response patterns |
| forms | Form inputs found on each page |
| scripts | External JS loaded on pages |
| cookies | Active cookies |
| localStorage | LocalStorage values |
| findings | Vulnerabilities found with structured evidence |
| verifications | Re-run results from verify command |
| parameterClassifications | What each parameter is FOR (id, email, price, etc.) |
| authBoundaries | Which URLs require auth, proven by request comparison |
| recordedSessions | Named macros (login flows, multi-step workflows) |
| hypotheses | Things to test next |
| nextSteps | Ordered action plan |
| visitedUrls | URLs already visited |
Tools (45 total)
All tools are payload-in, response-out — the LLM crafts every payload dynamically.
Browser Tools (21)
| Tool | Purpose |
|------|---------|
| navigate | Navigate browser to URL |
| click | Click element on page |
| fill | Fill form field |
| press_key | Send keyboard events |
| screenshot | Capture page screenshot |
| extract | Extract text, HTML, or links |
| evaluate | Execute JS in browser |
| get_forms | Get all forms with fields |
| get_cookies | Get active cookies |
| get_scripts | Get external scripts |
| get_storage | Get localStorage |
| close | Close browser session |
| get_page_info | Get URL, title, readyState, text length, link/form count |
| inject_cookie | Set cookies in browser context |
| macro_record_start | Start recording browser actions (LLM-driven) |
| macro_record_stop | Stop recording, get steps |
| browser_get_recording | View recording without stopping |
| browser_replay_macro | Replay saved macro steps |
| macro_list | List saved macros |
| manual_record_start | Start recording DIRECT human browser interactions (visible window) |
| manual_record_stop | Stop manual recording, return captured steps |
Session Recording & Trace (8)
| Tool | Purpose |
|------|---------|
| browser_start_trace | Start network trace |
| browser_stop_trace | Stop trace, get entries |
| browser_get_trace | View trace entries |
| create_browser_session | Create isolated browser context (multi-role) |
| list_browser_sessions | List all sessions |
| save_storage_state | Save cookies + localStorage to JSON file |
| load_storage_state | Restore cookies + localStorage from JSON file |
Network (3)
| Tool | Purpose |
|------|---------|
| http_request | Send arbitrary HTTP request |
| port_scan | Scan for open ports |
| header_analyze | Analyze HTTP security headers |
Exploit (2)
| Tool | Purpose |
|------|---------|
| sql_inject | Send SQL injection payload (LLM-crafted) |
| xss_inject | Send XSS payload (LLM-crafted) |
Recon (5)
| Tool | Purpose |
|------|---------|
| auth_probe | Compare response with/without cookies |
| subdomain_enum | Passive subdomain enumeration |
| dir_bruteforce | Discover hidden directories |
| jwt_parse | Decode JWT tokens |
| graphql_introspect | Query GraphQL introspection |
Knowledge (3)
| Tool | Purpose |
|------|---------|
| calculate_risk | Get risk score from findings |
| render_workflow_graph | See workflow graph as Mermaid diagram |
| classify_parameter | Save parameter purpose classification |
App Model (2)
| Tool | Purpose |
|------|---------|
| read_app_model | Read a section of the app model |
| update_app_model | Write findings/hypotheses/nodes to the app model |
Architecture
assess → [Pre-map Phase] → [Agent Phase] → [Report]
│ │
▼ ▼
BFS crawler LLM single-agent loop
Network interception explore→analyze→attack
DOM diffing 50 tool-call budget
Auto form filling Dashboard events
Workflow graph build Auto-report compilation- Single-agent loop: THREAT_MODEL_PROMPT drives explore→analyze→attack→re-analyze. No sub-agents. Agent reads/writes
app-model.jsonviaread_app_model/update_app_modeltools. - 47 tools total: browser (21), session/recording/trace (8), network (3), exploit (2), recon (5), knowledge (3), app-model (2), utility (3).
- 11 LLM providers via @langchain: OpenAI, Azure, Anthropic, Bedrock, Gemini, Groq, Together, Mistral, NIM, OpenRouter, Mock.
- BrowserSession: Persistent Playwright sessions with
fill()contenteditable/JS fallback,pressKey(), extraction,addCookie(),hasSession(),saveStorageState()/loadStorageState(), recording + replay + manual recording, network trace. - AppModel type (18 sections): workflow graph, recorded sessions, parameter classifications, auth boundaries, structured evidence, risk scoring, report compilation.
- Auto-report:
compileReport()generates HTML, JSON, or Markdown from app model findings — even if the LLM never callswrite_file. Always written after agent completes. - Dashboard: Optional WebSocket + HTML server (
--dashboardflag). Streams real-time tool calls, risk changes, status, and errors. - Verification:
verifycommand re-runs findings against a fresh deployment, classifies each as fixed/regressed/unchanged/unknown.
Project Structure
src/
├── cli/ — CLI commands (assess, scan, verify, interact, init)
├── core/ — AppModel types, BrowserSessionManager, fix-todos middleware
├── providers/ — 11 LLM provider factories
├── tools/ — 47 tools + tool-registry
├── pipeline/ — AutonomousOrchestrator + THREAT_MODEL_PROMPT
├── explorer/ — Pre-map phase (network-recorder, dom-observer, crawler, workflow-builder)
├── dashboard/ — WebSocket + HTML live dashboard
├── ingestion/ — OpenAPI/HAR/Postman/source-code parsers
├── verification/ — Re-run findings against new deployment
├── flow/ — Trace-to-HAR converter
└── agents/ — Agent registryTesting
npx vitest run
# 327 tests, 22 files, 0 failures
npx tsc --noEmit # 0 type errors
npm run build # 0 warningsLicense
MIT
