@equinor/fusion-framework-cli-plugin-copilot
v2.0.2
Published
Copilot SDK powered evaluation plugin for Fusion Framework CLI
Downloads
178
Readme
@equinor/fusion-framework-cli-plugin-copilot
Agentic evaluation plugin for the Fusion Framework CLI. It uses the GitHub Copilot SDK to create a conversational session in which an LLM autonomously drives agent-browser to navigate, interact with, and judge a running Fusion application against eval criteria written in plain markdown.
When to use this
- You have a Fusion application (cookbook or production app) and want to verify it works end-to-end.
- You want an LLM to autonomously explore and validate acceptance criteria without writing Playwright scripts.
- You want structured, evidence-backed pass/fail verdicts with screenshots, snapshots, and computed style checks.
How it works
The plugin follows a single-session agentic workflow: the Copilot agent reads your eval markdown, decides what to check, drives the browser through tool calls, collects evidence, and produces a structured JSON verdict in one session.
flowchart LR
A[CLI] -->|starts| B[Dev Server]
A -->|reads| C[eval/*.md]
A -->|creates| D[Copilot Session]
D -->|system prompt +\neval markdown| E[LLM]
E <-->|tool calls| F[agent-browser]
F <-->|headless Chrome| B
E -->|JSON verdict| AEval lifecycle
sequenceDiagram
participant CLI as ffc copilot app eval
participant Server as Dev Server
participant SDK as Copilot SDK
participant LLM as LLM Agent
participant AB as agent-browser
participant Browser as Headless Chrome
CLI->>CLI: Resolve eval files from app/eval/*.md
CLI->>Server: Start dev server (ffc app serve)
CLI->>CLI: Wait for server ready
CLI->>AB: Reset daemon (kill stale sessions)
loop For each eval file
CLI->>CLI: Create timestamped run directory
CLI->>SDK: Create session with browser tools
SDK->>LLM: System prompt + eval markdown
loop Agent autonomously evaluates
LLM->>AB: browser_navigate(url)
AB->>Browser: Open page
LLM->>AB: browser_wait(networkidle)
LLM->>AB: browser_snapshot()
AB-->>LLM: Accessibility tree
LLM->>AB: browser_get_styles(selector)
AB-->>LLM: Computed CSS values
LLM->>AB: browser_screenshot()
AB-->>LLM: Image data
LLM->>AB: browser_errors()
AB-->>LLM: Console errors
end
LLM-->>CLI: JSON verdict {pass, reasoning, steps}
CLI->>CLI: Save verdict + evidence artifacts
end
CLI->>AB: Reset daemon
CLI->>Server: Stop dev server
CLI->>CLI: Exit 0 (all pass) or 1 (any fail)Quick start
# Evaluate all eval files in a cookbook
ffc copilot app eval ./cookbooks/app-react
# Or use the workspace-root shortcut
pnpm eval:app ./cookbooks/app-react
# Run a specific eval
ffc copilot app eval ./cookbooks/app-react --eval smoke
# Use a specific LLM model
ffc copilot app eval ./cookbooks/app-react --model claude-sonnet-4
# Point at an already-running server
ffc copilot app eval . --url http://localhost:3000/apps/my-appMSAL login (first-time setup)
If the application requires authentication, run the login flow once to persist MSAL tokens in the Chrome profile:
ffc copilot app eval ./cookbooks/app-react --login
# Alias
ffc copilot app eval ./cookbooks/app-react --logonThis opens a headed browser for interactive login. After you authenticate, press Ctrl+C. Subsequent eval runs reuse the saved session automatically.
Writing eval files
Eval files are plain markdown files placed in the app's eval/ directory. The plugin passes the raw markdown directly to the LLM as the user prompt. No special parsing is required.
my-app/
├── eval/
│ ├── smoke.md
│ └── accessibility.md
├── src/
└── package.jsonEval file format
Write eval files as you would a user story or test plan. The LLM reads the markdown and decides how to verify each criterion.
---
name: smoke-test
---
## User Story
As a user, I need to see the application load successfully so I know
the deployment is healthy.
## Acceptance Criteria
- must see a heading with the application name
- must not have any JavaScript errors in the console
- should load within 5 seconds
- should display navigation elementsBoth story-driven formats (user story + acceptance criteria) and instruction-driven formats (numbered steps + assertions) work. The LLM adapts to the structure you provide.
CLI reference
ffc copilot app eval <path> [options]| Option | Description | Default |
|--------|-------------|---------|
| <path> | Path to the Fusion application directory | required |
| --eval <name-or-path> | Run a specific eval by name or file path | All files in eval/ |
| --port <port> | Port for the app dev server | 3000 |
| --host <host> | Host for the app dev server | 0.0.0.0 |
| --url <url> | Skip server start and use an already-running URL | none |
| --verbose | Show agent-browser commands and app server output | false |
| --login, --logon | Open a headed browser for interactive MSAL login | false |
| -m, --model <model> | LLM model to use, for example claude-sonnet-4 | SDK default |
| -o, --output <dir> | Output directory for run artifacts | .tmp/copilot/ |
Run artifacts
Each eval run creates a timestamped directory containing evidence and metadata:
.tmp/copilot/smoke-test_143052/
├── copilot-log.jsonl # Tool call log (timestamp, tool name, arguments)
├── copilot-response.md # Full LLM response text
├── verdict.json # Structured pass/fail verdict
└── evidence/
├── snapshot.txt # Accessibility tree snapshot
├── errors.txt # JavaScript console errors
├── url.txt # Final page URL
├── screenshot-*.jpg # Visual evidence screenshots
├── styles-*.txt # Computed CSS style captures
└── eval-*.json # JavaScript evaluation resultsAvailable browser tools
The LLM can use 18 browser tools during an eval session:
| Tool | Description |
|------|-------------|
| browser_navigate | Open a URL in the browser |
| browser_snapshot | Capture an accessibility tree with element refs (@e1, @e2) |
| browser_screenshot | Take a screenshot and return the image to the model |
| browser_get_styles | Get computed CSS styles of an element |
| browser_eval | Evaluate a JavaScript expression in the page context |
| browser_click | Click an element by ref, CSS selector, or text |
| browser_fill | Fill a form field (clears existing content first) |
| browser_type | Type text character by character |
| browser_press_key | Press a keyboard key (Enter, Tab, Escape, etc.) |
| browser_hover | Hover over an element |
| browser_select | Select an option from a dropdown |
| browser_scroll | Scroll the page or scroll an element into view |
| browser_wait | Wait for load state, text, element, or timeout |
| browser_find | Semantic element lookup by role, text, or label |
| browser_errors | Get JavaScript console errors |
| browser_get_url | Get the current page URL |
| browser_go_back | Navigate back in browser history |
| browser_reload | Reload the current page |
Package structure
graph TD
subgraph "@equinor/fusion-framework-cli-plugin-copilot"
IDX[index.ts — plugin entry point]
CMD[commands/app/command.ts — CLI command tree]
EVAL[commands/app/eval.ts — Copilot session runner]
LOGIN[commands/app/login.ts — MSAL login flow]
RESOLVE[eval-resolve.ts — eval file discovery]
TOOLS[commands/app/tools/ — 18 browser tool wrappers]
UTILS[utils/ — agent-browser, daemon, server helpers]
end
IDX -->|registers| CMD
CMD -->|--login| LOGIN
CMD -->|resolves| RESOLVE
CMD -->|runs| EVAL
EVAL -->|creates| TOOLS
EVAL -->|uses| UTILS
LOGIN -->|uses| UTILS
TOOLS -->|calls| UTILSPrerequisites
- Node.js >= 20
- pnpm as the workspace package manager
- agent-browser installed globally or available on
PATH - GitHub Copilot access in VS Code, because the SDK authenticates through the GitHub Copilot extension
