@arkxronts/ariadne
v0.1.0
Published
Local CLI for coding-agent reliability and testing evals.
Readme
Ariadne
Ariadne is a local CLI for running coding-agent reliability evals. It executes task prompts against a configured agent command, captures traces, scores behavior, and writes JSON plus HTML reports.
Install
pnpm install
pnpm buildDeveloper preview installation
Developer preview installation requires pnpm 10.34.1:
git clone https://github.com/ArkXero/Ariadne.git ariadne
cd ariadne
pnpm install
pnpm build
pnpm link --global
ariadne --helpIf pnpm reports that its global bin directory is missing from PATH, run pnpm setup, restart the shell, and repeat the link command.
The linked ariadne command runs the built dist/cli.js. Rerun pnpm build after source edits.
Commands
pnpm ariadne --help
pnpm ariadne -h
pnpm ariadne init
pnpm ariadne doctor
pnpm ariadne run
pnpm ariadne list
pnpm ariadne reportpnpm ariadne -- --help also works for compatibility with package-manager argument separator usage.
Checks
pnpm checkSee TESTING.md for full setup, smoke tests, expected results, and failure-debugging notes.
During development, use:
pnpm dev init
pnpm dev doctor
pnpm dev run
pnpm dev list
pnpm dev reportWorkflow
ariadne init creates:
ariadne.yml.ariadne/tasks/example.yml.ariadne/runs/.gitignoreentries for/.ariadne/and/ariadne.yml
The ignore entries keep Ariadne config, tasks, and generated run artifacts out of host-project git status and formatter checks. Running ariadne init again updates existing projects without duplicating entries.
For Codex, set agent.command to read Ariadne's stdin prompt explicitly:
agent:
command: "codex exec --sandbox workspace-write -"ariadne run reads ariadne.yml, loads YAML tasks, sends each task prompt to agent.command via stdin, runs configured verification commands, captures git traces, scores checks, and writes .ariadne/runs/<timestamp>.json.
ariadne doctor validates config and task files, checks command executables, and detects missing package manager scripts before a run.
ariadne list prints every run in the project, newest first, in a compact table with task IDs and short run IDs.
Use explicit output modes for full details and exports:
ariadne list --wide # Full task names and JSON paths
ariadne list --csv # Write .ariadne/runs/runs.csv
ariadne list --md # Write .ariadne/runs/runs.md
ariadne list --json # Write .ariadne/runs/runs.jsonariadne report reads the latest run JSON, prints a terminal summary, and writes .ariadne/runs/latest-report.html.
MVP checks
- Agent command must exit with code 0.
- Verification commands must pass.
- Forbidden files must not be modified.
- Changed files must not exceed
checks.max_changed_files. - Diff lines must not exceed
checks.max_diff_lines. - Forbidden command strings must not appear in logs or observed commands.
