helixevo

v0.10.0

Published

4 months ago

Co-evolving skill and project brain for AI agents, with automatic theory-conformance verification, reliable provider-aware actuation, ontology-aware learning, governed response, rollbackable topology control, bounded proof-governed steering, and a premium

0High
0Medium
0Low

danielchen26

ai agent skills evolution claude claude-code helixevo helixevo co-evolution pareto-frontier llm

HelixEvo

Co-evolving skill and project brain for AI agents. HelixEvo captures failures, traces activations, models pressure, routes governed responses, promotes cross-project transfer, reviews structural topology changes, safely executes accepted topology transitions with rollback, lets approved ontology concepts become active semantic consumers inside the live control loop, turns Proof into a bounded steering input for future control, and now adds automatic theory-conformance verification through contract-backed scenarios plus bounded live smoke checks.

How it works

HelixEvo builds on ideas from EvoSkill and AutoResearch to create a three-directional evolution system:

Generalize ↑ — Detect cross-project patterns and promote them to abstract skills
Specialize ↓ — Create project-specific skills from domain skills + project failures
Lateral ↔ — Merge, split, and resolve conflicts between skills

Every proposed change goes through:

3 independent LLM judges (Task Completion, Correction Alignment, Side-Effect Check)
Regression testing against skill tests
3-day canary deployment with auto-rollback

Prerequisites

Node.js 18+
Bun — used for building (curl -fsSL https://bun.sh/install | bash)
Claude CLI — installed and authenticated
- Requires a Claude Max plan subscription
- Claude Code remains the default provider for HelixEvo
- Prefer claude auth login managed credentials over exporting a hardcoded CLAUDE_CODE_OAUTH_TOKEN
- HelixEvo now retries once without an inherited CLAUDE_CODE_OAUTH_TOKEN if that override is stale but local Claude auth is valid
Optional providers
- Codex CLI (codex) for GPT Codex on shared prompt-in / text-out paths
- Ollama (ollama + local daemon) for shared local-model prompt-in / text-out paths
- Claude-only web-search and research tooling remain explicitly Claude-scoped

Verify prerequisites:

node --version    # v18+
bun --version     # any
claude --version  # default provider
codex --version   # optional
ollama --version  # optional

Install

From npm (recommended)

npm install -g helixevo

From GitHub

npm install -g github:danielchen26/helixevo

From source

git clone https://github.com/danielchen26/helixevo.git
cd helixevo
npm install
npm run build
npm link

Quick Start

# 1. Initialize — imports existing skills + generates skill tests
helixevo init

# 2. Capture failures from a session
helixevo capture path/to/session.json --project myapp

# 3. Evolve skills from failures
helixevo evolve --verbose

# 4. View the skill network
helixevo graph

# 5. Open the web dashboard
helixevo dashboard

Commands

| Command | Description | |---------|-------------| | helixevo watch | Always-on learning: auto-capture + auto-evolve | | helixevo metrics | Correction rates, skill trends, evolution impact | | helixevo proof | Outcome attribution, proof review, and steering summaries across interventions, transfer, topology, ontology, and evolution | | helixevo verify-brain | Automatic theory-conformance runner across deterministic scenarios plus bounded live smoke checks | | helixevo health | Network health: cohesion, coverage, balance, transfer | | helixevo init | Import existing skills + generate skill tests | | helixevo capture <session> | Extract failures from a session file | | helixevo project-setup <path> | Analyze a project, match skills, and surface capability gaps | | helixevo evolve | Evolve skills from captured failures | | helixevo generalize | Promote cross-project patterns ↑ | | helixevo specialize --project <name> | Create project-specific skills ↓ | | helixevo graph | View skill network in terminal | | helixevo ontology | Refresh, review, adopt, and inspect ontology concepts plus semantic control coverage | | helixevo topology | Prepare, apply, roll back, and inspect reviewed topology execution | | helixevo research | Proactive web research for skill improvement (Claude-scoped web-tool path) | | helixevo dashboard [--port <n>] | Open web dashboard, preferring localhost:3847 and falling forward if occupied | | helixevo status | Show system health plus provider-control truth | | helixevo report | Generate evolution report |

Common options

Most commands support:

--dry-run — Preview changes without applying
--verbose — Show detailed LLM interactions

Graph options

helixevo graph                    # TUI view (instant, cached)
helixevo graph --mermaid          # Open in browser as Mermaid diagram
helixevo graph --obsidian ~/vault # Sync to Obsidian vault
helixevo graph --rebuild          # Re-infer relationships (LLM call)
helixevo graph --optimize         # Refresh topology review queue first, then report full vs partial conflict enrichment
helixevo ontology --status        # Show ontology kernel / frontier / extension / adoption state
helixevo ontology --status --verbose
                                   # Show top active concepts, unused extensions, and deprecation-sensitive concepts
helixevo ontology --refresh       # Derive frontier concepts from recurring evidence
helixevo ontology --review <id> --decision promote
                                   # Promote a reviewed frontier concept into approved extensions
helixevo topology --status        # Show reviewed topology execution state
helixevo topology --prepare <id>  # Prepare an accepted topology candidate
helixevo topology --apply <id>    # Apply a safe prepared topology plan
helixevo topology --rollback <id> # Roll back an applied topology plan
helixevo proof --status           # Review proof state across the live loop
helixevo proof --review <id> --decision verify
                                  # Verify a proof record after operator review
helixevo verify-brain --verbose   # Run the contract-backed brain verification workflow
helixevo verify-brain --release   # Run stricter release-grade conformance handling

Research options

helixevo research --verbose             # Full output
helixevo research --project ./myapp     # Focus research on a project
helixevo research --max-hypotheses 5    # Test more hypotheses
helixevo research --dry-run             # Preview without creating skills

Data

All data is stored in ~/.helix/:

~/.helix/
├── config.json              # Configuration
├── failures.jsonl           # Captured failures
├── activation-traces.jsonl  # Native + derived activation traces
├── pressure-signals.jsonl   # Native + derived adaptation pressure
├── pressure-interventions.jsonl # Routed intervention ledger across response lanes
├── transfer-events.jsonl    # Promotion / transfer evidence across motifs and projects
├── governance-state.json    # Operator steering for active governance mode
├── llm-runtime-state.json   # Default provider, per-provider health, last execution, and fallback truth
├── topology-review-candidates.json # Persisted structural review queue
├── topology-review-decisions.jsonl # Operator accept/reject/defer decision ledger
├── topology-optimize-status.json # Last full/partial optimize refresh status + queue/enrichment summary
├── topology-overrides.json   # Applied safe structural topology overrides
├── topology-snapshots.json   # Snapshot refs for reviewed execution and rollback
├── topology-apply-plans.json # Prepared reviewed topology plans
├── topology-executions.jsonl # Prepared/applied/rolled-back execution ledger
├── topology-artifacts.jsonl  # Evidence artifacts for reviewed structural execution
├── proof-reviews.jsonl      # Operator verify/defer/contest ledger for derived proof records
├── evolution-artifacts.jsonl # Evolution + ontology-review evidence artifacts
├── theory-conformance/
│   ├── latest.json          # Latest contract-backed brain verification result
│   ├── reports/             # Human-readable theory-conformance reports
│   └── runs/                # Per-run scenario artifacts and structured outputs
├── ontology/
│   ├── kernel.json          # Materialized ontology kernel snapshot
│   ├── extensions.json      # Approved ontology extensions
│   ├── frontier.json        # Provisional frontier concepts awaiting review
│   ├── reviews.jsonl        # Ontology review decisions
│   └── change-log.jsonl     # Native ontology change events
├── frontier.json            # Pareto frontier (top-k configurations)
├── evolution-history.json   # All evolution runs + proposals
├── skill-tests.jsonl       # Regression test cases
├── skill-graph.json         # Cached network (nodes + edges + ontology version)
├── canary-registry.json     # Active canary deployments
├── knowledge-buffer.json    # Research discoveries + drafts
├── general/                 # Skills (SKILL.md files)
│   ├── my-skill/SKILL.md
│   └── ...
├── backups/                 # Pre-canary skill backups
└── reports/                 # Generated reports

Web Dashboard

The dashboard provides an interactive view of your skill ecosystem:

helixevo dashboard
# Prefers http://localhost:3847 and falls forward if that port is occupied

helixevo dashboard --port 3900
# Prefer port 3900 first

Tabs:

Overview — Premium control cockpit with frontier signals, brain foundation, provider-control truth, semantic backbone, ontology adoption visibility, proof review visibility, pressure counts, topology review visibility, and prepared/applied structural state
Skill Network — Interactive graph, premium inspector, co-evolution routing signals, and topology review/execution handoff links
Co-Evolution — Operator cockpit for routed pressure response, governance mode visibility, promotion queues, transfer evidence, semantic route influence, proof-aware route rationale, and topology handoff
Ontology — Semantic control surface for kernel visibility, frontier concept review, approved ontology extensions, adoption coverage, deprecation risk, and native ontology change events
Topology — Governance steering plus a persistent operator pipeline for review → prepare → apply → rollback across merge / split / promote / rewire / consolidate candidates
Proof — Outcome-attribution, review, and proof-steering cockpit for bounded effectiveness across interventions, transfer, topology execution, semantic adoption, and evolution impact
Projects — Project intake studio, live project analysis, gap routing, per-project pressure hotspots, and promotion feeders
Evolution — Timeline of evolution runs with judge scores, artifact provenance, and activation-aware context
Research — Knowledge buffer plus a live “why research now” handoff from current pressure, governed routing, and recurring gaps
Frontier — Pareto frontier with 4-dimension scores + canary status

The dashboard requires Next.js dependencies. On first run:

cd dashboard && npm install

Craft Agent Integration

HelixEvo includes a Craft Agent skill at integrations/craft-agent/:

# Copy to your skills directory
cp -r integrations/craft-agent/skills/skill-evolver ~/.agents/skills/

Then use [skill:skill-evolver] in Craft Agent to trigger evolution.

Architecture

Failures → Cluster → Propose → Replay → Multi-Judge → Regression → Canary → Frontier
              │                              │
              │                     3 independent judges:
              │                     - Task Completion
              │                     - Correction Alignment
              │                     - Side-Effect Check
              │
         Knowledge Buffer
         (discoveries + drafts from rejected proposals)

Brain foundation:

Ontology defines the stable semantic kernel for skills, projects, tasks, capabilities, artifacts, and mutations.
Ontology frontier and extensions let new semantic concepts emerge as provisional hypotheses, pass explicit review, become approved extensions, and then appear as active semantic consumers in pressure, routing, transfer, and structural interpretation without free-form drift.
Semantic adoption visibility shows which approved concepts are unused, active, deprecation-sensitive, or currently influencing live route rationale.
Activation traces record which skills and gaps were active during capture and project analysis.
Pressure signals turn failures and project gaps into explicit adaptation demand.
Pressure interventions record how HelixEvo responded across research, specialize, evolve, generalize, and manual-review lanes.
Governed routing and transfer evidence let recurring multi-project motifs bias toward promotion and show when reusable knowledge was actually realized.
Governance steering lets the operator pin or release the active adaptation mode rather than relying only on derived routing.
Topology review persists merge / split / promote / rewire / consolidate candidates so manual review is a real workflow.
Reviewed topology execution turns accepted safe candidates into prepared plans, snapshot-backed applies, and rollbackable structural transitions.
Proof control turns bounded outcome attribution into an explicit operator layer where interventions, transfer, topology execution, semantic adoption, and evolution impact can be verified, deferred, or contested, then fed back into future control through bounded proof steering.
Evolution artifacts preserve proposal-level evidence so the dashboard can show what changed, why, and with what provenance.

Three-layer hierarchy:

System — Global agent behaviors
Domain — Cross-project patterns (generalized skills)
Project — Project-specific specializations

License

MIT