homunculus-code
v0.11.0
Published
A self-evolving AI assistant that grows smarter every night
Downloads
134
Maintainers
Readme
Language: English | 繁體中文 | 简体中文
Homunculus for Claude Code
Your AI assistant gets better every night — without you lifting a finger.
Stop tuning your AI. Let it tune itself. You spend hours tweaking prompts, writing rules, researching new features, configuring tools. Homunculus flips this: define your goals, and the system evolves itself — skills, agents, hooks, scripts, everything — while you focus on actual work.
npx homunculus-code initOne command. Define your goals. Your assistant starts evolving.
Proof it works: One developer ran this system for 5 weeks. It auto-generated 229 behavioral patterns, routed them into 10 tested skills, created 4 specialized agents, 18 commands, and 37 automation scripts. The nightly agent made 1,500+ autonomous commits — improving the system while the developer slept. See results →
Why
Every Claude Code power user hits the same wall:
Week 1: "Claude is amazing!"
Week 2: "Let me add some rules and hooks..."
Week 3: "I need skills, agents, MCP servers, custom commands..."
Week 4: "I spend more time configuring Claude than using it."Sound familiar? Even OpenClaw — with 300K+ stars and self-extending capabilities — still needs you to decide what to improve, when to improve it, and whether the improvement worked. Edit SOUL.md, tweak AGENTS.md, break something, open another AI session to fix it. The AI can extend itself, but it can't set its own direction or validate its own quality.
Here's the difference:
Without Homunculus: With Homunculus:
You notice tests keep failing Goal tree detects test_pass_rate dropped
→ You research testing patterns → Nightly agent auto-evolves tdd skill
→ You write a pre-commit skill → Eval confirms 100% pass
→ You test it manually → Morning report: "Fixed. Here's what changed."
→ It breaks after a Claude update → Next update? System re-evaluates and adapts
→ You fix it again... → You slept through all of thisThe Core Idea: Goal Tree
Most AI tools optimize locally — "you did X, so I'll remember X." Homunculus optimizes globally — toward goals you define in a tree:
🎯 my-project
├── code_quality — Ship fewer bugs
│ ├── testing — Maintain test coverage
│ │ └── realized_by: jest.config.js, tests/ (42 tests) ✓
│ └── linting — Consistent code style
│ └── realized_by: .eslintrc.js, .prettierrc ✓
├── speed — Move faster
│ └── deploy_automation — One-command deploys
│ └── realized_by: # will evolve ○
└── knowledge — Learn and remember
├── research — Discover better approaches
│ └── realized_by: nightly agent ✓
└── memory — Remember what matters
└── realized_by: homunculus/instincts/ ✓Each node defines why it exists, how to measure it, and what currently implements it. Goals are stable. Implementations evolve. The system automatically routes each behavior to its optimal mechanism — implementations get replaced and upgraded while goals stay intact.
# architecture.yaml — from the reference system
autonomous_action:
purpose: "Act without waiting for human commands"
goals:
nightly_research:
purpose: "Discover better approaches while developer sleeps"
realized_by: heartbeat/heartbeat.sh # a shell script
health_check:
command: "test $(find logs/ -mtime -1 | wc -l) -gt 0"
expected: "nightly agent ran in last 24h"
task_management:
purpose: "Track and complete tasks autonomously"
realized_by: quest-board/server.js # a web app
metrics:
- name: forge_completion_rate
healthy: "> 70%"
continuous_evolution:
purpose: "Improve over time without human intervention"
goals:
pattern_extraction:
purpose: "Learn from every session"
realized_by: homunculus/scripts/evaluate-session.js
skill_aggregation:
purpose: "Converge patterns into tested skills"
realized_by: homunculus/evolved/skills/The realized_by field can point to anything:
| Type | Example | When to use |
|------|---------|-------------|
| Skill | skills/tdd-workflow.md | Behavioral knowledge |
| Agent | agents/code-reviewer.md | Specialized AI subagent |
| Hook | hooks/pre-commit.sh | Automated trigger |
| Script | scripts/deploy.sh | Shell automation |
| Cron | launchd/nightly-check.plist | Scheduled task |
| MCP | mcp-servers/github | External integration |
| Rule | rules/security.md | Claude Code behavioral rule |
| Command | commands/quality-gate.md | Slash command workflow |
How It Evolves
You use Claude Code normally
│
┌──────────┼──────────┐
│ │ │
Observe Health Check Research
(hooks) (goal tree) (nightly)
│ │ │
└──────────┼──────────┘
│
▼
┌────────────┐
│ Evolve │ Goals stay the same.
│ ──────── │ Implementations get better.
│ Extract │
│ Converge │
│ Eval │
│ Replace │
└────────────┘Three inputs, one engine:
- Observe — hooks watch your tool usage, extract recurring patterns into "instincts"
- Health check — the goal tree identifies which goals are unhealthy → focus there
- Research — a nightly agent scans for better approaches and proposes experiments
The evolution engine then:
- Extracts behavioral patterns → instincts (tagged with best mechanism + goal path)
- Routes each instinct to its optimal mechanism → hook, rule, skill, script, agent, or system
- For skills: eval → improve loop until 100% pass rate
- Reviews all goals nightly → is the current mechanism still the best one?
- Archives instincts once implemented → the mechanism is the source of truth now
What's New
v0.11.0 — Cross-Platform Evolution Support (Apr 2026)
- Cursor support —
initnow detects.cursor/and installs analwaysApply: truerule (evolution-system.mdc) so the agent always has goal awareness, plus a skill file in.cursor/rules/for operation routing. Stop hook triggers instinct extraction at session end - Codex CLI support —
initinjects a Homunculus section intoAGENTS.md(equivalent toCLAUDE.md) with a routing table, and copies a skill file to.codex/. Stop hook wired for session-end extraction - Platform-agnostic skills — New
skills/{claude,cursor,codex,generic}/directory. Each platform gets the right format; Claude Code users retain fast/hm-*slash commands homunculus goal— New CLI subcommand: shows yourarchitecture.yamlgoal tree summary or guides first-time setuphomunculus status— New CLI subcommand: evolution dashboard (instincts, skills, agents, evals, last report) — works from any terminal without opening your AI editor
v0.10.1 — Hook Auth Fallback (Apr 2026)
- Queue fallback for hook context —
evaluate-session.jsnow gracefully handles the case whereclaude --printfails from a CC hook subprocess (Keychain/OAuth not accessible). Failed extractions are queued tohomunculus/reports/extraction-queue.jsonland automatically processed on the next run outside hook context (e.g. manual call or/hm-night)
v0.10.0 — Durability Gate & Multi-Run Eval (Apr 2026)
- Durability Gate — Instincts with
durability_score < 0.7are filtered out; event-specific patterns no longer pollute the system - Gaming Gate — Score jumps >5pp with ≤3 net new lines are flagged as
gaming_suspectedand discarded - Multi-run eval —
--runs Nand--passes Nflags for multiple independent eval runs and majority vote per scenario, reducing LLM-judge variance - CLAUDE.md coverage check — Already-implemented rules are injected into the extraction prompt to avoid re-extracting existing behaviors
- Bash failure circuit breaker —
observe.shtracks last 10 failures for evolution analysis - Simplified phase pipeline docs — P0/P1A/B/C internal naming removed from public docs; cleaner mental model
v0.9.0 — Evolution Engine Upgrade (Mar 2026)
- Smart observation —
observe.shnow filters noise (skips Read/Glob/Grep, only records post-phase for writes) and tracks reference frequency — which instincts and skills are actually being read - Three-layer extraction —
evaluate-session.jsnow extracts instincts + memory suggestions + research topics in a single pass. Memory and research are written tohomunculus/reports/for you to review (non-invasive) - Semantic dedup — New instincts can declare
supersedesto auto-archive older instincts they replace - Write Gate — Extraction prompt now includes quality criteria (must change future behavior, capture a commitment, or preserve a decision rationale)
- Dynamic daily cap — Instinct extraction limit scales with session size (1 for short sessions, up to 5 for long ones)
- Smarter pruning —
prune-instincts.jsnow uses reference frequency scoring (+25 for frequently used, -15 for never referenced), 3-tier skill coverage detection (direct evolved / high overlap / partial), 14-day grace period before confidence decay, and at-risk warnings - Idempotency — Extraction tracks scan state to avoid reprocessing the same observations
--jsonoutput —prune-instincts.jssupports--jsonfor programmatic use
v0.8.0 — Upgrade Mechanism (Mar 2026)
homunculus-code upgrade— New command to update managed files when you upgrade the npm package. Core scripts (observe.sh, etc.) are auto-replaced with.bakbackup; commands and rules are only replaced if you haven't customized them — otherwise a.newfile is written for manual merge- Install manifest —
initnow generateshomunculus/.manifest.jsontracking SHA256 hashes of all managed files, enabling safe diff-aware upgrades
v0.7.0 — Non-Invasive Install & Proactivity Design Guide (Mar 2026)
- Non-invasive install — Scripts now live in
homunculus/scripts/instead of the project rootscripts/, avoiding collisions with existing project tooling - Safe command install — Slash commands are installed file-by-file; existing commands are preserved with a warning instead of being overwritten
- Hook merging — If
PostToolUsehooks already exist in.claude/settings.json, the observation hook is merged into the array rather than skipping setup entirely - Proactivity design guide — New docs/proactivity.md covering the three core concepts behind proactive AI assistants: Memory Flush, Research Queue, and Periodic Heartbeat
v0.6.3 — Project-Aware Goals & Evolution Tiers (Mar 2026)
- Project scanning —
/hm-goalnow detects your tech stack, tests, CI, and directory structure before asking questions. Goal trees start pre-filled with what you already have - Universal mechanism routing —
/evolveroutes instincts to the best mechanism (hook, rule, skill, script, agent), not just skills - Evolution intensity tiers — Configure
minimal/standard/fullinevolution-config.yamlto control depth, frequency, and cost - Nightly agent setup — First
/hm-nightrun properly offers to configure automatic overnight evolution
v0.5.0 — Initial Release (Mar 2026)
- Goal tree — Define what you want in
architecture.yaml - Observation hook — Automatically captures behavioral patterns from your sessions
- Nightly evolution —
/hm-nightruns health checks, extracts instincts, evolves skills, and generates reports - 6 slash commands —
/hm-goal,/hm-night,/hm-status,/evolve,/eval-skill,/improve-skill
Quick Start
1. Install
Run in your project directory (where your CLAUDE.md, .claude/, .cursor/, or codex.json lives):
npx homunculus-code initinit auto-detects your AI host (Claude Code, Cursor, or Codex CLI) and installs the right configuration:
| Host | Rule format | Skill format | Stop hook |
|------|------------|--------------|-----------|
| Claude Code | .claude/rules/evolution-system.md | .claude/commands/homunculus.md | SessionEnd → instinct extraction |
| Cursor | .cursor/rules/evolution-system.mdc (alwaysApply: true) | .cursor/rules/homunculus.mdc | stop event → instinct extraction |
| Codex CLI | Injected into AGENTS.md | .codex/homunculus-skill.md | stop event → instinct extraction |
Homunculus — Self-evolving AI Assistant
✓ Created homunculus/ directory structure
✓ Added evolution rules (platform: claude-code)
✓ Copied evolution scripts
✓ Added slash commands (/hm-goal, /hm-night, /hm-status)
✓ Configured observation hook
Done! Homunculus is installed.
Next steps:
1. Open your AI editor
2. Type /hm-goal (or "define goals") to define your goals (AI-guided)
3. Type /hm-night (or "run evolution") to run your first evolution cycle2. Define Your Goals
Open Claude Code and type /hm-goal. Claude scans your project first, then asks targeted questions based on what it finds:
> /hm-goal
Claude: I see a Next.js + TypeScript project with 42 Jest tests
and GitHub Actions CI. What's the main purpose of this project?
You: A SaaS dashboard for analytics
Claude: You have 42 tests already. Is test coverage a pain point,
or is something else slowing you down more?
You: Deployment is manual and error-prone
Claude: If your AI could fix one thing overnight, what would it be?
You: Automated deployment with proper checks
Claude: Here's your goal tree:
🎯 analytics-dashboard
├── code_quality — Ship fewer bugs
│ ├── testing — Maintain test coverage
│ │ └── realized_by: jest.config.js, tests/ (42 tests) ✓
│ └── linting — Consistent code style
│ └── realized_by: .eslintrc.js, .prettierrc ✓
├── deployment — Reliable releases
│ └── ci_pipeline — Automated checks on every PR
│ └── realized_by: .github/workflows/ ✓
└── productivity — Move faster
└── deploy_automation — One-command deploys
└── realized_by: # will evolve ○
4 goals: 3 already have implementations (✓), 1 will evolve (○)
Does this look right?
You: Great!
Claude: ✅ architecture.yaml created with 4 goals!For new projects with no existing files, Claude falls back to general questions about your project type and goals.
3. Run Your First Evolution Cycle
> /hm-nightWatch Claude check your goals, route instincts to the right mechanisms (hook/rule/skill/script/agent), evaluate and improve implementations, and generate a report — all in one command. This is what the nightly agent does autonomously while you sleep.
4. Upgrade
When a new version is published, update your managed files:
npx homunculus-code@latest upgrade- Core scripts → auto-replaced (backup saved as
.bak) - Commands/rules you haven't modified → auto-replaced
- Commands/rules you customized → skipped, new version saved as
.newfor you to merge
5. Keep Using Your AI Editor
The observation hook watches your usage automatically. As patterns emerge, instincts are extracted and routed to the right mechanism:
/hm-goal Define or refine your goals (Claude Code slash command)
/hm-night Run a full evolution cycle (Claude Code slash command)
/hm-status Check evolution progress (Claude Code slash command)
"define goals" Natural language trigger (all platforms)
"run evolution" Natural language trigger (all platforms)
"show status" Natural language trigger (all platforms)
homunculus goal CLI: view architecture.yaml goal summary (any terminal)
homunculus status CLI: evolution dashboard (any terminal)
homunculus night CLI: run evolution cycle via LLM provider (any terminal)/hm-night (or homunculus night) performs the complete evolution pipeline: routes instincts to the best mechanism (hook/rule/skill/script/agent), runs eval + improve on skills, reviews goal health, and generates a report. You can run it manually anytime, but the real power is letting it run autonomously every night.
The first time you run
/hm-night, it will ask if you want to set up the nightly agent for automatic overnight evolution.
Key Concepts
Goal Tree (architecture.yaml)
The central nervous system. Every goal has a purpose, metrics, and health checks. The evolution system reads this to decide what to improve and how to measure success.
Instincts
Small behavioral patterns auto-extracted from your usage. Each instinct is tagged with:
- Confidence score — grows with reinforcement, decays over time (half-life: 90 days)
- Suggested mechanism — which implementation type fits best (hook/rule/skill/script/...)
- Goal path — which goal in the tree this serves
Think of instincts as raw material. They get routed to the right mechanism and archived once implemented.
Implementation Routing
The system chooses the best mechanism for each behavior:
| Behavior type | Mechanism | |--------------|-----------| | Deterministic, every time | Hook (zero AI judgment) | | Tied to specific files/dirs | Rule (path-scoped) | | Reusable knowledge collection | Skill (with eval spec) | | Periodic automation | Script + scheduler | | External service connection | MCP | | Needs isolated context | Agent |
Skills & Eval Specs
When multiple instincts cover the same area and routing says "skill," they converge into a tested, versioned knowledge module. Every skill has an eval spec with scenario tests. Skills that fail eval get automatically improved.
Replaceable Implementations
The core principle: same goal, evolving implementation.
Goal: "Catch bugs before merge"
v1: instinct → "remember to run tests"
v2: rule → .claude/rules/testing.md (path-scoped guidance)
v3: skill → tdd-workflow.md (with eval spec)
v4: hook → pre-commit.sh (deterministic, automated)The system reviews these nightly — if a skill should be a hook, it suggests the upgrade.
Nightly Agent
This is what makes the system truly autonomous. The nightly agent runs /hm-night automatically while you sleep — routing instincts to the right mechanism, evaluating skills, reviewing goal health, and researching better approaches.
Setup: The first time you run /hm-night, it asks if you want to enable automatic nightly runs. Say yes, and it configures a scheduler (launchd on macOS, cron on Linux) to run every night.
You can also run /hm-night manually anytime to trigger a cycle on demand.
Evolution Tiers
Control how deeply your assistant evolves each night via evolution-config.yaml (created during init):
| | Minimal | Standard | Full | |---|---------|----------|------| | Instinct harvest + routing | ✅ | ✅ | ✅ | | Skill eval (changed only) | ✅ | ✅ | ✅ | | Research | — | 2 topics | 3-5 topics | | Experiments | — | 1/night | 3/night | | Bonus loop | — | — | Optional | | Est. cost/night | ~$0.5 | ~$2-3 | ~$5-10 |
Weekly deep mode (configurable day) adds: full skill re-eval, goal tree mechanism review, deep health check.
# Change tier anytime
# Edit evolution-config.yaml → tier: minimal | standard | fullSubscription users (Max/Team) can run full at no extra API cost.
You go to sleep
│
▼
┌─────────────────────────────────────────────┐
│ Nightly Agent (phase pipeline) │
│ │
│ 1. Health check (goal status) │
│ │
│ 2. Evolution cycle │
│ Route instincts → 8 mechanisms │
│ Eval + improve implementations │
│ Review: best mechanism per goal? │
│ │
│ 3. Research (cross-night dedup) │
│ │
│ 4. Act (experiments + quick fixes) │
│ │
│ 5. Report + sync │
└─────────────────────────────────────────────┘
│
▼
You wake up to a smarter assistant + a reportHere's what a real morning report looks like:
## Morning Report — 2026-03-22
### What Changed Overnight
- Improved skill: claude-code-reference v4.6 → v4.8
(added coverage for CC v2.1.77-80 features)
- Archived 3 outdated instincts (covered by evolved skills)
- New experiment passed: eval noise threshold set to 5pp
### Goal Health
- continuous_evolution: ✅ healthy (10 skills, all 100% eval)
- code_quality: ✅ healthy (144/144 tests passing)
- resource_awareness: ⚠️ attention (context usage trending up)
→ Queued experiment: split large skill into chapters
### Research Findings
- Claude Code v2.1.81: new --bare flag could speed up headless mode
→ Experiment queued for tomorrow night
- New pattern detected in community: writer/reviewer agent separation
→ Instinct created, will converge if reinforced
### Suggested Actions (for you)
- Review 2 knowledge card candidates from overnight research
- Approve experiment: context reduction via skill splittingIn our reference system, the nightly agent produced 1,500+ autonomous commits — routing instincts to the right mechanisms, evolving skills, running experiments, researching better approaches, and archiving outdated patterns. All without any human input.
The nightly agent is what turns Homunculus from "a tool you use" into "a system that grows on its own."
See docs/nightly-agent.md for setup.
Real-World Results
Built and tested on a real personal AI assistant. In 5 weeks (starting from zero):
| What evolved | Count | Details | |-------------|-------|---------| | Goal tree | 10 goals / 46+ sub-goals | Each with health checks and metrics | | Instincts | 229 | 1 active + 228 auto-archived (mature system: most patterns absorbed into skills) | | Skills | 10 | All 100% eval pass rate | | Experiments | 20 | Structured A/B tests with pass/fail tracking | | Subagents | 4 | Auto-extracted from repetitive main-thread patterns | | Scheduled agents | 4 | Nightly heartbeat, Discord bridge, daily news, trading | | Hooks | 8 | Observation, compaction, quality gates, session lifecycle | | Scripts | 37 | Session lifecycle, health checks, evolution reports | | Slash commands | 18 | Workflow automations (forge-dev, quality-gate, eval...) | | Rules | 7 | Core patterns, evolution system, knowledge management | | ADRs | 9 | Architecture decision records | | Total commits | 1,688+ | Mostly automated by nightly agent |
The nightly agent alone: 1,500+ autonomous commits.
The system even evolved its own task management board:
See the full reference implementation →
What Makes This Different
Three philosophies, three architectures:
OpenClaw treats an agent as a system to be orchestrated — 345K+ stars, 10+ channels, plugin marketplace, human decides what to improve.
Hermes Agent treats an agent as a mind to be developed — 58K+ stars, 18+ LLM providers, auto-extracts skills from workflows, offline RL training.
Homunculus treats an agent as a partner that evolves toward your goals — not a standalone system, but an evolution layer that makes your existing AI coding assistant measurably better over time.
| | Homunculus | OpenClaw | Hermes Agent | Cursor Rules | Claude Memory | |---|---|---|---|---|---| | Goal-driven | Goal tree with metrics + health checks | No | No | No | No | | Learns from usage | Auto-observation → instincts → 8 mechanisms | Self-extending | Auto-extracts skills | Manual | Auto-memory | | Quality control | Eval specs + discrimination tracking | None | Offline RL only | None | None | | Autonomous overnight | Nightly agent: eval + improve + research + experiment | No | Cron only | No | No | | Self-improving | Eval → improve → replace loop | Partial | RL training (offline) | No | No | | Meta-evolution | Evolution mechanism evolves itself | No | No | No | No | | Implementation agnostic | Skills, agents, hooks, scripts, MCP, cron... | Skills only | Skills only | Rules only | Memory only | | Runtime quality gate | 100% pass rate + discrimination tracking | None | None (offline RL) | None | None |
The key difference from Hermes: Hermes improves through offline RL training — the model gets better over time via exported trajectories. Homunculus improves runtime behavior — the skills, hooks, and agents running today get tested, validated, and replaced tonight. No training cycle. No waiting.
The key difference from OpenClaw: OpenClaw can extend itself, but you still decide what to improve and whether it worked. Homunculus closes the loop: goal health detects what needs improving, evals validate whether it worked, and the nightly agent does it without you.
What Gets Generated
After npx homunculus-code init:
your-project/
├── architecture.yaml # Your goal tree (the brain)
├── evolution-config.yaml # Tier + budget settings
├── homunculus/
│ ├── instincts/
│ │ ├── personal/ # Auto-extracted patterns
│ │ └── archived/ # Auto-pruned old patterns
│ ├── evolved/
│ │ ├── skills/ # Converged, tested knowledge
│ │ ├── agents/ # Specialized subagents
│ │ └── evals/ # Skill evaluation specs
│ ├── experiments/ # A/B test tracking
│ ├── reports/ # Evolution cycle reports
│ └── scripts/
│ ├── observe.sh # Observation hook
│ ├── evaluate-session.js # Pattern extraction
│ └── prune-instincts.js # Automatic cleanup
├── .claude/
│ ├── rules/
│ │ └── evolution-system.md # How Claude should evolve
│ └── commands/
│ ├── hm-goal.md # /hm-goal — define or view goals
│ ├── hm-night.md # /hm-night — run evolution cycle
│ ├── hm-status.md # /hm-status — evolution dashboard
│ ├── eval-skill.md # /eval-skill
│ ├── improve-skill.md # /improve-skill
│ └── evolve.md # /evolve
└── .gitignore # Excludes runtime dataAdvanced: Meta-Evolution
The evolution mechanism itself evolves:
- Instinct survival rate too low? → Automatically raise extraction thresholds
- Eval discrimination too low? → Add harder boundary scenarios
- Skill convergence too slow? → Adjust aggregation triggers
- Mechanism coverage low? → Flag goals that only rely on prompts for upgrade
- Dispatch compliance off? → Review if agent dispatches follow the token decision tree
Tracked via five metrics:
instinct_survival_rate— % of instincts that survive 14 daysskill_convergence— time from first instinct to evolved skilleval_discrimination— % of eval scenarios that actually distinguish between versionsmechanism_coverage— % of goals with non-prompt implementationscompliance_rate— % of agent dispatches at appropriate context pressure
Requirements
- Claude Code v2.1.70+
- Node.js 18+
- macOS or Linux
Multi-LLM Provider
By default, Homunculus uses claude --print for instinct extraction (requires Claude CLI login). You can switch to any of the following providers:
| Provider | Env Vars | Notes |
|----------|----------|-------|
| claude-cli (default) | HOMUNCULUS_HARVEST_MODEL | Requires Claude CLI login |
| codex-cli | HOMUNCULUS_HARVEST_MODEL | Requires Codex CLI login |
| anthropic-api | ANTHROPIC_API_KEY, HOMUNCULUS_HARVEST_MODEL | Direct API — no CLI needed |
| openai-api | OPENAI_API_KEY, OPENAI_BASE_URL, HOMUNCULUS_HARVEST_MODEL | Also works with Ollama, vLLM, OpenRouter |
# Use Anthropic API directly (no Claude CLI required)
export HOMUNCULUS_HARVEST_PROVIDER=anthropic-api
export ANTHROPIC_API_KEY=sk-ant-...
# Use OpenAI
export HOMUNCULUS_HARVEST_PROVIDER=openai-api
export OPENAI_API_KEY=sk-...
export HOMUNCULUS_HARVEST_MODEL=gpt-4o
# Use Ollama (local, free)
export HOMUNCULUS_HARVEST_PROVIDER=openai-api
export OPENAI_API_KEY=ollama
export OPENAI_BASE_URL=http://localhost:11434
export HOMUNCULUS_HARVEST_MODEL=llama3.2
# Use Codex CLI
export HOMUNCULUS_HARVEST_PROVIDER=codex-cli
export HOMUNCULUS_HARVEST_MODEL=o4-miniFAQ
The concepts (goal tree, eval-driven evolution, replaceable implementations) are tool-agnostic. The current implementation targets Claude Code hooks and commands, but the core pipeline could be adapted to other AI harnesses.
The observation hook is lightweight (no API calls). Instinct extraction uses a short Claude call per session (~$0.01). The nightly agent is optional and budget-configurable.
Yes. npx homunculus-code init adds to your project without overwriting existing files. Your current setup becomes the starting point for evolution.
Claude's memory records facts. Homunculus evolves behavior — tested skills, automated hooks, specialized agents — all driven by goals you define, with quality gates that prevent regression.
OpenClaw is excellent at self-extending. Homunculus solves a different problem: autonomous, goal-directed evolution. It decides what needs improving (via goal health), validates improvements (via evals), and does the work overnight (via the nightly agent). You could use both: OpenClaw for on-demand capability extension, Homunculus for the autonomous evolution layer on top.
Hermes Agent (NousResearch) is a standalone AI assistant that improves through offline RL training — it exports conversation trajectories to the Atropos framework, which trains the model to get better over time. That's powerful, but it's a different tool for a different problem.
Homunculus isn't a standalone assistant — it's an evolution layer for the AI coding assistant you already use (Claude Code, and soon Cursor/Codex). Instead of training a model, it evolves the runtime behaviors: the skills, hooks, and agents active in your sessions right now. Each behavior is eval-tested before it ships and tracked for discrimination (does having this skill actually change outcomes?).
The skill format is agentskills.io compatible, so skills that evolve in Homunculus can be shared with the broader ecosystem.
Those are from your own user-level hooks (~/.claude/settings.json), not from Homunculus. If your hooks use relative paths like node scripts/foo.js, they'll fail in projects that don't have those scripts. Fix by adding guards:
"command": "test -f scripts/foo.js && node scripts/foo.js || true"Homunculus only adds hooks to the project-level .claude/settings.json.
Blog Posts
- Stop Tuning Your AI. Let It Tune Itself. — The story behind Homunculus and why goal trees beat manual configuration.
Philosophy
"Your AI assistant should be a seed, not a statue."
Stop spending your evenings tuning AI. Plant a seed, define your goals, and let it grow. The more you use it, the better it gets — and it tells you exactly how and why through eval scores, goal health checks, and morning reports.
Acknowledgments
Homunculus builds on ideas from several projects and research:
- everything-claude-code — Continuous Learning pattern and Skill Creator's eval → improve loop. Homunculus adopted and extended these into a goal-tree-driven, autonomous evolution system.
- OpenClaw — Demonstrated that AI assistants can extend their own capabilities. Homunculus adds goal direction, eval quality gates, and autonomous overnight operation.
- Hermes Agent — Validated the "agent that grows with you" positioning and the agentskills.io standard. Homunculus differs by evolving runtime behavior (not model weights) with measurable eval quality gates.
- Karpathy's Autoresearch — Proved AI can run autonomous experiment loops (118 iterations, 12+ hours). Inspired the nightly agent's research cycle.
- Anthropic's Eval Research — Eval methodology, noise tolerance (±6pp), and pass@k / pass^k metrics.
Contributing
See CONTRIBUTING.md.
License
MIT
Built by Javan and his self-evolving AI assistant.
