@galileodev/core

v0.39.2

Published

19 hours ago

**AI that learns from every task.** Galileo builds a persistent memory from your codebase and gets measurably better at generating and verifying code over time.

0High
0Medium
0Low

d13gomarono13

Galileo

AI that learns from every task. Galileo builds a persistent memory from your codebase and gets measurably better at generating and verifying code over time.

Install

npm install -g @galileodev/cli

That's it. One command, globally available.

The Problem

AI coding tools are stateless. They generate code, forget everything, and start from zero on the next task. They don't know what worked last time, what patterns your codebase favors, or which approaches led to failures. Every interaction is isolated — no compounding, no improvement, no learning.

Cursor, Copilot, Aider, Continue — they're all powerful, but they all have amnesia.

What Galileo Does Differently

Galileo remembers. Every time it generates code, it reflects on the result, extracts what worked and what didn't, and stores those insights in a persistent memory. The next task benefits from everything learned before. Over dozens of tasks, the system measurably improves — higher first-pass verification rates, fewer errors, better code.

This isn't prompt caching or RAG over docs. It's a closed-loop learning system: generate → verify → reflect → remember → improve.

Quick Start

# Initialize in your project
cd your-project
galileo init

# Generate, verify, and learn
galileo build "Add input validation to the user endpoint"

# See what it learned
galileo memory

# Prove it's getting better
galileo eval instructions.json

Run 1: Empty memory. Code generated from scratch. 3 insights extracted. 2 verification errors found and fixed automatically.

Run 10: 25 insights in memory. The 8 most relevant are selected for this task. First-pass verification succeeds — no fixes needed.

Run 50: 80 insights in memory. Related insights have been distilled into abstract principles. Stale knowledge has decayed away. First-pass rate: 78% (up from 40% at run 1).

Run 100: galileo eval confirms — memory-informed generation achieves 85% first-pass rate vs. 45% without. The improvement is empirical, not anecdotal.

How It Works

              PERCEIVE                          ACT
           (select relevant                (generate code
            knowledge for                   from knowledge
            this task)                      + instruction)
                 │                               │
                 v                               v
        ┌────────────────┐            ┌────────────────┐
        │   ATTENTION    │            │    ACTION      │
        │   Smart        │──────────→ │   Code         │
        │   Selection    │            │   Generation   │
        └────────────────┘            └────────────────┘
                 ^                               │
                 │                               v
        ┌────────────────┐            ┌────────────────┐
        │    MEMORY      │            │ METACOGNITION  │
        │   Persistent   │←────────── │   Reflect +    │
        │   Knowledge    │            │   Score        │
        └────────────────┘            └────────────────┘
                 ^                               │
                 │                               v
        ┌────────────────┐            ┌────────────────┐
        │ REINFORCEMENT  │            │  GROUNDING     │
        │   Attribute    │←────────── │  Verify Code   │
        │   Outcomes     │            │  (4 checks)    │
        └────────────────┘            └────────────────┘

Each cycle through this loop makes the next cycle better. Memory isn't a database — it's the system's evolving understanding of how to write good code in your project.

Usage

# The primary workflow — generate, verify, fix, and learn
galileo build "Add rate limiting to the API" --cycles 3

# Learn from existing code without writing files
galileo analyze "Review the authentication module for security patterns"

# Verify and fix
galileo verify
galileo solve --budget 50000 --retries 3

# Stage isolation and checkpoints
galileo build "Add validation" --isolation isolated --checkpoints generator,reflector

# Evaluate the learning loop
galileo eval instructions.json
galileo benchmark instructions.json
galileo auto-refine instructions.json --max-trials 20

# Optimize code for a specific metric
galileo optimize-code \
  --metric "bundle-size" \
  --command "du -sb dist | cut -f1" \
  --target src/index.ts \
  --direction minimize

# Evolve prompt templates
galileo evolve generator --evaluator verification --experiments 10

# Project orchestration
galileo init-project       # interactive Q&A → phased plan
galileo start              # conversational auto-pilot with TUI

# Visual dashboard + chat
galileo dashboard          # opens http://localhost:3141

What Makes It Work

Smart selection — Memories aren't dumped into context. A scoring function ranks them by relevance (embedding similarity), quality (past verification outcomes), and freshness (unused knowledge decays with a 7-day half-life). Only the most useful memories are selected for each task.

Four-layer verification — Every code generation is checked by TypeScript compiler, ESLint, Semgrep security analysis, and your test suite. Failures trigger automated remediation — the system fixes its own mistakes before you see them.

Reinforcement from outcomes — When verification passes, the memories that contributed get marked as helpful. When it fails, they get marked as harmful. Over time, good knowledge rises and bad knowledge fades.

Automatic distillation — When clusters of related insights grow large, they're synthesized into abstract principles and the originals are archived. The system compresses its own experience into wisdom.

Prompt evolution — Templates aren't static. The galileo evolve command runs controlled experiments to mutate and improve prompts, keeping only variants that demonstrably improve verification pass rates.

Resilience under stress — Circuit breakers on every external call, graceful degradation when services are slow, health monitoring that detects quality decline in the memory itself.

Claude Code Plugin

Galileo ships as a Claude Code plugin with 16 skills, 5 pipeline agents, and a session-start hook. The plugin bridges Galileo's learning engine with Claude Code's coding capabilities — Claude Code writes the code, Galileo ensures each generation is informed by accumulated knowledge.

# If you use Claude Code, Galileo works as a plugin layer
claude plugin install galileo

Configuration

galileo init   # creates .galileo/ with config, starter knowledge, and SQLite store

Configuration lives in .galileo/config.json:

{
  "apiKey": "sk-ant-...",
  "model": "claude-sonnet-4-20250514",
  "provider": "anthropic"
}

Providers: "anthropic" (direct API), "pi-ai" (30+ providers including OpenAI, Google, Mistral, local models), or "cli" (Claude CLI). Set baseUrl for local model endpoints.

Packages

| Package | Description | |---------|-------------| | @galileodev/core | Learning engine — pipeline, memory, attention, reinforcement, observability | | @galileodev/verify | Grounding — 4 verifiers, build-verify-fix orchestration, metric-driven optimization | | @galileodev/meta | Prompt strategy — templates, evolution, token counting, validation | | @galileodev/cli | Interface — 21 commands, TUI, dashboard, chat, formatting |

The monorepo uses npm workspaces. Build order: core → meta/verify → cli.

Architecture (For Contributors)

Under the hood, the pipeline is a middleware chain. Each cognitive capability is a composable middleware function — independently testable, swappable, and extensible. The user-facing commands (build, analyze, memory) map to internal stages (Generator, Reflector, Curator, Selector, Distiller, FeedbackRecorder).

                              galileo CLI
    init · build · analyze · verify · solve · eval · auto-refine
    optimize-code · start · dashboard · evolve · benchmark ·
    prompts · memory · init-project · self-improve · update ·
    migrate-store
    ┌──────────────────────────────────────────────────────┐
    │                  @galileodev/cli                      │
    │  Commands · Formatters · TUI · Dashboard · Session   │
    └──────────────┬───────────────────┬───────────────────┘
                   │                   │
    ┌──────────────▼──────┐  ┌────────▼──────────────────┐
    │  @galileodev/meta   │  │   @galileodev/verify      │
    │  Templates          │  │   Verifiers (tsc, eslint, │
    │  Validator           │  │   semgrep, tests)         │
    │  RatchetOptimizer   │  │   SolveAgent              │
    │                      │  │   ACDCOrchestrator        │
    │                      │  │   KarpathyLoop            │
    └──────────┬───────────┘  └──────────┬────────────────┘
               │                         │
    ┌──────────▼─────────────────────────▼────────────────┐
    │                 @galileodev/core                      │
    │  Pipeline · Generator · Reflector · Curator          │
    │  Selector · Distiller · FeedbackRecorder             │
    │  PlaybookStore (JSONL / SQLite) · Embeddings (ONNX)  │
    │  LLM Providers (Anthropic / pi-ai / CLI)             │
    │  EventBus · CircuitBreakers · OTEL · Metrics         │
    └─────────────────────────────────────────────────────┘

Roadmap

| Pillar | Capability | Status | |--------|-----------|--------| | 1. Core Pipeline | Generate → Reflect → Curate loop | ✅ Complete | | 2. Developer Experience | CLI + Claude Code plugin | ✅ Complete | | 3. Intelligent Context | Relevance-ranked memory retrieval + distillation | ✅ Complete | | 4. Closed-Loop Eval | Verification outcomes feed back into memory | ✅ Complete | | 5. Adaptive Learning | Temporal decay + episodic context | ✅ Complete | | 6. Self-Improvement | Auto-tuning of pipeline parameters | ✅ Complete | | 7. Infrastructure | Circuit breakers, health monitoring, resilience | ✅ Complete | | 8. Production Ops | OpenTelemetry tracing + structured observability | ✅ Complete | | 9. Multi-Provider | 30+ LLM providers via pi-ai adapter | ✅ Complete | | 10. Perception | Repo awareness, memory management, generalization | 🔲 Planned | | 11. Autonomous Agency | Richer evaluation, streaming, interruptible stages | 🔲 Planned |

Full details: docs/ROADMAP.md

Testing

npm test                    # all 786 tests
npm test -w packages/core   # core only
npm test -w packages/verify # verify only
npm test -w packages/cli    # cli only
npm test -w packages/meta   # meta only

Contributing

Fork the repository
Create a feature branch: git checkout -b feat/your-feature
Make changes with tests
Run the full suite: npm test
Build all packages: npm run build
Submit a pull request

Development Setup

git clone https://github.com/aut0didakt0s/galileo.git
cd galileo && npm install && npm run build
npm link -w packages/cli

License

MIT