sisyphus-engine

v0.3.0

Published

a month ago

Spec-driven artifact engine with adversarial evaluation

0High
0Medium
0Low

hurleysk

claude ai orchestrator adversarial-evaluation spec-driven document-generation

Sisyphus

Spec-driven artifact engine with adversarial evaluation. Define what you want, how to get the data, and how to know it's right. Sisyphus pushes the boulder until it stays at the top.

How It Works

npx sisyphus run specs/my-spec.json

Stack -- Gather data from files, globs, and agent-extracted summaries
Start -- Spawn a fresh Claude instance (Sisyphus) to produce content from the data
Descend -- Run structural checks in code, then spawn an adversarial evaluator (Hades) to judge custom criteria
Climb -- If anything fails, format specific feedback and send Sisyphus back up the hill

The loop continues until all criteria pass or max retries are exhausted.

Architecture

Three layers, four agents, strict isolation:

+-----------------------------------+
|  Domain Layer                     |  "documentation" -> markdown checks, assembler
|  (extensible to codegen, etc.)    |
+-----------------------------------+
|  Core Engine                      |  spec parser, stack pipeline, spawn,
|  (domain-agnostic)                |  produce/descend loop, climb, reporting
+-----------------------------------+

| Agent | Role | Isolation | |-------|------|-----------| | Zeus | Conversational architect (T1) | Sees user, not the loop | | Thanatos | Orchestrator (T2, the engine) | Dispatches, never produces | | Sisyphus | Producer (T3) | Sees data + feedback, never criteria | | Hades | Evaluator (T3) | Sees output + criteria, never the goal |

Spec Format

A spec defines boulders (units of work) with stack sources (data) and criteria (acceptance):

{
  "title": "Entity Migration Status",
  "layer": "documentation",
  "output": "output/reports/status.md",
  "maxRetries": 3,
  "boulders": [
    {
      "name": "Entity Inventory",
      "description": "Complete list of entities with migration status",
      "stack": [
        { "type": "analysis", "source": "schema/_index.json", "instruction": "Extract all entities" }
      ],
      "criteria": [
        { "check": "contains-table", "description": "Has entity table", "columns": ["Entity", "Status"] },
        { "check": "row-count-gte", "description": "Covers all entities", "min": 10 },
        { "check": "custom", "description": "Each entity has a valid status" }
      ]
    }
  ]
}

CLI

sisyphus validate <spec-file>     # Validate spec against schema
sisyphus run <spec-file>          # Execute the spec
sisyphus run <spec-file> --dry-run    # Show plan without executing
sisyphus run <spec-file> --section <name>  # Run one boulder only

Documentation Layer Checks

Deterministic, in-code checks that can't be gamed:

| Check | What it does | |-------|-------------| | contains-table | Verify markdown table exists with required columns | | row-count-gte / row-count-lte | Verify row count bounds | | contains-heading | Find heading by text and optional level | | word-count-gte / word-count-lte | Verify word count bounds | | custom | LLM-evaluated by Hades with adversarial stance |

All structural checks are code-block-aware -- tables and headings inside fenced code blocks are ignored.

Project Structure

bin/sisyphus.ts              # CLI entry point
src/
  engine.ts                  # Boulder loop (Thanatos)
  spec.ts                    # Spec loading + JSON schema validation
  stack.ts                   # Data stacking pipeline
  start.ts                   # Claude spawning via stdin pipe
  checks.ts                  # Check registry
  prompt-builder.ts          # Prompt assembly with isolation
  lessons.ts                 # Cross-run learning system
  report.ts                  # Run report generation
  types.ts                   # Core interfaces
layers/
  documentation/
    index.ts                 # Layer interface implementation
    assembler.ts             # Markdown document assembly
    checks/                  # Structural check implementations
    prompts/
      sisyphus.md            # Producer prompt
      hades.md               # Evaluator prompt
lib/spec-schema.json         # Base spec JSON schema
lessons/                     # Lesson stores (global + per-layer)
examples/                    # Example specs

Lessons System

Sisyphus learns across runs. When a boulder fails then succeeds after climbing, the pattern is captured as a lesson. Lessons are:

Filtered by relevance tags before injection into prompts
Scored by recency and frequency
Budget-capped to prevent context bloat (default ~2000 chars)
Stored per-layer (lessons/documentation.json) and globally (lessons/global.json)

Install

npm install -g sisyphus-engine
sisyphus run specs/my-spec.json

Or use without installing:

npx sisyphus-engine run specs/my-spec.json

From source

git clone https://github.com/HurleySk/sisyphus.git
cd sisyphus
npm install
npm run build
npm test

Requires Node.js 18+ and the Claude Code CLI.

Why "Sisyphus"?

The boulder always rolls back down. But unlike the myth, this engine keeps climbing with specific feedback until it stays at the top -- or flags it and moves on.

agentic-loop -- Inspiration for the two-terminal architecture
HurleySk/sherlock -- Domain analyst skills, Sisyphus-aware

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme