oh-my-fable

v0.2.0

Published

13 hours ago

The autonomous-agent harness that actually finishes long tasks — because it plans first, self-corrects every step, and survives crashes. The whole run lives in one serializable RunContext, checkpointed after every step, so it resumes exactly where it died

oh-my-fable

Fable 5's way of working a long task — plan first, self-correct every step, never lose the thread — as a model-agnostic agent harness.

The fable is Fable 5's way of thinking; the oh-my- is because, like oh-my-zsh, you just want the good defaults. The mindset is the model's — the engine is any provider.

npm i oh-my-fable

The demos are magical. Then you point an agent at a real multi-hour task and it loops on the same step, loses the plan somewhere in a 40-message chat history, and — when your process restarts — forgets everything and starts over.

oh-my-fable encodes the way a strong reasoning model works a long task — the mindset, not the model — into a harness: plan first, self-correct every step, keep the thread, and finish. It's built around two mechanisms and one rule:

The whole run lives in a single RunContext — the only source of truth, and always serializable. It's checkpointed after every step.

From that one rule you get the thing nobody else gives you: a crash is a pause.

The name is about the thinking, not a model lock-in — the mindset is Fable 5's, the engine is whatever Provider you hand it (Anthropic, OpenAI-compatible, local, …).

── run run_mqf… ──
  📋 planned 3 steps: outline → draft → edit
  ▶  outline
     → outlined
     💾 checkpoint saved
  ▶  draft
  💥 the process just died (power outage, OOM, deploy, whatever)

── resuming from the last checkpoint ──
  ▶  draft                ← picks up exactly where it died
     💾 checkpoint saved
  ▶  edit
  ✅ done

  steps: outline [done], draft [done], edit [done]

const result = await run(goal, { provider, store });   // crashes at step 2
// ...process restarts...
await resume(result.runId, { provider, store });        // finishes from step 2

That's examples/scripted-run.mjs — run it with npm run example, no API key needed.

The three things it does that most frameworks don't

1. It survives crashes (resumable by construction)

State doesn't live in memory or in a chat transcript — it lives in RunContext, saved to disk after every step. Kill the process at step 47 of 60 and resume() continues from step 47, plan and progress intact. Swap the FileStore for SQLite/Redis by implementing one interface.

2. It plans first, then self-corrects (plan ≠ history)

The plan is structured data that lives outside the conversation, so the model never loses track of "where am I" in a wall of text. After every step a reflector checks the result against the goal and routes:

| verdict | meaning | what happens | | --- | --- | --- | | on_track | normal progress | next step | | needs_replan | the result changed the plan's assumptions | replan | | blocked | same obstacle keeps recurring | replan around it / escalate | | goal_met | success criteria satisfied | stop (even with steps left — no busywork) |

And replanning accumulates: finished steps are preserved verbatim; only the remaining work is regenerated. Long tasks move forward instead of restarting.

3. It's deterministically testable (genuinely rare for an agent framework)

Because every model call is stateless, you can script the model and assert the loop's behavior — no network, no flakiness:

import { run, ScriptedProvider, reply, MemoryStore } from "oh-my-fable";

const provider = new ScriptedProvider([
  reply.plan([{ id: "s1", intent: "do the thing" }]),
  reply.text("did it"),
  reply.reflection("goal_met"),
]);

const { status } = await run("do the thing", { provider, store: new MemoryStore() });
expect(status).toBe("done"); // fully deterministic

The whole harness is tested this way — crash-recovery, replan-accumulation, budget halts, the tool loop — all without a single API call.

Quick start

import { run, AnthropicProvider } from "oh-my-fable";

const result = await run(
  {
    description: "Research the top 3 Rust web frameworks and write a comparison table",
    successCriteria: ["a markdown table comparing 3 frameworks exists"],
    constraints: ["only use information you can verify"],
  },
  { provider: new AnthropicProvider() }, // reads ANTHROPIC_API_KEY
);

console.log(result.status); // "done" | "halted" | "failed"
console.log(result.ctx.plan.steps);

npm i oh-my-fable        # zero runtime dependencies

Node ≥ 18. Ships with AnthropicProvider and OpenAICompatProvider (works with OpenAI, Ollama, LM Studio, OpenRouter, Groq… — ollama("llama3.1") for a local model with no key), both over fetch, no SDK. Or bring any model by implementing the Provider interface (three methods).

AnthropicProvider works with the current flagship models (claude-opus-4-8, claude-fable-5) out of the box — it drops the temperature parameter they reject — and prompt-caches the system+tools prefix by default, so a long durable run pays ~10× less on the context it replays every step. Opt into { thinking: "adaptive", effort: "high" } for harder planning. The claude provider can return real --output-format json cost/usage and run Claude's own tools ({ tools: true, permissionMode: "acceptEdits" }).

Or use it from the terminal

Don't want to write code? It ships a CLI (zero extra deps):

npx oh-my-fable demo                       # watch crash → resume, no API key

# ⭐ already pay for Claude Code? drive it as a DURABLE, TOOL-USING agent — your
#    login, no separate API key, $0 per token. Claude edits files & runs commands:
npx oh-my-fable run "refactor utils.ts and run the tests" --provider claude --cli-tools

# pure-reasoning over the same login (no tools):
npx oh-my-fable run "outline a talk on durable agents" --provider claude

# or a LOCAL model (Ollama / LM Studio), also no key:
npx oh-my-fable run "outline a talk on durable agents" --provider ollama --model llama3.1

# or any hosted model:
export ANTHROPIC_API_KEY=sk-...
npx oh-my-fable run "summarize README.md into SUMMARY.md" --tools fs

npx oh-my-fable list                       # your saved runs
npx oh-my-fable show  run_abc123           # the run's plan, steps & budget as a timeline
npx oh-my-fable resume run_abc123          # continue one from its checkpoint

You don't need an Anthropic API key. Pick how it talks to a model:

| --provider | uses | key? | tools? | | --- | --- | --- | --- | | claude | your Claude Code login | none | --cli-tools → Claude runs Read/Write/Edit/Bash itself | | codex | your Codex CLI login | none | --cli-tools → workspace-write | | ollama | a local Ollama model | none | --tools fs (harness-run) | | --base-url <url> | LM Studio / OpenRouter / Groq / any OpenAI-compatible | per that server | --tools fs | | openai | OpenAI | OPENAI_API_KEY | --tools fs | | (default) | Anthropic | ANTHROPIC_API_KEY | --tools fs |

Two ways to give an agent hands:

--cli-tools (claude/codex) — the CLI runs its own tools (file edits, shell) on your subscription. oh-my-fable stays the durable planner/reflector around it: it plans, checkpoints every step, and reflects — Claude does the work. Tune with --permission-mode acceptEdits|dontAsk|plan and --allow "Read,Edit,Bash(npm test)".
--tools fs (API providers) — the harness gives the agent a sandboxed read_file/write_file/list_dir, confined to the working directory.

You watch the plan form and each step get reflected on, live. Every run is checkpointed, so resume <runId> always works — and show <runId> prints the whole run (plan, steps, budget) from its serialized RunContext.

Tools

import { run, defineTool, AnthropicProvider } from "oh-my-fable";

const search = defineTool(
  "web_search",
  "Search the web and return results.",
  { type: "object", properties: { query: { type: "string" } }, required: ["query"] },
  async ({ query }) => ({ ok: true, output: await fetchResults(query) }),
);

await run(goal, { provider: new AnthropicProvider(), tools: [search] });

A tool that throws becomes an Observation, not a crash — the reflector decides what to do about it.

Watch it work

await run(goal, {
  provider,
  onEvent: (e) => console.log(e.type, e),
  // plan_created · step_start · step_done · reflection · replan · compaction · checkpoint · done · halted
});

It can't run away

Three hard ceilings, checked at the top of every loop turn, plus two recovery caps — exceed any and it halts cleanly, preserving all work:

await run(goal, {
  provider,
  maxSteps: 50,            // total step budget
  maxTokens: 2_000_000,    // cumulative token budget
  maxWallClockMs: 1_800_000,
  maxStepAttempts: 3,      // a single step retried this many times → blocked
  maxReplans: 12,          // replan storm → halted
});

How it's built

A planner ↔ executor ↔ reflector loop over a serializable RunContext:

plan → [ budget? → next step → compact? → execute → reflect → checkpoint → route ] → done

planner — goal → ordered steps; replan accumulates instead of resetting.
executor — runs one step, including a provider-agnostic tool mini-loop.
reflector — heuristics first (cheap, certain), then the model, with JSON self-repair and a conservative fallback (a wrong early exit is worse than one more loop).
contextManager — folds old turns into digests so long runs stay inside the window; the plan is never compacted.
store / budget — checkpoint after every step; guard against runaways.

Every piece is an interface you can replace without touching the core. The full architecture writeup is in ARCHITECTURE.md.

Roadmap

A web dashboard that tails a run's events and lets you resume from any checkpoint (show <runId> is the CLI version of this today).
More providers in-repo (OpenAI-compatible, local) — though it's a 3-method interface.
Parallel step execution for independent branches of the plan DAG.
Human-in-the-loop: pause for approval as a first-class step status.

💖 Sponsor

Free, MIT, zero-dependency, built in spare time. If it saved your agent from starting over:

⭐ Star the repo — it's how the next person building an agent finds it.
🍋 Sponsor via Lemon Squeezy — one-time or recurring.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

oh-my-fable

Fable 5's way of working a long task — plan first, self-correct every step, never lose the thread — as a model-agnostic agent harness.

The three things it does that most frameworks don't

1. It survives crashes (resumable by construction)

2. It plans first, then self-corrects (plan ≠ history)

3. It's deterministically testable (genuinely rare for an agent framework)

Quick start

Or use it from the terminal

Tools

Watch it work

It can't run away

How it's built

Roadmap

💖 Sponsor

License