hyperion-delta

v0.1.7

Published

7 days ago

Zero-config local agent state management for dirty-set-scale rollback.

0High
0Medium
0Low

ayushman2010

agent rollback checkpoint filesystem vfs prettiflow hyperion

Hyperion Delta-Bench

Hyperion Delta-Bench proves a simple systems result for local AI agents: rollback should scale with the files the agent changed, not with the size of the whole repository.

In the final audit run, Git reset took 3,478.407 ms per rollback. Hyperion's targeted manifest restore took 0.971 ms. The tmpfs dirty-set path took 0.063 ms, a 54,851.92x speedup over Git.

Hyperion Delta-Bench benchmark dashboard

Benchmark Result

The benchmark synthesizes a 50,000-file TypeScript workspace nested 10 directories deep, then measures rollback cycles with process.hrtime.bigint().

| Runner | Total I/O Block Time | Average Rollback Latency | Samples | Speedup vs Git | Reduction vs Git | | --- | ---: | ---: | ---: | ---: | ---: | | Legacy Runner (git reset --hard + git clean -fd) | 34,784.070 ms | 3,478.407 ms | 10 | 1.00x | 0.00% | | Targeted Reversion (manifest file restore) | 9.715 ms | 0.971 ms | 10 | 3,580.50x | 99.97% | | Targeted Reversion (rsync file-list/link-dest) | 504.942 ms | 50.494 ms | 10 | 68.89x | 98.55% | | Targeted Reversion (tmpfs dirty-set restore, WSL2) | 0.634 ms | 0.063 ms | 10 | 54,851.92x | 100.00% |

Raw evidence:

Why This Matters For Agents

Local coding agents do not just edit once. They mutate files, run tests, fail, backtrack, and try another branch. If every failed attempt pays a multi-second Git reset or full-tree clone/delete penalty, search quality gets capped by filesystem latency instead of model reasoning.

Hyperion's result is not "copy-on-write always wins." The first full-tree CoW clone/delete design was slower than Git because it still churned through tens of thousands of directory entries and inodes. The winning strategy is targeted state reversion:

Git reset scales with repository-wide filesystem inspection.
Full tree clone/delete scales with repository-wide metadata churn.
Hyperion manifest rollback scales with the dirty set.
tmpfs dirty-set rollback shows the upper bound when rollback metadata and content stay in RAM.

For Prettiflow-style local MCTS or repair loops, that means an agent can test far more branches without leaving the developer's workspace dirty.

SDK Quickstart

The production SDK surface is exposed as hyperion-delta. Prettiflow-style agent loops can use the adapter wrapper with only the checkpoint lifecycle in their execution path:

import { HyperionAgentSession } from "hyperion-delta";

const session = new HyperionAgentSession(process.cwd());

try {
  const attempt = await session.runAttempt(async ({ exec }) => {
    await runAgentAttempt();
    await exec("npm", ["test"]);
  });
  await session.promote(attempt.checkpointId);
} finally {
  await session.dispose();
}

HyperionAgentSession is a thin wrapper over HyperionWorkspace. It installs Node fs interception by default, exposes the selected strategy, stores the last reconcile result, and records rollback timing in milliseconds. runAttempt() creates a checkpoint, reconciles after explicit child-process execution, and rolls back automatically when the attempt throws. Child-process and native-tool writes are still protected by the mandatory reconcile call inside rollback().

Successful attempts are finalized with promote(checkpointId). Promotion accepts the current worktree state in place, marks the checkpoint as promoted, frees Hyperion-owned rollback storage, and leaves git add, git commit, merge, and push to the developer or surrounding agent workflow.

API Reference

The package exports two runtime entry points:

HyperionWorkspace: the core checkpoint, reconcile, rollback, VFS interception, and cleanup API.
HyperionAgentSession: a Prettiflow-oriented wrapper that installs interception by default and records diagnostics.

Core methods:

track(path | paths): manually register paths for future integrations that cannot use interception.
declareToolOutputs(contract): declare exact generated or ignored tool outputs so they can be tracked without broad ignored-root scans.
getDiagnostics(): return a read-only snapshot of strategy, storage, hot-buffer, Windows volume, checkpoint, and ignored-write diagnostics.
snapshot(options?): capture a checkpoint and return a CheckpointId, with optional parentId, branchId, and subagentId lineage tags.
fork(parentCheckpointId, options?): create a child checkpoint from an active parent and inherit lineage tags unless overridden.
runInBranch(branchCheckpointId, callback): execute branch-scoped work and reconcile that branch before returning.
promoteBranch(branchCheckpointId, options?): promote a branch checkpoint with deterministic merge planning; current conflict mode is reject-only.
dropBranch(branchCheckpointId): drop a branch with rollback semantics guarded by overlap conflict checks.
getCheckpointLineage(checkpointId): return oldest-to-newest checkpoint ancestry.
listCheckpointChildren(parentId, options?): list direct children of a checkpoint.
listBranchHeads(filter?): list latest checkpoint heads grouped by branchId.
listSubagentHeads(filter?): list latest checkpoint heads grouped by subagentId.
reconcile(checkpointId?): refresh dirty-set state after child-process or native-tool writes.
rollback(checkpointId): reconcile, restore dirty paths, delete created paths, and clean ghost directories.
recoverAttempts(): inspect durable checkpoint journals and whether they can be rehydrated.
rehydrateAttempt(checkpointId): recreate safe in-memory checkpoint state from durable recovery metadata.
exportPatch(checkpointId): emit a Git-compatible unified diff for an active checkpoint.
promote(checkpointId, options?): finalize a successful attempt in place, optionally returning a patch, without running Git.
dispose(): unregister hooks/interceptors and clean Hyperion-owned session state.

Agent-session helpers:

runAttempt(callback, options?): wrap one agent attempt with automatic snapshot/fork, reconciliation, rollback-on-throw, diagnostics, and fail-fast same-session reentrancy protection.
exec(command, args, options?): run an explicit executable plus argument array without shell-string execution. Inside runAttempt(), the context exec() reconciles the active checkpoint after the process exits.
runInBranch(branchCheckpointId, callback), promoteBranch(...), and dropBranch(...): convenience wrappers over workspace branch lifecycle APIs.

Public types and errors are exported from the package root, including HyperionConfig, ReconcileResult, StorageStrategyKind, HyperionError, HyperionCapacityError, HyperionIntegrityError, HyperionPathError, HyperionRollbackError, and HyperionBranchConflictError.

Small regular-file backups use a bounded in-memory Hot Dirty Buffer by default before spilling to the selected physical strategy. Tune it with useHotBuffer, hotBufferMaxFileBytes, hotBufferMaxTotalBytes, and hotBufferMaxFiles; the exported defaults are 256 KiB per file, 8 MiB total, and 1024 files.

Ignored dependency and generated-output roots are still excluded from broad scans, but VFS-captured writes into ignored paths can be made fail-fast with strictIgnoredWrites: true. Explicit track() calls may name exact ignored paths for future tool-adapter integrations without expanding broad reconciliation walks.

Tool integrations can declare exact generated outputs with declareToolOutputs(). Declared paths under ignored roots such as node_modules/**, .git/**, .hyperion/**, dist/**, or .next/** are allowed under strictIgnoredWrites, backed up by VFS interception, and explicitly statted during reconcile():

const checkpointId = await workspace.snapshot();

workspace.declareToolOutputs({
  toolName: "vite",
  checkpointId,
  outputs: [
    "node_modules/.cache/vite/deps_metadata.json",
    { path: "dist/manifest.json", optional: true },
  ],
});

Contracts are exact-path only. They do not enable recursive scans of dependency or build-output folders.

Runtime diagnostics are available with getDiagnostics():

const diagnostics = session.getDiagnostics();

console.log(diagnostics.strategy);
console.log(diagnostics.checkpoints[0]?.storage?.hotBuffer);
console.log(diagnostics.windowsVolume?.fileSystemName);
console.log(diagnostics.ignoredWrites);

Diagnostics are snapshots. Mutating the returned object does not mutate SDK state, and calling diagnostics does not run Git, shell commands, or filesystem scans.

On Windows, Hyperion detects NTFS, Dev Drive, and ReFS signals with fixed fsutil probes. Verified NTFS workspaces can use the ntfs-link tier: Hyperion creates a hard-link backup, then immediately materializes the workspace file so later writes cannot mutate the backup inode. Dev Drive is reported as an environment optimization, not a rollback strategy. ReFS block clone is reported as a future native-helper candidate and is not invoked by the zero-dependency SDK.

Durable attempt journals are enabled by default with durableAttemptJournals: true. Each checkpoint writes metadata to .hyperion/checkpoints/<checkpointId>/journal.json before the ID is returned. The journal records checkpoint metadata, strategy, Git HEAD, ignored patterns, baseline metadata, and dirty-entry summaries, but never file contents. Git still owns permanent history, merging, commits, and pushes.

Recovery rehydration is available with rehydrateAttempt(checkpointId) when Hyperion can prove the checkpoint is still restorable. Created-file-only attempts can rehydrate from journal metadata. Modified or deleted files require durable backup records in .hyperion/checkpoints/<checkpointId>/backups.json; volatile Hot Dirty Buffer memory-only backups intentionally block rehydration after restart.

Patch export is available with exportPatch(checkpointId). It reconciles first, then emits a text-only unified diff for created, modified, and deleted regular files. It does not run Git, commit, merge, push, dispose the checkpoint, or mutate the workspace.

Git promotion is available with promote(checkpointId). It reconciles first, optionally returns the same text patch with { exportPatch: true }, marks the checkpoint promoted, and cleans Hyperion-owned rollback storage. Promoted checkpoints are audit records only: they cannot be rolled back, exported again, or rehydrated. Git remains the authority for staging, commits, merges, remotes, signatures, and pushes.

See ARCHITECTURE.md for the full system design, failure model, and strategy router details. The limitations and mitigation roadmap live in LIMITATIONS.md. Release notes are in CHANGELOG.md, with release and security posture notes in RELEASE.md and SECURITY.md.

Release Checks

For local package readiness:

npm run release:check

This runs typecheck, tests, build, npm pack --dry-run, and a temp-project install smoke. The install smoke packs the SDK into an OS temp directory, installs it into a temporary sample project, and imports both HyperionWorkspace and HyperionAgentSession from the installed package.

For final pre-publish confidence:

npm run release:final

This runs the full release check, verifies the zero-runtime-dependency audit path with npm audit --omit=dev, and prints the final dry-run package contents.

For a copy-ready release runbook, use RELEASE_NEXT.md.

For reliability gates (failure injection, fuzz smoke, and stress smoke):

npm run test:reliability:ci

For targeted local reliability runs:

npm run test:reliability:fuzz
npm run test:reliability:stress

For a focused install smoke after an existing build:

npm run package:smoke

The published package is intentionally limited to dist, the README/architecture docs, the benchmark hero image used by the README, and required npm metadata. Benchmark commands are repository-checkout utilities and are not part of the SDK runtime surface.

Publishing uses GitHub Actions trusted publishing with npm provenance (OIDC). Before the first public publish, a maintainer must configure npm trusted publishing for hyperion-delta with repository ayush585/Hyperion-Delta, workflow .github/workflows/publish.yml, and environment npm-publish.

Manual dispatch is tag-only. Trigger Publish Package from main and provide tag as refs/tags/vX.Y.Z.

Troubleshooting

Git unavailable: Hyperion falls back to stat-only manifests. Correctness remains, but large non-Git workspaces may start slower.
tmpfs unavailable: Linux /dev/shm acceleration is skipped and the SDK degrades to POSIX links or pure manifest restore.
rsync unavailable: POSIX-link-style benchmark rows may be skipped, and SDK behavior remains on the safest available strategy.
Windows or NTFS: verified NTFS volumes can use ntfs-link dirty-set backup acceleration. Dev Drive and ReFS signals appear in diagnostics; ReFS block clone is intentionally deferred until a native Windows API helper exists. Small VFS-backed edits are still accelerated by the Hot Dirty Buffer before spilling to disk.
Ignored paths: node_modules/**, .git/**, and .hyperion/** are ignored by default so dependency and internal state folders are not tracked.
Strict ignored writes: set strictIgnoredWrites: true to throw HyperionIgnoredPathError before in-process VFS writes mutate ignored roots.
Tool output contracts: call declareToolOutputs() before running package managers, build systems, formatters, or codegen tools that write exact ignored/generated files. Undeclared ignored writes still follow strictIgnoredWrites.
Diagnostics: call getDiagnostics() to inspect selected strategy, actual storage tier, Hot Dirty Buffer hit/spill counters, Windows volume signals, active checkpoint storage, and recent ignored-write events.
Durable journal recovery: call recoverAttempts() from a new workspace/session to inspect abandoned checkpoint metadata and canRehydrate status.
Rehydration failures: rehydrateAttempt() rejects disposed attempts, corrupt journals, missing backup manifests, missing backup files, cross-workspace journals, and volatile memory-only backups.
Patch export: exportPatch() supports text regular files and requires backup records for modified/deleted paths. Binary, symlink, and backup-missing exports fail loudly with integrity errors.
Promotion: promote() finalizes the current worktree state and does not run Git. If { exportPatch: true } fails because a dirty file is binary, a symlink, or missing backup content, the checkpoint remains active and rollback-capable.
Child-process modified/deleted files: reconcile() detects them, and rollback() always reconciles first. Restoring modified or deleted files still requires a pre-mutation backup from VFS interception or a future explicit tracking integration.
Missing backup record: rollback fails loudly with an integrity error instead of silently corrupting or partially restoring the workspace.

What It Measures

The current benchmark compares:

Legacy Runner: mutates a tracked file, creates an untracked scratch file, then runs git reset --hard HEAD and git clean -fd.
Targeted Reversion: tracks the modified files in a manifest, restores only those files from a read-only base, and deletes only manifest-listed scratch files.
rsync Targeted Reversion: creates a linked working tree with rsync --link-dest, then restores only changed files with an rsync file list.
tmpfs Targeted Reversion: keeps the dirty-set rollback cache in /dev/shm on Linux/WSL2 so the files the agent actually touched restore from RAM.

Lessons from the Metadata Bottleneck

Initial testing revealed that standard directory cloning strategies trigger inode metadata thrashing on 50k+ file systems, outperforming Git only on block-level I/O but failing on metadata throughput.

The first implementation used Linux reflinks with cp -a --reflink=always, then deleted and recloned the whole 50,000-file sandbox every turn. On the WSL2 XFS loopback test drive, it produced this result:

Legacy Runner total:   190,694.525 ms
Legacy average:         3,813.890 ms

Hyperion full clone total: 816,614.450 ms
Hyperion full clone avg:    16,332.289 ms

That failure is useful. Reflinks avoid copying file blocks, but they do not eliminate directory traversal, inode allocation, unlink work, or metadata updates. A real local agent should not throw away an entire tree when it knows which files it touched.

Hyperion's practical optimization is therefore targeted state reversion: track the agent's dirty set and revert only those paths. The tmpfs mode demonstrates the upper bound for Prettiflow-style local search when dirty-set content and metadata operations live in RAM.

Running The Benchmark

For a fast local regression check:

npm run benchmark:smoke

Smoke mode uses a small fixture and temporary work root. It validates the benchmark shape and strategy routing, not final performance evidence.

For the full benchmark defaults:

npm run benchmark

The full run preserves the audit-scale defaults in benchmark.ts. For the cleanest filesystem signal, run inside a native Linux filesystem or the XFS loopback mount used during audit testing. The tmpfs row appears automatically when /dev/shm is available.

When launched from WSL under /mnt/c, the script automatically stages generated benchmark workspaces in native Linux /tmp and prints the selected work root. This keeps the requested Windows project path usable while avoiding DrvFS metadata emulation from dominating the benchmark.

The benchmark prints the selected work root, fixture size, iteration count, and runner strategy rows. If optional capabilities are unavailable, such as rsync or Linux /dev/shm, those rows are reported as skipped instead of failing the run.

The script also accepts environment overrides while preserving the audit defaults:

HYPERION_FILE_COUNT=1000 HYPERION_ITERATIONS=3 npm run benchmark

Interpreting Results

The target outcome is not "copy-on-write always wins." The meaningful result is:

Git reset scales with repository-wide filesystem inspection.
Full tree clone/delete scales with repository-wide metadata churn.
Targeted rollback scales with the number of files the agent actually changed.
tmpfs dirty-set rollback shows the best-case latency when the rollback cache avoids disk hardware entirely.

Benchmark Ideas To Run Next

The current final run is intentionally narrow: a 50,000-file fixture, one simulated agent edit cycle, and 10 measured rollback samples. The next useful benchmark work is to map the performance envelope:

Dirty-set size sweep: 1, 10, 100, and 1,000 changed files.
Repository size sweep: 10k, 50k, 100k, and 250k files.
Platform matrix: WSL2, native Linux, macOS APFS, Windows NTFS, Windows Dev Drive, and ReFS.
Tooling matrix: tsc, formatters, generated snapshots, package-manager outputs, esbuild, oxc, and SWC.
Strategy matrix: Git reset, manifest restore, POSIX link storage, and tmpfs dirty-set storage.
Cache matrix: cold-cache and warm-cache runs.
Agent-search stress test: concurrent checkpoints and MCTS-style branch rollback.

Those runs should keep the same rule as this benchmark: measure rollback latency with process.hrtime.bigint(), print the work root, report skipped platform-specific strategies explicitly, and never hide metadata-heavy failures. The full-tree clone/delete miss is part of the engineering evidence.