hyperion-delta
v0.1.7
Published
Zero-config local agent state management for dirty-set-scale rollback.
Maintainers
Readme
Hyperion Delta-Bench
Hyperion Delta-Bench proves a simple systems result for local AI agents: rollback should scale with the files the agent changed, not with the size of the whole repository.
In the final audit run, Git reset took 3,478.407 ms per rollback. Hyperion's targeted manifest restore took 0.971 ms. The tmpfs dirty-set path took 0.063 ms, a 54,851.92x speedup over Git.

Benchmark Result
The benchmark synthesizes a 50,000-file TypeScript workspace nested 10 directories deep, then measures rollback cycles with process.hrtime.bigint().
| Runner | Total I/O Block Time | Average Rollback Latency | Samples | Speedup vs Git | Reduction vs Git |
| --- | ---: | ---: | ---: | ---: | ---: |
| Legacy Runner (git reset --hard + git clean -fd) | 34,784.070 ms | 3,478.407 ms | 10 | 1.00x | 0.00% |
| Targeted Reversion (manifest file restore) | 9.715 ms | 0.971 ms | 10 | 3,580.50x | 99.97% |
| Targeted Reversion (rsync file-list/link-dest) | 504.942 ms | 50.494 ms | 10 | 68.89x | 98.55% |
| Targeted Reversion (tmpfs dirty-set restore, WSL2) | 0.634 ms | 0.063 ms | 10 | 54,851.92x | 100.00% |
Raw evidence:
Why This Matters For Agents
Local coding agents do not just edit once. They mutate files, run tests, fail, backtrack, and try another branch. If every failed attempt pays a multi-second Git reset or full-tree clone/delete penalty, search quality gets capped by filesystem latency instead of model reasoning.
Hyperion's result is not "copy-on-write always wins." The first full-tree CoW clone/delete design was slower than Git because it still churned through tens of thousands of directory entries and inodes. The winning strategy is targeted state reversion:
- Git reset scales with repository-wide filesystem inspection.
- Full tree clone/delete scales with repository-wide metadata churn.
- Hyperion manifest rollback scales with the dirty set.
- tmpfs dirty-set rollback shows the upper bound when rollback metadata and content stay in RAM.
For Prettiflow-style local MCTS or repair loops, that means an agent can test far more branches without leaving the developer's workspace dirty.
SDK Quickstart
The production SDK surface is exposed as hyperion-delta. Prettiflow-style agent loops can use the adapter wrapper with only the checkpoint lifecycle in their execution path:
import { HyperionAgentSession } from "hyperion-delta";
const session = new HyperionAgentSession(process.cwd());
try {
const attempt = await session.runAttempt(async ({ exec }) => {
await runAgentAttempt();
await exec("npm", ["test"]);
});
await session.promote(attempt.checkpointId);
} finally {
await session.dispose();
}HyperionAgentSession is a thin wrapper over HyperionWorkspace. It installs Node fs interception by default, exposes the selected strategy, stores the last reconcile result, and records rollback timing in milliseconds. runAttempt() creates a checkpoint, reconciles after explicit child-process execution, and rolls back automatically when the attempt throws. Child-process and native-tool writes are still protected by the mandatory reconcile call inside rollback().
Successful attempts are finalized with promote(checkpointId). Promotion accepts the current worktree state in place, marks the checkpoint as promoted, frees Hyperion-owned rollback storage, and leaves git add, git commit, merge, and push to the developer or surrounding agent workflow.
API Reference
The package exports two runtime entry points:
HyperionWorkspace: the core checkpoint, reconcile, rollback, VFS interception, and cleanup API.HyperionAgentSession: a Prettiflow-oriented wrapper that installs interception by default and records diagnostics.
Core methods:
track(path | paths): manually register paths for future integrations that cannot use interception.declareToolOutputs(contract): declare exact generated or ignored tool outputs so they can be tracked without broad ignored-root scans.getDiagnostics(): return a read-only snapshot of strategy, storage, hot-buffer, Windows volume, checkpoint, and ignored-write diagnostics.snapshot(options?): capture a checkpoint and return aCheckpointId, with optionalparentId,branchId, andsubagentIdlineage tags.fork(parentCheckpointId, options?): create a child checkpoint from an active parent and inherit lineage tags unless overridden.runInBranch(branchCheckpointId, callback): execute branch-scoped work and reconcile that branch before returning.promoteBranch(branchCheckpointId, options?): promote a branch checkpoint with deterministic merge planning; current conflict mode is reject-only.dropBranch(branchCheckpointId): drop a branch with rollback semantics guarded by overlap conflict checks.getCheckpointLineage(checkpointId): return oldest-to-newest checkpoint ancestry.listCheckpointChildren(parentId, options?): list direct children of a checkpoint.listBranchHeads(filter?): list latest checkpoint heads grouped bybranchId.listSubagentHeads(filter?): list latest checkpoint heads grouped bysubagentId.reconcile(checkpointId?): refresh dirty-set state after child-process or native-tool writes.rollback(checkpointId): reconcile, restore dirty paths, delete created paths, and clean ghost directories.recoverAttempts(): inspect durable checkpoint journals and whether they can be rehydrated.rehydrateAttempt(checkpointId): recreate safe in-memory checkpoint state from durable recovery metadata.exportPatch(checkpointId): emit a Git-compatible unified diff for an active checkpoint.promote(checkpointId, options?): finalize a successful attempt in place, optionally returning a patch, without running Git.dispose(): unregister hooks/interceptors and clean Hyperion-owned session state.
Agent-session helpers:
runAttempt(callback, options?): wrap one agent attempt with automatic snapshot/fork, reconciliation, rollback-on-throw, diagnostics, and fail-fast same-session reentrancy protection.exec(command, args, options?): run an explicit executable plus argument array without shell-string execution. InsiderunAttempt(), the contextexec()reconciles the active checkpoint after the process exits.runInBranch(branchCheckpointId, callback),promoteBranch(...), anddropBranch(...): convenience wrappers over workspace branch lifecycle APIs.
Public types and errors are exported from the package root, including HyperionConfig, ReconcileResult, StorageStrategyKind, HyperionError, HyperionCapacityError, HyperionIntegrityError, HyperionPathError, HyperionRollbackError, and HyperionBranchConflictError.
Small regular-file backups use a bounded in-memory Hot Dirty Buffer by default before spilling to the selected physical strategy. Tune it with useHotBuffer, hotBufferMaxFileBytes, hotBufferMaxTotalBytes, and hotBufferMaxFiles; the exported defaults are 256 KiB per file, 8 MiB total, and 1024 files.
Ignored dependency and generated-output roots are still excluded from broad scans, but VFS-captured writes into ignored paths can be made fail-fast with strictIgnoredWrites: true. Explicit track() calls may name exact ignored paths for future tool-adapter integrations without expanding broad reconciliation walks.
Tool integrations can declare exact generated outputs with declareToolOutputs(). Declared paths under ignored roots such as node_modules/**, .git/**, .hyperion/**, dist/**, or .next/** are allowed under strictIgnoredWrites, backed up by VFS interception, and explicitly statted during reconcile():
const checkpointId = await workspace.snapshot();
workspace.declareToolOutputs({
toolName: "vite",
checkpointId,
outputs: [
"node_modules/.cache/vite/deps_metadata.json",
{ path: "dist/manifest.json", optional: true },
],
});Contracts are exact-path only. They do not enable recursive scans of dependency or build-output folders.
Runtime diagnostics are available with getDiagnostics():
const diagnostics = session.getDiagnostics();
console.log(diagnostics.strategy);
console.log(diagnostics.checkpoints[0]?.storage?.hotBuffer);
console.log(diagnostics.windowsVolume?.fileSystemName);
console.log(diagnostics.ignoredWrites);Diagnostics are snapshots. Mutating the returned object does not mutate SDK state, and calling diagnostics does not run Git, shell commands, or filesystem scans.
On Windows, Hyperion detects NTFS, Dev Drive, and ReFS signals with fixed fsutil probes. Verified NTFS workspaces can use the ntfs-link tier: Hyperion creates a hard-link backup, then immediately materializes the workspace file so later writes cannot mutate the backup inode. Dev Drive is reported as an environment optimization, not a rollback strategy. ReFS block clone is reported as a future native-helper candidate and is not invoked by the zero-dependency SDK.
Durable attempt journals are enabled by default with durableAttemptJournals: true. Each checkpoint writes metadata to .hyperion/checkpoints/<checkpointId>/journal.json before the ID is returned. The journal records checkpoint metadata, strategy, Git HEAD, ignored patterns, baseline metadata, and dirty-entry summaries, but never file contents. Git still owns permanent history, merging, commits, and pushes.
Recovery rehydration is available with rehydrateAttempt(checkpointId) when Hyperion can prove the checkpoint is still restorable. Created-file-only attempts can rehydrate from journal metadata. Modified or deleted files require durable backup records in .hyperion/checkpoints/<checkpointId>/backups.json; volatile Hot Dirty Buffer memory-only backups intentionally block rehydration after restart.
Patch export is available with exportPatch(checkpointId). It reconciles first, then emits a text-only unified diff for created, modified, and deleted regular files. It does not run Git, commit, merge, push, dispose the checkpoint, or mutate the workspace.
Git promotion is available with promote(checkpointId). It reconciles first, optionally returns the same text patch with { exportPatch: true }, marks the checkpoint promoted, and cleans Hyperion-owned rollback storage. Promoted checkpoints are audit records only: they cannot be rolled back, exported again, or rehydrated. Git remains the authority for staging, commits, merges, remotes, signatures, and pushes.
See ARCHITECTURE.md for the full system design, failure model, and strategy router details. The limitations and mitigation roadmap live in LIMITATIONS.md. Release notes are in CHANGELOG.md, with release and security posture notes in RELEASE.md and SECURITY.md.
Release Checks
For local package readiness:
npm run release:checkThis runs typecheck, tests, build, npm pack --dry-run, and a temp-project install smoke. The install smoke packs the SDK into an OS temp directory, installs it into a temporary sample project, and imports both HyperionWorkspace and HyperionAgentSession from the installed package.
For final pre-publish confidence:
npm run release:finalThis runs the full release check, verifies the zero-runtime-dependency audit path with npm audit --omit=dev, and prints the final dry-run package contents.
For a copy-ready release runbook, use RELEASE_NEXT.md.
For reliability gates (failure injection, fuzz smoke, and stress smoke):
npm run test:reliability:ciFor targeted local reliability runs:
npm run test:reliability:fuzz
npm run test:reliability:stressFor a focused install smoke after an existing build:
npm run package:smokeThe published package is intentionally limited to dist, the README/architecture docs, the benchmark hero image used by the README, and required npm metadata. Benchmark commands are repository-checkout utilities and are not part of the SDK runtime surface.
Publishing uses GitHub Actions trusted publishing with npm provenance (OIDC). Before the first public publish, a maintainer must configure npm trusted publishing for hyperion-delta with repository ayush585/Hyperion-Delta, workflow .github/workflows/publish.yml, and environment npm-publish.
Manual dispatch is tag-only. Trigger Publish Package from main and provide tag as refs/tags/vX.Y.Z.
Troubleshooting
- Git unavailable: Hyperion falls back to stat-only manifests. Correctness remains, but large non-Git workspaces may start slower.
- tmpfs unavailable: Linux
/dev/shmacceleration is skipped and the SDK degrades to POSIX links or pure manifest restore. rsyncunavailable: POSIX-link-style benchmark rows may be skipped, and SDK behavior remains on the safest available strategy.- Windows or NTFS: verified NTFS volumes can use
ntfs-linkdirty-set backup acceleration. Dev Drive and ReFS signals appear in diagnostics; ReFS block clone is intentionally deferred until a native Windows API helper exists. Small VFS-backed edits are still accelerated by the Hot Dirty Buffer before spilling to disk. - Ignored paths:
node_modules/**,.git/**, and.hyperion/**are ignored by default so dependency and internal state folders are not tracked. - Strict ignored writes: set
strictIgnoredWrites: trueto throwHyperionIgnoredPathErrorbefore in-process VFS writes mutate ignored roots. - Tool output contracts: call
declareToolOutputs()before running package managers, build systems, formatters, or codegen tools that write exact ignored/generated files. Undeclared ignored writes still followstrictIgnoredWrites. - Diagnostics: call
getDiagnostics()to inspect selected strategy, actual storage tier, Hot Dirty Buffer hit/spill counters, Windows volume signals, active checkpoint storage, and recent ignored-write events. - Durable journal recovery: call
recoverAttempts()from a new workspace/session to inspect abandoned checkpoint metadata andcanRehydratestatus. - Rehydration failures:
rehydrateAttempt()rejects disposed attempts, corrupt journals, missing backup manifests, missing backup files, cross-workspace journals, and volatile memory-only backups. - Patch export:
exportPatch()supports text regular files and requires backup records for modified/deleted paths. Binary, symlink, and backup-missing exports fail loudly with integrity errors. - Promotion:
promote()finalizes the current worktree state and does not run Git. If{ exportPatch: true }fails because a dirty file is binary, a symlink, or missing backup content, the checkpoint remains active and rollback-capable. - Child-process modified/deleted files:
reconcile()detects them, androllback()always reconciles first. Restoring modified or deleted files still requires a pre-mutation backup from VFS interception or a future explicit tracking integration. - Missing backup record: rollback fails loudly with an integrity error instead of silently corrupting or partially restoring the workspace.
What It Measures
The current benchmark compares:
Legacy Runner: mutates a tracked file, creates an untracked scratch file, then runsgit reset --hard HEADandgit clean -fd.Targeted Reversion: tracks the modified files in a manifest, restores only those files from a read-only base, and deletes only manifest-listed scratch files.rsync Targeted Reversion: creates a linked working tree withrsync --link-dest, then restores only changed files with an rsync file list.tmpfs Targeted Reversion: keeps the dirty-set rollback cache in/dev/shmon Linux/WSL2 so the files the agent actually touched restore from RAM.
Lessons from the Metadata Bottleneck
Initial testing revealed that standard directory cloning strategies trigger inode metadata thrashing on 50k+ file systems, outperforming Git only on block-level I/O but failing on metadata throughput.
The first implementation used Linux reflinks with cp -a --reflink=always, then deleted and recloned the whole 50,000-file sandbox every turn. On the WSL2 XFS loopback test drive, it produced this result:
Legacy Runner total: 190,694.525 ms
Legacy average: 3,813.890 ms
Hyperion full clone total: 816,614.450 ms
Hyperion full clone avg: 16,332.289 msThat failure is useful. Reflinks avoid copying file blocks, but they do not eliminate directory traversal, inode allocation, unlink work, or metadata updates. A real local agent should not throw away an entire tree when it knows which files it touched.
Hyperion's practical optimization is therefore targeted state reversion: track the agent's dirty set and revert only those paths. The tmpfs mode demonstrates the upper bound for Prettiflow-style local search when dirty-set content and metadata operations live in RAM.
Running The Benchmark
For a fast local regression check:
npm run benchmark:smokeSmoke mode uses a small fixture and temporary work root. It validates the benchmark shape and strategy routing, not final performance evidence.
For the full benchmark defaults:
npm run benchmarkThe full run preserves the audit-scale defaults in benchmark.ts. For the cleanest filesystem signal, run inside a native Linux filesystem or the XFS loopback mount used during audit testing. The tmpfs row appears automatically when /dev/shm is available.
When launched from WSL under /mnt/c, the script automatically stages generated benchmark workspaces in native Linux /tmp and prints the selected work root. This keeps the requested Windows project path usable while avoiding DrvFS metadata emulation from dominating the benchmark.
The benchmark prints the selected work root, fixture size, iteration count, and runner strategy rows. If optional capabilities are unavailable, such as rsync or Linux /dev/shm, those rows are reported as skipped instead of failing the run.
The script also accepts environment overrides while preserving the audit defaults:
HYPERION_FILE_COUNT=1000 HYPERION_ITERATIONS=3 npm run benchmarkInterpreting Results
The target outcome is not "copy-on-write always wins." The meaningful result is:
- Git reset scales with repository-wide filesystem inspection.
- Full tree clone/delete scales with repository-wide metadata churn.
- Targeted rollback scales with the number of files the agent actually changed.
- tmpfs dirty-set rollback shows the best-case latency when the rollback cache avoids disk hardware entirely.
Benchmark Ideas To Run Next
The current final run is intentionally narrow: a 50,000-file fixture, one simulated agent edit cycle, and 10 measured rollback samples. The next useful benchmark work is to map the performance envelope:
- Dirty-set size sweep: 1, 10, 100, and 1,000 changed files.
- Repository size sweep: 10k, 50k, 100k, and 250k files.
- Platform matrix: WSL2, native Linux, macOS APFS, Windows NTFS, Windows Dev Drive, and ReFS.
- Tooling matrix:
tsc, formatters, generated snapshots, package-manager outputs,esbuild,oxc, and SWC. - Strategy matrix: Git reset, manifest restore, POSIX link storage, and tmpfs dirty-set storage.
- Cache matrix: cold-cache and warm-cache runs.
- Agent-search stress test: concurrent checkpoints and MCTS-style branch rollback.
Those runs should keep the same rule as this benchmark: measure rollback latency with process.hrtime.bigint(), print the work root, report skipped platform-specific strategies explicitly, and never hide metadata-heavy failures. The full-tree clone/delete miss is part of the engineering evidence.
