@longrun/sandstorm

v0.1.15

Published

a day ago

Fly.io Machines sandbox management library

Downloads

636

0High
0Medium
0Low

fly.io machines sandbox containers infrastructure

Sandstorm

Fly.io Machines sandbox management library. Provides a high-level API for creating, pausing, resuming, and deleting isolated sandboxes backed by Fly.io Machines, with built-in support for coding agents and persona configuration.

Installation

npm install @longrun/sandstorm

Quick Start

import { SandboxManager } from '@longrun/sandstorm';

const manager = new SandboxManager({
  apiToken: process.env.FLY_API_TOKEN!,
  orgSlug: 'my-org',
  instanceImage: 'flyio/claude-code:latest',
  instancePrefix: 'myapp-',
});

// Create a sandbox
const sandbox = await manager.createSandbox({
  name: 'project-alice',
  env: { GITHUB_TOKEN: '...' },
});
console.log(sandbox.url); // https://myapp-project-alice.fly.dev

// Pause (snapshot + delete machine, keep volume)
const { snapshotId } = await manager.pauseSandbox(sandbox.appName);

// Resume (restore from snapshot)
const resumed = await manager.resumeSandbox(sandbox.appName, snapshotId);

// Delete (permanent)
await manager.deleteSandbox(sandbox.appName);

Architecture

Sandstorm has two API layers:

SandboxManager (High-Level)

The primary user-facing interface. Orchestrates complex operations into single calls.

| Method | Description | |--------|-------------| | createSandbox(params) | Creates app + volume + machine in one call | | deleteSandbox(id) | Deletes machine + volume + app | | pauseSandbox(id) | Snapshots volume, deletes machine (reversible) | | resumeSandbox(id, snapshotId?, options?) | Restores from snapshot, recreates machine | | updateSandboxEnv(id, env) | Updates machine environment variables | | getSandbox(id) | Gets current sandbox status | | syncSandbox(appName, agent, persona, vars) | Re-deploys persona config to a running sandbox | | connectInteractive(appName, agent, workingDir?, options?) | Starts an interactive terminal session via SSH | | execCommand(appName, command) | Executes a command inside a sandbox | | writeFile(appName, filePath, content) | Writes a file inside a sandbox | | startAgentSession(appName, agent, id, prompt, env?) | Starts a non-interactive agent session in background | | isAgentRunning(appName, agent) | Checks if an agent process is running | | getSessionLogs(appName, agent, id, options?) | Gets session log content | | watchLogs(appName, agent, id, onLine, options?) | Watches logs with polling-based tail | | listSessions(appName, agent, id) | Lists all session log files | | findIdleSandboxes(minutes) | Finds sandboxes with low network activity | | runLifecycleCheck(config) | Runs lifecycle management with hooks |

Low-Level Components

Available for advanced usage via manager.client, manager.metricsClient, and manager.lifecycleManager.

FlyMachines - Raw Machines API wrapper (apps, machines, volumes, snapshots)
FlyMetrics - Prometheus metrics for activity detection
LifecycleManager - State verification and reconciliation helpers

Agent System

Built-in support for coding agents that run inside sandboxes.

ClaudeCodeAgent - Claude Code agent implementation
claudeCode - Singleton instance of ClaudeCodeAgent
getCodingAgent(id) - Get an agent by ID
getDefaultCodingAgent() - Get the default agent (Claude Code)

Persona Utilities

loadPersonaDir(dirPath) - Load all files from a persona directory into a Map<string, string>
resolvePersona(persona) - Resolve a persona source (directory path or in-memory Map)

API Reference

SandboxManager

new SandboxManager(config: SandstormConfig)

Config

interface SandstormConfig {
  apiToken: string;       // Fly API token
  orgSlug: string;        // Fly organization slug
  instanceImage: string;  // Docker image for machines
  instancePrefix: string; // Prefix for app names (e.g., 'myapp-')
}

createSandbox

await manager.createSandbox({
  name: string;            // Used in app naming: {prefix}{name}
  region?: string;         // Default: 'ord'
  env: Record<string, string>;
  volumeSizeGb?: number;   // Default: 1
  memoryMb?: number;       // Default: 2048
  cpus?: number;           // Default: 2
  swapSizeMb?: number;     // Default: 1024
  agent?: CodingAgent;     // Agent to install and configure
  persona?: string | Map<string, string>;  // Persona dir path or in-memory files
  personaVars?: Record<string, string>;    // Template variable substitution: {{key}} -> value
  repo?: {                 // Git repo to clone after machine starts
    url: string;
    branch?: string;       // Default: 'main'
    token?: string;        // For private repos (injected into clone URL)
    path?: string;         // Clone target path (default: '/data/project')
  };
}): Promise<Sandbox>

Creates a Fly app, persistent volume, machine, optionally clones a repo, and configures an agent with a persona. Returns a Sandbox with the machine ID, app name, volume ID, region, status, public URL, and creation timestamp.

On failure, the app is automatically cleaned up.

pauseSandbox / resumeSandbox

// Pause: snapshot volume, delete machine
const { snapshotId } = await manager.pauseSandbox(appName);

// Resume: restore volume from snapshot, create new machine
const sandbox = await manager.resumeSandbox(appName, snapshotId, {
  env: { ... },       // Optional: override env vars
  memoryMb: 4096,     // Optional: override resources
  cpus: 4,
  swapSizeMb: 2048,
});

Pausing is reversible - the app and volume remain, only the machine is deleted. Cost while paused is just volume storage ($0.15/GB/month).

If snapshotId is omitted from resumeSandbox, the latest snapshot is used.

updateSandboxEnv

await manager.updateSandboxEnv(appName, {
  NEW_KEY: 'new-value',
  EXISTING_KEY: 'updated-value',
});

Merges new env vars with existing ones. Changes take effect on next machine restart.

syncSandbox

await manager.syncSandbox(appName, agent, './persona-dir', {
  PROJECT_NAME: 'my-project',
});

Re-deploys persona configuration to a running sandbox. Re-reads persona files from disk, re-substitutes template variables, and re-deploys to the machine. Useful after updating persona files locally.

connectInteractive

await manager.connectInteractive(appName, agent, '/data/project', {
  envFile: '/data/.env',  // Optional: source env file before running agent
});

Starts an interactive terminal session via fly ssh console. Takes over the terminal with full PTY support. Blocks until the session ends.

execCommand / writeFile

const result = await manager.execCommand(appName, 'ls /data/project');
// result: { stdout, stderr, exitCode }

await manager.writeFile(appName, '/data/config.json', '{"key": "value"}');
// Creates parent directories automatically, uses base64 encoding for safe transport

Agent Sessions

// Start a non-interactive agent session in background
const session = await manager.startAgentSession(
  appName, agent, 'task-42', 'Fix the login bug', { EXTRA_VAR: '...' }
);
// session: { sessionId, logPath, latestLogPath }

// Check if agent is still running
const running = await manager.isAgentRunning(appName, agent);

// Get logs (specific session or latest)
const lines = await manager.getSessionLogs(appName, agent, 'task-42', {
  sessionId: session.sessionId,  // Optional: omit for latest
  lines: 100,                    // Optional: omit for all lines
});

// Watch logs in real-time
const watcher = await manager.watchLogs(appName, agent, 'task-42', (line) => {
  console.log(line);
}, { lines: 50, intervalMs: 500 });
// Later: watcher.stop()

// List all sessions for an identifier
const sessions = await manager.listSessions(appName, agent, 'task-42');
// sessions: [{ sessionId, logPath, startedAt }]

findIdleSandboxes

const idle = await manager.findIdleSandboxes(15); // 15-minute window
// Returns: IdleSandbox[] with sandboxId, appName, machineId, volumeId, lastActivityAt, idleMinutes

Queries Fly Prometheus metrics for network activity. Returns sandboxes with less than 10KB total traffic in the measurement window.

Does NOT pause anything - consumer decides what to do.

runLifecycleCheck

const result = await manager.runLifecycleCheck({
  // Timeline configuration
  warnAfterMinutes: 60,      // Optional: warn phase
  pauseAfterMinutes: 120,    // Required: pause phase
  deleteAfterDays: 30,       // Optional: delete phase

  // Sandboxes to check (from your database)
  sandboxes: [
    {
      sandboxId: 'mach_123',
      appName: 'myapp-project-alice',
      machineId: 'mach_123',
      volumeId: 'vol_456',
      lastActivityAt: new Date('2024-01-15'),
      lifecycleStatus: 'active',
    },
  ],

  // Hooks - you decide what happens
  onWarn: async (sandbox) => {
    await db.update({ status: 'warned' });
    await sendEmail(sandbox.appName, 'Going idle soon');
  },
  onPause: async (sandbox, snapshotId) => {
    await db.update({ status: 'paused', snapshotId });
  },
  onDelete: async (sandboxId, appName) => {
    await db.delete(sandboxId);
  },
});

console.log(result); // { warned: 0, paused: 1, deleted: 0, errors: [] }

Lifecycle States

active ──[warnAfterMinutes]──> warned ──[pauseAfterMinutes]──> paused ──[deleteAfterDays]──> deleted
         (optional)                      (required)                      (optional)

active - Machine running, recently used
warned - Idle for warning threshold (optional phase)
paused - Machine deleted, volume snapshotted (low cost, reversible)
deleted - Permanently removed

All phases except pause are optional. Configure only what you need:

// Minimal: just pause after 2 hours, no warning, no delete
{ pauseAfterMinutes: 120 }

// Full lifecycle: warn, pause, delete
{ warnAfterMinutes: 1440, pauseAfterMinutes: 4320, deleteAfterDays: 30 }

Activity Detection

Sandstorm detects idle sandboxes via Fly's Prometheus metrics endpoint:

Queries fly_instance_net_recv_bytes and fly_instance_net_sent_bytes
Uses increase() over a configurable time window
Default threshold: 10KB total traffic = active
Uses regex patterns for efficient batch querying

This approach works for non-HTTP workloads (like Claude Code agents) where Fly's built-in auto_stop_machines cannot detect activity.

Environment Variables

| Variable | Required | Description | |----------|----------|-------------| | FLY_API_TOKEN | Yes | Fly.io API token |

The orgSlug, instanceImage, and instancePrefix are passed via constructor config.

Development

npm install
npm run build       # Compile TypeScript
npm run test        # Run tests (vitest)
npm run test:watch  # Run tests in watch mode
npm run lint        # Lint source
npm run format      # Format code
npm run typecheck   # Type check

License

MIT