smart-agent-ai

v2.5.1

Published

2 months ago

Autonomous AI agent framework for Bun — skills, objectives, parallel tools, safe mode, LLM streaming

0High
0Medium
0Low

mements

agent ai llm autonomous objectives skills gemini claude bun

smart-agent

Autonomous agentic loop with Skills + Objectives for Bun.

            ┌──────────────────────────────────────┐
            │         Objectives (WHAT)            │
            │  Blackbox validate() functions that   │
            │  define success criteria             │
            └────────────┬─────────────────────────┘
                         │
  prompt ──→  LLM ──→ exec tool ──→ check objectives ──→ loop
                ↑                        │
                │                        ↓
            ┌───┴──────────┐     ┌──────────────┐
            │ Skills (CTX) │     │ Met? → done  │
            │ YAML files   │     │ Not? → retry │
            │ teach CLIs   │     └──────────────┘
            └──────────────┘

Objectives = WHAT to achieve — blackbox validate() functions that return { met, reason }
Skills = CONTEXT — .md files with YAML frontmatter the LLM reads to learn available CLIs (git, bun, docker, your project scripts)
Tools = HOW to interact — built-in exec, read_file, write_file, edit_file, search, list_dir
agent.run(prompt) = the trigger that kicks off the loop

The agent doesn't "know" git or bun — skills teach it. Validation errors are passed back to the LLM so it knows how to adjust.

Install

bun add smart-agent

Quick Start

import { Agent } from "smart-agent"

const agent = new Agent({
  model: "gemini-2.5-flash",
  // Skills teach the agent what CLIs are available
  skills: ["./skills/bun.md", "./skills/git.md"],
  // Objectives define success — blackbox validation
  objectives: [{
    name: "tests_pass",
    description: "All unit tests pass",
    validate: (state) => {
      const last = state.toolHistory.findLast(t => t.tool === "exec" && t.params.command?.includes("bun test"))
      if (!last) return { met: false, reason: "Run 'bun test' first" }
      return { met: last.result.success, reason: last.result.success ? "Tests pass" : "Tests fail" }
    }
  }],
})

// agent.run() is the trigger — skills give it context on how to proceed
for await (const event of agent.run("Fix the failing tests")) {
  console.log(event.type, event)
}

Multi-turn Sessions

For chatbot-style interactions, use Session. It maintains conversation history and re-plans objectives each turn:

import { Session } from "smart-agent"

const session = new Session({ model: "gemini-2.5-flash" })

for await (const event of session.send("create a hello world project")) {
  if (event.type === "awaiting_confirmation") {
    // Objectives are paused — review before proceeding
    console.log("Objectives:", event.objectives)
    session.confirmObjectives()  // or session.rejectObjectives()
  }
  if (event.type === "complete") {
    console.log("Done!")
  }
}

// Follow-up — planner adjusts objectives based on context
for await (const event of session.send("now add unit tests")) {
  session.confirmObjectives()
}

By default, sessions require confirmation before executing (requireConfirmation: true). This gives the user a chance to review and approve generated objectives. Disable with { requireConfirmation: false }.

Chatbot Mode — `Agent.plan()`

For one-shot planning without sessions:

import { Agent } from "smart-agent"

for await (const event of Agent.plan(
  "Create a greeting.txt with 'Hello World'",
  { model: "gemini-2.5-flash" }
)) {
  if (event.type === "planning") {
    console.log("Generated objectives:", event.objectives)
  }
  if (event.type === "complete") {
    console.log("Done!")
  }
}

The planner analyzes the prompt and creates verifiable objectives using templates (file_exists, file_contains, command_succeeds, command_output_contains), then a worker agent executes them.

Conversation History

Pass a message array instead of a string to provide conversation context:

for await (const event of agent.run([
  { role: "user", content: "fix the auth tests" },
  { role: "assistant", content: "I'll look at the test files..." },
  { role: "user", content: "focus on login.test.ts" },
])) {
  // agent has full conversation context
}

How It Works

prompt → LLM → XML response → execute tools → check objectives → loop

Your prompt + system prompt (tools + skills + objectives) go to the LLM
LLM responds in XML with tool invocations
Agent executes tools and feeds results back
Objectives are checked — if all validate() return met: true, the loop ends
Otherwise, loop continues until all objectives pass or maxIterations is reached

API

`new Agent(config)`

interface AgentConfig {
  model: string                    // LLM model name
  objectives?: Objective[]         // Goals to achieve (required for run(), optional for plan())
  skills?: (string | Skill)[]     // .md file paths or inline Skill objects
  maxIterations?: number           // Default: 20
  temperature?: number             // Default: 0.3
  maxTokens?: number               // Default: 8000
  cwd?: string                     // Working directory (default: process.cwd())
  toolTimeoutMs?: number           // Default: 30000
  systemPrompt?: string            // Extra system prompt text
  tools?: Tool[]                   // Additional custom tools
  signal?: AbortSignal             // Cancel the agent loop
  noStreaming?: boolean            // Disable streaming (returns token usage data)
  safeMode?: boolean               // Block autonomous exec unless approved via onApproval
  onApproval?: (tool, params) => Promise<boolean> | boolean  // Interactive approval callback
  onToolOutput?: (tool, chunk) => void  // Real-time exec output streaming
}

Safe Mode

When safeMode: true, the agent cannot run shell commands without approval:

const agent = new Agent({
  model: "gemini-2.5-flash",
  safeMode: true,
  // Interactive approval — ask the user before running commands
  onApproval: async (tool, params) => {
    const ok = confirm(`Run "${params.command}"?`)
    return ok
  },
})

Without onApproval, safe mode blocks all exec calls with an error message.

Streaming Exec Output

Get real-time command output chunks instead of waiting for completion:

const agent = new Agent({
  model: "gemini-2.5-flash",
  onToolOutput: (tool, chunk) => process.stdout.write(chunk),
})

`agent.run(input): AsyncGenerator<AgentEvent>`

Run with predefined objectives. Accepts string or Message[].

`Agent.plan(input, config): AsyncGenerator<AgentEvent>`

Dynamic mode — planner generates objectives from the prompt, then worker executes.

Events

| Event | When | |-------|------| | planning | Planner generated objectives | | awaiting_confirmation | Waiting for user to confirm objectives (Session only) | | iteration_start | Loop iteration begins | | thinking / thinking_delta | LLM explains what it's doing (delta = streaming chunks) | | tool_start / tool_result | Tool execution | | tool_output_delta | Real-time exec output chunk (when onToolOutput is set) | | approval_required | Safe mode asked user for approval | | objective_check | Objectives validated | | usage | Token usage data (when noStreaming is set) | | complete | All objectives met | | error | Something failed (agent recovers) | | cancelled | Aborted via signal | | max_iterations | Gave up |

Built-in Tools

| Tool | Description | |------|-------------| | read_file | Read file contents | | write_file | Create/overwrite a file | | edit_file | Find-and-replace in a file | | exec | Run shell commands (streams output via onToolOutput) | | list_dir | List directory contents (recursive) | | search | Search for text patterns across files |

Parallel execution: Read-only tools (read_file, list_dir, search) run concurrently via Promise.all. Write tools (write_file, edit_file, exec) run sequentially to preserve ordering.

Custom Tools

Add your own tools via the tools config:

const agent = new Agent({
  model: "gemini-2.5-flash",
  tools: [{
    name: "deploy",
    description: "Deploy the app to production",
    parameters: {
      env: { type: "string", description: "Target environment", required: true },
    },
    execute: async (params) => {
      // your deployment logic
      return { success: true, output: `Deployed to ${params.env}` }
    },
  }],
  objectives: [/* ... */],
})

Skills

Skills are .md files with YAML frontmatter describing CLI tools. They're injected into the system prompt so the LLM knows how to use them via exec.

---
name: git
description: Git version control — staging, committing, branching, and history
---

# Git

## Commands

### commit
Create a commit with a message.
```bash
git commit -m "{message}"

message: Commit message

const agent = new Agent({
  model: "gemini-3-flash-preview",
  skills: ["./skills/git.md", "./skills/docker.md"],
  objectives: [/* ... */],
})

Built-in skills included: git, docker, bun, npm.

Objectives

Each objective has a validate(state) function that checks if the goal is met:

{
  name: "tests_pass",
  description: "All unit tests pass",
  validate: (state) => {
    const lastExec = state.toolHistory.findLast(t => t.tool === "exec")
    return {
      met: lastExec?.result.success === true,
      reason: lastExec ? "Tests passed" : "No tests run yet"
    }
  }
}

The state object contains:

messages — Full conversation history
toolHistory — All tool calls and results
touchedFiles — Set of files modified
iteration — Current iteration number

Objective Templates

When using Agent.plan(), the planner generates objectives using these templates:

| Template | Params | Checks | |----------|--------|--------| | file_exists | path, contains? | File exists (optionally with content) | | file_contains | path, text | File contains specific text | | command_succeeds | command | Command exits with code 0 | | command_output_contains | command, text | Command output contains text | | custom_check | check | Generic fallback |

LLM Support

All LLM communication is handled by jsx-ai, which provides provider routing, streaming, and retry logic.

| Provider | Models | Env Var | |----------|--------|---------| | Google | gemini-* | GEMINI_API_KEY or GOOGLE_API_KEY | | Anthropic | claude-* | ANTHROPIC_API_KEY | | DeepSeek | deepseek-* | DEEPSEEK_API_KEY | | OpenAI | gpt-*, o3-*, o4-* | OPENAI_API_KEY | | Any | Other models | OPENAI_API_KEY + OPENAI_BASE_URL |

Unknown models fall back to OpenAI-compatible /chat/completions API using OPENAI_BASE_URL.

Examples

Run any example with bun run examples/<name>.ts:

| Example | What it does | |---------|--------------| | skill-driven | ⭐ The canonical pattern — skills provide CLI context, agent fixes lint/format/type errors | | code-review | Finds and fixes bugs in a deliberately broken file | | refactor | Splits a monolithic file into clean modules | | api-gen | Generates a REST API from a spec, writes tests, verifies them | | session | Multi-turn Session with objective confirmation/rejection | | custom-tools | Extends the agent with http_get and json_transform tools | | scaffold | Multi-objective — creates project, writes tests, makes them pass | | planner | Agent.plan() — generates objectives from natural language | | hello | Creates a file — simplest possible agent |

export GEMINI_API_KEY=your-key
bun run examples/code-review.ts

Architecture

Memory System

Agents can persist structured data via the Memory system — a key-value store backed by SQLite. Keys support dot-notation prefixes for grouping (e.g. users.john, config.theme).

// In agent-generated scripts:
const STATE_URL = process.env.STATE_URL // injected by scheduler

// Store data
await fetch(STATE_URL, {
  method: 'POST',
  body: JSON.stringify({ agentId: 1, key: 'users.john', value: JSON.stringify({ name: 'John', age: 30 }) })
})

// Retrieve data
const res = await fetch(`${STATE_URL}?agentId=1&key=users.john`)
const { value } = await res.json()

The Memory tab in the UI groups entries by prefix, renders JSON values with smart previews, and supports inline deletion.

Plugin System

Plugins extend smart-agent with external integrations. Each plugin:

Installs as an npm package — bun add geeksy-telegram-plugin
Registers skills — .md files describing new capabilities
Requires configuration — API keys, auth tokens, etc.
Runs as a separate process — managed by BGR in its own process group
Can expose its own UI — plugins may serve their own web interface

┌─────────────────────────────────────────────────┐
│  geeksy (main)        :3737                     │
│  ├── Agent workspace  /                         │
│  ├── Memory API       /api/agent-state          │
│  └── Plugin registry  /api/plugins              │
├─────────────────────────────────────────────────┤
│  telegram-plugin       :3738  (bgrun group)     │
│  ├── MTProto auth      /auth                    │
│  ├── Message polling   → writes to shared SQLite│
│  └── Plugin UI         /                        │
├─────────────────────────────────────────────────┤
│  future-plugin         :3739  (bgrun group)     │
│  └── ...                                        │
└─────────────────────────────────────────────────┘

First plugin: Telegram — User authenticates their Telegram account (MTProto via GramJS), the plugin polls chosen channels and writes messages into the shared SQLite database. Agents access this data through the Memory tab and via skills that enable codegen for sending messages during prompt execution or scheduled scripts.

Why Not just-bash / AgentFS?

These are compelling projects that we evaluated for smart-agent:

just-bash — A JavaScript bash interpreter that simulates shell commands as pure functions without OS access. It would allow sandboxed command execution without Docker.
AgentFS — A virtual filesystem backed by SQLite, giving agents file operations with built-in time-travel (rollback to any point).

Why we chose not to adopt them:

Our agents need real OS access. Smart-agent's core value is executing real commands (bun test, git commit, docker build) on real files. Simulated bash and virtual filesystems would break this — agents couldn't install packages, run test suites, or interact with actual project code.
Isolation is solved differently. Instead of virtualizing the OS layer, we isolate at the process level (BGR groups, separate working directories) and the permission level (per-agent skill scoping). This gives us sandboxing without sacrificing real execution.
We already have time-travel. Git versioning + SQLite-backed state persistence provides rollback. The Memory system stores all agent state in SQLite with timestamps, giving us the audit trail AgentFS provides.
Edge/serverless is not our deployment target. Smart-agent is designed for local-first execution on developer machines, not Cloudflare Workers. Our agents need access to the local filesystem, running processes, and system tools.

When they WOULD make sense:

Multi-tenant SaaS where each user needs isolated execution
Untrusted code execution (like a coding playground)
Serverless deployment where OS access is impossible

If you're building on top of smart-agent for these use cases, integrating just-bash + AgentFS through a custom tool is straightforward — see the Custom Tools section.

Contributing

Before proposing architectural changes, please review the Why Not just-bash / AgentFS? section above. We've intentionally chosen real OS execution over virtualization. PRs adding sandboxed execution layers should include a compelling use case that can't be solved with the existing process-level isolation.

License

MIT