smart-agent-ai
v2.4.0
Published
Autonomous AI agent framework for Bun — skills, objectives, parallel tools, safe mode, LLM streaming
Downloads
139
Maintainers
Readme
smart-agent
Autonomous agentic loop with Skills + Objectives for Bun.
┌──────────────────────────────────────┐
│ Objectives (WHAT) │
│ Blackbox validate() functions that │
│ define success criteria │
└────────────┬─────────────────────────┘
│
prompt ──→ LLM ──→ exec tool ──→ check objectives ──→ loop
↑ │
│ ↓
┌───┴──────────┐ ┌──────────────┐
│ Skills (CTX) │ │ Met? → done │
│ YAML files │ │ Not? → retry │
│ teach CLIs │ └──────────────┘
└──────────────┘- Objectives = WHAT to achieve — blackbox
validate()functions that return{ met, reason } - Skills = CONTEXT — .md files with YAML frontmatter the LLM reads to learn available CLIs (
git,bun,docker, your project scripts) - Tools = HOW to interact — built-in
exec,read_file,write_file,edit_file,search,list_dir agent.run(prompt)= the trigger that kicks off the loop
The agent doesn't "know" git or bun — skills teach it. Validation errors are passed back to the LLM so it knows how to adjust.
Install
bun add smart-agentQuick Start
import { Agent } from "smart-agent"
const agent = new Agent({
model: "gemini-2.5-flash",
// Skills teach the agent what CLIs are available
skills: ["./skills/bun.md", "./skills/git.md"],
// Objectives define success — blackbox validation
objectives: [{
name: "tests_pass",
description: "All unit tests pass",
validate: (state) => {
const last = state.toolHistory.findLast(t => t.tool === "exec" && t.params.command?.includes("bun test"))
if (!last) return { met: false, reason: "Run 'bun test' first" }
return { met: last.result.success, reason: last.result.success ? "Tests pass" : "Tests fail" }
}
}],
})
// agent.run() is the trigger — skills give it context on how to proceed
for await (const event of agent.run("Fix the failing tests")) {
console.log(event.type, event)
}Multi-turn Sessions
For chatbot-style interactions, use Session. It maintains conversation history and re-plans objectives each turn:
import { Session } from "smart-agent"
const session = new Session({ model: "gemini-2.5-flash" })
for await (const event of session.send("create a hello world project")) {
if (event.type === "awaiting_confirmation") {
// Objectives are paused — review before proceeding
console.log("Objectives:", event.objectives)
session.confirmObjectives() // or session.rejectObjectives()
}
if (event.type === "complete") {
console.log("Done!")
}
}
// Follow-up — planner adjusts objectives based on context
for await (const event of session.send("now add unit tests")) {
session.confirmObjectives()
}By default, sessions require confirmation before executing (requireConfirmation: true). This gives the user a chance to review and approve generated objectives. Disable with { requireConfirmation: false }.
Chatbot Mode — Agent.plan()
For one-shot planning without sessions:
import { Agent } from "smart-agent"
for await (const event of Agent.plan(
"Create a greeting.txt with 'Hello World'",
{ model: "gemini-2.5-flash" }
)) {
if (event.type === "planning") {
console.log("Generated objectives:", event.objectives)
}
if (event.type === "complete") {
console.log("Done!")
}
}The planner analyzes the prompt and creates verifiable objectives using templates (file_exists, file_contains, command_succeeds, command_output_contains), then a worker agent executes them.
Conversation History
Pass a message array instead of a string to provide conversation context:
for await (const event of agent.run([
{ role: "user", content: "fix the auth tests" },
{ role: "assistant", content: "I'll look at the test files..." },
{ role: "user", content: "focus on login.test.ts" },
])) {
// agent has full conversation context
}How It Works
prompt → LLM → XML response → execute tools → check objectives → loop- Your prompt + system prompt (tools + skills + objectives) go to the LLM
- LLM responds in XML with tool invocations
- Agent executes tools and feeds results back
- Objectives are checked — if all
validate()returnmet: true, the loop ends - Otherwise, loop continues until all objectives pass or
maxIterationsis reached
API
new Agent(config)
interface AgentConfig {
model: string // LLM model name
objectives?: Objective[] // Goals to achieve (required for run(), optional for plan())
skills?: (string | Skill)[] // .md file paths or inline Skill objects
maxIterations?: number // Default: 20
temperature?: number // Default: 0.3
maxTokens?: number // Default: 8000
cwd?: string // Working directory (default: process.cwd())
toolTimeoutMs?: number // Default: 30000
systemPrompt?: string // Extra system prompt text
tools?: Tool[] // Additional custom tools
signal?: AbortSignal // Cancel the agent loop
noStreaming?: boolean // Disable streaming (returns token usage data)
safeMode?: boolean // Block autonomous exec unless approved via onApproval
onApproval?: (tool, params) => Promise<boolean> | boolean // Interactive approval callback
onToolOutput?: (tool, chunk) => void // Real-time exec output streaming
}Safe Mode
When safeMode: true, the agent cannot run shell commands without approval:
const agent = new Agent({
model: "gemini-2.5-flash",
safeMode: true,
// Interactive approval — ask the user before running commands
onApproval: async (tool, params) => {
const ok = confirm(`Run "${params.command}"?`)
return ok
},
})Without onApproval, safe mode blocks all exec calls with an error message.
Streaming Exec Output
Get real-time command output chunks instead of waiting for completion:
const agent = new Agent({
model: "gemini-2.5-flash",
onToolOutput: (tool, chunk) => process.stdout.write(chunk),
})agent.run(input): AsyncGenerator<AgentEvent>
Run with predefined objectives. Accepts string or Message[].
Agent.plan(input, config): AsyncGenerator<AgentEvent>
Dynamic mode — planner generates objectives from the prompt, then worker executes.
Events
| Event | When |
|-------|------|
| planning | Planner generated objectives |
| awaiting_confirmation | Waiting for user to confirm objectives (Session only) |
| iteration_start | Loop iteration begins |
| thinking / thinking_delta | LLM explains what it's doing (delta = streaming chunks) |
| tool_start / tool_result | Tool execution |
| tool_output_delta | Real-time exec output chunk (when onToolOutput is set) |
| approval_required | Safe mode asked user for approval |
| objective_check | Objectives validated |
| usage | Token usage data (when noStreaming is set) |
| complete | All objectives met |
| error | Something failed (agent recovers) |
| cancelled | Aborted via signal |
| max_iterations | Gave up |
Built-in Tools
| Tool | Description |
|------|-------------|
| read_file | Read file contents |
| write_file | Create/overwrite a file |
| edit_file | Find-and-replace in a file |
| exec | Run shell commands (streams output via onToolOutput) |
| list_dir | List directory contents (recursive) |
| search | Search for text patterns across files |
Parallel execution: Read-only tools (
read_file,list_dir,search) run concurrently viaPromise.all. Write tools (write_file,edit_file,exec) run sequentially to preserve ordering.
Custom Tools
Add your own tools via the tools config:
const agent = new Agent({
model: "gemini-2.5-flash",
tools: [{
name: "deploy",
description: "Deploy the app to production",
parameters: {
env: { type: "string", description: "Target environment", required: true },
},
execute: async (params) => {
// your deployment logic
return { success: true, output: `Deployed to ${params.env}` }
},
}],
objectives: [/* ... */],
})Skills
Skills are .md files with YAML frontmatter describing CLI tools. They're injected into the system prompt so the LLM knows how to use them via exec.
---
name: git
description: Git version control — staging, committing, branching, and history
---
# Git
## Commands
### commit
Create a commit with a message.
```bash
git commit -m "{message}"- message: Commit message
const agent = new Agent({
model: "gemini-3-flash-preview",
skills: ["./skills/git.md", "./skills/docker.md"],
objectives: [/* ... */],
})Built-in skills included: git, docker, bun, npm.
Objectives
Each objective has a validate(state) function that checks if the goal is met:
{
name: "tests_pass",
description: "All unit tests pass",
validate: (state) => {
const lastExec = state.toolHistory.findLast(t => t.tool === "exec")
return {
met: lastExec?.result.success === true,
reason: lastExec ? "Tests passed" : "No tests run yet"
}
}
}The state object contains:
messages— Full conversation historytoolHistory— All tool calls and resultstouchedFiles— Set of files modifiediteration— Current iteration number
Objective Templates
When using Agent.plan(), the planner generates objectives using these templates:
| Template | Params | Checks |
|----------|--------|--------|
| file_exists | path, contains? | File exists (optionally with content) |
| file_contains | path, text | File contains specific text |
| command_succeeds | command | Command exits with code 0 |
| command_output_contains | command, text | Command output contains text |
| custom_check | check | Generic fallback |
LLM Support
All LLM communication is handled by jsx-ai, which provides provider routing, streaming, and retry logic.
| Provider | Models | Env Var |
|----------|--------|---------|
| Google | gemini-* | GEMINI_API_KEY or GOOGLE_API_KEY |
| Anthropic | claude-* | ANTHROPIC_API_KEY |
| DeepSeek | deepseek-* | DEEPSEEK_API_KEY |
| OpenAI | gpt-*, o3-*, o4-* | OPENAI_API_KEY |
| Any | Other models | OPENAI_API_KEY + OPENAI_BASE_URL |
Unknown models fall back to OpenAI-compatible /chat/completions API using OPENAI_BASE_URL.
Examples
Run any example with bun run examples/<name>.ts:
| Example | What it does |
|---------|--------------|
| skill-driven | ⭐ The canonical pattern — skills provide CLI context, agent fixes lint/format/type errors |
| code-review | Finds and fixes bugs in a deliberately broken file |
| refactor | Splits a monolithic file into clean modules |
| api-gen | Generates a REST API from a spec, writes tests, verifies them |
| session | Multi-turn Session with objective confirmation/rejection |
| custom-tools | Extends the agent with http_get and json_transform tools |
| scaffold | Multi-objective — creates project, writes tests, makes them pass |
| planner | Agent.plan() — generates objectives from natural language |
| hello | Creates a file — simplest possible agent |
export GEMINI_API_KEY=your-key
bun run examples/code-review.tsArchitecture
Memory System
Agents can persist structured data via the Memory system — a key-value store backed by SQLite. Keys support dot-notation prefixes for grouping (e.g. users.john, config.theme).
// In agent-generated scripts:
const STATE_URL = process.env.STATE_URL // injected by scheduler
// Store data
await fetch(STATE_URL, {
method: 'POST',
body: JSON.stringify({ agentId: 1, key: 'users.john', value: JSON.stringify({ name: 'John', age: 30 }) })
})
// Retrieve data
const res = await fetch(`${STATE_URL}?agentId=1&key=users.john`)
const { value } = await res.json()The Memory tab in the UI groups entries by prefix, renders JSON values with smart previews, and supports inline deletion.
Plugin System
Plugins extend smart-agent with external integrations. Each plugin:
- Installs as an npm package —
bun add geeksy-telegram-plugin - Registers skills — .md files describing new capabilities
- Requires configuration — API keys, auth tokens, etc.
- Runs as a separate process — managed by BGR in its own process group
- Can expose its own UI — plugins may serve their own web interface
┌─────────────────────────────────────────────────┐
│ geeksy (main) :3737 │
│ ├── Agent workspace / │
│ ├── Memory API /api/agent-state │
│ └── Plugin registry /api/plugins │
├─────────────────────────────────────────────────┤
│ telegram-plugin :3738 (bgrun group) │
│ ├── MTProto auth /auth │
│ ├── Message polling → writes to shared SQLite│
│ └── Plugin UI / │
├─────────────────────────────────────────────────┤
│ future-plugin :3739 (bgrun group) │
│ └── ... │
└─────────────────────────────────────────────────┘First plugin: Telegram — User authenticates their Telegram account (MTProto via GramJS), the plugin polls chosen channels and writes messages into the shared SQLite database. Agents access this data through the Memory tab and via skills that enable codegen for sending messages during prompt execution or scheduled scripts.
Why Not just-bash / AgentFS?
These are compelling projects that we evaluated for smart-agent:
just-bash — A JavaScript bash interpreter that simulates shell commands as pure functions without OS access. It would allow sandboxed command execution without Docker.
AgentFS — A virtual filesystem backed by SQLite, giving agents file operations with built-in time-travel (rollback to any point).
Why we chose not to adopt them:
Our agents need real OS access. Smart-agent's core value is executing real commands (
bun test,git commit,docker build) on real files. Simulated bash and virtual filesystems would break this — agents couldn't install packages, run test suites, or interact with actual project code.Isolation is solved differently. Instead of virtualizing the OS layer, we isolate at the process level (BGR groups, separate working directories) and the permission level (per-agent skill scoping). This gives us sandboxing without sacrificing real execution.
We already have time-travel. Git versioning + SQLite-backed state persistence provides rollback. The Memory system stores all agent state in SQLite with timestamps, giving us the audit trail AgentFS provides.
Edge/serverless is not our deployment target. Smart-agent is designed for local-first execution on developer machines, not Cloudflare Workers. Our agents need access to the local filesystem, running processes, and system tools.
When they WOULD make sense:
- Multi-tenant SaaS where each user needs isolated execution
- Untrusted code execution (like a coding playground)
- Serverless deployment where OS access is impossible
If you're building on top of smart-agent for these use cases, integrating just-bash + AgentFS through a custom tool is straightforward — see the Custom Tools section.
Contributing
Before proposing architectural changes, please review the Why Not just-bash / AgentFS? section above. We've intentionally chosen real OS execution over virtualization. PRs adding sandboxed execution layers should include a compelling use case that can't be solved with the existing process-level isolation.
License
MIT
