@elfenlabs/nous

v0.9.1

Published

3 months ago

A minimal agent SDK for TypeScript — three primitives, infinite composition

0High
0Medium
0Low

yondercode

agent sdk llm context state-machine ai typescript

🧠 Nous

A minimal agent SDK for TypeScript. Four primitives, zero opinions on your LLM provider.

npm install @elfenlabs/nous

Quick Start

import { createContext, createTool, createOpenAIProvider, runAgent } from '@elfenlabs/nous'

// Define a tool
const getWeather = createTool({
  id: 'get_weather',
  description: 'Get the current weather for a city',
  schema: {
    city: { type: 'string', description: 'The city name' },
  },
  execute: async (args) => {
    const { city } = args as { city: string }
    return { city, temp: 22, condition: 'sunny' }
  },
})

// Create context and provider
const ctx = createContext()
ctx.push("What's the weather in Tokyo?")

const provider = createOpenAIProvider('https://api.openai.com', 'gpt-4o', {
  apiKey: process.env.OPENAI_API_KEY,
})

// Run the agent
const result = await runAgent({
  ctx,
  provider,
  instruction: 'You are a helpful assistant. Use tools when needed.',
  tools: [getWeather],
})

console.log(result.response) // "The weather in Tokyo is 22°C and sunny."
console.log(result.steps)    // 2
console.log(result.usage)    // { promptTokens, completionTokens, totalTokens }

Primitives

| Primitive | What it is | |---|---| | Context | Append-only message chain. You push messages in, the agent loop reads them out. | | Tool | Schema + execute function. The agent calls tools automatically based on model output. | | Provider | LLM backend interface. A single generate() method — implement it for any API. | | Agent | The loop. Calls the provider, executes tool calls, repeats until the model responds with text. |

Context

An ordered Message[] chain. Push strings (become user messages) or full Message objects.

import { createContext } from '@elfenlabs/nous'

const ctx = createContext()

// Strings become user messages
ctx.push('What is 2 + 2?')

// Full messages for other roles
ctx.push({ role: 'system', content: 'You are a math tutor.' })

// Read messages
ctx.messages // readonly Message[]

// Serialize / restore
const snapshot = ctx.serialize()
const restored = createContext({ from: snapshot })

// Fork — zero-copy child context (see Sub-Agent Composition)
const child = ctx.fork()
child.push('This message only exists in the child')
child.messages  // [...parent messages, child messages]
ctx.messages    // unchanged — parent is not affected

Message Shape

type ContentPart =
  | { type: 'text'; text: string }
  | { type: 'image_url'; image_url: { url: string; detail?: 'low' | 'high' | 'auto' } }
  | { type: 'input_audio'; input_audio: { data: string; format: 'wav' | 'mp3' } }
  | { type: 'video_url'; video_url: { url: string; detail?: 'low' | 'high' | 'auto' } }
  | { type: 'file'; file: { url: string; mime_type: string; name?: string } }

type Message = {
  role: 'system' | 'user' | 'assistant' | 'tool'
  content: string | ContentPart[]
  reasoning?: string             // reasoning / think tokens from supported models
  toolCallId?: string            // links tool results back to the call
  toolCalls?: ToolCallRequest[]  // tool calls requested by the model
  pinned?: boolean               // protected from eviction (see Context Window Management)
}

Tool

A tool is an id, a description, a schema, and an execute function.

import { createTool } from '@elfenlabs/nous'

const calculator = createTool({
  id: 'calculator',
  description: 'Evaluate a math expression',
  schema: {
    expression: { type: 'string', description: 'The expression to evaluate', required: true },
  },
  execute: async (args) => {
    const { expression } = args as { expression: string }
    return { result: eval(expression) }
  },
})

// The .spec property gives you the wire format for provider APIs
calculator.spec // { name, description, parameters }

Set terminal: true when a tool result should end the loop immediately instead of going back to the model:

const submitVerdict = createTool({
  id: 'submit_verdict',
  description: 'Return the final structured verdict',
  terminal: true,
  execute: async () => ({ approved: true, reason: 'Looks good' }),
})

const result = await runAgent({
  ctx,
  provider,
  instruction: 'Review the input and submit a verdict.',
  tools: [submitVerdict],
})

result.response           // JSON string of the tool result
result.terminalToolResult // raw structured value

Parameter Types

type ToolParameter = {
  type: 'string' | 'number' | 'boolean' | 'object' | 'array'
  description: string
  required?: boolean       // default: true
  properties?: Record<string, ToolParameter>  // for 'object'
  items?: ToolParameter    // for 'array'
  enum?: string[]          // for 'string'
}

Output Truncation

Tool results are truncated to prevent context blowup. Set a default limit on the agent, or override per-tool:

const result = await runAgent({
  ctx,
  provider,
  instruction: '...',
  tools: [myTool],
  defaultMaxOutputChars: 10_000,  // default limit for all tools
})

// Per-tool override
const bigOutputTool = createTool({
  id: 'read_file',
  description: 'Read a file',
  schema: { path: { type: 'string', description: 'File path' } },
  maxOutputChars: 50_000,  // this tool gets a larger limit
  execute: async (args) => fs.readFileSync((args as { path: string }).path, 'utf8'),
})

Priority: per-tool maxOutputChars > agent defaultMaxOutputChars > built-in default (10,000 chars).

Agent Loop

runAgent returns an awaitable AgentRunHandle. Await it like a promise, or inspect live status while the loop is running.

import { runAgent, SlidingWindowStrategy } from '@elfenlabs/nous'

const run = runAgent({
  ctx,                    // Context — the conversation so far
  provider,               // Provider — any LLM backend
  instruction: '...',     // system prompt (prepended to every call)
  tools: [tool1, tool2],  // available tools

  // Limits
  maxSteps: 50,           // default: 50
  signal: abortController.signal,

  // Context window management (see below)
  maxContextTokens: 100_000,
  evictionStrategy: new SlidingWindowStrategy(),
  tokenCounter: (text) => text.length / 4,  // default

  // Streaming callbacks
  onThinkingStart: () => {},
  onThinking: (chunk) => {},      // reasoning tokens (dim/hidden)
  onThinkingEnd: () => {},
  onOutputStart: () => {},
  onOutput: (chunk) => {},        // content tokens (visible)
  onOutputEnd: () => {},
  onToolCall: (index, id, name) => {},
  onToolCallArgs: (index, argChunk) => {},

  // Tool lifecycle hooks
  onBeforeToolCall: async (tool, args) => {
    // return false to block the call
  },
  onAfterToolCall: (tool, args, result) => {},
})

run.status()    // { state, step, activeToolCalls }
const stop = run.onChange((status) => {})

const result = await run

result.response  // final text response
result.steps     // number of provider calls made
result.usage     // { promptTokens, completionTokens, totalTokens }
result.terminalToolResult // set only when a terminal tool ended the loop

Streaming

The streaming callbacks follow a lifecycle: onThinkingStart → onThinking (repeated) → onThinkingEnd → onOutputStart → onOutput (repeated) → onOutputEnd. Transitions are managed automatically — thinking ends when content begins.

const result = await runAgent({
  ctx,
  provider,
  instruction: 'You are a helpful assistant.',
  tools,
  onThinkingStart: () => process.stdout.write('\x1b[2m'),  // dim
  onThinking: (chunk) => process.stdout.write(chunk),
  onThinkingEnd: () => process.stdout.write('\x1b[0m\n'),  // reset
  onOutputStart: () => {},
  onOutput: (chunk) => process.stdout.write(chunk),
  onOutputEnd: () => process.stdout.write('\n'),
})

Streaming callbacks fire during each provider call. When the model makes tool calls, onOutputEnd fires before tool execution, and new onOutputStart/onOutput events fire on the next iteration.

If the provider streams tool calls, onToolCall(index, id, name) fires when each tool call first appears, and onToolCallArgs(index, argChunk) fires for each streamed JSON argument fragment.

How the Loop Works

┌─────────────────────────────────────────────┐
│  system prompt + ctx.messages → provider    │
│                    ↓                        │
│  ┌─ tool calls? ──────────────────────────┐ │
│  │ YES → execute tools → push results     │ │
│  │       → loop back to provider          │ │
│  ├─ text content? ────────────────────────┤ │
│  │ YES → push assistant message → return  │ │
│  ├─ reasoning only? ─────────────────────┤  │
│  │ YES → push reasoning → loop           │  │
│  └────────────────────────────────────────┘ │
└─────────────────────────────────────────────┘

Each iteration, the provider returns a GenerateResult with optional content, reasoning, and toolCalls. The agent evaluates them in priority order:

Case 1 — Tool Calls: The model returned toolCalls (content and reasoning may also be present). The agent pushes the assistant message to context, then executes each tool call in parallel. Within this case, each individual call is handled as one of:

Parse error — model produced invalid JSON arguments → push an error result asking the model to retry
Unknown tool — model hallucinated a tool name → push an error result
Blocked — onBeforeToolCall hook returned false → push a "blocked" result
Success — run tool.execute(), truncate output to maxOutputChars, push result
Exception — tool threw an error → catch it, push the error message as result

After all tool calls are processed, the loop normally continues back to the provider. If any executed tool has terminal: true, the loop stops immediately and returns that tool's output as the agent result.

Case 2 — Content only: The model returned content with no tool calls. This is the only case that exits the loop. The agent pushes the final assistant message and returns AgentResult.

Case 3 — Reasoning only: The model returned only reasoning (a think block) with no content or tool calls. This happens with reasoning models that sometimes emit a think step before acting. The agent pushes an assistant message with empty content and loops again.

Case 4 — Empty response: No content, reasoning, or tool calls. The agent throws an error.

Provider

The Provider interface is a single method. Implement it for any LLM backend.

interface Provider {
  generate(params: {
    messages: Message[]
    tools?: ToolSpec[]
    signal?: AbortSignal
    stream?: StreamCallbacks
  }): Promise<GenerateResult>
  supportedMedia?: string[]
  includesReasoning?: boolean
}

type GenerateResult = {
  content?: string
  reasoning?: string
  toolCalls?: ToolCallRequest[]
  usage?: Usage
}

type StreamCallbacks = {
  onReasoning?: (chunk: string) => void
  onContent?: (chunk: string) => void
  onToolCallStart?: (index: number, id: string, name: string) => void
  onToolCallDelta?: (index: number, argChunk: string) => void
}

Built-in: OpenAI-Compatible Provider

Works with OpenAI, vLLM, OpenRouter, Ollama, LiteLLM, and any OpenAI-compatible API. Supports streaming (SSE), tool-call deltas, and reasoning model support (reasoning_content and reasoning).

import { createOpenAIProvider } from '@elfenlabs/nous'

// OpenAI
const openai = createOpenAIProvider('https://api.openai.com', 'gpt-4o', {
  apiKey: process.env.OPENAI_API_KEY,
})

// Local vLLM
const vllm = createOpenAIProvider('http://localhost:8000', 'my-model')

// OpenRouter
const openrouter = createOpenAIProvider('https://openrouter.ai/api', 'anthropic/claude-sonnet-4.5', {
  apiKey: process.env.OPENROUTER_API_KEY,
  temperature: 0.2,
  includeReasoning: true,
})

includeReasoning controls whether assistant reasoning text is echoed back to the API on subsequent turns. Enable it for providers that require reasoning replay in tool loops. When enabled, those reasoning tokens are also counted toward context-budget calculations via provider.includesReasoning.

The built-in provider currently declares native media support for image/*.

Sub-Agent Composition

Agents are just functions. Wrap runAgent inside a tool to create sub-agents.

Isolated Context

Use createContext() when the sub-agent doesn't need the parent conversation. Only the final answer bubbles up — no internal noise leaks into the parent.

const searchOrders = createTool({
  id: 'search_orders',
  description: 'Search through paginated orders to find a match',
  schema: {
    query: { type: 'string', description: 'What to search for' },
  },
  execute: async (args) => {
    const { query } = args as { query: string }

    // Sub-agent gets its own isolated context
    const subCtx = createContext()
    subCtx.push(`Find: ${query}`)

    const fetchPage = createTool({
      id: 'fetch_page',
      description: 'Fetch a page of orders',
      schema: { page: { type: 'number', description: 'Page number' } },
      execute: async (a) => api.getOrders((a as { page: number }).page),
    })

    const result = await runAgent({
      ctx: subCtx,
      provider,
      instruction: 'Search through pages until you find the item or exhaust all pages.',
      tools: [fetchPage],
      maxSteps: 20,
    })

    return result.response
  },
})

Forked Context

Use ctx.fork() when the sub-agent needs the full parent conversation to do its job. The child sees all parent messages as a read-only prefix and appends only to its own array — tool call noise stays in the fork.

const deepAnalysis = createTool({
  id: 'deep_analysis',
  description: 'Perform deep analysis using the full conversation context',
  schema: {
    focus: { type: 'string', description: 'What aspect to analyze' },
  },
  execute: async (args, ctx) => {
    const { focus } = args as { focus: string }

    // Fork inherits the entire parent conversation (zero-copy)
    const forkedCtx = ctx.fork()
    forkedCtx.push(`Analyze the conversation so far, focusing on: ${focus}`)

    const result = await runAgent({
      ctx: forkedCtx,
      provider,
      instruction: 'You are an analyst. Use the conversation history to provide insights.',
      tools: [searchDatabase, runQuery],
      maxSteps: 15,
    })

    // Sub-agent's tool calls and intermediate steps stay in the fork
    // Only the final answer returns to the parent
    return result.response
  },
})

Context Window Management

LLM APIs have context limits. When the conversation exceeds the limit, some providers return a 400 error — others silently truncate from the beginning, evicting your system prompt first. Nous prevents this with automatic compaction.

Automatic Compaction

Pass an evictionStrategy to runAgent and Nous will compact the context before every generate() call:

import { runAgent, SlidingWindowStrategy } from '@elfenlabs/nous'

const result = await runAgent({
  ctx,
  provider,
  instruction: 'You are a helpful assistant.',
  tools: [myTool],
  maxContextTokens: 100_000,                // token budget (default: 100k)
  evictionStrategy: new SlidingWindowStrategy(), // enable auto-compaction
})

SlidingWindowStrategy evicts the oldest non-pinned messages first. Tool call groups (assistant message + tool results) are always evicted as a unit to maintain structural integrity.

Pinning Messages

Pin critical messages to protect them from eviction:

const ctx = createContext()

ctx.push({ role: 'user', content: 'Project spec: build a CLI tool that...' })
ctx.pin(-1) // protect from eviction (-1 = last pushed)

ctx.push({ role: 'user', content: 'Also, here are the requirements...' })
ctx.pin(-1)

// Later, if needed:
ctx.unpin(0) // remove protection

Pinned messages are never evicted. The system prompt (passed as instruction) and tool definitions are always protected automatically — they're budgeted as fixed costs.

Custom Token Counter

The default token estimator uses text.length / 4 (~3.5–4 chars per token for English). For precise counting:

import { encode } from 'tiktoken'

const result = await runAgent({
  // ...
  tokenCounter: (text) => encode(text).length,
})

On-Demand Compaction

Strategies can also be called directly — by the host app, a tool, or any caller:

import { SlidingWindowStrategy } from '@elfenlabs/nous'

const strategy = new SlidingWindowStrategy()
const tokenCounter = (text: string) => text.length / 4

// Proactive compaction at 50% to fight context rot
strategy.compact(ctx, maxTokens * 0.5, tokenCounter)

Custom Strategies

Implement the EvictionStrategy interface for custom behavior:

import type { EvictionStrategy, TokenCounter } from '@elfenlabs/nous'
import type { Context } from '@elfenlabs/nous'

class SummarizingStrategy implements EvictionStrategy {
  compact(ctx: Context, budgetTokens: number, tokenCounter: TokenCounter): void {
    // Your logic: summarize old messages, evict, push summary, etc.
  }
}

Error Handling

Thrown Errors

These errors propagate to the caller and must be caught:

import { MaxStepsError, AgentAbortError, ContextBudgetError } from '@elfenlabs/nous'

try {
  await runAgent({ ctx, provider, instruction: '...', tools, maxSteps: 10 })
} catch (err) {
  if (err instanceof MaxStepsError) {
    // Agent exceeded step limit
  }
  if (err instanceof AgentAbortError) {
    // AbortSignal was triggered
  }
  if (err instanceof ContextBudgetError) {
    // Fixed context (system prompt + tools + pinned) exceeds maxContextTokens
  }
}

Auto-Recovery

These errors are handled internally — the agent feeds them back to the model as tool role messages, giving the model a chance to self-correct:

Unknown tool — model called a tool that doesn't exist (e.g., hallucinated name)
Malformed arguments — model produced invalid JSON for tool arguments
Tool exception — tool.execute() threw an error
Blocked call — onBeforeToolCall hook returned false

The model sees the error in its context and can retry with corrected arguments, use a different tool, or respond with text instead. This keeps the agent loop resilient without requiring manual error handling for common LLM mistakes.

License

MIT