@illuma-ai/agents

v2.1.1

Published

3 days ago

Illuma AI Agents Library

Downloads

4,482

0High
0Medium
0Low

team-illuma

Illuma Agents

Enterprise-grade TypeScript library for building and orchestrating LLM-powered agents.

Built on LangChain and LangGraph, Illuma Agents provides multi-agent orchestration, real-time streaming, tool integration, prompt caching, extended thinking, and structured output — supporting 12+ LLM providers out of the box.

Features

Multi-Agent Orchestration — Handoff, sequential, parallel (fan-out/fan-in), conditional, and hybrid agent flows
12+ LLM Providers — OpenAI, Anthropic, AWS Bedrock, Google Gemini, Vertex AI, Azure OpenAI, Mistral, DeepSeek, xAI, OpenRouter, Moonshot
Streaming-First — Real-time token streaming with split-stream buffering and content aggregation
Built-in Tools — Code execution (12+ languages), calculator, web search, browser automation, programmatic tool calling
Prompt Caching — Anthropic and Bedrock cache control for reduced latency and cost
Extended Thinking — Anthropic/Bedrock thinking blocks with proper tool-call sequencing
Structured Output — JSON schema-constrained responses via tool calling, provider-native, or auto mode
Dynamic Tool Discovery — BM25-ranked tool search for large tool registries (MCP servers)
Context Management — Automatic message pruning, token counting, and context window optimization
Observability — Langfuse + OpenTelemetry tracing
Dual Module Output — ESM + CJS with full TypeScript declarations

Installation

npm install @illuma-ai/agents

Peer Dependencies

The library requires @langchain/core as a peer dependency. If not already installed:

npm install @langchain/core

Environment Variables

Set API keys for the providers you plan to use:

# LLM Providers (add the ones you need)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
GOOGLE_API_KEY=...
AZURE_OPENAI_API_KEY=...
DEEPSEEK_API_KEY=...
XAI_API_KEY=...
MISTRAL_API_KEY=...
OPENROUTER_API_KEY=...

# Code Executor (optional)
CODE_EXECUTOR_BASEURL=http://localhost:8088
CODE_EXECUTOR_API_KEY=your-api-key

# Observability (optional)
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_BASE_URL=https://cloud.langfuse.com

Quick Start

Single Agent

import { HumanMessage } from '@langchain/core/messages';
import {
  Run,
  ChatModelStreamHandler,
  createContentAggregator,
  ToolEndHandler,
  ModelEndHandler,
  GraphEvents,
  Providers,
} from '@illuma-ai/agents';
import type * as t from '@illuma-ai/agents';

const { contentParts, aggregateContent } = createContentAggregator();

const run = await Run.create<t.IState>({
  runId: 'my-run-001',
  graphConfig: {
    type: 'standard',
    llmConfig: {
      provider: Providers.ANTHROPIC,
      model: 'claude-sonnet-4-20250514',
      apiKey: process.env.ANTHROPIC_API_KEY,
    },
    instructions: 'You are a helpful AI assistant.',
  },
  returnContent: true,
  customHandlers: {
    [GraphEvents.TOOL_END]: new ToolEndHandler(),
    [GraphEvents.CHAT_MODEL_END]: new ModelEndHandler(),
    [GraphEvents.CHAT_MODEL_STREAM]: new ChatModelStreamHandler(),
    [GraphEvents.ON_RUN_STEP]: {
      handle: (event: string, data: t.RunStep) => aggregateContent({ event, data }),
    },
    [GraphEvents.ON_RUN_STEP_DELTA]: {
      handle: (event: string, data: t.RunStepDeltaEvent) => aggregateContent({ event, data }),
    },
    [GraphEvents.ON_MESSAGE_DELTA]: {
      handle: (event: string, data: t.MessageDeltaEvent) => aggregateContent({ event, data }),
    },
  },
});

const result = await run.processStream(
  { messages: [new HumanMessage('What is the capital of France?')] },
  { version: 'v2', configurable: { user_id: 'user-123', thread_id: 'conv-1' } }
);

console.log('Response:', contentParts);

Multi-Agent with Handoffs

import { Run, Providers } from '@illuma-ai/agents';
import type * as t from '@illuma-ai/agents';

const run = await Run.create({
  runId: 'multi-agent-001',
  graphConfig: {
    type: 'multi-agent',
    agents: [
      {
        agentId: 'flight_assistant',
        provider: Providers.ANTHROPIC,
        clientOptions: { modelName: 'claude-haiku-4-5' },
        instructions: 'You are a flight booking assistant.',
      },
      {
        agentId: 'hotel_assistant',
        provider: Providers.ANTHROPIC,
        clientOptions: { modelName: 'claude-haiku-4-5' },
        instructions: 'You are a hotel booking assistant.',
      },
    ],
    edges: [
      {
        from: 'flight_assistant',
        to: 'hotel_assistant',
        description: 'Transfer when user needs hotel help',
      },
      {
        from: 'hotel_assistant',
        to: 'flight_assistant',
        description: 'Transfer when user needs flight help',
      },
    ],
  },
  customHandlers: { /* ...event handlers... */ },
  returnContent: true,
});

Parallel Fan-out / Fan-in

const run = await Run.create({
  runId: 'parallel-001',
  graphConfig: {
    type: 'multi-agent',
    agents: [
      { agentId: 'coordinator', provider: Providers.ANTHROPIC, clientOptions: { modelName: 'claude-haiku-4-5' }, instructions: 'Coordinate research tasks.' },
      { agentId: 'analyst_a', provider: Providers.ANTHROPIC, clientOptions: { modelName: 'claude-haiku-4-5' }, instructions: 'Financial analysis.' },
      { agentId: 'analyst_b', provider: Providers.ANTHROPIC, clientOptions: { modelName: 'claude-haiku-4-5' }, instructions: 'Technical analysis.' },
      { agentId: 'summarizer', provider: Providers.ANTHROPIC, clientOptions: { modelName: 'claude-haiku-4-5' }, instructions: 'Synthesize all findings.' },
    ],
    edges: [
      { from: 'coordinator', to: ['analyst_a', 'analyst_b'], edgeType: 'direct' },   // Fan-out (parallel)
      { from: ['analyst_a', 'analyst_b'], to: 'summarizer', edgeType: 'direct' },     // Fan-in
    ],
  },
  customHandlers: { /* ... */ },
});

Using Tools

import { createCodeExecutionTool, Calculator } from '@illuma-ai/agents';

const run = await Run.create<t.IState>({
  runId: 'tools-001',
  graphConfig: {
    type: 'standard',
    llmConfig: { provider: Providers.OPENAI, model: 'gpt-4o' },
    instructions: 'You can execute code and do math.',
    tools: [createCodeExecutionTool(), new Calculator()],
  },
  customHandlers: { /* ... */ },
});

Extended Thinking

const run = await Run.create<t.IState>({
  runId: 'thinking-001',
  graphConfig: {
    type: 'standard',
    llmConfig: {
      provider: Providers.ANTHROPIC,
      model: 'claude-3-7-sonnet-latest',
      thinking: { type: 'enabled', budget_tokens: 5000 },
    },
    instructions: 'Think through problems carefully.',
  },
  customHandlers: {
    // ...standard handlers...
    [GraphEvents.ON_REASONING_DELTA]: {
      handle: (event: string, data: t.ReasoningDeltaEvent) => {
        // Receive thinking/reasoning tokens as they stream
      },
    },
  },
});

Structured Output

const run = await Run.create<t.IState>({
  runId: 'structured-001',
  graphConfig: {
    type: 'standard',
    agents: [{
      agentId: 'analyzer',
      provider: Providers.OPENAI,
      clientOptions: { model: 'gpt-4o' },
      instructions: 'Analyze sentiment.',
      structuredOutput: {
        schema: {
          type: 'object',
          properties: {
            sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
            confidence: { type: 'number' },
          },
          required: ['sentiment', 'confidence'],
        },
        mode: 'auto',
        strict: true,
      },
    }],
  },
  customHandlers: {
    [GraphEvents.ON_STRUCTURED_OUTPUT]: {
      handle: (_event: string, data: unknown) => console.log('Result:', data),
    },
  },
});

Providers

| Provider | Enum | Notes | |----------|------|-------| | OpenAI | Providers.OPENAI | GPT-4o, o1, o3 | | Anthropic | Providers.ANTHROPIC | Claude 4, Sonnet, Haiku — thinking, caching, web search | | AWS Bedrock | Providers.BEDROCK | Claude via Bedrock — caching, reasoning | | Google Gemini | Providers.GOOGLE | Gemini Pro, Flash | | Vertex AI | Providers.VERTEXAI | Google models via GCP | | Azure OpenAI | Providers.AZURE | OpenAI models via Azure | | Mistral | Providers.MISTRALAI | Large, Medium, Small | | DeepSeek | Providers.DEEPSEEK | Reasoning models | | xAI | Providers.XAI | Grok | | OpenRouter | Providers.OPENROUTER | Multi-model routing | | Moonshot | Providers.MOONSHOT | Moonshot AI |

Provider config examples:

// OpenAI
{ provider: Providers.OPENAI, clientOptions: { model: 'gpt-4o', apiKey: '...' } }

// Anthropic
{ provider: Providers.ANTHROPIC, clientOptions: { modelName: 'claude-sonnet-4-20250514', apiKey: '...' } }

// AWS Bedrock
{ provider: Providers.BEDROCK, clientOptions: { model: 'us.anthropic.claude-sonnet-4-20250514-v1:0', region: 'us-east-1' } }

Multi-Agent Patterns

Edge Types

| Type | Behavior | |------|----------| | Handoff (default) | LLM decides when to transfer — auto-generates transfer_to_<agent> tools | | Direct | Fixed routing — agents run in sequence or parallel |

Handoff (Dynamic)

{ from: 'triage', to: 'billing', description: 'Transfer for billing questions' }
// → triage agent gets a transfer_to_billing tool it can call

Sequential Pipeline

{ from: 'drafter', to: 'reviewer', edgeType: 'direct', prompt: 'Review the draft above.' }

Fan-out / Fan-in (Parallel)

{ from: 'coordinator', to: ['analyst_a', 'analyst_b'], edgeType: 'direct' }          // parallel
{ from: ['analyst_a', 'analyst_b'], to: 'summarizer', edgeType: 'direct', prompt: '{results}' }  // join

Use {results} in prompts to inject collected output from parallel agents.

Conditional Routing

{
  from: 'router',
  to: ['fast_model', 'powerful_model'],
  condition: (state) => state.messages.at(-1)?.content.length > 500 ? 'powerful_model' : 'fast_model',
}

Hybrid

Agents with both handoff and direct edges use exclusive routing: if a handoff fires, only the handoff destination runs; otherwise direct edges execute.

Built-in Tools

| Tool | Import | Description | |------|--------|-------------| | Code Executor | createCodeExecutionTool() | Sandboxed execution in 12+ languages (Python, JS, TS, C, C++, Java, PHP, Rust, Go, D, Fortran, R) | | Calculator | new Calculator() | Math expressions via mathjs | | Browser Tools | createBrowserTools() | 12 browser actions (navigate, click, type, screenshot, etc.) | | Tool Search | createToolSearchTool() | BM25-ranked discovery for large tool registries | | Programmatic Tool Calling | createProgrammaticToolCallingTool() | LLM writes Python to call tools as async functions |

Event System

const customHandlers = {
  [GraphEvents.CHAT_MODEL_STREAM]: new ChatModelStreamHandler(),   // Token-by-token streaming
  [GraphEvents.CHAT_MODEL_END]:    new ModelEndHandler(usageArray), // Usage metadata
  [GraphEvents.TOOL_END]:          new ToolEndHandler(),            // Tool results
  [GraphEvents.ON_RUN_STEP]:       { handle: (e, data) => ... },   // New run step
  [GraphEvents.ON_RUN_STEP_DELTA]: { handle: (e, data) => ... },   // Step delta (tool args)
  [GraphEvents.ON_MESSAGE_DELTA]:  { handle: (e, data) => ... },   // Text delta
  [GraphEvents.ON_REASONING_DELTA]:{ handle: (e, data) => ... },   // Thinking delta
  [GraphEvents.ON_AGENT_UPDATE]:   { handle: (e, data) => ... },   // Agent switch
  [GraphEvents.ON_STRUCTURED_OUTPUT]: { handle: (e, data) => ... },// Structured JSON
};

Use createContentAggregator() to automatically collect deltas into a complete response:

const { contentParts, aggregateContent } = createContentAggregator();
// Pass aggregateContent into your handlers
// After streaming, contentParts has the full structured response

Prompt Caching

Caching is automatic for Anthropic and Bedrock providers:

System messages get cache control markers
Last 2 conversation messages get cache breakpoints
Use dynamicContext for per-request data (keeps system prompt cacheable):

{
  instructions: 'You are a helpful assistant.',           // Cached
  dynamicContext: `Current time: ${new Date().toISOString()}`,  // Not cached
}

Structured Output Modes

| Mode | Description | |------|-------------| | 'auto' | Auto-selects best strategy per provider (default) | | 'tool' | Uses tool calling — universal compatibility | | 'provider' | Provider-native JSON mode | | 'native' | Constrained decoding — guaranteed schema compliance |

structuredOutput: {
  schema: { /* JSON Schema */ },
  mode: 'auto',
  strict: true,
  handleErrors: true,  // Auto-retry on validation failure
  maxRetries: 2,
}

Observability

Set these env vars to enable automatic Langfuse tracing:

LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_BASE_URL=https://cloud.langfuse.com

Each trace captures userId, sessionId, messageId, and full LangChain callback spans.

Title Generation

Generate conversation titles from the first exchange:

const { title, language } = await run.generateTitle({
  provider: Providers.ANTHROPIC,
  inputText: userMessage,
  contentParts,
  titleMethod: TitleMethod.COMPLETION,
  clientOptions: { model: 'claude-3-5-haiku-latest' },
});

API Exports

// Core
export { Run } from '@illuma-ai/agents';
export { ChatModelStreamHandler, createContentAggregator, SplitStreamHandler } from '@illuma-ai/agents';
export { HandlerRegistry, ModelEndHandler, ToolEndHandler } from '@illuma-ai/agents';

// Tools
export { createCodeExecutionTool, Calculator, createBrowserTools } from '@illuma-ai/agents';
export { createToolSearchTool, createProgrammaticToolCallingTool } from '@illuma-ai/agents';

// Graphs
export { StandardGraph, MultiAgentGraph } from '@illuma-ai/agents';

// LLM
export { getChatModelClass, llmProviders } from '@illuma-ai/agents';

// Enums & Constants
export { GraphEvents, Providers, ContentTypes, StepTypes, TitleMethod, Constants } from '@illuma-ai/agents';

// Types
export type { IState, RunConfig, AgentInputs, GraphEdge, StructuredOutputConfig } from '@illuma-ai/agents';