agent-orchestra

v0.1.0

Published

2 months ago

Multi-agent AI orchestration with confidence scoring for TypeScript

0High
0Medium
0Low

ameyb

ai agents multi-agent orchestration confidence-scoring llm typescript ai-agents agent-framework

Why agent-orchestra?

Most AI agent frameworks give you building blocks but leave the hard problems to you: when should an agent's output be trusted? What happens when two agents disagree? How do you prevent a confident-but-wrong agent from causing damage?

agent-orchestra is a TypeScript framework for multi-agent orchestration where confidence scoring is the core primitive, not an afterthought. Every agent returns a calibrated confidence score. The orchestrator uses these scores to decide whether to proceed, cross-validate with another agent, or escalate to a human. The result is an AI system that knows what it doesn't know.

npm install agent-orchestra

Quickstart

import {
  Orchestrator,
  defineAgent,
  ConfidenceThresholds,
} from "agent-orchestra";

// 1. Define specialized agents
const reviewer = defineAgent({
  id: "code-reviewer",
  description: "Reviews code changes for correctness",
  execute: async (task, context) => {
    const analysis = await yourLLM.analyze(task.payload);
    return {
      result: analysis.findings,
      confidence: analysis.confidence,
      rationale: analysis.reasoning,
    };
  },
});

const security = defineAgent({
  id: "security-scanner",
  description: "Checks for security vulnerabilities",
  execute: async (task, context) => {
    const scan = await yourLLM.scan(task.payload);
    return {
      result: scan.vulnerabilities,
      confidence: scan.confidence,
      rationale: scan.reasoning,
    };
  },
});

// 2. Create the orchestrator
const orchestra = new Orchestrator({
  agents: [reviewer, security],
  thresholds: ConfidenceThresholds.DEFAULT,
  onEscalation: async (result) => {
    console.log(`Escalating: ${result.rationale}`);
  },
});

// 3. Run
const result = await orchestra.run({
  type: "code-review",
  payload: { diff: "..." },
});

console.log(result.aggregateConfidence); // 0.87
console.log(result.escalations);         // []

That's it. Sixteen lines to a working multi-agent system with confidence gating.

Architecture

graph TB
    subgraph Orchestrator
        TQ[Task Queue] --> DC[Decomposer]
        DC --> RT[Router]
        RT --> AG[Aggregator]
        AG --> CG[Confidence Gate]
        CG -->|≥ 0.85| PR[Proceed]
        CG -->|0.60 – 0.84| CV[Cross-Validate]
        CG -->|< 0.60| ES[Escalate]
    end

    subgraph Agents
        A1[Agent A]
        A2[Agent B]
        A3[Agent C]
    end

    subgraph Context
        CB[Context Bus]
        HS[Score History]
    end

    RT --> A1 & A2 & A3
    A1 & A2 & A3 --> AG
    A1 & A2 & A3 -.-> CB
    ES -.-> HS

    style CG fill:#1a1a2e,color:#fff
    style PR fill:#16a34a,color:#fff
    style CV fill:#ca8a04,color:#fff
    style ES fill:#dc2626,color:#fff

The orchestrator receives a task, decomposes it into sub-tasks, routes each to the appropriate agent, collects results with confidence scores, and makes a decision. It never writes code or produces content itself — its job is coordination and judgment.

Core Concepts

Agents

An agent is a specialized unit that performs one task well. Each agent implements the Agent interface: a single execute method that takes a Task and returns an AgentResult with a confidence score.

import { defineAgent } from "agent-orchestra";

const myAgent = defineAgent({
  id: "my-agent",
  description: "Does one thing well",
  taskTypes: ["analysis"],        // optional: restrict to task types
  schemaVersion: 1,               // optional: reject incompatible tasks

  execute: async (task, context) => {
    // context.bus — read/write to the shared context bus
    // context.history — past results for this task chain
    const priorFindings = context.bus.get("upstream-findings");

    const result = await doWork(task.payload, priorFindings);

    // Write to context bus for downstream agents
    context.bus.set("my-findings", result.findings);

    return {
      result: result.data,
      confidence: result.confidence,  // 0–1
      rationale: "Explanation of confidence level",
      evidencePaths: result.filesExamined,
    };
  },
});

Agents are model-agnostic. Use OpenAI, Anthropic, a local model, or a deterministic function — agent-orchestra doesn't care how you get the result, only that you return a confidence score with it.

Confidence Scoring

Every AgentResult includes a confidence field (0–1). The framework provides rubric helpers to keep scores calibrated:

import { ConfidenceRubric } from "agent-orchestra";

const reviewRubric = new ConfidenceRubric({
  high:   { range: [0.9, 1.0], criteria: "Full context, established patterns, small diff" },
  medium: { range: [0.7, 0.9], criteria: "Well-understood but touches integration boundaries" },
  low:    { range: [0.5, 0.7], criteria: "Multiple plausible interpretations exist" },
  guess:  { range: [0.0, 0.5], criteria: "Insufficient context to make a determination" },
});

// Use in your agent
const score = reviewRubric.score("medium", 0.78);
// Returns 0.78, validated against the range

The orchestrator uses thresholds to gate decisions:

| Score | Action | Description | |-------|--------|-------------| | ≥ 0.85 | Proceed | Result is trusted. Move to next step. | | 0.60 – 0.84 | Cross-validate | Route to a different agent for a second opinion. | | < 0.60 | Escalate | Pause for human review. |

Thresholds are configurable per deployment:

import { ConfidenceThresholds } from "agent-orchestra";

// Built-in presets
ConfidenceThresholds.DEFAULT    // { proceed: 0.85, review: 0.60 }
ConfidenceThresholds.STRICT     // { proceed: 0.92, review: 0.75 }
ConfidenceThresholds.PERMISSIVE // { proceed: 0.75, review: 0.50 }

// Custom
const custom = { proceed: 0.88, review: 0.65 };

Cross-Agent Chaining

When Agent A's output becomes Agent B's input, confidence propagates:

import { propagateConfidence } from "agent-orchestra";

const scores = [0.85, 0.78, 0.91];
propagateConfidence(scores); // 0.60 — conservative by design

The formula: max(product(scores), min(scores) * 0.9). Multiplicative attenuation penalizes long uncertain chains while the floor prevents unbounded pessimism.

The orchestrator handles chaining automatically:

const orchestra = new Orchestrator({
  agents: [analyzer, reviewer, tester],
  chains: [
    { from: "analyzer", to: "reviewer", when: "always" },
    { from: "reviewer", to: "tester", when: "confidence_below", threshold: 0.85 },
  ],
});

Disagreement Detection

When two agents assess the same artifact and disagree, the orchestrator detects it:

const orchestra = new Orchestrator({
  agents: [reviewer, security],
  onDisagreement: async (disagreement) => {
    // disagreement.agents — ["code-reviewer", "security-scanner"]
    // disagreement.summaryA — reviewer's rationale
    // disagreement.summaryB — security's rationale
    await notifyHuman(disagreement);
  },
});

The orchestrator never resolves disagreements by averaging scores or picking the higher one. Disagreement between confident agents is a signal that human judgment is needed.

Context Bus

Agents communicate through a shared context bus rather than stuffing full outputs into prompts:

// Agent A writes structured findings
context.bus.set("security-findings", {
  vulnerabilities: [...],
  scannedFiles: [...],
  uncertainties: ["Could not determine if input is sanitized at L42"],
});

// Agent B reads only what it needs
const findings = context.bus.get("security-findings");

This keeps each agent's prompt focused and prevents context pollution.

Circuit Breaker

Built-in protection against runaway agents:

const orchestra = new Orchestrator({
  agents: [reviewer],
  circuitBreaker: {
    maxConsecutiveFailures: 3,
    maxActionsPerMinute: 20,
    maxTokenBudget: 500_000,
    onTrip: async (reason) => {
      await alertOps(`Circuit breaker tripped: ${reason}`);
    },
  },
});

Circuit breakers require manual reset by default. An agent that enters a failure loop should not be allowed to retry on its own.

API Reference

`defineAgent(config)`

Creates a new agent instance.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | id | string | Yes | Unique agent identifier | | description | string | Yes | Human-readable description | | taskTypes | string[] | No | Task types this agent handles (all if omitted) | | schemaVersion | number | No | Reject tasks with incompatible schema versions | | execute | (task, context) => Promise<AgentResult> | Yes | The agent's work function |

Returns: Agent

`new Orchestrator(config)`

Creates the orchestrator.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | agents | Agent[] | Yes | Array of agents to coordinate | | thresholds | ThresholdConfig | No | Confidence thresholds (default: ConfidenceThresholds.DEFAULT) | | chains | ChainConfig[] | No | Cross-agent chaining rules | | circuitBreaker | CircuitBreakerConfig | No | Circuit breaker settings | | maxChainDepth | number | No | Max chaining depth before forced escalation (default: 4) | | onEscalation | (result) => Promise<void> | No | Called when a result is escalated | | onDisagreement | (disagreement) => Promise<void> | No | Called when agents disagree |

`orchestrator.run(task)`

Executes a task through the orchestration pipeline.

| Parameter | Type | Description | |-----------|------|-------------| | task | Task | The task to execute |

Returns: Promise<OrchestratorResult>

interface OrchestratorResult {
  results: AgentResult[];
  escalations: Escalation[];
  disagreements: Disagreement[];
  aggregateConfidence: number;
  tokensUsed: number;
  durationMs: number;
}

`AgentResult`

Returned by every agent execution.

interface AgentResult {
  agentId: string;
  taskId: string;
  result: unknown;
  confidence: number;        // 0–1
  rationale: string;         // why this confidence level
  evidencePaths?: string[];  // files/resources examined
}

`propagateConfidence(scores)`

Computes aggregate confidence across a chain.

propagateConfidence([0.9, 0.85])  // 0.765
propagateConfidence([0.5, 0.9])   // 0.45
propagateConfidence([])            // 0

`ConfidenceRubric`

Helper for anchoring confidence scores to observable criteria.

const rubric = new ConfidenceRubric({ ... });
rubric.score(level, value)  // Validates value is in the level's range
rubric.describe()           // Returns human-readable rubric description

Comparison

| | agent-orchestra | LangGraph | CrewAI | AutoGen/AG2 | |---|---|---|---|---| | Core abstraction | Confidence-gated orchestration | State machine graphs | Role-based crews | Multi-party conversation | | Confidence scoring | First-class primitive with rubrics, calibration, propagation | Not built-in | Not built-in | Not built-in | | Disagreement detection | Automatic with configurable resolution | Manual | Manual | Emergent (uncontrolled) | | Circuit breakers | Built-in | Not built-in | Not built-in | Not built-in | | Cross-agent chaining | Declarative with confidence gating | Graph edges | Sequential/parallel tasks | Conversation turns | | Human-in-the-loop | Confidence-triggered escalation | Checkpoint-based | Limited | Manual | | Language | TypeScript | Python, TypeScript | Python | Python | | Model lock-in | None | LangChain ecosystem | None | None | | Framework weight | ~12 KB (zero dependencies) | Heavy (LangChain) | Medium | Medium |

When to use agent-orchestra: You need multiple AI agents to coordinate on tasks where reliability matters more than speed, and you want fine-grained control over when the system trusts its own output.

When to use something else: You need a full application framework (LangGraph), rapid prototyping with role-based agents (CrewAI), or conversational multi-agent research (AutoGen).

Features at a Glance

Zero runtime dependencies — ships at ~13 KB minified
Dual CJS/ESM — works in Node.js, Bun, Deno, and bundlers
Full TypeScript — strict types with exported interfaces for everything
Model-agnostic — use OpenAI, Anthropic, local models, or deterministic functions
38 tests — comprehensive coverage of confidence, agents, circuit breakers, and orchestration

Observability

agent-orchestra emits structured events for every orchestration decision:

const orchestra = new Orchestrator({
  agents: [reviewer, security],
  logger: {
    onAgentStart: (agentId, task) => { /* ... */ },
    onAgentComplete: (agentId, result) => { /* ... */ },
    onConfidenceGate: (result, decision) => { /* ... */ },
    onDisagreement: (disagreement) => { /* ... */ },
    onCircuitBreak: (reason) => { /* ... */ },
  },
});

Every event includes agentId, taskId, timestamp, tokensUsed, and confidence. Pipe these to your observability stack (Datadog, Grafana, LangSmith, or a plain JSON log) for dashboards, alerting, and calibration monitoring.

Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme