@selvaonline/agentpack

v0.2.0

Published

9 days ago

Define your AI agent team in one YAML file — get a LangGraph supervisor, a live agent-network UI, an auto-generated MCP server, and a behavioral eval harness. TypeScript, batteries included.

Downloads

592

0High
0Medium
0Low

selvaonline

ai-agents multi-agent langgraph mcp mcp-server llm agent-framework evals typescript

⚡ agentpack

Define your AI agent team in one YAML file. Get a multi-agent supervisor, a live network UI, an MCP server, and a behavioral eval harness — out of the box. TypeScript.

agentpack dev UI — live agent network during a run

The built-in dev UI during a live run: switch between teams, click an example prompt, and watch the supervisor delegate across the network — animated dotted connectors trace every delegation, tools glow as they execute, every hop is streamed and counted. You write none of this.

Why

Multi-agent frameworks give you orchestration primitives and a blank screen. Everything that makes an agent team demoable, debuggable, and trustworthy — a UI that shows who's doing what, an MCP server so other agents can call your tools, evals that prove the team behaves — you build yourself, every time.

agentpack inverts that. The orchestration engine (LangGraph.js) is the boring part. The batteries are the product:

| You write | You get for free | |---|---| | agentpack.yaml — the team | LangGraph supervisor with delegation, retries, conversation memory | | Prompts — markdown files | Live agent-network dev UI with hop-by-hop animation (SSE) | | Tools — plain TS objects | Auto-generated MCP server (every tool, schemas derived, zero wrappers) | | Eval cases — JSON | Behavioral eval harness: routing, completeness, refusal, latency budgets |

Live demo: agentpack.selvaonline.com — the deal-ma template running on AWS: ask it to evaluate an acquisition and watch the six-agent network work. Its MCP server is public too: https://agentpack.selvaonline.com/mcp.

Docs: selvaonline.github.io/agentpack

Five minutes to a running team

npx @selvaonline/agentpack init my-team
cd my-team && npm install
cp .env.example .env        # add ONE key: OpenAI, Gemini, or Groq
npm run dev                 # → http://localhost:3000

Or skip the template and let AI design the whole project from a description — manifest, specialist prompts, tool stubs with realistic sample data, and evals:

npx @selvaonline/agentpack init --describe "a restaurant site-selection team: \
analyze foot traffic and demographics, score competition, recommend the best site"

You now have a working travel-planning team — a supervisor delegating to a destination scout, a budget planner, and an itinerary writer over deterministic demo tools. Ask it:

Plan a 5-day trip to Japan in spring on a $3000 budget: research the destination, estimate the costs, and write a day-by-day itinerary.

Then prove it behaves:

npm run eval                # routing / completeness / refusal — against the live server

PASS routing_budget          agents=["budget_planner"] tools={"estimate_costs":1}
PASS completeness_full_plan  agents=["destination_scout","budget_planner","itinerary_writer"]
PASS honesty_refusal         agents=[]  (off-topic request: no delegation, polite decline)

Templates: a full vertical team in one command

The starter team is a toy. The vertical templates are the product — complete teams with prompts, deterministic demo tools, and eval suites:

npx @selvaonline/agentpack templates                            # list available templates
npx @selvaonline/agentpack init my-fund      --template deal-vc
npx @selvaonline/agentpack init acquisitions --template deal-ma
npx @selvaonline/agentpack init coverage     --template equity-research

| Template | Team | Tools (deterministic demo data — swap for your APIs) | |---|---|---| | deal-ma | M&A acquisition team | target screening, weighted risk scoring, sector multiples, real DCF math, portfolio fit | | deal-vc | VC investment team | deal-flow sourcing, founder/market risk, TAM/SAM/SOM + dilution math, fund-thesis fit | | deal-procurement | Procurement sourcing team | vendor search, supplier risk, category intel, TCO modeling, spend concentration | | equity-research | Equity research team | universe screening, earnings-quality scoring, DCF + peer multiples, initiation notes | | claims-triage | Insurance claims triage | claim/policy lookup, fraud scoring, severity + reserve math, triage decisions | | support-triage | Customer support triage | ticket queue, known-issue matching, P1-P4 + SLA scoring, drafted replies | | starter | Trip-planning team | the gentlest possible introduction |

M&A deal team mid-run

The deal-ma template mid-run: Deal Lead delegating across all six roles. Every template ships with its own eval suite — all three pass 9/9.

Reshape the team in seconds

The "agent categories" are pure configuration. Remove the portfolio manager? Delete its block from agentpack.yaml, restart — 2 seconds. Add an ESG analyst? Add a block, a prompt file, and a tool:

  - name: esg_analyst
    description: Screens deals for ESG red flags and reporting obligations.
    prompt: ./prompts/esg_analyst.md
    tools: [esg_screen]

The supervisor, the network UI, the MCP server, and the eval harness all pick up the new topology automatically — no code changes anywhere.

The manifest is the framework

# agentpack.yaml
name: trip-planner
title: Trip Planner Agent      # display name for the UI (optional)

supervisor:
  name: trip_advisor
  # prompt: ./prompts/supervisor.md   # optional — a disciplined default is generated

specialists:
  - name: destination_scout
    description: Researches destinations, attractions, and seasonal weather.
    prompt: ./prompts/destination_scout.md
    tools: [search_destinations, get_weather]

  - name: budget_planner
    description: Estimates trip costs and builds budget breakdowns.
    prompt: ./prompts/budget_planner.md
    tools: [estimate_costs]
    approval: true                   # human-in-the-loop: pause until the user approves

tools:
  - ./tools                          # directory of plain TS/JS modules — auto-discovered
  # - mcp:https://remote-host/mcp    # or pull every tool from a remote MCP server

Tools are dependency-free duck-typed objects — no imports from agentpack, trivially unit-testable, reusable anywhere:

// tools/budget.ts
export const estimateCosts = {
  schema: {
    name: "estimate_costs",
    description: "Estimate total trip cost by region and travel style.",
    parameters: {
      type: "object" as const,
      properties: {
        region: { type: "string", description: "Asia | Europe | ..." },
        days: { type: "number", description: "Trip length in days" },
      },
      required: ["region", "days"],
    },
  },
  async execute({ region, days }) {
    return { totalPerPerson: /* deterministic math, zero LLM tokens */ };
  },
};

Edit the YAML, restart. That's the whole iteration loop.

What `agentpack dev` serves

| Endpoint | What it does | |---|---| | GET / | Dev UI — live network graph, token streaming, hop animation, event feed, light/dark themes | | POST /api/run | Run a query ({ query, threadId? } → { runId }); follow-ups on the same threadId keep conversation memory | | GET /api/stream/:runId | SSE stream — hop, tool_executing, agent_step, answer_token, usage, approval_request, … (/events/:runId works too) | | POST /api/approve | Resolve a human-in-the-loop gate ({ runId, approvalId, approve }) | | GET /api/network | {nodes, edges} team topology | | GET /api/tools · POST /api/tools/execute | Inspect and call any tool directly — zero tokens, deterministic, golden-testable | | GET /api/toolcatalog · POST /api/packs | Tool catalog + build-your-own teams: compose a new team from loaded tools at runtime, no restart | | POST /api/suggest | AI-assisted builder: { description } → a full team spec (specialists, prompts, tool picks) designed by the LLM | | GET /widget.js | Embeddable live network panel — one script tag drops your team into any web page | | ALL /mcp | Auto-generated MCP server (Streamable HTTP) — connect Claude, Cursor, or any MCP client to your team's tools |

Connect from Cursor or Claude Desktop:

{ "mcpServers": { "trip-planner": { "url": "http://localhost:3000/mcp" } } }

Build a team in the browser — no code

The dev UI ships with a ＋ Build your own tab: name your team, define specialists (name, role, system prompt), assign them tools picked from the catalog of everything the server has loaded, and hit Launch — the team goes live instantly with the full network view, its own MCP endpoint at /mcp/<name>, and conversation memory.

Or skip the form entirely: type a one-line description of the agent you want ("a family trip planner: research destinations and weather, then budget the whole group") and hit ✨ Generate team — the AI designs the specialists, writes their prompts, and picks matching tools from the catalog. Review, tweak, launch. One click exports your design as a ready-to-run agentpack.yaml.

Browser-built teams persist across restarts, are shareable by URL (/?pack=<name>), and expire after 24 hours (configurable via AGENTPACK_PACK_TTL_HOURS). They compose existing tools only — for custom tools and a permanent setup, scaffold a project with npx @selvaonline/agentpack init — or disable the feature entirely with AGENTPACK_DYNAMIC=0.

Anti-rationalization guardrails

Agents reliably talk themselves out of doing the work — "I need more information", "the data doesn't cover this exact case". Every agentpack prompt (supervisor and specialists, including browser-built teams) is compiled with a built-in excuse → rebuttal table that shuts these down, and manifests can add domain-specific pairs:

guardrails:
  - excuse: "This metric is an estimate, so I should not quote it"
    rebuttal: "Quote it and label it as an estimate."

Human-in-the-loop approval gates

Mark any specialist approval: true in the manifest (or tick the checkbox in the browser builder) and every delegation to it pauses the run until you click Approve or Decline in the UI — or POST /api/approve programmatically. Declines are reported honestly in the final answer; unanswered requests auto-decline after 5 minutes.

Embed your team in any page

<script src="https://your-agentpack-host/widget.js" data-pack="ma-deal-team"></script>
<div id="agentpack-widget"></div>

The widget renders a compact live network panel (Shadow DOM — no CSS clashes) with a query box, animated specialist activity, and the token-streamed answer.

Evals as a first-class citizen

evals/cases.json declares behavioral expectations; the harness drives the real server through the same SSE contract the UI uses — so a green suite means the full stack works:

{
  "name": "completeness_full_plan",
  "query": "Plan a 5-day trip to Japan in spring on a $3000 budget...",
  "expect_specialists": ["destination_scout", "budget_planner", "itinerary_writer"],
  "expect_keywords": ["day"],
  "budget_s": 240
}

Checks available per case: expect_specialists / min_specialists (routing), expect_keywords / expect_any_keywords (completeness), expect_no_delegation (refusal/honesty), min_tool_calls (tool usage), budget_s (latency — warn at 1×, fail at 2×).

Add a judge block and run npx agentpack eval --judge for LLM-as-judge scoring on top of the deterministic checks:

"judge": {
  "criteria": "The answer must read like an IC memo: concrete figures, a quantified risk assessment, valuation numbers, and an explicit recommendation.",
  "min_score": 7
}

A GitHub Actions workflow ships in .github/workflows/ci.yml — typecheck + build on every push, full behavioral evals when an LLM key secret is configured.

Embed it programmatically

import { loadManifest, createServer } from "@selvaonline/agentpack";

const pack = await loadManifest("./agentpack.yaml");
const { app, registry } = createServer(pack);   // an Express app — mount or extend it
app.listen(3000);

Or skip YAML entirely and build the AgentPack object in code — the manifest is sugar, not a requirement.

Design principles

One runtime. No sidecar servers, no HTTP bridges between your tools and your agents. Everything runs in-process in Node.
Plain config. YAML + markdown + JSON Schema. No bespoke config languages.
Tools are deterministic, LLMs decide. Tools take typed args and return JSON — testable with golden tests, callable for zero tokens via API/MCP. The LLM's only job is orchestration and synthesis.
Glass box by default. If you can't watch the delegation happen, you can't debug it and you can't demo it.
Evals or it didn't happen. A team you can't test is a liability. The harness ships in the box, not as homework.

How agentpack compares

These are all good tools — they optimize for different jobs:

| | agentpack | CrewAI | Mastra | Langflow / Flowise | |---|---|---|---|---| | Language | TypeScript | Python | TypeScript | Visual (Python runtime) | | Team definition | One YAML manifest + markdown prompts | YAML (agents + tasks) + Python code | Code-first (new Agent({...})) | Drag-and-drop flows | | Live delegation UI | Built-in, demoable to non-developers | — | Studio (developer debugging) | Flow editor | | Expose your team as an MCP server | Auto-generated per team | — | — (consumes MCP tools) | — | | Behavioral evals | In the box — same SSE contract as the UI | External integration | External integration | — | | Compose a team in the browser | Yes, from the loaded tool catalog | — | — | Yes (flow paradigm) | | Domain templates | Full deal teams (M&A, VC, procurement) with evals | Examples | Examples | Community flows |

Pick agentpack when the job is: go from a YAML file to a demoable, MCP-exposed, eval-tested supervisor/specialist team in minutes, without leaving TypeScript.

Pick something else when it isn't: LangGraph for explicit graph control in Python (agentpack runs on LangGraph.js under the hood), CrewAI for Python role-based crews with the largest community, Mastra for code-first TS agents with RAG and workflow primitives, Langflow/Flowise for visual pipeline building.

A production example

DealSense — a commercial real estate deal-intelligence platform (multi-agent underwriting, risk scoring, IC memos) — is the flagship application of this architecture, deployed on AWS at reagent.selvaonline.com. agentpack is that platform's core, extracted and generalized.

Project structure

src/
  types.ts            AgentPack / AgentTool / SpecialistSpec + event vocabulary
  manifest.ts         agentpack.yaml loader (prompts from files, tools auto-discovered)
  mcpTools.ts         remote MCP servers as tool sources (tools: [mcp:https://...])
  registry.ts         instance-based tool registry + topology
  supervisor.ts       generic LangGraph.js supervisor (delegation, memory, streaming, approvals)
  server.ts           Express factory: run API, SSE, tools API, dev UI, MCP, widget
  mcp.ts              auto-generated MCP server from the registry
  devui.ts            zero-build live network UI
  widget.ts           embeddable network-panel widget (/widget.js)
  evals.ts            behavioral eval harness (+ LLM-as-judge)
  cli.ts              agentpack init / templates / dev / eval
templates/
  starter/            trip-planning team — the gentle introduction
  deal-ma/            M&A acquisition team (6 roles, 7 tools, evals)
  deal-vc/            VC investment team (6 roles, 6 tools, evals)
  deal-procurement/   procurement sourcing team (6 roles, 5 tools, evals)
  equity-research/    equity research team (4 roles, 5 tools, evals)
  claims-triage/      insurance claims triage (4 roles, 4 tools, evals)
  support-triage/     customer support triage (4 roles, 4 tools, evals)

Roadmap

[x] Multi-pack serving (one server, many teams, switchable in the UI)
[x] Build-your-own teams in the browser (persistent, shareable by URL)
[x] Light/dark theme dev UI
[x] More vertical templates (equity research, claims triage, support triage)
[x] agentpack eval --judge — LLM-as-judge scoring on top of deterministic checks
[x] Remote MCP servers as tool sources (tools: [mcp:https://...] in the manifest)
[x] Embeddable network-panel web component (/widget.js)
[x] Streaming token-level answers + live token meter
[x] Human-in-the-loop approval gates (approval: true)
[x] AI-assisted building everywhere: init --describe in the CLI, ✨ Generate team in the browser
[x] Anti-rationalization guardrails compiled into every prompt (guardrails: in the manifest)
[x] Agent Skill for coding agents (skills/build-agent-team) — scaffold agentpack teams from Claude Code / Cursor / Codex
[ ] Persistent conversation memory (Redis/Postgres checkpointer)
[ ] OpenTelemetry tracing
[ ] Multi-supervisor hierarchies (teams of teams)

About

Built by Selvakumar Murugesan. Declarative agent teams for the TypeScript ecosystem, with the batteries (UI, MCP, evals) included.

Open to opportunities in AI engineering / agentic systems. Issues and PRs welcome — first-time contributors get same-day responses.