npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@framers/agentos

v0.6.14

Published

AgentOS — open-source TypeScript runtime for autonomous AI agents. Unified graph orchestration, cognitive memory, runtime tool generation, multi-tier guardrails, voice pipeline, and 21 LLM providers.

Readme

AgentOS — Open-Source TypeScript AI Agent Runtime with Cognitive Memory, HEXACO Personality, and Runtime Tool Forging

85.6% on LongMemEval-S at $0.0090/correct, +1.4 above Mastra OM gpt-4o (84.23%) · 70.2% on LongMemEval-M (1.5M-token variant), the only open-source library on the public record above 65% on M with publicly reproducible methodology · 16 LLM providers · 8 neuroscience-backed memory mechanisms · Apache-2.0

npm CI tests codecov TypeScript License LongMemEval-S LongMemEval-M agentos-bench Discord

Benchmarks · Website · Docs · npm · Discord · Blog


AgentOS is an open-source TypeScript runtime for AI agents that remember, adapt, and write their own tools.

When an agent encounters a sub-task no existing tool covers, it generates a TypeScript function with a Zod-described schema, sends it through an LLM judge, and on approval runs it in a hardened node:vm sandbox. The new tool joins the catalog for the rest of the session. When a multi-agent team hits a capability gap, the manager calls spawn_specialist and the LLM judge reviews the synthesized agent spec before it joins the live roster.

The runtime carries the parts of an agent that should outlive a single chat completion: persistent cognitive memory (Ebbinghaus decay, retrieval-induced forgetting, reconsolidation, source-confidence decay) grounded in published cognitive-science literature, optional HEXACO personality vectors that bias retrieval and routing, six multi-agent orchestration strategies, streaming guardrails, a voice pipeline, and one dispatch interface across 21 LLM providers. Apache-2.0.

100+ first-party extensions (channel adapters, tool packs, guardrail packs) and 88 curated SKILL.md skills auto-discover at startup through their respective registries: a host pulls a curated index and the runtime wires every tool, guardrail, channel, and skill without manual registration. The auto-loader is the same surface that runtime-forged tools join: an agent that invents a function in session N can promote it (with judge approval and SkillExporter) into a SKILL.md that the registry picks up on the next process start. Forging is how the surface grows mid-run; auto-discovery is how it ships as a first-class capability afterward.

On benchmarks: 85.6% on LongMemEval-S at $0.0090 per correct answer (gpt-4o reader, +1.4 points above Mastra's published 84.23%, 0.4 points behind Emergence.ai's 86% closed-source SaaS SOTA); 70.2% on LongMemEval-M (1.5M-token haystacks, 500 sessions per question), the only open-source library on the public record above 65% on M with publicly reproducible methodology. Per-case run JSONs and single-CLI reproduction ship in agentos-bench.


Install

npm install @framers/agentos
import { agent } from '@framers/agentos';

const tutor = agent({
  provider: 'anthropic',
  instructions: 'You are a patient CS tutor.',
  personality: { openness: 0.9, conscientiousness: 0.95 },
  memory: { types: ['episodic', 'semantic'], working: { enabled: true } },
});

const session = tutor.session('student-1');
await session.send('Explain recursion with an analogy.');
await session.send('Can you expand on that?'); // remembers context

Full quickstart · Examples cookbook · API reference


Emergent Design

"So we and our elaborately evolving computers may meet each other halfway."

— Philip K. Dick, The Android and the Human, 1972

Three things accumulate across an AgentOS session and compose into behavior:

  1. Memory. What was said, what was decided, what was retrieved.
  2. Tool surface. Starts at whatever was registered. Can grow when an agent forges a new function mid-decision and the judge approves it.
  3. Personality (optional). A HEXACO trait vector that biases retrieval, specialist routing, and decision-making.

Each is configurable and observable; none crosses into "emergent agent" on its own. The composition is the interesting part.

Runtime Tool Forging

When an agent encounters a sub-task that no available tool covers, it generates a TypeScript function with a Zod-described input and output schema. A separate LLM call evaluates the forged function against the agent's stated intent and either approves or rejects it. Approved functions execute in a hardened node:vm sandbox with strict defaults (5-second wall clock, 128 MB heap-delta budget, eval / require / process banned, fetch / fs / crypto allowlist-empty by default). Approved tools join a discoverable index keyed by name and signature; subsequent turns invoke them via call_forged_tool(name, args). First forge costs full LLM tokens; reuse costs tens of tokens. Sandbox internals, isolation tradeoffs (node:vm vs queued isolated-vm for the hosted multi-tenant tier), and the full safety policy are in the emergent capabilities docs.

The pattern the runtime supports: an agent forges a tool mid-decision, the judge approves it, that turn invokes it, and a few turns later a different specialist agent in the same session invokes the same tool because the index made it findable. Promoted tools can be exported as SKILL.md skills via SkillExporter and join the auto-discovery surface on the next process start.

HEXACO Personality (optional)

Personality is opt-in. The runtime behaves identically with or without a trait vector, and most production deployments do not pass one.

// Personality-neutral (most production agents)
const support = agent({
  provider: 'openai',
  instructions: 'Resolve customer tickets.',
  memory: { types: ['episodic', 'semantic'] },
});

// Opt-in HEXACO (when persona consistency across sessions matters)
const coach = agent({
  provider: 'openai',
  instructions: "Long-running career coach. Hold the user accountable to their stated goals across weekly check-ins; flag drift, push back on excuses, escalate when goals shift.",
  personality: {
    conscientiousness: 0.9,    // won't let goals drift between sessions
    honestyHumility: 0.85,     // won't tell the user what they want to hear
    emotionality: 0.3,         // stays steady when the user is reactive
  },
  memory: { types: ['episodic', 'semantic'] },
});

When a vector is supplied, the kernel weights retrieval, specialist routing, and tool selection by the trait values. Same agent, same prompt, same tools: a high-Openness leader and a high-Conscientiousness leader produce measurably different decision sequences. Personality lives in the kernel, not in the prompt — prompt-only personality dissolves under context pressure while kernel-encoded bias persists. The vector remains editable, inspectable, and removable on consent.


Memory Benchmarks

gpt-4o reader, gpt-4o-2024-08-06 judge, full N=500 across every row. Cross-provider numbers are excluded from the tables because their public methodology disclosures don't admit reproduction.

LongMemEval-S (115K tokens, 50 sessions)

| System | Accuracy | $/correct | p50 latency | |---|---:|---:|---:| | EmergenceMem Internal | 86.0% | not published | 5,650 ms | | AgentOS (canonical-hybrid + reader-router) | 85.6% | $0.0090 | 3,558 ms | | Mastra OM gpt-4o (gemini-flash observer) | 84.23% | not published | not published | | Supermemory gpt-4o | 81.6% | not published | not published | | EmergenceMem Simple Fast (rerun in agentos-bench) | 80.6% | $0.0586 | 3,703 ms | | Zep (self / independent reproduction) | 71.2% / 63.8% | not published | not published |

+1.4 points above Mastra OM. EmergenceMem Internal posts 86.0% (0.4 above) but doesn't publish per-case results or a reproducible CLI; among open-source libraries with single-CLI reproduction at gpt-4o, 85.6% is the highest publicly reproducible number located. p50 latency 3,558 ms vs EmergenceMem's published median 5,650 ms.

Cross-provider numbers omitted from the table (different reader and/or undisclosed judge): Mastra OM 94.87% (gpt-5-mini + gemini-2.5-flash observer), agentmemory 96.2% (Claude Opus 4.6), MemMachine 93.0% (GPT-5-mini), Hindsight 91.4% (unspecified backbone).

LongMemEval-M (1.5M tokens, 500 sessions)

M's haystacks exceed every production context window; most vendors only publish on S.

| System | Accuracy | License | |---|---:|---| | LongMemEval paper, GPT-4o round Top-10 (paper's best) | 72.0% | open repo | | AgentBrain | 71.7% | closed-source SaaS | | LongMemEval paper, GPT-4o session Top-5 | 71.4% | open repo | | AgentOS (sem-embed + reader-router + Top-5) | 70.2% | Apache-2.0 | | LongMemEval paper, GPT-4o round Top-5 | 65.7% | open repo | | Mem0 v3, Mastra, Hindsight, Zep, EmergenceMem, Supermemory, Letta | not published | — |

At matched Top-5 retrieval, +4.5 above the round-level paper baseline (65.7%) and 1.2 below the session-level (71.4%); the paper's overall strongest GPT-4o result is 72.0% at Top-10. Of open-source libraries with publicly reproducible runs, AgentOS is the only one above 65% on M.

Full leaderboard → · Run JSONs → · Transparency audit → · LongMemEval paper (Wu et al., ICLR 2025, Table 3)

Methodology stack: bootstrap 95% CIs at 10k Mulberry32 resamples (seed 42), per-benchmark judge-FPR probes (S 1%, M 2%, LOCOMO 0%), per-case run JSONs, single-CLI reproduction. The transparency audit covers what the headline numbers don't: LOCOMO's ~6.4% answer-key error rate, the LongMemEval-S context-window confound, and the Mem0-vs-Zep comparison gaming case study, alongside which vendors disclose which methodology dimensions.


Ecosystem

| Package | Role | |---|---| | @framers/agentos | Core runtime: GMI agents, cognitive memory, multi-agent orchestration, guardrails, voice, 21 LLM providers. Apache 2.0. | | @framers/agentos-extensions | 100+ first-party extensions and templates: channel adapters, tool packs, integrations, guardrail packs. | | @framers/agentos-extensions-registry | Discovery + auto-loader layer for the extensions catalog. Hosts pull the index without pulling every implementation; the runtime resolves and registers packs at startup. | | @framers/agentos-skills | 88 curated SKILL.md skills covering common tasks. | | @framers/agentos-skills-registry | Discovery + auto-loader layer for the skills catalog. Also the surface where promoted forged tools land after SkillExporter. | | @framers/agentos-bench | Open benchmark harness. Bootstrap 95% CIs at 10k resamples, judge false-positive-rate probes, per-case run JSONs at fixed seed. MIT (the rest of AgentOS is Apache 2.0). | | @framers/sql-storage-adapter | Cross-platform SQL persistence: SQLite, Postgres, IndexedDB, Capacitor SQLite. | | paracosm | AI agent swarm simulation engine that uses AgentOS as its substrate. |

Extensions and skills auto-load at startup. The runtime walks each registry plus any user-supplied paths, resolves each pack's createExtensionPack(context) factory or SKILL.md frontmatter, and registers tools, guardrails, channels, and skills without manual wiring. Capability gating and HITL approval gates apply to side-effecting installs. See extensions architecture for the full loading model.


📄 Technical Whitepaper · Coming Soon

The full architecture and benchmark methodology, written for engineers and researchers who want a citable PDF instead of scrolling docs. Cognitive memory pipeline, classifier-driven dispatch, HEXACO personality modulation, runtime tool forging, full LongMemEval-S/M and LOCOMO benchmark methodology with confidence interval math, judge-FPR probes, per-stage retention metrics, and reproducibility recipes.

| Covers | What's inside | |---|---| | Architecture | Generalized Mind Instances, IngestRouter / MemoryRouter / ReadRouter, 8 cognitive mechanisms with primary-source citations | | Benchmarks | LongMemEval-S 85.6%, LongMemEval-M 70.2%, vendor landscape, confidence interval methodology, judge FPR probes, full transparency stack | | Reproducibility | Per-case run JSONs at --seed 42, single-CLI reproduction, Apache-2.0 bench at github.com/framersai/agentos-bench |

Join Discord for the announcement → · Read the benchmarks now →


Classifier-Driven Memory Pipeline

Most memory libraries retrieve on every query. AgentOS gates memory through three LLM-as-judge classifiers in a single shared pass, so trivial queries skip retrieval entirely and the rest get the right architecture and reader per category.

User query
    │
    ▼ Stage 1: QueryClassifier (gpt-5-mini, ~$0.0001/query)
    │    T0=none ─────► answer from context, skip retrieval
    │    T1+=needs memory
    ▼ Stage 2: MemoryRouter      → canonical-hybrid · OM-v10 · OM-v11
    ▼ Stage 3: ReaderRouter      → gpt-4o (TR/SSU) · gpt-5-mini (SSA/SSP/KU/MS)
    ▼
Grounded answer

Stages 2 and 3 reuse the Stage 1 classification, so the full pipeline costs one classifier call per query, not three. The T0 / no-memory gate is the novel piece: removing retrieval entirely for greetings and small talk saves the embedding + rerank + reader cost on a substantial fraction of typical agent traffic.

| Primitive | Source | Decision | |---|---|---| | QueryClassifier | @framers/agentos/query-router | T0/none vs T1/simple vs T2/moderate vs T3/complex | | MemoryRouter | @framers/agentos/memory-router | canonical-hybrid vs observational-memory-v10 vs v11 | | ReaderRouter | @framers/agentos/memory-router | gpt-4o vs gpt-5-mini per category |

Cognitive Pipeline docs → · Architecture deep dive → · Beyond RAG →


Why AgentOS

| vs. | AgentOS differentiator | |---|---| | LangChain / LangGraph | Cognitive memory (8 neuroscience-backed mechanisms), HEXACO personality, runtime tool forging | | Vercel AI SDK | Multi-agent teams (6 strategies), 7 vector backends, guardrails, voice/telephony | | CrewAI / Mastra | Unified orchestration (DAGs + graphs + missions), personality-driven routing, published reproducible numbers on LongMemEval-S (85.6%) and LongMemEval-M (70.2%) with full methodology disclosure |

Full framework comparison →


Key Features

| Category | Highlights | |---|---| | LLM Providers | 16: OpenAI, Anthropic, Gemini, Groq, Ollama, OpenRouter, Together, Mistral, xAI, Claude/Gemini CLI, + 5 image/video | | Cognitive Memory | 8 mechanisms: reconsolidation, retrieval-induced forgetting, involuntary recall, FOK, gist extraction, schema encoding, source decay, emotion regulation | | HEXACO Personality | 6 traits modulate memory, retrieval bias, response style | | RAG Pipeline | 7 vector backends · 4 retrieval strategies · GraphRAG · HyDE · Cohere rerank-v3.5 | | Multi-Agent Teams | 6 coordination strategies · shared memory · inter-agent messaging · HITL gates | | Orchestration | workflow() DAGs · AgentGraph cycles · mission() goal-driven planning · checkpointing | | Guardrails | 5 security tiers · 6 packs (PII, ML classifiers, topicality, code safety, grounding, content policy) | | Emergent Capabilities | Runtime tool forging · 4 self-improvement tools · tiered promotion · skill export | | Voice & Telephony | ElevenLabs, Deepgram, Whisper · Twilio, Telnyx, Plivo | | Channels | 37 platform adapters (Telegram, Discord, Slack, WhatsApp, webchat, ...) | | Observability | OpenTelemetry · usage ledger · cost guard · circuit breaker |


Multi-Agent in 6 Lines

import { agency } from '@framers/agentos';

const team = agency({
  strategy: 'graph',
  agents: {
    researcher: { provider: 'anthropic', instructions: 'Find relevant facts.' },
    writer:     { provider: 'openai',    instructions: 'Summarize clearly.',  dependsOn: ['researcher'] },
    reviewer:   { provider: 'gemini',    instructions: 'Check accuracy.',     dependsOn: ['writer'] },
  },
});

const result = await team.generate('Compare TCP vs UDP for game networking.');

Strategies: sequential · parallel · debate · review-loop · hierarchical · graph. With strategy: 'hierarchical' + emergent: { enabled: true }, the manager LLM gets a spawn_specialist tool that mints new sub-agents at runtime when the static roster doesn't cover a sub-task. agency() is for single-request multi-agent coordination — for long-running world simulations or per-turn parallel agent loops, build your own orchestration with agent() + the lower-level primitives. Multi-agent docs → · Hierarchical + emergent → · Scope guide →


See It In Action

🌀 Paracosm — AI Agent Swarm Simulation

Define any scenario as JSON. Run it with AI commanders that have different HEXACO personalities. Same starting conditions, different decisions, divergent civilizations. Built on AgentOS.

npm install paracosm

Live Demo · GitHub · npm


Configure API Keys

Three layers, highest priority first:

// 1. Inline on the call (per-tenant, per-test, per-customer)
generateText({ apiKey: 'sk-customer', prompt: '...' });

// 2. Module-level default — set once at boot, no .env needed
import { setDefaultProvider } from '@framers/agentos';
setDefaultProvider({ provider: 'openai', apiKey: process.env.MY_OWN_KEY });

// 2b. Reorder the env-var auto-detect chain instead (when you keep multiple keys)
import { setProviderPriority } from '@framers/agentos';
setProviderPriority(['anthropic', 'openai', 'ollama']);

// 3. Environment variable auto-detect chain (default order)
//    OpenRouter → OpenAI → Anthropic → Gemini → Groq → Together → Mistral
//    → xAI → claude CLI → gemini CLI → Ollama → image providers
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=AIza...

# Comma-separated keys auto-rotate with quota detection
export OPENAI_API_KEY=sk-key1,sk-key2,sk-key3

Full credential resolution + default models per provider →


API Surfaces

  • agent(): lightweight stateful agent. Prompts, sessions, personality, hooks, tools, memory.
  • agency(): multi-agent teams + full runtime. Emergent tooling, guardrails, RAG, voice, channels, HITL.
  • generateText() / streamText() / generateObject() / generateImage() / generateVideo() / generateMusic() / performOCR() / embedText(): low-level multi-modal helpers with native tool calling.
  • workflow() / AgentGraph / mission(): three orchestration authoring APIs over one graph runtime.

Provider fallback is an explicit opt-in via agent({ fallbackProviders: [...] }) (or buildFallbackChain() for programmatic chains). Defaults to off — the runtime never silently retries against a different provider unless you configured a chain.

Full API reference → · High-Level API guide →


Documentation & Community


Contributing

git clone https://github.com/framersai/agentos.git && cd agentos
pnpm install && pnpm build && pnpm test

Contributing Guide · We use Conventional Commits.


License

Apache 2.0

Built by Manic Agency LLC · Frame.dev · Wilds.ai