npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@lloyal-labs/lloyal-agents

v2.1.0

Published

Multi-agent inference inside the decode loop — structured concurrency over shared KV state

Readme

@lloyal-labs/lloyal-agents

Continuous Context agent runtime for the lloyal HDK.

lloyal-agents runs multi-agent inference inside the decode loop. Instead of N independent model calls rebuilding the prompt each step, all agents advance inside one continuous decode process — forked from shared KV cache state, driven through a single GPU forward pass per tick, spawning sub-agents from their own live branches at arbitrary depth.

Built on lloyal.node, which provides forkable decode state and continuous tree batching over llama.cpp. lloyal-agents adds structured concurrency, tool dispatch, and a five-phase tick loop. Orchestration is not a layer above inference — it is inference.

npm i @lloyal-labs/lloyal-agents @lloyal-labs/lloyal.node

lloyal-agents provides the agent runtime. lloyal.node provides the native inference backend — prebuilt binaries for macOS (Metal, CPU), Linux (CPU, CUDA, Vulkan), and Windows (CPU, CUDA, Vulkan). Both are required. GPU selection at runtime.

Public API

import {
  initAgents,        // bootstrap: session, store, event channel
  useAgent, agent,   // single-agent helpers
  agentPool,         // multi-agent pool with a swappable orchestrator
  useAgentPool,      // lower-level Effection resource (advanced)
  diverge,           // multi-branch perplexity selection
  parallel, chain, fanout, dag, reduce,  // orchestrators / combinators
  withSpine,         // scoped spine branch with guaranteed teardown
  Tool, Source,
  DefaultAgentPolicy,
  Ctx, Store, Events,
} from "@lloyal-labs/lloyal-agents";

Bootstrap

import { main, call } from "effection";
import { createContext } from "@lloyal-labs/lloyal.node";
import { initAgents } from "@lloyal-labs/lloyal-agents";

main(function* () {
  const ctx = yield* call(() =>
    createContext({
      modelPath: "model.gguf",
      nCtx: 32768,
      nSeqMax: 8,
      typeK: "q4_0",
      typeV: "q4_0",
    }),
  );

  yield* initAgents(ctx);
  // Ctx, Store, Events now set — useAgent(), agentPool(), diverge()
  // find them automatically. Session + context disposed on scope exit.
});

Shared Frontier

When agents fork from a common branch, they inherit its KV cache — the full attention state up to the fork point. This boundary is the shared frontier: the last position where all agents had identical computational state.

Everything before the frontier is shared context. Everything after is independent reasoning. The model doesn't need to be told what the other agents know — it already attended over the same prefix. Communication happened at prefill time, through the attention mechanism, with zero serialization overhead.

yield* withSpine(
  { systemPrompt: PLAYBOOKS, tools },
  function* (spine) {
    // spine is a prefilled branch — system prompt + tool schemas already in KV.
    // Every agent forked from spine shares that prefix.
    return yield* agentPool({
      orchestrate: parallel(
        questions.map((q) => ({
          content: q,
          systemPrompt: WORKER_PROMPT,
        })),
      ),
      tools: [...sourceTools, reportTool],
      parent: spine,
      terminalTool: "report",
    });
  },
);

withSpine creates the spine branch, passes it to the body, and guarantees cleanup via try/finally — the spine cannot leak out of the block. Effection enforces the lifetime.

Orchestrators

agentPool accepts an orchestrator that determines how agents are spawned and sequenced:

  • parallel(specs[]) — agents run concurrently from the shared spine.
  • chain(specs[], factory) — sequential, with extendSpine writing each task's findings onto the spine before the next forks.
  • fanout(landscapeSpec, domainSpecs[]) — landscape pass that informs N parallel domain agents.
  • dag(nodes[]) — arbitrary acyclic graph with multi-parent edges (Task-as-Future pattern).

Same agentPool call shape; the orchestrator argument changes the topology.

In-Loop Orchestration

All active agents advance together in a five-phase tick loop:

SPAWN+EXTEND. The rendezvous point with the orchestrator fiber. Pending agent spawns and extendSpine calls are queued via Effection action() and drained at the start of each tick — batched into a single store.prefill(). Single-fiber discipline preserved across concurrent orchestrator extends.

PRODUCE. Every generating agent calls produceSync() — synchronous sampling with no async gap between agents. The entire produce phase is a single uninterrupted pass over the active set.

COMMIT. One store.commit() call packs all produced tokens into a single llama_batch and dispatches once. N branches, one GPU call.

SETTLE. Tool results that resolved during the prior DISPATCH are drained from a buffer. Each result is tokenized into a delta, budget-checked against a fresh ContextPressure snapshot, and batch-prefilled into the agent's branch. Grammar state resets. The agent transitions back to generating.

DISPATCH. Tool calls collected during PRODUCE are executed sequentially via scoped() + call(). Each tool runs to completion before the next begins. Tools return Operation<unknown>, so they can yield* into framework primitives like useAgent or agentPool, spawning recursive sub-agents within the calling agent's scope.

When no agent is generating and tools are still pending, the loop yields control until the next tool resolves. No polling. No sleep loops.

Structured Concurrency DAG

Agent lifecycles are managed by Effection, a structured concurrency library for JavaScript. This is not optional sugar — it is load-bearing infrastructure. It is what makes recursive agents possible.

Every branch registers an ensure() callback at fork time:

function* setupAgent(parent, task, ctx) {
  const branch = parent.forkSync();
  yield* ensure(() => {
    if (!branch.disposed) branch.pruneSync();
  });
  // ...
}

If the scope exits — error, cancellation, normal completion — the branch is pruned. Orphaned branches are structurally impossible. Tool dispatch uses scoped() + call(); each tool executes inside a scoped error boundary within the agent pool scope. If the scope tears down, pending tools are cancelled. The DAG is not imposed on the orchestration. It is intrinsic to the Effection task tree.

useAgentPool is an Effection resource() — it suspends via provide() after all agents complete, but keeps their branches alive. The caller can fork sub-agents from any completed agent's branch. Those sub-agents inherit the parent agent's full KV state — every tool result it consumed, every reasoning step it took. No summarization. No context window management. The sub-agent continues from the parent's frontier.

Recursive agents work at two levels. At the harness level, a completed pool's branches can be forked into follow-up pools. At the model level, a tool's execute() returns Operation<unknown>, so it can yield* directly into useAgent or agentPool. An agent that calls such a tool spawns sub-agents mid-generation — inside its own scope, inheriting its KV state, with cleanup guaranteed by structured concurrency. DelegateTool (from @lloyal-labs/rig) is the canonical implementation.

There is nothing in the framework that limits depth. Agents can spawn sub-agents that spawn sub-agents. An agent pool can run inside another agent pool's scope. The structured concurrency guarantees compose at every level.

Hallucination Detection

The framework provides hallucination detection at two levels.

Per-token observables. Every branch exposes runtime-accessible signals on every step: branch.modelEntropy() (Shannon entropy of the full vocabulary distribution), branch.modelSurprisal(token) (surprisal of the chosen token: -log2(p)), branch.perplexity (model-level, from raw logits), and branch.samplingPerplexity (sampling-level, from the filtered distribution). The delta between model and sampling perplexity is itself a hallucination indicator — high sampling perplexity relative to model perplexity means the sampler is working against the model's probability mass.

Enable trace: true on agent pools to capture entropy and surprisal on every agent:produce event.

Multi-branch semantic comparison. diverge() forks N branches from a shared frontier, generates independently, and returns all outputs with their perplexity scores:

const result = yield* diverge({
  parent: root,            // shared frontier
  attempts: 3,             // fork 3 branches
  params: { temperature: 0.7 },
});
// result.best — lowest-perplexity branch, still alive
// result.attempts — all branches with output, ppl, token count
// Losers already pruned. Winner's branch is the caller's responsibility.

The harness decides how to compare. diverge() returns all outputs with their perplexity scores — the harness can apply any equivalence measure: bigram overlap, embedding similarity, or model-based evaluation. Where branches agree, the model is confident; where they diverge, hallucination risk is high.

This directly operationalizes the semantic entropy work from Farquhar et al. (Nature, 2024) — but as a runtime primitive, not a post-hoc metric. The key constraint: divergence from a common computational ancestor is signal. Divergence from independently-constructed contexts is sampling variance. This measurement is only meaningful because agents share a frontier.

Session Accumulation

Session.commitTurn(query, answer) extends the trunk with a new query–answer pair. Future queries fork from this trunk — its KV cache already contains everything the prior turn established.

A cold query starts from position 0. A warm query starts from an existing trunk. Over multiple queries, the session compounds — each turn advances the frontier, and future agents inherit the accumulated state via attention.

The lower-level building blocks (prefillUser, prefillToolResult, promote) are also exposed for harnesses that orchestrate the trunk lifecycle directly.

Context Pressure

KV cache is finite. ContextPressure snapshots the remaining budget on every tick and DefaultAgentPolicy enforces two thresholds:

  • softLimit (default 1024 tokens remaining): SETTLE rejects tool results that would cross this floor. PRODUCE hard-cuts agents requesting non-terminal tool calls. Terminal tools (e.g. report) still pass — agents can always submit findings. INIT drops agents that don't fit above this floor.
  • hardLimit (default 128 tokens remaining): agents killed immediately before produceSync(). No decode call is made below this line — it would crash.

Tool result prefill in the SETTLE phase is budget-gated against a fresh pressure snapshot. If a tool result doesn't fit, the agent is terminated rather than risking a context overflow mid-generation. The softLimit reserves space for downstream work — synthesis passes, verification.

yield* agentPool({
  orchestrate: parallel(tasks),
  tools, terminalTool: "report",
  policy: new DefaultAgentPolicy({
    budget: { context: { softLimit: 2048 } }, // reserve 2K for downstream
  }),
});

Tools

Tools are class-based with OpenAI-compatible function schemas:

import { Tool } from "@lloyal-labs/lloyal-agents";
import { call } from "effection";
import type { Operation } from "effection";
import type { ToolContext } from "@lloyal-labs/lloyal-agents";

class SearchTool extends Tool<{ query: string }> {
  readonly name = "search";
  readonly description = "Semantic search over the corpus";
  readonly parameters = {
    type: "object",
    properties: { query: { type: "string", description: "Search query" } },
    required: ["query"],
  };

  *execute(args: { query: string }, context?: ToolContext): Operation<unknown> {
    const results = yield* call(() => this.reranker.rank(args.query, this.chunks));
    return results.slice(0, 10);
  }
}

Pass the same Tool[] to withSpine({ systemPrompt, tools }) and to agentPool({ tools }): the spine decodes the tool schemas into KV once at setup, and the pool's dispatcher registers the same instances for runtime execute(). Two roles, one input — see the playbooks convention for mixed-role pools. (Advanced: createToolkit(tools) is still exported if you need direct access to the { toolMap, toolsJson } pair, e.g. for the low-level useAgentPool primitive.)

Events

The runtime emits structured events for TUI, logging, or telemetry:

| Event | Payload | | --------------------- | --------------------------------------------------------- | | agent:spawn | agentId, parentAgentId | | agent:produce | agentId, text, tokenCount, entropy?, surprisal? | | agent:tool_call | agentId, tool, args | | agent:tool_result | agentId, tool, result | | agent:tool_progress | agentId, tool, filled, total | | agent:return | agentId, result — voluntary completion via terminal tool | | agent:recovered | agentId, result — recovery extracted findings from killed agent | | agent:done | agentId |

Documentation

Full positioning, mechanics, learn pages, and reference at docs.lloyal.ai.

License

Apache-2.0