npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@lloyal-labs/sdk

v2.0.0

Published

Backend-agnostic TypeScript SDK for the lloyal inference platform

Downloads

1,200

Readme

@lloyal-labs/sdk

Backend-agnostic inference primitives for the lloyal HDK.

Composable inference primitives for forkable decode state, shared-prefix KV branching, and continuous tree batching. Branches share a KV prefix while keeping independent machinery — sampler chain, grammar, logits snapshot, perplexity tracker — for controlled divergence at decode time. BranchStore packs tokens from N branches (each at a different position, different seq_id, each needing independent logits captured) into a single llama_batch and dispatches once.

npm i @lloyal-labs/sdk

The SDK exports the SessionContext contract and the primitives that operate on it. A backend binding (e.g. @lloyal-labs/lloyal.node for Node) provides createContext() — the SDK takes it from there. Underneath, liblloyal is the C++ core; the Node binding is one front-end on top of it.

The Branch API

import { createContext } from '@lloyal-labs/lloyal.node';
import { Branch, BranchStore } from '@lloyal-labs/sdk';

const ctx = await createContext({ modelPath: './model.gguf', nSeqMax: 6 });
const store = new BranchStore(ctx);

// Shared prompt: "Explain quantum entanglement"
const prompt = await ctx.tokenize('Explain quantum entanglement');

const root = Branch.create(ctx, 0, { temperature: 0.8 });
await root.prefill(prompt);

// Fork 4 branches — each gets a different reasoning prefix
const analogy  = await root.fork();
const formal   = await root.fork();
const socratic = await root.fork();
const visual   = await root.fork();

// Scatter-prefill: inject divergent prefixes in one batched dispatch
// 4 branches × variable lengths → auto bin-packed into minimal GPU calls
await store.prefill([
  [analogy,  await ctx.tokenize('Think of it like two coins...')],    // 12 tokens
  [formal,   await ctx.tokenize('In quantum mechanics, the...')],     // 8 tokens
  [socratic, await ctx.tokenize('What happens when you measure...')], // 10 tokens
  [visual,   await ctx.tokenize('Imagine two particles...')],         // 7 tokens
]);

// Generate — all 4 in lockstep, 1 GPU call per step
const branches = [analogy, formal, socratic, visual];
for (;;) {
  const live = branches.filter(b => !b.disposed);
  if (!live.length) break;

  const entries: [Branch, number][] = [];
  for (const b of live) {
    const { token, text, isStop } = b.produceSync();
    if (isStop) { b.pruneSync(); continue; }
    entries.push([b, token]);
  }
  if (!entries.length) break;
  await store.commit(entries);
}

// Winner takes all — one seq_keep pass, losers vaporized
const winner = branches
  .filter(b => !b.disposed)
  .reduce((a, b) => (a.perplexity < b.perplexity ? a : b));
await store.retainOnly(winner);

Or for single-branch generation, Branch is an async iterable — generate until EOG:

for await (const { token, text } of branch) {
  process.stdout.write(text);
}

Continuous Tree Batching

Tree search with N branches means N calls to llama_decode() — each paying GPU dispatch overhead, memory barriers, and PCIe round-trips. BranchStore eliminates this: tokens from N branches are packed into a single llama_batch and dispatched once. N branches, 1 GPU call.

Two packing strategies for different access patterns:

// commit: 1 token per branch — one GPU dispatch for N branches
await store.commit([[branch1, tok1], [branch2, tok2], [branch3, tok3]]);

// prefill: variable tokens per branch — asymmetric injection
await store.prefill([
  [branchA, systemTokens],  // 200 tokens
  [branchB, queryTokens],   //  12 tokens
  [branchC, docTokens],     // 800 tokens
]);
// Greedy bin-packed into ceil(total / nBatch) dispatches

Topology

Parent/child edges are always-on. Simple chat to best-of-N to deep search is one continuum.

branch.parent;       // handle or null if root
branch.children;     // child handles
branch.isLeaf;       // no children?

| Method | Behavior | |--------|----------| | pruneSync() | Throws if children exist | | pruneSubtreeSync() | Iterative post-order traversal |

Per-Token Metrics

Every branch exposes runtime-accessible information-theoretic measures on every step:

branch.modelEntropy();        // Shannon entropy of full vocab distribution (bits)
branch.modelSurprisal(token); // -log2(p) for a specific token
branch.perplexity;            // model-level PPL (exp of mean NLL from raw logits)
branch.samplingPerplexity;    // sampling-level PPL (from filtered distribution)

Session

Session manages the conversation trunk — the single promoted branch that accumulates verified context across queries.

const session = new Session({ ctx, store });

// High-level: extend the trunk with a new query–answer pair
await session.commitTurn('What is quantum entanglement?', answer);

// Lower-level building blocks (for harnesses that orchestrate trunk lifecycle directly)
await session.prefillUser('What is quantum entanglement?');
await session.promote(verifiedBranch);

// Next query starts from the promoted trunk's KV state
session.trunk;  // the live branch

commitTurn is the recommended high-level helper. Future queries fork from session.trunk and read prior conversation through KV attention — no prompt-history injection.

Rerank

Backend-agnostic reranker. The caller provides a SessionContext — how it was created (local, remote, quantized) is not the SDK's concern.

import { Rerank } from '@lloyal-labs/sdk';

const reranker = await Rerank.create(ctx, { nSeqMax: 8 });
const scores = await reranker.rank(query, documents);

Exports

// Classes
export { Branch, BranchStore, Session, Rerank };

// Delta builders (for tool result injection)
export { buildUserDelta, buildToolResultDelta };

// Types
export type { SessionContext, SamplingParams, Produced, ContextOptions, ... };

License

Apache-2.0