npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

shortlist

v0.1.2

Published

Dynamic tool selection for AI SDK agents — index your tools and expose only the most relevant ones to the model on every step.

Readme

shortlist

Dynamic tool selection for AI SDK agents.

When an agent has 50, 100, or 200+ tools, handing the model all of them on every step is a problem twice over: it burns context tokens, and it lowers tool-calling accuracy — the right tool gets lost in the noise. shortlist indexes your tools once and, on each step, exposes only the handful that actually match what the user is trying to do.

import { createToolIndex } from "shortlist";
import { generateText } from "ai";

const index = createToolIndex(allTools); // 200 tools, no API key needed

const result = await generateText({
  model,
  tools: allTools,
  prepareStep: index.prepareStep({ maxTools: 5 }), // model sees the best 5 per step
  prompt: "Refund the last charge for customer Acme.",
});
  • Zero-config default — keyword search (BM25 + TF-IDF) that needs no API key and runs in well under a millisecond.
  • Better with embeddings — add an embedding model for semantic search; shortlist fuses it with keyword search automatically.
  • Drops into the AI SDKprepareStep, wrapLanguageModel middleware, or a select() you call yourself.
  • Built for agent loops — adaptive cutoffs, miss-driven escalation, recently-used tool retention, related-tool expansion, and an in-memory query-embedding cache.

Status: pre-release (0.0.1). The API below is implemented and tested; expect additive changes before 1.0.

Install

npm install shortlist
# peers (you almost certainly already have these):
npm install ai zod

Requires Node 18+, ai >= 4.0, and zod >= 3.25.

How it works

createToolIndex(tools, options?) builds a search index over each tool's name, description, and parameter names. You then pick an integration point:

| You want… | Use | Returns | | --- | --- | --- | | The AI SDK to filter tools per step | index.prepareStep(opts) | a prepareStep function | | Transparent filtering at the provider level | index.middleware(opts) | a LanguageModelMiddleware | | To select tools yourself | index.select(query, opts) | string[] of tool names | | To debug why tools ranked as they did | index.selectWithScores(query, opts) | { tools, results } | | The model to discover tools on demand | index.searchTool() | a callable meta-tool |

Strategies

// hybrid (default when no embeddingModel) — free, keyword-based, instant
createToolIndex(tools);

// combined (default when an embeddingModel is given) — keyword + semantic, fused
createToolIndex(tools, {
  embeddingModel: openai.embeddingModel("text-embedding-3-small"),
});

// semantic — embeddings only
createToolIndex(tools, { strategy: "semantic", embeddingModel });

combined runs keyword and semantic search in parallel and fuses the normalized scores. The base weighting is ≈30% keyword / 70% semantic, but by default the keyword weight is scaled per query by how much of it the keyword index actually matches (adaptiveFusion): a paraphrase or a query in another language — where keyword would only be matching noise — fuses as semantic-only, while a query that reuses the tools' words keeps keyword's exact-match anchoring. Tune the base split with fusionWeights, or set adaptiveFusion: false for a fixed ratio. If the embedding call fails, combined falls back to keyword-only so selection never hard-fails — at warm-up or per query.

When you use embeddings, call await index.warmUp() once at startup to pre-compute tool embeddings (and persist them with an embedding cache) so the first select() isn't slow.

prepareStep — the main path

const result = await generateText({
  model,
  tools: allTools,
  prepareStep: index.prepareStep({
    maxTools: 5,
    alwaysActive: ["getCurrentUser"], // always exposed, regardless of the query
    recentToolBoost: 3,               // keep up to 3 recently-used tools available
  }),
  stopWhen: stepCountIs(8),
  prompt,
});

prepareStep re-selects tools on every step from the live conversation, and adds two behaviors that matter inside real agent loops:

  • Miss-driven escalation. If the model produced no tool calls last step, the next step shows the next page of ranked tools instead of repeating the same set. After two consecutive misses, all tools are exposed so the agent is never stuck.
  • Recently-used retention (recentToolBoost, opt-in). A fresh per-step selection can otherwise strip away a tool the agent just used successfully. Set recentToolBoost to keep the most-recently-used tools active across steps — derived entirely from the steps the SDK passes in, so there's no hidden session state. recentToolWindow (default 3) controls how far back it looks.

select and selectWithScores

Call the selector directly when you manage activeTools yourself:

const names = await index.select("ship it to prod", { maxTools: 5 });
// → ["deployToVercel", ...]

selectWithScores returns the same names plus per-tool provenance — invaluable for tuning and debugging "why did that tool rank there?":

const { tools, results } = await index.selectWithScores("create an invoice", {
  maxTools: 5,
  relatedTools: { createInvoice: ["queryCustomers"] },
});

for (const r of results) {
  console.log(r.name, r.score, {
    reranked: r.reranked,     // an LLM reranker produced this ordering
    viaRelated: r.viaRelated, // pulled in as a companion of a selected tool
    alwaysActive: r.alwaysActive, // pinned, not matched by search
  });
}

SelectOptions

| Option | Default | Description | | --- | --- | --- | | maxTools | 5 | Maximum tools to return. | | alwaysActive | [] | Tool names always included. | | threshold | — | Drop results scoring below this. | | adaptive | true | Return fewer than maxTools when there's a clear score gap (the "elbow"). Automatically skipped when a reranker is active, since the reranker already decides the top-N. | | relatedTools | — | Per-call override of the index-level related-tools map. |

PrepareStepOptions extends SelectOptions with recentToolBoost and recentToolWindow.

Related tools

Some tools travel together — selecting createInvoice is useless without queryCustomers. Declare those links and shortlist pulls companions in whenever the key tool is selected (expansion is one-directional and can exceed maxTools):

const index = createToolIndex(tools, {
  relatedTools: {
    createInvoice: ["queryCustomers", "sendEmail"],
  },
});

LLM reranking & enrichment (optional)

For maximum accuracy on messy, slangy queries, add a cheap model:

const index = createToolIndex(tools, {
  embeddingModel: openai.embeddingModel("text-embedding-3-small"),
  rerankerModel: openai("gpt-4o-mini"),
  enrichDescriptions: true, // expand descriptions with synonyms at warmUp (cached)
});
await index.warmUp();

The reranker re-scores the top candidates with reasoning a pure vector match can't do (when there are ≤50 tools it sees all of them). enrichDescriptions rewrites each tool's description once with synonyms and common phrasings. Both fail soft — set SHORTLIST_DEBUG=1 to log when they fall back.

Caching embeddings

Persist tool embeddings across restarts so you don't re-pay the embedding API:

import { createToolIndex, fileCache } from "shortlist";

const index = createToolIndex(tools, {
  embeddingModel,
  embeddingCache: fileCache(".shortlist-cache.json"),
});
await index.warmUp();

shortlist also keeps an in-memory query-embedding cache (semantic/combined strategies) so repeated queries in an agent loop skip the embedding call. It's on by default (50 entries, exact-match on the normalized query); tune with queryCacheSize or set it to 0 to disable.

Letting the model discover tools

Expose a meta-tool the model can call when it needs a capability that isn't in its current selection:

const tools = { ...alwaysOnTools, search_tools: index.searchTool() };

Evaluating accuracy

import { evalToolIndex } from "shortlist/eval";

const report = await evalToolIndex(index, [
  { query: "create a ticket", expected: "createJiraTicket" },
  { query: "ship it", expected: "deployToVercel", alternatives: ["deployToProd"] },
]);
console.log(report); // { top1, top3, top5, avgLatencyMs, misses }

Benchmarks

There's a real, reproducible benchmark in bench/ — a ~215-tool corpus and a labeled query set split by query type, so you can see when each strategy wins. Run npm run bench (keyword only, no key) or npm run bench:write (adds the embedding modes when OPENAI_API_KEY is set, and regenerates bench/RESULTS.md).

Top-5 accuracy by query type (215 tools, 99 queries, text-embedding-3-small):

| query type | keyword | combined | semantic | | --- | --- | --- | --- | | lexical (reuses the tool's words) | 100% | 100% | 100% | | exact-name (DynamoDB, Lambda, …) | 100% | 100% | 100% | | typo / misspelling | 80% | 100% | 100% | | acronym (k8s, PR, FX) | 67% | 100% | 83% | | paraphrase (same intent, new words) | 10% | 90% | 90% | | verbose / noisy phrasing | 25% | 88% | 88% | | multilingual (sv/fr/es/de) | 25% | 100% | 100% |

Keyword is perfect when the query reuses the tool's words and effectively free; embeddings are what carry paraphrases, noisy phrasing, and other languages. combined (with adaptive fusion) is the best all-rounder — it matches semantic recall and keeps keyword's exact-match anchoring, where pure semantic slips (acronyms 100% vs 83%, top-5).

API reference

function createToolIndex<T extends ToolSet>(tools: T, options?: ToolIndexOptions): ToolIndex<T>;

interface ToolIndex<T> {
  toolNames: (keyof T & string)[];
  warmUp(): Promise<void>;
  select(query: string, options?: SelectOptions): Promise<string[]>;
  selectWithScores(query: string, options?: SelectOptions): Promise<ScoredSelection>;
  prepareStep(options?: PrepareStepOptions): PrepareStepFunction<T>;
  middleware(options?: SelectOptions): LanguageModelMiddleware;
  searchTool(): Tool;
}

interface ToolIndexOptions {
  strategy?: "hybrid" | "semantic" | "combined";
  embeddingModel?: EmbeddingModel;
  embeddingCache?: EmbeddingCacheOptions;
  rerankerModel?: LanguageModel;
  enrichDescriptions?: boolean;
  relatedTools?: Record<string, string[]>;
  queryCacheSize?: number;
  fusionWeights?: { keyword?: number; semantic?: number }; // combined base split, default 0.3/0.7
  adaptiveFusion?: boolean; // scale keyword weight by per-query coverage, default true
}

License

MIT © Andreas Enemyr