shortlist
v0.1.2
Published
Dynamic tool selection for AI SDK agents — index your tools and expose only the most relevant ones to the model on every step.
Maintainers
Readme
shortlist
Dynamic tool selection for AI SDK agents.
When an agent has 50, 100, or 200+ tools, handing the model all of them on every
step is a problem twice over: it burns context tokens, and it lowers tool-calling
accuracy — the right tool gets lost in the noise. shortlist indexes your tools once
and, on each step, exposes only the handful that actually match what the user is
trying to do.
import { createToolIndex } from "shortlist";
import { generateText } from "ai";
const index = createToolIndex(allTools); // 200 tools, no API key needed
const result = await generateText({
model,
tools: allTools,
prepareStep: index.prepareStep({ maxTools: 5 }), // model sees the best 5 per step
prompt: "Refund the last charge for customer Acme.",
});- Zero-config default — keyword search (BM25 + TF-IDF) that needs no API key and runs in well under a millisecond.
- Better with embeddings — add an embedding model for semantic search;
shortlistfuses it with keyword search automatically. - Drops into the AI SDK —
prepareStep,wrapLanguageModelmiddleware, or aselect()you call yourself. - Built for agent loops — adaptive cutoffs, miss-driven escalation, recently-used tool retention, related-tool expansion, and an in-memory query-embedding cache.
Status: pre-release (
0.0.1). The API below is implemented and tested; expect additive changes before1.0.
Install
npm install shortlist
# peers (you almost certainly already have these):
npm install ai zodRequires Node 18+, ai >= 4.0, and zod >= 3.25.
How it works
createToolIndex(tools, options?) builds a search index over each tool's name,
description, and parameter names. You then pick an integration point:
| You want… | Use | Returns |
| --- | --- | --- |
| The AI SDK to filter tools per step | index.prepareStep(opts) | a prepareStep function |
| Transparent filtering at the provider level | index.middleware(opts) | a LanguageModelMiddleware |
| To select tools yourself | index.select(query, opts) | string[] of tool names |
| To debug why tools ranked as they did | index.selectWithScores(query, opts) | { tools, results } |
| The model to discover tools on demand | index.searchTool() | a callable meta-tool |
Strategies
// hybrid (default when no embeddingModel) — free, keyword-based, instant
createToolIndex(tools);
// combined (default when an embeddingModel is given) — keyword + semantic, fused
createToolIndex(tools, {
embeddingModel: openai.embeddingModel("text-embedding-3-small"),
});
// semantic — embeddings only
createToolIndex(tools, { strategy: "semantic", embeddingModel });combined runs keyword and semantic search in parallel and fuses the normalized
scores. The base weighting is ≈30% keyword / 70% semantic, but by default the keyword
weight is scaled per query by how much of it the keyword index actually matches
(adaptiveFusion): a paraphrase or a query in another language — where keyword would
only be matching noise — fuses as semantic-only, while a query that reuses the tools'
words keeps keyword's exact-match anchoring. Tune the base split with fusionWeights,
or set adaptiveFusion: false for a fixed ratio. If the embedding call fails, combined
falls back to keyword-only so selection never hard-fails — at warm-up or per query.
When you use embeddings, call await index.warmUp() once at startup to pre-compute
tool embeddings (and persist them with an embedding cache) so
the first select() isn't slow.
prepareStep — the main path
const result = await generateText({
model,
tools: allTools,
prepareStep: index.prepareStep({
maxTools: 5,
alwaysActive: ["getCurrentUser"], // always exposed, regardless of the query
recentToolBoost: 3, // keep up to 3 recently-used tools available
}),
stopWhen: stepCountIs(8),
prompt,
});prepareStep re-selects tools on every step from the live conversation, and adds two
behaviors that matter inside real agent loops:
- Miss-driven escalation. If the model produced no tool calls last step, the next step shows the next page of ranked tools instead of repeating the same set. After two consecutive misses, all tools are exposed so the agent is never stuck.
- Recently-used retention (
recentToolBoost, opt-in). A fresh per-step selection can otherwise strip away a tool the agent just used successfully. SetrecentToolBoostto keep the most-recently-used tools active across steps — derived entirely from the steps the SDK passes in, so there's no hidden session state.recentToolWindow(default 3) controls how far back it looks.
select and selectWithScores
Call the selector directly when you manage activeTools yourself:
const names = await index.select("ship it to prod", { maxTools: 5 });
// → ["deployToVercel", ...]selectWithScores returns the same names plus per-tool provenance — invaluable for
tuning and debugging "why did that tool rank there?":
const { tools, results } = await index.selectWithScores("create an invoice", {
maxTools: 5,
relatedTools: { createInvoice: ["queryCustomers"] },
});
for (const r of results) {
console.log(r.name, r.score, {
reranked: r.reranked, // an LLM reranker produced this ordering
viaRelated: r.viaRelated, // pulled in as a companion of a selected tool
alwaysActive: r.alwaysActive, // pinned, not matched by search
});
}SelectOptions
| Option | Default | Description |
| --- | --- | --- |
| maxTools | 5 | Maximum tools to return. |
| alwaysActive | [] | Tool names always included. |
| threshold | — | Drop results scoring below this. |
| adaptive | true | Return fewer than maxTools when there's a clear score gap (the "elbow"). Automatically skipped when a reranker is active, since the reranker already decides the top-N. |
| relatedTools | — | Per-call override of the index-level related-tools map. |
PrepareStepOptions extends SelectOptions with recentToolBoost and recentToolWindow.
Related tools
Some tools travel together — selecting createInvoice is useless without
queryCustomers. Declare those links and shortlist pulls companions in whenever the
key tool is selected (expansion is one-directional and can exceed maxTools):
const index = createToolIndex(tools, {
relatedTools: {
createInvoice: ["queryCustomers", "sendEmail"],
},
});LLM reranking & enrichment (optional)
For maximum accuracy on messy, slangy queries, add a cheap model:
const index = createToolIndex(tools, {
embeddingModel: openai.embeddingModel("text-embedding-3-small"),
rerankerModel: openai("gpt-4o-mini"),
enrichDescriptions: true, // expand descriptions with synonyms at warmUp (cached)
});
await index.warmUp();The reranker re-scores the top candidates with reasoning a pure vector match can't do
(when there are ≤50 tools it sees all of them). enrichDescriptions rewrites each
tool's description once with synonyms and common phrasings. Both fail soft — set
SHORTLIST_DEBUG=1 to log when they fall back.
Caching embeddings
Persist tool embeddings across restarts so you don't re-pay the embedding API:
import { createToolIndex, fileCache } from "shortlist";
const index = createToolIndex(tools, {
embeddingModel,
embeddingCache: fileCache(".shortlist-cache.json"),
});
await index.warmUp();shortlist also keeps an in-memory query-embedding cache (semantic/combined
strategies) so repeated queries in an agent loop skip the embedding call. It's on by
default (50 entries, exact-match on the normalized query); tune with queryCacheSize
or set it to 0 to disable.
Letting the model discover tools
Expose a meta-tool the model can call when it needs a capability that isn't in its current selection:
const tools = { ...alwaysOnTools, search_tools: index.searchTool() };Evaluating accuracy
import { evalToolIndex } from "shortlist/eval";
const report = await evalToolIndex(index, [
{ query: "create a ticket", expected: "createJiraTicket" },
{ query: "ship it", expected: "deployToVercel", alternatives: ["deployToProd"] },
]);
console.log(report); // { top1, top3, top5, avgLatencyMs, misses }Benchmarks
There's a real, reproducible benchmark in bench/ — a ~215-tool corpus and a
labeled query set split by query type, so you can see when each strategy wins. Run
npm run bench (keyword only, no key) or npm run bench:write (adds the embedding modes
when OPENAI_API_KEY is set, and regenerates bench/RESULTS.md).
Top-5 accuracy by query type (215 tools, 99 queries, text-embedding-3-small):
| query type | keyword | combined | semantic | | --- | --- | --- | --- | | lexical (reuses the tool's words) | 100% | 100% | 100% | | exact-name (DynamoDB, Lambda, …) | 100% | 100% | 100% | | typo / misspelling | 80% | 100% | 100% | | acronym (k8s, PR, FX) | 67% | 100% | 83% | | paraphrase (same intent, new words) | 10% | 90% | 90% | | verbose / noisy phrasing | 25% | 88% | 88% | | multilingual (sv/fr/es/de) | 25% | 100% | 100% |
Keyword is perfect when the query reuses the tool's words and effectively free; embeddings
are what carry paraphrases, noisy phrasing, and other languages. combined (with adaptive
fusion) is the best all-rounder — it matches semantic recall and keeps keyword's
exact-match anchoring, where pure semantic slips (acronyms 100% vs 83%, top-5).
API reference
function createToolIndex<T extends ToolSet>(tools: T, options?: ToolIndexOptions): ToolIndex<T>;
interface ToolIndex<T> {
toolNames: (keyof T & string)[];
warmUp(): Promise<void>;
select(query: string, options?: SelectOptions): Promise<string[]>;
selectWithScores(query: string, options?: SelectOptions): Promise<ScoredSelection>;
prepareStep(options?: PrepareStepOptions): PrepareStepFunction<T>;
middleware(options?: SelectOptions): LanguageModelMiddleware;
searchTool(): Tool;
}
interface ToolIndexOptions {
strategy?: "hybrid" | "semantic" | "combined";
embeddingModel?: EmbeddingModel;
embeddingCache?: EmbeddingCacheOptions;
rerankerModel?: LanguageModel;
enrichDescriptions?: boolean;
relatedTools?: Record<string, string[]>;
queryCacheSize?: number;
fusionWeights?: { keyword?: number; semantic?: number }; // combined base split, default 0.3/0.7
adaptiveFusion?: boolean; // scale keyword weight by per-query coverage, default true
}License
MIT © Andreas Enemyr
