@ongravy/agent-kit
v0.1.0
Published
Production-grade primitives for building agentic AI systems on the Anthropic SDK: hybrid RAG with RRF, eval metrics (recall@k, NDCG, tool-call F1), cost-aware model router, and prompt-cache planning. Extracted from OnGravy.
Downloads
116
Maintainers
Readme
@ongravy/agent-kit
Production-grade primitives for building agentic AI systems on the Anthropic SDK. Extracted from OnGravy — an AI-native accounting platform shipping June 2026.
What this is
A small, production-tested library of the boring-but-critical pieces every agentic LLM system needs:
- 🔀 Reciprocal Rank Fusion — for hybrid retrieval (combine vector + BM25 + any other ranking)
- 📊 Eval metrics — recall@k, NDCG, MRR, tool-call F1; the metrics you wish you'd built before that 1am production fire
- 💰 Cost-aware model router — Haiku → Sonnet → Opus by complexity, with budget downgrade
- 💾 Prompt-cache planner — decides which blocks should carry
cache_controlfor Anthropic's 90%-savings caching - 🔌 MCP wire-format converter — turns your Zod-shaped tool registry into Model Context Protocol tool spec, ready for Claude Desktop / Cursor / Zed
Each module is pure (no I/O, no network) and zero-runtime-deps (only zod as peer-dep). Bundle adds ~12 KB gzipped to your build.
Why these specific helpers
These are the things you think are simple in your demo project, then realise are subtle when you ship:
| Helper | What goes wrong without it |
|---|---|
| reciprocalRankFusion | Vector retrieval misses queries with rare proper nouns; BM25 misses queries with paraphrases. Single-stage → ~20% recall loss vs hybrid. |
| routeModel | Naive setups always use Sonnet → 5× higher LLM bill. Naive routing puts everything on Haiku → reasoning fails on hard queries. |
| planPromptCache | Anthropic's prompt cache requires ≥1024 tokens per cached block. Setups that cache below this waste cache lookup latency. |
| evalRetrievalSet | "Does this prompt change improve quality?" — without recall@k metrics, your only answer is "vibes." |
| toMCPTool + sanitiseMcpName | MCP names can't contain dots/colons/spaces. Naive registries break on tax.compute_gst. |
Installation
npm install @ongravy/agent-kit
# or
pnpm add @ongravy/agent-kit
# or
bun add @ongravy/agent-kitPeer dep:
npm install zod # ^4.0.0Quick examples
Hybrid retrieval (vector + BM25)
import { reciprocalRankFusion } from '@ongravy/agent-kit/rrf';
// Fetch ranked results from each source however you like
const vectorResults = await pgvector.search(queryEmbedding, { k: 30 });
const bm25Results = await pgTsvector.search(queryText, { k: 30 });
// Fuse them
const fused = reciprocalRankFusion({
rankings: [
{ label: 'vec', items: vectorResults.map(r => ({ id: r.id, score: r.cosine })) },
{ label: 'bm25', items: bm25Results.map(r => ({ id: r.id, score: r.tsRank })) },
],
// Optional: weight one source higher when its precision is better
weights: { vec: 1.0, bm25: 0.7 },
});
// Top 5 fused results
const topFive = fused.slice(0, 5);Cost-aware model routing
import { routeModel } from '@ongravy/agent-kit/router';
const decision = routeModel({
inputTokens: 4_000,
toolCount: 5,
expectedTurns: 3,
isHighStakes: true, // money/compliance answer
remainingBudgetPaise: 1_50_000, // ₹1500 left this month
});
// decision.model: 'claude-sonnet-4-6'
// decision.reason: 'Multi-turn / multi-tool / high-stakes'
// decision.budgetDowngraded: false
// decision.estimatedInputCostPaise: 1200
const response = await anthropic.messages.create({
model: decision.model,
// …
});Prompt caching
import { planPromptCache } from '@ongravy/agent-kit/router';
const plan = planPromptCache({
systemTokens: 2_500,
toolsTokens: 4_000,
expectStableForFiveMin: true,
});
// plan.cacheSystem: true
// plan.cacheTools: true
// plan.estimatedSavingsPctOnNextCall: 81
// Then on the API call:
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
system: [
{ type: 'text', text: SYSTEM_PROMPT,
...(plan.cacheSystem && { cache_control: { type: 'ephemeral' } }) },
],
tools: tools.map((t, i) => ({
...t,
...(plan.cacheTools && i === tools.length - 1 && { cache_control: { type: 'ephemeral' } }),
})),
// …
});Retrieval evaluation
import { evalRetrievalSet } from '@ongravy/agent-kit/eval';
const summary = evalRetrievalSet([
{
queryId: 'q1',
query: 'GST rate on legal services',
relevantIds: [{ id: 'doc-A', grade: 3 }, { id: 'doc-B', grade: 1 }],
retrievedIds: ['doc-A', 'doc-X', 'doc-B', 'doc-Y'],
},
// … 30 more cases
]);
console.log(`recall@5 = ${summary.meanRecallAtK[5].toFixed(3)}`);
console.log(`NDCG@5 = ${summary.meanNdcgAtK[5].toFixed(3)}`);
console.log(`MRR = ${summary.meanMrr.toFixed(3)}`);Tool-call accuracy
import { evalToolCallSet } from '@ongravy/agent-kit/eval';
const summary = evalToolCallSet([
{
caseId: 'tc1',
expectedTool: 'create_invoice',
expectedArgs: { amount: 1000, party: 'Acme' },
actualTool: 'create_invoice',
actualArgs: { amount: 1000, party: 'Acme', date: '2026-04-01' },
},
// …
]);
console.log(`tool accuracy: ${(summary.toolAccuracy*100).toFixed(1)}%`);
console.log(`fully correct: ${(summary.fullyCorrectAccuracy*100).toFixed(1)}%`);
console.log(`mean arg-match score: ${(summary.meanArgMatchScore*100).toFixed(1)}%`);MCP tool format conversion
import { z } from 'zod';
import { buildMCPCatalog, type GenericToolDef } from '@ongravy/agent-kit/mcp';
// Your existing tool registry — any shape with name, description, Zod schema
const myTools: GenericToolDef[] = [
{
name: 'tax.compute_gst',
description: 'Compute GST on a sale.',
inputSchema: z.object({ amount: z.number(), rate: z.number() }),
jurisdictions: ['IN'],
},
{
name: 'sa_zatca.compile',
description: 'Compile a ZATCA VAT return.',
inputSchema: z.object({ taxPeriod: z.string() }),
jurisdictions: ['SA'],
},
];
// Filter by jurisdiction + convert to MCP wire format
const catalog = buildMCPCatalog(myTools, { jurisdiction: 'IN' });
// catalog.tools[0].name === 'tax_compute_gst' (sanitised — dots replaced)
// catalog.tools[0].inputSchema is JSON Schema
// catalog only contains the IN tool (SA filtered out)Why this works
These helpers come from a real production system that:
- Handles real money (Indian SMB accounting, GST compliance, audit reports)
- Runs at multi-tenant scale across 4 jurisdictions (India, UAE, Saudi Arabia, Singapore)
- Has a measured 0.3% hallucination rate post-defences (vs ~12% bare model)
- Has a per-business cost cap of ₹500/month enforced via the model router
If you're building anything in the same ballpark — domain-specific Q&A, agentic workflow automation, multi-tenant LLM systems — these primitives have already paid for themselves once.
Read the longform writeup: Six-layer hallucination defence.
API reference
rrf module
| Export | Purpose |
|---|---|
| reciprocalRankFusion(input) | Fuse N ranked lists into one ranking |
| RankedItem, RrfInput, FusedItem | Type signatures |
eval module
| Export | Purpose |
|---|---|
| evalRetrievalCase(case, ks?) | Single-case recall@k / NDCG / MRR |
| evalRetrievalSet(cases, ks?) | Aggregate across cases |
| evalToolCallCase(case) | Per-case tool-correct + arg-match |
| evalToolCallSet(cases) | Aggregate tool-call summary |
router module
| Export | Purpose |
|---|---|
| routeModel(ctx) | Pick Haiku / Sonnet / Opus per workload + budget |
| planPromptCache(input) | Plan cache_control placement for stable prefixes |
| AnthropicModel, RoutingDecision, CachePlan | Type signatures |
mcp module
| Export | Purpose |
|---|---|
| toMCPTool(def) | Convert internal tool def → MCP wire format |
| buildMCPCatalog(defs, opts?) | Build full tools/list response |
| filterToolsForMCP(defs, opts?) | Filter by jurisdiction |
| sanitiseMcpName(name) | Make any name MCP-safe |
Versioning
This package follows semver. Pre-1.0 releases (0.x.y) may have breaking changes; breaking changes after 1.0.0 will bump the major version.
Contributing
Issues + PRs welcome at github.com/pratikrevankar/ongravy. The pure-test discipline of the parent repo applies — every helper has a corresponding test file. See tests/lib-pure/ for the test pattern.
Author
Pratik Revankar — builder of OnGravy. @pratikrevankar on X.
License
MIT — see LICENSE.
