@ongravy/agent-kit

v0.1.0

Published

15 days ago

Production-grade primitives for building agentic AI systems on the Anthropic SDK: hybrid RAG with RRF, eval metrics (recall@k, NDCG, tool-call F1), cost-aware model router, and prompt-cache planning. Extracted from OnGravy.

Downloads

116

`@ongravy/agent-kit`

Production-grade primitives for building agentic AI systems on the Anthropic SDK. Extracted from OnGravy — an AI-native accounting platform shipping June 2026.

What this is

A small, production-tested library of the boring-but-critical pieces every agentic LLM system needs:

🔀 Reciprocal Rank Fusion — for hybrid retrieval (combine vector + BM25 + any other ranking)
📊 Eval metrics — recall@k, NDCG, MRR, tool-call F1; the metrics you wish you'd built before that 1am production fire
💰 Cost-aware model router — Haiku → Sonnet → Opus by complexity, with budget downgrade
💾 Prompt-cache planner — decides which blocks should carry cache_control for Anthropic's 90%-savings caching
🔌 MCP wire-format converter — turns your Zod-shaped tool registry into Model Context Protocol tool spec, ready for Claude Desktop / Cursor / Zed

Each module is pure (no I/O, no network) and zero-runtime-deps (only zod as peer-dep). Bundle adds ~12 KB gzipped to your build.

Why these specific helpers

These are the things you think are simple in your demo project, then realise are subtle when you ship:

| Helper | What goes wrong without it | |---|---| | reciprocalRankFusion | Vector retrieval misses queries with rare proper nouns; BM25 misses queries with paraphrases. Single-stage → ~20% recall loss vs hybrid. | | routeModel | Naive setups always use Sonnet → 5× higher LLM bill. Naive routing puts everything on Haiku → reasoning fails on hard queries. | | planPromptCache | Anthropic's prompt cache requires ≥1024 tokens per cached block. Setups that cache below this waste cache lookup latency. | | evalRetrievalSet | "Does this prompt change improve quality?" — without recall@k metrics, your only answer is "vibes." | | toMCPTool + sanitiseMcpName | MCP names can't contain dots/colons/spaces. Naive registries break on tax.compute_gst. |

Installation

npm install @ongravy/agent-kit
# or
pnpm add @ongravy/agent-kit
# or
bun add @ongravy/agent-kit

Peer dep:

npm install zod  # ^4.0.0

Quick examples

Hybrid retrieval (vector + BM25)

import { reciprocalRankFusion } from '@ongravy/agent-kit/rrf';

// Fetch ranked results from each source however you like
const vectorResults = await pgvector.search(queryEmbedding, { k: 30 });
const bm25Results   = await pgTsvector.search(queryText,    { k: 30 });

// Fuse them
const fused = reciprocalRankFusion({
  rankings: [
    { label: 'vec',  items: vectorResults.map(r => ({ id: r.id, score: r.cosine })) },
    { label: 'bm25', items: bm25Results.map(r => ({ id: r.id, score: r.tsRank })) },
  ],
  // Optional: weight one source higher when its precision is better
  weights: { vec: 1.0, bm25: 0.7 },
});

// Top 5 fused results
const topFive = fused.slice(0, 5);

Cost-aware model routing

import { routeModel } from '@ongravy/agent-kit/router';

const decision = routeModel({
  inputTokens:           4_000,
  toolCount:             5,
  expectedTurns:         3,
  isHighStakes:          true,        // money/compliance answer
  remainingBudgetPaise:  1_50_000,    // ₹1500 left this month
});

// decision.model:                'claude-sonnet-4-6'
// decision.reason:               'Multi-turn / multi-tool / high-stakes'
// decision.budgetDowngraded:     false
// decision.estimatedInputCostPaise: 1200

const response = await anthropic.messages.create({
  model: decision.model,
  // …
});

Prompt caching

import { planPromptCache } from '@ongravy/agent-kit/router';

const plan = planPromptCache({
  systemTokens:           2_500,
  toolsTokens:            4_000,
  expectStableForFiveMin: true,
});

// plan.cacheSystem: true
// plan.cacheTools:  true
// plan.estimatedSavingsPctOnNextCall: 81

// Then on the API call:
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  system: [
    { type: 'text', text: SYSTEM_PROMPT,
      ...(plan.cacheSystem && { cache_control: { type: 'ephemeral' } }) },
  ],
  tools: tools.map((t, i) => ({
    ...t,
    ...(plan.cacheTools && i === tools.length - 1 && { cache_control: { type: 'ephemeral' } }),
  })),
  // …
});

Retrieval evaluation

import { evalRetrievalSet } from '@ongravy/agent-kit/eval';

const summary = evalRetrievalSet([
  {
    queryId:      'q1',
    query:        'GST rate on legal services',
    relevantIds:  [{ id: 'doc-A', grade: 3 }, { id: 'doc-B', grade: 1 }],
    retrievedIds: ['doc-A', 'doc-X', 'doc-B', 'doc-Y'],
  },
  // … 30 more cases
]);

console.log(`recall@5 = ${summary.meanRecallAtK[5].toFixed(3)}`);
console.log(`NDCG@5   = ${summary.meanNdcgAtK[5].toFixed(3)}`);
console.log(`MRR      = ${summary.meanMrr.toFixed(3)}`);

Tool-call accuracy

import { evalToolCallSet } from '@ongravy/agent-kit/eval';

const summary = evalToolCallSet([
  {
    caseId:        'tc1',
    expectedTool:  'create_invoice',
    expectedArgs:  { amount: 1000, party: 'Acme' },
    actualTool:    'create_invoice',
    actualArgs:    { amount: 1000, party: 'Acme', date: '2026-04-01' },
  },
  // …
]);

console.log(`tool accuracy:        ${(summary.toolAccuracy*100).toFixed(1)}%`);
console.log(`fully correct:         ${(summary.fullyCorrectAccuracy*100).toFixed(1)}%`);
console.log(`mean arg-match score: ${(summary.meanArgMatchScore*100).toFixed(1)}%`);

MCP tool format conversion

import { z } from 'zod';
import { buildMCPCatalog, type GenericToolDef } from '@ongravy/agent-kit/mcp';

// Your existing tool registry — any shape with name, description, Zod schema
const myTools: GenericToolDef[] = [
  {
    name:          'tax.compute_gst',
    description:   'Compute GST on a sale.',
    inputSchema:   z.object({ amount: z.number(), rate: z.number() }),
    jurisdictions: ['IN'],
  },
  {
    name:          'sa_zatca.compile',
    description:   'Compile a ZATCA VAT return.',
    inputSchema:   z.object({ taxPeriod: z.string() }),
    jurisdictions: ['SA'],
  },
];

// Filter by jurisdiction + convert to MCP wire format
const catalog = buildMCPCatalog(myTools, { jurisdiction: 'IN' });
// catalog.tools[0].name === 'tax_compute_gst'   (sanitised — dots replaced)
// catalog.tools[0].inputSchema is JSON Schema
// catalog only contains the IN tool (SA filtered out)

Why this works

These helpers come from a real production system that:

Handles real money (Indian SMB accounting, GST compliance, audit reports)
Runs at multi-tenant scale across 4 jurisdictions (India, UAE, Saudi Arabia, Singapore)
Has a measured 0.3% hallucination rate post-defences (vs ~12% bare model)
Has a per-business cost cap of ₹500/month enforced via the model router

If you're building anything in the same ballpark — domain-specific Q&A, agentic workflow automation, multi-tenant LLM systems — these primitives have already paid for themselves once.

Read the longform writeup: Six-layer hallucination defence.

API reference

`rrf` module

| Export | Purpose | |---|---| | reciprocalRankFusion(input) | Fuse N ranked lists into one ranking | | RankedItem, RrfInput, FusedItem | Type signatures |

`eval` module

| Export | Purpose | |---|---| | evalRetrievalCase(case, ks?) | Single-case recall@k / NDCG / MRR | | evalRetrievalSet(cases, ks?) | Aggregate across cases | | evalToolCallCase(case) | Per-case tool-correct + arg-match | | evalToolCallSet(cases) | Aggregate tool-call summary |

`router` module

| Export | Purpose | |---|---| | routeModel(ctx) | Pick Haiku / Sonnet / Opus per workload + budget | | planPromptCache(input) | Plan cache_control placement for stable prefixes | | AnthropicModel, RoutingDecision, CachePlan | Type signatures |

`mcp` module

| Export | Purpose | |---|---| | toMCPTool(def) | Convert internal tool def → MCP wire format | | buildMCPCatalog(defs, opts?) | Build full tools/list response | | filterToolsForMCP(defs, opts?) | Filter by jurisdiction | | sanitiseMcpName(name) | Make any name MCP-safe |

Versioning

This package follows semver. Pre-1.0 releases (0.x.y) may have breaking changes; breaking changes after 1.0.0 will bump the major version.

Contributing

Issues + PRs welcome at github.com/pratikrevankar/ongravy. The pure-test discipline of the parent repo applies — every helper has a corresponding test file. See tests/lib-pure/ for the test pattern.

Author

Pratik Revankar — builder of OnGravy. @pratikrevankar on X.

License

MIT — see LICENSE.