npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

stozer-ai

v0.1.6

Published

Deterministic grounding validation for AI agents and RAG pipelines. Detect hallucinations, data fabrication, and grounding failures without LLM-as-judge — <50ms, zero API calls.

Readme

Stozer

Grounding validation for tool-calling AI agents and RAG pipelines.

Detect when an agent's response contradicts its tool outputs or retrieved context — deterministically, without LLM-as-judge overhead.

npm version License


The Problem

Most "hallucinations" in tool-calling agents and RAG pipelines are grounding failures — the agent gets accurate data from tools or retrieved context, and then ignores it, miscalculates, or fabricates from empty results. The source of truth is already in the trace.

Why Not LLM-as-a-Judge?

| | Stozer | LLM-as-a-Judge | |---|---|---| | Latency | <50ms | 2–10s per call | | Cost | $0 compute | $0.01–0.10 per eval | | Determinism | Same input → same result | Varies across runs | | Explainability | Exact failure code + evidence | "I think there might be an issue" | | Hallucination risk | Zero (no LLM) | The judge can hallucinate too | | Scalability | 10K+ evals/sec | Rate-limited by provider |

The Solution

Stozer validates the agent's response against tool outputs and reports grounding failures with standardized codes — like OBD diagnostic codes for AI.

npm install stozer-ai

Zero LLM calls. Deterministic failure detection. Runs in <50ms.


Getting Started

  1. Create a free account at app.stozer.dev — no credit card required
  2. Copy your API key from Settings → API Keys (starts with stozer_live_)
  3. Set the env var or save it with the CLI:
# Option A: environment variable
export STOZER_API_KEY=stozer_live_your_key

# Option B: save once with the CLI (persists across sessions)
npx stozer auth stozer_live_your_key

The free tier includes full access to grounding evaluation, failure detection, and the dashboard — enough to integrate and test with your agent.


Quick Start — 3 Minutes

1. Evaluate a trace

import { StozerClient, TraceBuilder } from 'stozer-ai';

const client = new StozerClient();  // uses STOZER_API_KEY env var

const trace = new TraceBuilder({ traceId: 'run-001' })
  .addUserInput('How many employees are on leave today?')
  .addToolCall('getLeaveRecords', { date: '2024-03-15' })
  .addToolOutput('getLeaveRecords', [
    { employeeId: 'E01', name: 'Sarah Johnson', status: 'on_leave' },
    { employeeId: 'E02', name: 'Michael Chen', status: 'on_leave' },
  ])
  .addFinalResponse('There are 3 employees on leave today.')  // ← Bug: says 3, data shows 2
  .build();

const result = await client.evaluate(trace);

console.log(result.report.groundingScore);       // 0.5
console.log(result.report.detectedFailures[0]);  // { type: 'grounding.data_ignored', severity: 'high' }

2. Monitor in production (proxy mode)

Works with any language — PHP, Python, Go, Java, Ruby, C#.

Change your AI base URL and add your Stozer key using the composite key format (||):

# OpenAI — add Stozer key before your AI key, separated by ||
OPENAI_BASE_URL=http://localhost:3001/proxy/openai/v1
OPENAI_API_KEY=stozer_live_your_key||sk-your-openai-key

# Anthropic
ANTHROPIC_BASE_URL=http://localhost:3001/proxy/anthropic
ANTHROPIC_API_KEY=stozer_live_your_key||sk-ant-your-key

Your app works exactly the same — zero code changes. Stozer automatically splits the key, identifies your account, and forwards only the AI key to the provider.


Detection

Deterministic failure detectors across six categories: grounding, reasoning, orchestration, safety, semantic, and instrumentation.

Examples:

  • Fabrication from empty tool results or missing RAG context
  • Numeric inconsistencies with tool data
  • Ignored or altered data in the response
  • Entity misattribution
  • Prompt leakage and sensitive data exposure
  • Tool budget exhaustion and fabricated tool inputs

Features

Diagnostic Advisor

Every detected failure includes actionable diagnostics — root cause identification, evidence, and remediation guidance.

Policy Engine

Configure per-failure actions on your Stozer dashboard — block, warn, or observe for each failure type. SDK wrappers automatically enforce your configured policy:

import { wrapOpenAI, GroundingError } from 'stozer-ai';
import OpenAI from 'openai';

const openai = wrapOpenAI(new OpenAI(), {
  mode: 'block',
  threshold: 0.85,
  onEvaluate: (report, trace) => {
    console.log('Score:', report.groundingScore);
    console.log('Failures:', report.detectedFailures.length);
  },
});

Also available: wrapAnthropic for Anthropic SDK, wrapGemini for Google Gemini SDK.

import { wrapAnthropic } from 'stozer-ai';
const client = wrapAnthropic(new Anthropic(), { mode: 'observe' });

import { wrapGemini } from 'stozer-ai';
const model = wrapGemini(genAI.getGenerativeModel({ model: 'gemini-2.0-flash' }), { mode: 'observe' });

MCP Server (VS Code, Cursor, Windsurf, Zed, Claude Desktop)

Use Stozer directly from your IDE — no terminal needed.

Setup (one command):

npx stozer auth stozer_live_your_key
npx stozer mcp-setup

Auto-detects installed IDEs and configures MCP. Restart your IDE after running.

  1. In VS Code: Ctrl+Shift+P"MCP: Open User Configuration"
  2. Add this to mcp.json:
{
  "servers": {
    "stozer": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "stozer-ai", "mcp"]
    }
  }
}
  1. Restart VS Code

Usage: In Copilot Chat, say: "Call stozer verify_response with this trace: {...}"

8 tools available: verify_response, quick_check, check_trace_quality, list_rules, get_failure_info, evaluate_with_policy, get_live_traces, get_trace_report

Express Middleware

import express from 'express';
import { groundingMiddleware, FileStore } from 'stozer-ai';

const app = express();
app.post('/api/chat', groundingMiddleware({
  mode: 'warn',
  store: new FileStore('./traces/grounding.jsonl'),
  extractTrace: (req, res, body) => body.trace,
}));

CLI

npx stozer auth stozer_live_your_key      # Save API key (once)
npx stozer mcp-setup                      # Auto-configure MCP for all IDEs
npx stozer evaluate trace.json            # Evaluate one trace
npx stozer traces                         # List recent evaluations
npx stozer trace <trace-id>               # Full report for a specific trace
npx stozer rules [category]               # List detection rules (grounding, reasoning, safety, ...)
npx stozer rule <failure_type>            # Details + remediation for a rule

Auto-discovery: Set STOZER_API_KEY env var and all SDK wrappers automatically send traces — no explicit client setup needed.


How It Works

  1. Analyze the agent's response against tool output data
  2. Detect grounding failures using deterministic rules
  3. Score overall grounding quality
  4. Diagnose root causes with actionable remediation

No LLM calls. Deterministic. Supports 11 human languages.


Benchmarks

Tested on public hallucination datasets — no cherry-picking, full results. Same engine, same rules — accuracy depends on data structure.

Tool-Calling Agents (structured JSON — APIs, databases, functions)

| Benchmark | Samples | Precision | Recall | F1 | Runtime | |---|---|---|---|---|---| | HaluEval QA | 16,662 | 96.4% | 93.3% | 96.5% | 8s |

HaluEval QA (Li et al., 2023): 16,662 question–answer pairs with known hallucinations. Near-zero false positives on structured data — when Stozer flags something, it's real.

RAG Pipelines (free-text documents — PDFs, knowledge bases, policies)

| Benchmark | Samples | Precision | Recall | F1 | Runtime | |---|---|---|---|---|---| | FaithBench | 750 | 61.3% | 78.6% | 68.9% | 4s |

FaithBench (Bai et al., 2024): 750 free-text summaries. Lower precision is expected — paraphrased prose is harder than structured data. Stozer reports a coverage metric showing what fraction of claims it verified deterministically vs. semantically, so you know exactly where the certainty boundary is.

Production Traces

| Samples | Precision | Recall | F1 | |---|---|---|---| | 1,500+ verified | 98.6% | 96.2% | 97.8% |

Predominantly tool-calling agents with structured API/database outputs, across HR, finance, and operations domains.

Why the Gap?

The accuracy difference comes from data structure, not configuration:

  • Tool outputs are structured JSON — field names, typed values, discrete data. Deterministic checks are near-perfect.
  • RAG documents are free text — paraphrases, coreference, entailment. Harder to match deterministically. Stozer transparently reports what it could verify and what it couldn't.

Reproduce locally: npx ts-node _halueval_bench.ts and npx ts-node _faithbench_bench.ts


License

Proprietary. See LICENSE for details.