npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@syntheticlab/synbad

v0.0.8

Published

LLM inference provider evals

Readme

Synbad the legendary sailor

Synbad is a tool for detecting bugs in LLM inference providers, especially open-source ones. Synbad is maintained by Synthetic, as part of our efforts to keep our inference quality as high as possible.

If you find bugs in Synthetic's model hosting, please contribute the bugs here! We will fix them.

Install

Synbad is distributed through npm. Install it with:

npm install -g @syntheticlab/synbad

Results

We keep a running tally of provider+model results for tool calling and reasoning parsing for GLM-4.7, Kimi K2 Thinking, and MiniMax M2. Feel free to add more provider results!

|Provider |Model |Success Rate| |---------|----------------|------------| |Synthetic.new|GLM-4.7 |:white_check_mark: 100%| |Synthetic.new|Kimi K2 Thinking|:white_check_mark: 100%| |Synthetic.new|MiniMax M2 |:white_check_mark: 100%|

|Provider |Model |Success Rate| |---------|----------------|------------| |Fireworks|GLM-4.7 |:x: 83%| |Fireworks|Kimi K2 Thinking|:x: 92%| |Fireworks|MiniMax M2 |:white_check_mark: 100%|

|Provider |Model |Success Rate| |---------|----------------|------------| |Together |Kimi K2 Thinking|:x: 66%|

|Provider |Model |Success Rate| |---------|----------------|------------| |Parasail |GLM-4.7 |:x: 83%| |Parasail |Kimi K2 Thinking|:x: 75%|

Note for attempting reproductions: generally all tests are reproducible with --count 1 and --count 1 --stream, but for evaluating the response-in-reasoning eval, you generally will need a high count to reproduce the bug: --count 40 and --count 40 --stream typically is sufficient.

All evals must pass both with and without Synbad's --stream parameter (which tests streaming APIs) to be considered a pass.

How do I contribute inference bugs?

If you already have some problematic JSON, head over to the Contributing section. If you don't, don't worry! Synbad makes it easy to capture the problematic JSON you're encountering.

First, run the Synbad Proxy, specifying the local port you want to use and the inference host you want to target. For example, to forward requests from localhost:3000 to Synthetic's API, you'd do:

synbad proxy -p 3000 -t https://api.synthetic.new/openai/v1

Then, configure your coding agent — or whichever local tool you're using — to point to http://localhost:3000 (or whichever port you selected). The Synbad Proxy will log all request bodies to stdout, so all you need to do is reproduce the bug by using your tool or coding agent, and then copy the JSON it printed to stdout.

Now you have reproducible JSON to file a bug via Synbad!

Contributing

First, clone this repo from Github. Then cd into it and run:

npm install

All inference evals are stored in the evals/ directory. They're written in TypeScript. You need to export two things from an eval:

  1. The JSON that reproduces the problem, as the const json. It doesn't have to reproduce it 100% of the time; if the bug appears even 5% of the time, that's fine.
  2. A test function that runs some asserts on the returned assistant message, which detect the error.

For example, we can test parallel tool call support very simply (as we do in the evals/tools/parallel-tool.ts file):

import * as assert from "../../source/asserts.ts";
import { ChatMessage } from "../../source/chat-completion.ts";

export function test({ tool_calls }: ChatMessage) {
  assert.isNotNullish(tool_calls);
  assert.isNotEmptyArray(tool_calls);
  assert.strictEqual(tool_calls.length, 2);
}

export const json = {
  "messages": [
    {"role": "user", "content": "What's the weather in Paris and London?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "parallel_tool_calls": true,
  "tool_choice": "auto",
}

The asserts.ts file re-exports all of the built-in NodeJS assertion functions, and also adds a few extra ones, e.g. isNotNullish which checks whether an object is null or undefined.

To run your new eval, use the synbad.sh script in this repo, which auto-recompiles everything (including your new test!) before running the evals. Assuming you're testing the evals/reasoning/reasoning-parsing test, for GLM-4.6 on Synthetic, and you want to run it 5 times since it isn't consistently failing:

./synbad.sh eval --env-var SYNTHETIC_API_KEY \
  --base-url "https://api.synthetic.new/openai/v1" \
  --only evals/reasoning/reasoning-parsing \
  --model "hf:zai-org/GLM-4.6" \
  --count 5

Handling reasoning parsing

The OpenAI spec didn't originally include reasoning content parsing, since the original OpenAI models didn't reason. The open-source community added support for reasoning later, but there are two competing specs:

  1. Storing the reasoning content in message.reasoning_content, or
  2. Storing the reasoning content in message.reasoning.

To make sure your evals work with a wider range of inference providers, use the getReasoning function when testing reasoning parsing like so:

import { getReasoning } from "../../source/chat-completion.ts";

// In your test:

const reasoning = getReasoning(message);

This ensures your test will use the correct reasoning content data regardless of which spec the underlying inference provider is using.

Running Synbad

First, install it:

npm install -g @syntheticlab/synbad

Then run:

synbad eval --env-var SYNTHETIC_API_KEY \
  --base-url "https://api.synthetic.new/openai/v1" \
  --model "hf:zai-org/GLM-4.6"