npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

semantic-expect

v0.0.6

Published

LLM-based test assertions

Downloads

7

Readme

🔡🤞 Semantic Expect

LLM-based test assertions for Vitest and Jest

Version Workflow License Badges

test('Joke writer', async () => {
  await expect(writeJoke).toGenerate('Something funny');
});

This library is in early development and is seeking contributors!

Philosophy

Developing applications backed by generative artificial intelligence (such as large language models) requires us to redefine the very notion of "reliability". No longer is it possible — or even desirable — to expect our applications to do exactly what we program them to do: Not only are LLMs fundamentally non-deterministic, but exhibiting emergent and unprogrammed behaviors is one of the key things that makes LLMs so powerful in the first place. Any production-grade LLM-powered system will require multiple quality assurance mechanisms, including run-time checks, live service monitoring, offline evaluation, and — ideally — test automation.

Semantic Expect's role is to shift basic validation left and verify essential generative behavior before shipping. It will always be possible to tweak prompts and eke out better responses, but some behaviors may be simply unacceptable to ship at all. Semantic Expect lets you write tests for generative features that can be added to your continuous integration and deployment processes, alongside end-to-end and integration tests. You should err toward defining rules that express acceptable behavior rather than perfect behavior; otherwise your tests may exhibit "flakiness" that impedes development velocity. Finding this balance and refining these techniques is perhaps the new art of "semantic testing".

Setup

To use Semantic Expect, you'll need to register custom matchers with your test runner. Instructions vary slightly by runner, but generally look like this:

// First, import your LLM client and a matcher factory
import { OpenAI } from 'openai';
import { makeOpenAIMatchers } from 'semantic-expect';

const model = new OpenAI();

// Second, build the matchers by submitting the LLM client
const matchers = makeOpenAIMatchers(model);

// Third, register the matchers
expect.extend(matchers);

You can typically do multiple steps one line if preferred:

expect.extend(makeOpenAIMatchers(new OpenAI()));

See Jest expect.extend() and Vitest Extending Matchers for further details.

To use custom matchers across multiple test files, you can register them in a separate setup file. See Jest setupFilesAfterEnv configuration and Vitest setupFiles configuration for further details.

Matching

Because generative AI is fundamentally non-deterministic, it's generally not possible to test a static input against an expected value (e.g. using toBe), nor is it typically sufficient to generate only one test value for assessment. Given these dynamics, Semantic Expect provides a toGenerate matcher that accepts a generator function, runs it n times, and checks every generation against a requirement:

it('Should write an on-topic joke', async () => {
  const generator = () => writeJoke('about computers');
  // Be sure to await the assertion
  await expect(generator).toGenerate('A joke about computers', 5);
});

Note: You must await the assertion, since the model call is asynchronous. If you don't, the test will always pass!

If the generated content does not fulfill the requirement, the matcher will provide a message explaining why:

Each generation should be 'A joke about computers' (1 of 3 were not):
  - 'Why was the electricity feeling so powerful? Because it had a high voltage personality!' (Is a joke about electricity, not computers)

By default, toGenerate will run the generator 3 times, however a custom count can be specified as the second argument. Of course, it's always possible for a generator to work correctly 10 times and fail on the 11th time, but such is the reality of working with LLMs; the best we can do is manage the risk, not eliminate it. The requirements should be kept broad enough that they can reliably be met even with the inherent variability of the content being tested.

If the generator being tested doesn't require any parameters, it can be submitted on its own, without a wrapping function:

it('Should write something funny', async () => {
  await expect(writeJoke).toGenerate('Something funny');
});

The toGenerate matcher can also be negated using not:

it('Should write a work-appropriate joke', async () => {
  const generator = () => writeJoke('about computers');
  await expect(generator).not.toGenerate('Anything inappropriate for work', 5);
});

Models

Semantic Expect provides multiple options for the models backing the custom matchers.

  • makeOpenAIMatchers: Uses OpenAI backend and defaults to chat-based model (alias for makeOpenAIChatMatchers)
  • makeOpenAIChatMatchers: Uses OpenAI backend and always uses chat-based model
  • makeOpenAITextMatchers: Uses OpenAI backend and always uses text-based (instruct) model

You can also specify a particular model via options if desired:

const textMatchers = makeOpenAITextMatchers(client, {
  model: 'text-davinci-003',
});
const chatMatchers = makeOpenAIChatMatchers(client, { model: 'gpt-4' });

Message formats

Semantic Expect generates an unformatted test result message by default, however this can be customized for your test runner and preferences:

const jestMatchers = makeOpenAIMatchers(client, { format: 'jest' });
const vitestMatchers = makeOpenAIMatchers(client, { format: 'vitest' });

Additional examples

Semantic Expect includes general examples by default, however your particular use case may benefit from additional guidance. Examples include the following properties:

  • requirement: A description of the desired generated content, such as "A professional greeting"
  • content: The content being submitted for assessment, such as "What's up?? 🤪"
  • assessment: A brief assessment of why the content does or doesn't fulfill the requirement, such as "Uses casual language"
  • pass: true if requirement is fulfilled, false if not

Additional examples are registered when you create your matchers:

const matchers = makeOpenAIMatchers(client, {
  examples: [
    {
      requirement: 'A professional greeting',
      content: "What's up?? 🤪",
      assessment: 'Uses casual language',
      pass: false,
    },
  ],
});

There is no hard limit to the number of custom examples you can provide, however note that you may eventually run up against token limits imposed by your model.

To-do

  • Support LLM providers other than OpenAI
  • Message formats for additional test runners, and fully custom format function
  • Test coverage, particularly a suite directly testing determinations (including their wording) in order to trim down the prompt content as much as possible
  • Docs