npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

llm-rehearsal

v0.0.5

Published

Prompt evaluation and regression testing

Downloads

4

Readme

Rehearsal

Prompt evaluation and regression testing

Modifying a prompt, sometimes even the smallest details, can have a big impact on the output. Rehearsal makes it easy to perform various tests or evaluations against LLM output. Use cases for Rehearsal include:

  • regression testing
  • QA
  • helping with prompt iteration

Installation

yarn add -D llm-rehearsal

Usage

One important aspect of Rehearsal is that it's completely agnostic of what's used to generate the text. Simply provide an async function that returns a {text: "llm response"} object:

import { rehearsal, expectations } from 'llm-rehearsal';

const { includesString } = expectations;

// Provide an LLM function
const { testCase, run } = rehearsal(async (input: { country: string }) => {
  // your custom code to call LLM here
  const textResponse = await callLLM({
    prompt: `What is the capital of ${country}?`,
  });
  return { text: textResponse }; // only requirement is to return llm response in `text` property
});

// Define test cases
testCase('France', {
  input: { country: 'France' },
  expect: [includesString('paris')],
});
testCase('Germany', {
  input: { country: 'Germany' },
  expect: [includesString('berlin')],
});

// Start test suite
run();

To run the tests, don't forget to call run() at the end and execute your file (with plain node for JS or ts-node for TS).

Expectations for all test cases

To run expectations on all test cases, use expectForAll():

const { testCase, run, expectForAll } = rehearsal(
  async (input: { country: string }) => {
    // your custom code to call LLM here
    const textResponse = await callLLM({
      prompt: `What is the capital of ${country}?`,
    });
    return { text: textResponse }; // only requirement is to return llm response in `text` property
  },
);

// This expectation will be run for all testCase
expectForAll([not(includesString('as a large language model'))]);

Mixing expectations

Expectations can be composed with boolean logic:

import { rehearsal, expectations } from 'llm-rehearsal';
const { includesString, not, and, or } = expectations;

const { testCase } = rehearsal(llmFunction);

testCase("don't say yellow", {
  input: {
    /* input variables */
  },
  expect: [not(includesString('yellow'))],
});

testCase('potato/tomato', {
  input: {
    /* input variables */
  },
  expect: [or(includesString('potato'), includesString('tomato'))],
});

testCase('the cake is a lie', {
  input: {
    /* input variables */
  },
  expect: [and(includesString('cake'), includesString('lie'))],
});

Built-in expectations

  • includesString - checks if the LLM response contains a given string
  • matchesRegex - checks if the LLM response matches a given regular expression
  • not - negates an expectation
  • and - compose multiple expectations with AND logic
  • or - compose multiple expectations with OR logic

Coming soon:

  • includesWord - check for separate words, not just substrings
  • askGPT - perform evaluation through a GPT prompt

Custom expectations

Custom expctations can be easily created:

import { createExpectation } from 'llm-rehearsal';

const { isLongerThan } = createExpectation(
  'isLongerThan',
  (count: number) => (output) => {
    return output.text.length > count
      ? { pass: true }
      : {
          pass: false,
          message: `Expected output text to be > ${count} characters, but instead is ${output.text.length}`,
        };
  },
);

// use it as the built-in expectations
testCase('long output', {
  input: {
    /* input variables */
  },
  expect: [isLongerThan(9000)],
});

// custom expectations can also be composed with boolean logic:
testCase('long output with sandwich in it', {
  input: {
    /* input variables */
  },
  expect: [and(isLongerThan(9000), includesString('sandwich'))],
});

If your function returns more than just a text (such as metadata or results of intermediate steps), you can create type-safe expectations:

import { rehearsal, expectations } from 'llm-rehearsal';

// notice that `createExpectation` is returned by the rehearsal() function,
// and is typed according to the input/output of the LLM function
const { testCase, createExpectation } = rehearsal(
  async (input: { country: string }) => {
    // your custom code to call LLM here
    const { textResponse, documents } = await callLLMChain({
      prompt: `What is the capital of ${country}?`,
    });
    return { text: textResponse, documents }; // we return more than just `text`
  },
);

const { usesDocuments } = createExpectation('usesDocuments', () => (output) => {
  return output.documents.length > 0 // output is properly typed
    ? { pass: true }
    : { pass: false, message: 'Expected documents to be returned, found none' };
});

Labels for expectations

To make test results more readable, expectations can attached a label:

testCase('my test case', {
  input: {},
  expect: [
    [includesString('banana'), 'include banana'],
    [matchesRegex(/^hello/), 'starts with "hello"'],
    // also works with composed expectations:
    [
      not(
        or(
          includesString('hamburger'),
          includesString('fries'),
          includesString('hotdog'),
          includesString('chicken nuggets'),
          includesString('burritos'),
        ),
      ),
      'no fastfood',
    ],
  ],
});

Describe

Just like most testing library, you can group test cases using describe:

import { rehearsal, expectations, describe } from 'llm-rehearsal';

const { includesString } = expectations;
const { testCase, run } = rehearsal(async (input: { country: string }) => {
  // your custom code to call LLM here
  const textResponse = await callLLM({
    prompt: `What is the capital of ${country}?`,
  });
  return { text: textResponse };
});

describe('Countries', () => {
  testCase('France', {
    input: { country: 'France' },
    expect: [includesString('paris')],
  });
  testCase('Germany', {
    input: { country: 'Germany' },
    expect: [includesString('berlin')],
  });
});

Note: describe does not support only. This should be supported in the future.

Only

To isolate a test case and run only this one (or only a few), use textCase.only:

testCase('France', {
  input: { country: 'France' },
  expect: [includesString('paris')],
});
testCase.only('Germany', {
  input: { country: 'Germany' },
  expect: [includesString('berlin')],
});

This will only run the Germany test case. Multiple test case can be marked "only" to run a selected set.

Local development

To install a local build of Rehearsal, the recommended method is to use Yalc. Make sure to install yalc globally.

  1. Build the library: yarn build
  2. Publish to the yalc local store (does not leave your computer): yarn publish-local
  3. On the consuming side (the NodeJS project where you want to install Rehearsal): yalc install llm-rehearsal

Note
Keep in mind that Yalc will copy the package to the store, and then copy it again when installed on the consuming side. After a new build, you'll need to run yarn publish-local in this repository and also yalc update on the consuming side.