cypress-verify-llm

v0.0.2

Published

23 days ago

Cypress custom commands for verifying LLM/AI response correctness with percentage scoring

0High
0Medium
0Low

daniilqa

cypress cypress-plugin llm ai testing assertions verification chatbot gpt claude hallucination safety prompt-testing

cypress-verify-llm

Cypress custom commands for verifying LLM/AI response correctness with percentage scoring.

Built by Daniil Shapovalov - Cypress Ambassador for the software testing community.

Features

11 assertion types covering all aspects of LLM response quality
Percentage scoring (0-100%) for every assertion — not just pass/fail
Rich Cypress Command Log integration with detailed console output
User-configurable phrase lists — override defaults per-call or globally
Extensible — register custom assertion types
TypeScript support with full autocomplete
Zero dependencies — only requires Cypress as a peer dependency

Installation

npm install cypress-verify-llm --save-dev

Add to your cypress/support/e2e.js (or e2e.ts):

import "cypress-verify-llm";

For TypeScript, add to tsconfig.json:

{
  "compilerOptions": {
    "types": ["cypress", "cypress-verify-llm"]
  }
}

Quick Start

it("verifies LLM refuses harmful requests", () => {
  const response = "I'm sorry, I cannot assist with that request.";
  cy.verifyLlmResponse(response, "policy").then((result) => {
    cy.log(`Score: ${result.score}%`); // Score: 100%
  });
});

it("verifies keyword coverage", () => {
  const response = "OCR extracts text from PDF files and images...";
  cy.verifyLlmResponse(response, "semantic_summary", {
    requiredKeywords: ["OCR", "text extraction", "PDF", "image"],
    minLength: 50,
  });
});

it("verifies JSON schema", () => {
  const response = '{"name": "Alice", "age": 30}';
  cy.verifyLlmResponse(response, "schema", {
    expectedSchema: { name: "string", age: "number" },
  });
});

API Reference

`cy.verifyLlmResponse(responseText, assertionType, options?)`

Primary command for verifying LLM responses.

| Parameter | Type | Description | |-----------|------|-------------| | responseText | string | The LLM response text to verify | | assertionType | string | One of the 11 assertion types below | | options | object | Type-specific options (see table) |

Returns: Cypress.Chainable<VerifyResult>

`cy.verifyLlmSuite(responseText, suiteObject)`

Accepts a test suite fixture object (compatible with JSON fixture format).

cy.verifyLlmSuite(responseText, {
  assertionType: "exact",
  expected: "hello world",
  id: "TEST-01",        // metadata — ignored by assertion
  category: "basic",    // metadata — ignored by assertion
});

Assertion Types

| Type | Description | Required Options | Scoring | |------|-------------|-----------------|---------| | exact | Exact string match | { expected } | 100% if match, else Levenshtein similarity | | contains | Substring presence | { expected } | 100% if found, 0% if not | | regex | RegExp pattern match | { expected } (pattern string) | 100% if matches, 0% if not | | varType | List structure validation | { expected } (CSV or array) | Partial item match ratio | | policy | Safety/refusal detection | { phrases? } | min(100, matchedPhrases/3 * 100) | | clarification | Ambiguity handling | { phrases? } | Phrase match with penalties | | truthfulness | Hallucination resistance | { negationPhrases?, limitationPhrases?, hypotheticalPhrases? } | 33.3% per category (negation/limitation/hypothetical) | | semantic_summary | Keyword coverage | { requiredKeywords, forbiddenKeywords?, minLength?, threshold? } | Keyword match ratio with penalties | | schema | JSON structure validation | { expectedSchema } | Per-field existence + type score | | length | Word count enforcement | { maxWords, tolerance?, minWords? } | 100% within tolerance, degrades linearly | | repeatability | Deterministic output check | { expected } | 100% if match, 0% if not |

Result Object

Every assertion returns a VerifyResult:

{
  pass: boolean;     // Whether the assertion passed
  score: number;     // 0-100 correctness percentage
  details: {
    message: string; // Human-readable result description
    // ...type-specific fields (matchedPhrases, missing keywords, etc.)
  };
}

Custom Phrase Lists

Phrase-based assertions (policy, clarification, truthfulness) ship with default phrases but are fully configurable.

Per-call override

cy.verifyLlmResponse(text, "policy", {
  phrases: ["access denied", "not permitted", "unauthorized"],
});

cy.verifyLlmResponse(text, "truthfulness", {
  negationPhrases: ["incorrect", "false premise"],
  limitationPhrases: ["no data available"],
});

Global configuration

const { configurePhrases } = require("cypress-verify-llm/register");
const { DEFAULT_POLICY_PHRASES } = require("cypress-verify-llm/phrases");

// Extend defaults
configurePhrases("policy", [...DEFAULT_POLICY_PHRASES, "my-custom-phrase"]);

// Override truthfulness categories
configurePhrases("truthfulness.negation", ["incorrect", "false"]);
configurePhrases("truthfulness.limitation", ["no data"]);
configurePhrases("truthfulness.hypothetical", ["theoretically"]);

Priority: per-call options > global config > built-in defaults.

Available default phrase exports

const {
  DEFAULT_POLICY_PHRASES,        // 43 phrases
  DEFAULT_CLARIFICATION_PHRASES, // 38 phrases
  DEFAULT_NEGATION_PHRASES,      // 36 phrases
  DEFAULT_LIMITATION_PHRASES,    // 30 phrases
  DEFAULT_HYPOTHETICAL_PHRASES,  // 30 phrases
} = require("cypress-verify-llm/phrases");

Custom Assertion Types

const { registerAssertion } = require("cypress-verify-llm/register");

registerAssertion("sentiment", (responseText, options) => {
  const positiveWords = options.positiveWords || ["good", "great", "excellent"];
  const text = responseText.toLowerCase();
  const matched = positiveWords.filter((w) => text.includes(w));
  const score = Math.round((matched.length / positiveWords.length) * 100);

  return {
    pass: score >= (options.threshold || 50),
    score,
    details: { message: `Sentiment score: ${score}%`, matched },
  };
});

// Now use it:
cy.verifyLlmResponse(text, "sentiment", { threshold: 60 });

Error Handling

The plugin validates all inputs and provides clear, actionable error messages:

cypress-verify-llm: "responseText" is required and must be a string. Received: undefined
cypress-verify-llm: Unknown assertion type "typo". Available: exact, contains, regex, ...
cypress-verify-llm [exact]: "expected" option is required and must be a string.
cypress-verify-llm [regex]: Invalid regex pattern "[invalid": Unterminated character class
cypress-verify-llm [schema]: Response is not valid JSON: "not json at all..."

Closed Alpha Testing

For the author

cd packages/cypress-verify-llm
npm link

For testers (local)

npm link cypress-verify-llm

For testers (remote — git install)

npm install github:aaico/cypress-verify-llm#main --save-dev

For testers (tarball)

# Author packs:
cd packages/cypress-verify-llm && npm pack
# Tester installs:
npm install ./cypress-verify-llm-0.0.1.tgz --save-dev

Publishing to npm

cd packages/cypress-verify-llm

# Verify package contents
npm pack --dry-run

# Publish
npm publish --access public

# Verify
npm view cypress-verify-llm

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

cypress-verify-llm

Features

Installation

Quick Start

API Reference

cy.verifyLlmResponse(responseText, assertionType, options?)

cy.verifyLlmSuite(responseText, suiteObject)

Assertion Types

Result Object

Custom Phrase Lists

Per-call override

Global configuration

Available default phrase exports

Custom Assertion Types

Error Handling

Closed Alpha Testing

For the author

For testers (local)

For testers (remote — git install)

For testers (tarball)

Publishing to npm

License

`cy.verifyLlmResponse(responseText, assertionType, options?)`

`cy.verifyLlmSuite(responseText, suiteObject)`