@akhilan-fluxon/llm-testrunner-components

v1.1.4

Published

6 months ago

A Stencil web component library for LLM test runner functionality

Downloads

0High
0Medium
0Low

akhilan-fluxon

web-components stencil react typescript

LLM TestRunner Web Components

A Stencil web component library that provides a comprehensive LLM testing solution with automated evaluation capabilities.

Overview

The LLM TestRunner is a tool for testing Large Language Model (LLM) responses against expected criteria. It provides a complete interface for:

Question Management: Add, edit, and organize test questions
AI Integration: Direct integration with Google's Gemini AI API
Automated Evaluation: Built-in evaluation engine that checks responses against expected keywords and source links
Batch Testing: Run multiple tests sequentially
Real-time Results: Live evaluation results with pass/fail indicators

Components

`<llm-test-runner>`

The main component that provides a complete LLM testing interface.

Features:

Question input with expected keywords and source links
Real-time AI response generation via Gemini API
Test case management (add, delete, run individual or all tests)
Built-in evaluation engine with keyword and source link matching
Error handling and loading states
Rate limiting for batch operations

Usage:

<llm-test-runner api-key="your-gemini-api-key-here"></llm-test-runner>

🎯 Usage Modes

1. Direct HTML Usage

Simply include the component in your HTML:

<!DOCTYPE html>
<html>
<head>
  <script type="module" src="/build/llm-testrunner.esm.js"></script>
  <script nomodule src="/build/llm-testrunner.js"></script>
</head>
<body>
  <llm-test-runner api-key="your-gemini-api-key-here"></llm-test-runner>
</body>
</html>

2. Library Integration

Import as a module in your application:

import { LLMTestRunner } from 'llm-testrunner-components';

// The component is automatically registered and ready to use

Configuration

API Key Prop

The component accepts the Gemini API key as a prop.

<llm-test-runner api-key="your-gemini-api-key-here"></llm-test-runner>

React/JSX Usage

function App() {
  return (
    <div>
      <llm-test-runner apiKey="your-gemini-api-key-here" />
    </div>
  );
}

Evaluation Engine

The built-in evaluation engine provides:

Keyword Matching: Case-insensitive matching of expected keywords in AI responses
Source Link Validation: Checks for presence of expected URLs in responses
Pass/Fail Logic: Tests pass only when ALL expected items are found
Detailed Results: Shows which keywords and links were found/missing

Evaluation Criteria

Keywords: Must be present in the AI response (case-insensitive)
Source Links: Must be present as exact URL matches
Pass Condition: ALL expected keywords AND source links must be found

Development

Building

npm run build

Testing

npm test

Development Server

npm start

Project Structure

src/
├── components/
│   └── llm-test-runner/          # Main component
│       ├── llm-test-runner.tsx   # Component logic
│       ├── llm-test-runner.css   # Styling
│       └── readme.md             # Component documentation
├── lib/
│   └── evaluation/               # Evaluation engine
│       ├── evaluation-engine.ts  # Core evaluation logic
│       ├── types.ts              # TypeScript interfaces
│       └── index.ts              # Exports
└── index.ts                      # Main library exports

Using in React Applications

Installation

npm install llm-testrunner-components

Integration

import React, { useEffect } from 'react';
import { defineCustomElements } from 'llm-testrunner-components/loader';

function App() {
  useEffect(() => {
    defineCustomElements();
  }, []);

  return (
    <div>
      <h1>LLM Test Runner</h1>
      <llm-test-runner api-key="your-gemini-api-key-here"></llm-test-runner>
    </div>
  );
}

TypeScript Support

declare global {
  namespace JSX {
    interface IntrinsicElements {
      'llm-test-runner': any;
    }
  }
}

API Reference

Component Props

interface LLMTestRunnerProps {
  apiKey: string; // Required: Your Gemini API key
}

TestCase Interface

interface TestCase {
  id: string;
  question: string;
  expectedKeywords: string[];
  expectedSourceLinks: string[];
  output?: string;
  isRunning?: boolean;
  error?: string;
  evaluationResult?: EvaluationResult;
}

EvaluationResult Interface

interface EvaluationResult {
  testCaseId: string;
  passed: boolean;
  keywordMatches: KeywordMatch[];
  sourceLinkMatches: SourceLinkMatch[];
  timestamp?: string;
}