@llmdata/rubric

v0.1.0

Published

6 months ago

TypeScript/Node.js bindings for Rubric - LLM-based evaluation using weighted rubrics. High-performance Rust core with idiomatic TypeScript API.

0High
0Medium
0Low

gbains

llm evaluation rubric grading ai machine-learning rust napi typescript openai anthropic language-models

About

This package provides TypeScript/Node.js bindings to the Rubric Rust core library via napi-rs, enabling high-performance LLM evaluation in JavaScript environments. The core evaluation logic is written in Rust for maximum performance, with idiomatic TypeScript bindings for ease of use.

Installation

npm install @llmdata/rubric

yarn add @llmdata/rubric

pnpm add @llmdata/rubric

bun add @llmdata/rubric

Quick Start

Set up environment variables:

export OPENAI_API_KEY=your_api_key_here
# Or any other model API key used in your generate function

Run the example below:

import { Rubric, PerCriterionGrader } from '@llmdata/rubric';
import OpenAI from 'openai';

// Declare custom generate function with any model and inference provider
async function generateWithOpenAI(systemPrompt: string, userPrompt: string): Promise<string> {
  const client = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt },
    ],
    max_tokens: 400,
    temperature: 0.0,
  });

  return response.choices[0]?.message?.content || '';
}

async function main() {
  // Build rubric
  const rubric = Rubric.fromDict([
    { weight: 10.0, requirement: "States Q4 2023 base margin as 17.2%" },
    { weight: 8.0, requirement: "Explicitly uses Shapley attribution for decomposition" },
    { weight: -15.0, requirement: "Uses total deliveries instead of cash-only deliveries" }
  ]);

  // Select autograder strategy
  const grader = new PerCriterionGrader(
    generateWithOpenAI,
    "This overrides the default grader system prompt"
  );

  // Grade output
  const result = await rubric.grade(
    "Output to evaluate...",
    grader,
    "Input query..."
  );

  console.log(`Score: ${result.score.toFixed(2)}`);  // Score is 0.0-1.0
  
  if (result.report) {
    for (const criterion of result.report) {
      console.log(`  [${criterion.verdict}] ${criterion.requirement}`);
      console.log(`    → ${criterion.reason}`);
    }
  }
}

main().catch(console.error);

Autograder Strategies

PerCriterionGrader

Evaluates each criterion in parallel inference calls.

Scoring Formula:

For each criterion i:

If verdict = MET, contribution = wi
If verdict = UNMET, contribution = 0

Final score:

score = max(0, min(1, Σ(verdict_i = MET ? w_i : 0) / Σ(max(0, w_i))))

Where:

wi = weight of criterion i
Denominator = sum of positive weights only
Numerator = sum of weights for MET criteria
Result clamped to [0, 1]

PerCriterionOneShotGrader (Coming Soon)

Makes 1 inference call that evaluates all criteria together and returns a structured output, unlike PerCriterionGrader which makes n inference calls.

RubricAsJudgeGrader (Coming Soon)

Holistic evaluation where the model returns a final score directly.

API Reference

Rubric

class Rubric {
  constructor(criteria: CriterionInput[]);
  static fromDict(criteria: CriterionInput[]): Rubric;
  static fromJson(json: string): Rubric;
  static fromYaml(yaml: string): Rubric;
  static fromFile(path: string): Rubric;
  
  len(): number;
  isEmpty(): boolean;
  
  grade(
    toGrade: string,
    grader?: PerCriterionGrader,
    query?: string
  ): Promise<EvaluationReport>;
}

PerCriterionGrader

class PerCriterionGrader {
  constructor(
    generateFn?: GenerateFunction,
    systemPrompt?: string
  );
}

Types

type CriterionInput = {
  weight: number;
  requirement: string;
};

type CriterionReport = {
  weight: number;
  requirement: string;
  verdict: "MET" | "UNMET";
  reason: string;
};

type EvaluationReport = {
  score: number;
  report?: CriterionReport[];
};

type GenerateFunction = (
  systemPrompt: string,
  userPrompt: string
) => Promise<string> | string;

Loading Rubrics

// Direct construction
const rubric = new Rubric([
  { weight: 10.0, requirement: "States Q4 2023 base margin as 17.2%" },
  { weight: 8.0, requirement: "Explicitly uses Shapley attribution for decomposition" },
  { weight: -15.0, requirement: "Uses total deliveries instead of cash-only deliveries" }
]);

// From array of objects
const rubric = Rubric.fromDict([
  { weight: 10.0, requirement: "States Q4 2023 base margin as 17.2%" },
  { weight: 8.0, requirement: "Explicitly uses Shapley attribution for decomposition" }
]);

// From JSON string
const rubric = rubricFromJson('[{"weight": 10.0, "requirement": "Example requirement"}]');

// From YAML string
const yamlData = `
- weight: 10.0
  requirement: "Example requirement"
`;
const rubric = rubricFromYaml(yamlData);

// From files
const rubric = rubricFromFile('rubric.json');
const rubric = rubricFromFile('rubric.yaml');

JSON Format

[
  {
    "weight": 10.0,
    "requirement": "States Q4 2023 base margin as 17.2%"
  },
  {
    "weight": 8.0,
    "requirement": "Explicitly uses Shapley attribution for decomposition"
  },
  {
    "weight": -15.0,
    "requirement": "Uses total deliveries instead of cash-only deliveries"
  }
]

YAML Format

- weight: 10.0
  requirement: "States Q4 2023 base margin as 17.2%"
- weight: 8.0
  requirement: "Explicitly uses Shapley attribution for decomposition"
- weight: -15.0
  requirement: "Uses total deliveries instead of cash-only deliveries"

Examples with Different Providers

OpenAI

import OpenAI from 'openai';

async function generateWithOpenAI(systemPrompt: string, userPrompt: string): Promise<string> {
  const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  const response = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt },
    ],
    max_tokens: 400,
    temperature: 0.0,
  });
  return response.choices[0]?.message?.content || '';
}

Anthropic

import Anthropic from '@anthropic-ai/sdk';

async function generateWithAnthropic(systemPrompt: string, userPrompt: string): Promise<string> {
  const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
  const response = await client.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 400,
    system: systemPrompt,
    messages: [{ role: 'user', content: userPrompt }],
  });
  return response.content[0].type === 'text' ? response.content[0].text : '';
}

OpenRouter

import OpenAI from 'openai';

async function generateWithOpenRouter(systemPrompt: string, userPrompt: string): Promise<string> {
  const client = new OpenAI({
    baseURL: 'https://openrouter.ai/api/v1',
    apiKey: process.env.OPENROUTER_API_KEY,
  });
  const response = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt },
    ],
  });
  return response.choices[0]?.message?.content || '';
}

Local Models (Ollama)

import { Ollama } from 'ollama';

async function generateWithOllama(systemPrompt: string, userPrompt: string): Promise<string> {
  const ollama = new Ollama();
  const response = await ollama.chat({
    model: 'llama3.1',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt },
    ],
  });
  return response.message.content;
}

Requirements

Node.js 16+
TypeScript 5.0+ (optional, for TypeScript users)
An LLM API (e.g., OpenAI, Anthropic, OpenRouter, local models)

Platform Support

Pre-built binaries are available for:

macOS: x64, ARM64 (Apple Silicon)
Linux: x64, ARM64 (glibc and musl)
Windows: x64, ARM64

If a pre-built binary is not available for your platform, the package will compile from source during installation (requires Rust toolchain).

Building from Source

If you need to build from source:

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone the repository
git clone https://github.com/The-LLM-Data-Company/rubric.git
cd rubric/bindings/node

# Install dependencies and build
npm install
npm run build

# Run tests
npm test

For detailed publishing instructions, see NPM_PUBLISHING.md.

Performance

The Rust core provides significant performance benefits:

Fast evaluation: Native Rust performance for rubric scoring
Memory efficient: Minimal memory overhead compared to pure JavaScript
Concurrent grading: Efficient parallel processing of multiple criteria
Type safety: TypeScript definitions provide full type safety

Contributing

Contributions are welcome! Please see the main repository for contribution guidelines.

License

MIT License - see LICENSE file for details.

Related Projects

Python bindings: rubric on PyPI
Rust core: rubric-core

Support

GitHub Issues: Report bugs or request features
Documentation: Full documentation
NPM Package: @llmdata/rubric

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

About

Installation

Quick Start

Autograder Strategies

PerCriterionGrader

PerCriterionOneShotGrader (Coming Soon)

RubricAsJudgeGrader (Coming Soon)

API Reference

Rubric

PerCriterionGrader

Types

Loading Rubrics

JSON Format

YAML Format

Examples with Different Providers

OpenAI

Anthropic

OpenRouter

Local Models (Ollama)

Requirements

Platform Support

Building from Source

Performance

Contributing

License

Related Projects

Support