@llmdata/rubric
v0.1.0
Published
TypeScript/Node.js bindings for Rubric - LLM-based evaluation using weighted rubrics. High-performance Rust core with idiomatic TypeScript API.
Maintainers
Readme
About
This package provides TypeScript/Node.js bindings to the Rubric Rust core library via napi-rs, enabling high-performance LLM evaluation in JavaScript environments. The core evaluation logic is written in Rust for maximum performance, with idiomatic TypeScript bindings for ease of use.
Installation
npm install @llmdata/rubricyarn add @llmdata/rubricpnpm add @llmdata/rubricbun add @llmdata/rubricQuick Start
- Set up environment variables:
export OPENAI_API_KEY=your_api_key_here
# Or any other model API key used in your generate function- Run the example below:
import { Rubric, PerCriterionGrader } from '@llmdata/rubric';
import OpenAI from 'openai';
// Declare custom generate function with any model and inference provider
async function generateWithOpenAI(systemPrompt: string, userPrompt: string): Promise<string> {
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
max_tokens: 400,
temperature: 0.0,
});
return response.choices[0]?.message?.content || '';
}
async function main() {
// Build rubric
const rubric = Rubric.fromDict([
{ weight: 10.0, requirement: "States Q4 2023 base margin as 17.2%" },
{ weight: 8.0, requirement: "Explicitly uses Shapley attribution for decomposition" },
{ weight: -15.0, requirement: "Uses total deliveries instead of cash-only deliveries" }
]);
// Select autograder strategy
const grader = new PerCriterionGrader(
generateWithOpenAI,
"This overrides the default grader system prompt"
);
// Grade output
const result = await rubric.grade(
"Output to evaluate...",
grader,
"Input query..."
);
console.log(`Score: ${result.score.toFixed(2)}`); // Score is 0.0-1.0
if (result.report) {
for (const criterion of result.report) {
console.log(` [${criterion.verdict}] ${criterion.requirement}`);
console.log(` → ${criterion.reason}`);
}
}
}
main().catch(console.error);Autograder Strategies
PerCriterionGrader
Evaluates each criterion in parallel inference calls.
Scoring Formula:
For each criterion i:
- If verdict = MET, contribution = wi
- If verdict = UNMET, contribution = 0
Final score:
score = max(0, min(1, Σ(verdict_i = MET ? w_i : 0) / Σ(max(0, w_i))))Where:
- wi = weight of criterion i
- Denominator = sum of positive weights only
- Numerator = sum of weights for MET criteria
- Result clamped to [0, 1]
PerCriterionOneShotGrader (Coming Soon)
Makes 1 inference call that evaluates all criteria together and returns a structured output, unlike PerCriterionGrader which makes n inference calls.
RubricAsJudgeGrader (Coming Soon)
Holistic evaluation where the model returns a final score directly.
API Reference
Rubric
class Rubric {
constructor(criteria: CriterionInput[]);
static fromDict(criteria: CriterionInput[]): Rubric;
static fromJson(json: string): Rubric;
static fromYaml(yaml: string): Rubric;
static fromFile(path: string): Rubric;
len(): number;
isEmpty(): boolean;
grade(
toGrade: string,
grader?: PerCriterionGrader,
query?: string
): Promise<EvaluationReport>;
}PerCriterionGrader
class PerCriterionGrader {
constructor(
generateFn?: GenerateFunction,
systemPrompt?: string
);
}Types
type CriterionInput = {
weight: number;
requirement: string;
};
type CriterionReport = {
weight: number;
requirement: string;
verdict: "MET" | "UNMET";
reason: string;
};
type EvaluationReport = {
score: number;
report?: CriterionReport[];
};
type GenerateFunction = (
systemPrompt: string,
userPrompt: string
) => Promise<string> | string;Loading Rubrics
// Direct construction
const rubric = new Rubric([
{ weight: 10.0, requirement: "States Q4 2023 base margin as 17.2%" },
{ weight: 8.0, requirement: "Explicitly uses Shapley attribution for decomposition" },
{ weight: -15.0, requirement: "Uses total deliveries instead of cash-only deliveries" }
]);
// From array of objects
const rubric = Rubric.fromDict([
{ weight: 10.0, requirement: "States Q4 2023 base margin as 17.2%" },
{ weight: 8.0, requirement: "Explicitly uses Shapley attribution for decomposition" }
]);
// From JSON string
const rubric = rubricFromJson('[{"weight": 10.0, "requirement": "Example requirement"}]');
// From YAML string
const yamlData = `
- weight: 10.0
requirement: "Example requirement"
`;
const rubric = rubricFromYaml(yamlData);
// From files
const rubric = rubricFromFile('rubric.json');
const rubric = rubricFromFile('rubric.yaml');JSON Format
[
{
"weight": 10.0,
"requirement": "States Q4 2023 base margin as 17.2%"
},
{
"weight": 8.0,
"requirement": "Explicitly uses Shapley attribution for decomposition"
},
{
"weight": -15.0,
"requirement": "Uses total deliveries instead of cash-only deliveries"
}
]YAML Format
- weight: 10.0
requirement: "States Q4 2023 base margin as 17.2%"
- weight: 8.0
requirement: "Explicitly uses Shapley attribution for decomposition"
- weight: -15.0
requirement: "Uses total deliveries instead of cash-only deliveries"Examples with Different Providers
OpenAI
import OpenAI from 'openai';
async function generateWithOpenAI(systemPrompt: string, userPrompt: string): Promise<string> {
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
max_tokens: 400,
temperature: 0.0,
});
return response.choices[0]?.message?.content || '';
}Anthropic
import Anthropic from '@anthropic-ai/sdk';
async function generateWithAnthropic(systemPrompt: string, userPrompt: string): Promise<string> {
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 400,
system: systemPrompt,
messages: [{ role: 'user', content: userPrompt }],
});
return response.content[0].type === 'text' ? response.content[0].text : '';
}OpenRouter
import OpenAI from 'openai';
async function generateWithOpenRouter(systemPrompt: string, userPrompt: string): Promise<string> {
const client = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
});
const response = await client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
});
return response.choices[0]?.message?.content || '';
}Local Models (Ollama)
import { Ollama } from 'ollama';
async function generateWithOllama(systemPrompt: string, userPrompt: string): Promise<string> {
const ollama = new Ollama();
const response = await ollama.chat({
model: 'llama3.1',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
});
return response.message.content;
}Requirements
- Node.js 16+
- TypeScript 5.0+ (optional, for TypeScript users)
- An LLM API (e.g., OpenAI, Anthropic, OpenRouter, local models)
Platform Support
Pre-built binaries are available for:
- macOS: x64, ARM64 (Apple Silicon)
- Linux: x64, ARM64 (glibc and musl)
- Windows: x64, ARM64
If a pre-built binary is not available for your platform, the package will compile from source during installation (requires Rust toolchain).
Building from Source
If you need to build from source:
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone the repository
git clone https://github.com/The-LLM-Data-Company/rubric.git
cd rubric/bindings/node
# Install dependencies and build
npm install
npm run build
# Run tests
npm testFor detailed publishing instructions, see NPM_PUBLISHING.md.
Performance
The Rust core provides significant performance benefits:
- Fast evaluation: Native Rust performance for rubric scoring
- Memory efficient: Minimal memory overhead compared to pure JavaScript
- Concurrent grading: Efficient parallel processing of multiple criteria
- Type safety: TypeScript definitions provide full type safety
Contributing
Contributions are welcome! Please see the main repository for contribution guidelines.
License
MIT License - see LICENSE file for details.
Related Projects
- Python bindings:
rubricon PyPI - Rust core:
rubric-core
Support
- GitHub Issues: Report bugs or request features
- Documentation: Full documentation
- NPM Package: @llmdata/rubric
