@vibeatlas/ship-reliability-sdk
v1.0.0
Published
Domain-agnostic framework for scoring ANY AI agent's output reliability
Downloads
22
Maintainers
Readme
@ship-protocol/reliability-sdk
Domain-agnostic framework for scoring ANY AI agent's output reliability.
SHIP Protocol scores code today. This SDK generalizes the methodology to score any agent output: code agents, customer service agents, legal agents, and more — through pluggable domain adapters.
Install
npm install @ship-protocol/reliability-sdkPython:
cd python && pip install -e .Quick Start
3-Line LangChain Integration
import { ShipReliability } from '@ship-protocol/reliability-sdk';
const ship = new ShipReliability({ domain: 'code' });
chain = chain.pipe(ship.asLangChainMiddleware());Direct Scoring
import { ShipReliability } from '@ship-protocol/reliability-sdk';
const ship = new ShipReliability({ domain: 'code' });
const score = await ship.score({
input: 'Write a REST API',
output: 'feat: add REST API with Express.js and error handling',
metadata: { language: 'typescript', tool: 'claude' },
});
console.log(score);
// => { score: 73, grade: 'B', confidence: 0.82, factors: {...}, domain: 'code', ... }Python
from ship_reliability import ShipReliability, ScoringInput
ship = ShipReliability(domain="code")
score = ship.score(ScoringInput(
input="Write a REST API",
output="feat: add REST API with Express.js",
metadata={"language": "typescript", "tool": "claude"}
))
print(f"Score: {score.score}, Grade: {score.grade}")Domains
code — Code Reliability (SHIP API)
Calls the live SHIP API /v2/score endpoint. Requires network access.
const ship = new ShipReliability({ domain: 'code' });
const score = await ship.score({
input: 'Write authentication',
output: 'feat: add JWT auth with bcrypt',
metadata: { language: 'typescript', tool: 'claude', repo: 'owner/repo' },
});Metadata fields:
language— Programming language (default:'typescript')tool— AI tool that generated the code (e.g.'claude','copilot')repo— Repository inowner/repoformatowner— Repository owner
customer-service — Customer Service Reliability
Heuristic scoring for customer service agent responses. Evaluates empathy, actionability, certainty, and channel fit.
const ship = new ShipReliability({ domain: 'customer-service' });
const score = await ship.score({
input: 'My order arrived damaged',
output: "I'm sorry about that. Here's how to get a replacement: go to Orders > Returns.",
metadata: { channel: 'chat', category: 'returns' },
});Factors scored: empathy, actionability, certainty, response_length, channel_fit
general — General Agent Reliability
Heuristic scoring for any text agent output. Works offline without API calls.
const ship = new ShipReliability({ domain: 'general' });
const score = await ship.score({
input: 'Explain closures in JavaScript',
output: 'A closure is a function that...',
});Factors scored: completeness, coherence, relevance, specificity, consistency
Custom Domain Adapters
Implement DomainAdapter to add scoring for any agent type:
import { ShipReliability, scoreToGrade } from '@ship-protocol/reliability-sdk';
import type { DomainAdapter, ScoringInput, ScoringConfig, ReliabilityScore } from '@ship-protocol/reliability-sdk';
class LegalDomainAdapter implements DomainAdapter {
readonly domain = 'legal';
readonly name = 'Legal Document Reliability';
validate(input: ScoringInput) {
return { valid: !!input.output, errors: input.output ? [] : ['output required'] };
}
async score(input: ScoringInput, config: ScoringConfig): Promise<ReliabilityScore> {
// Your domain-specific scoring logic here
const score = 75;
return {
score,
grade: scoreToGrade(score),
confidence: 0.6,
factors: { precision: 80, structure: 70 },
domain: this.domain,
timestamp: new Date().toISOString(),
modelVersion: 'legal-v1',
recommendations: [],
};
}
}
const ship = new ShipReliability({ adapter: new LegalDomainAdapter() });Integrations
LangChain
import { ShipReliability } from '@ship-protocol/reliability-sdk';
const ship = new ShipReliability({ domain: 'code' });
// Option 1: Pipe middleware
chain = chain.pipe(ship.asLangChainMiddleware());
// Option 2: Standalone middleware
import { createLangChainMiddleware } from '@ship-protocol/reliability-sdk';
const middleware = createLangChainMiddleware({ domain: 'general' });
// Option 3: Callback handler
import { ShipReliabilityCallback } from '@ship-protocol/reliability-sdk';
const callback = new ShipReliabilityCallback({ domain: 'code' });
await callback.onChainEnd(chainOutput);
console.log(callback.getAverageScore());Python LangChain:
from ship_reliability import ShipReliability
from langchain_core.runnables import RunnableLambda
ship = ShipReliability(domain="code")
chain = my_chain | RunnableLambda(ship.as_langchain_middleware())OpenAI
import { wrapOpenAIChat } from '@ship-protocol/reliability-sdk';
import OpenAI from 'openai';
const openai = new OpenAI();
const scoredChat = wrapOpenAIChat(
openai.chat.completions.create.bind(openai.chat.completions),
{ domain: 'general' }
);
const { result, reliability } = await scoredChat({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Write a function' }],
});
console.log(`Score: ${reliability.score}`);Anthropic
import { wrapAnthropicMessages } from '@ship-protocol/reliability-sdk';
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
const scoredMessages = wrapAnthropicMessages(
anthropic.messages.create.bind(anthropic.messages),
{ domain: 'general' }
);
const { result, reliability } = await scoredMessages({
model: 'claude-sonnet-4-6',
messages: [{ role: 'user', content: 'Write a function' }],
max_tokens: 1024,
});
console.log(`Score: ${reliability.score}`);ShipClient (Low-Level API Client)
Direct access to the SHIP Protocol API:
import { ShipClient } from '@ship-protocol/reliability-sdk';
const client = new ShipClient({
baseUrl: 'https://ship-protocol.dhruvaapi.workers.dev', // default
timeout: 10000,
retries: 2,
});
// Score a commit
const score = await client.score({
commit_message: 'feat: add auth',
language: 'typescript',
tool: 'claude',
});
// Detect AI-generated code
const detection = await client.detect({
code: 'function foo() { return bar; }',
language: 'javascript',
});
// Get tool leaderboard
const tools = await client.tools();
const leaderboard = await client.leaderboard();
const health = await client.health();API Reference
ShipReliability
Main class for domain-agnostic reliability scoring.
Constructor
new ShipReliability(config?: ShipReliabilityConfig)| Option | Type | Default | Description |
|--------|------|---------|-------------|
| domain | string | 'general' | Built-in domain: 'code', 'customer-service', 'general' |
| adapter | DomainAdapter | — | Custom domain adapter (overrides domain) |
| apiUrl | string | SHIP API URL | Base URL for API calls |
| timeout | number | 10000 | Request timeout in ms |
| retries | number | 2 | Number of retries on failure |
| cache | boolean | true | Enable LRU score caching |
| cacheSize | number | 100 | Max cached entries |
| cacheTtl | number | 300000 | Cache TTL in ms (5 min) |
Methods
| Method | Returns | Description |
|--------|---------|-------------|
| score(input) | Promise<ReliabilityScore> | Score an agent output |
| asLangChainMiddleware() | Function | Get a LangChain-compatible middleware |
| wrapOpenAI(fn) | Function | Wrap an OpenAI completion call |
| wrapAnthropic(fn) | Function | Wrap an Anthropic message call |
| clearCache() | void | Clear the score cache |
| domain | string | Current domain name |
ReliabilityScore
Universal score type returned by all adapters.
interface ReliabilityScore {
score: number; // 0-100
grade: string; // A+, A, B, C, D, F
confidence: number; // 0-1
factors: Record<string, number>;
domain: string;
timestamp: string; // ISO 8601
modelVersion: string;
recommendations: string[];
}DomainAdapter
Plugin interface for adding new agent types.
interface DomainAdapter {
readonly domain: string;
readonly name: string;
score(input: ScoringInput, config: ScoringConfig): Promise<ReliabilityScore>;
validate(input: ScoringInput): { valid: boolean; errors: string[] };
}ShipClient
Low-level HTTP client for the SHIP Protocol API.
| Method | Returns | Description |
|--------|---------|-------------|
| score(req) | Promise<ScoreResponse> | POST /v2/score |
| detect(req) | Promise<DetectResponse> | POST /v2/detect |
| tools() | Promise<ToolsResponse> | GET /v2/tools |
| leaderboard() | Promise<LeaderboardResponse> | GET /v2/leaderboard |
| health() | Promise<HealthResponse> | GET /v2/crawler-status |
Configuration
Environment
The SDK works with Node.js 18+ (uses native fetch). No external HTTP dependencies.
Caching
Built-in LRU cache prevents redundant API calls. Disable with cache: false or customize:
const ship = new ShipReliability({
domain: 'code',
cache: true,
cacheSize: 200, // Max entries
cacheTtl: 600000, // 10 min TTL
});Testing
# TypeScript
npm test # Run all tests
npx tsc --noEmit # Type check
# Python
cd python && python -m pytest tests/ -vExamples
See the examples/ directory:
score-code.ts— Score AI-generated code via SHIP APIlangchain-basic.ts— 3-line LangChain integrationcustom-domain.ts— Create a custom domain adapterlangchain-basic.py— Python LangChain integration
Run with: npx tsx examples/score-code.ts
SHIP Protocol API
The code domain adapter calls these endpoints:
| Endpoint | Method | Description |
|----------|--------|-------------|
| /v2/score | POST | Score a commit's reliability |
| /v2/detect | POST | Detect AI-generated code |
| /v2/tools | GET | AI tool comparison data |
| /v2/leaderboard | GET | Tool reliability rankings |
| /v2/crawler-status | GET | System health |
Base URL: https://ship-protocol.dhruvaapi.workers.dev
No auth required. Rate limit: 100 req/min.
License
MIT
