@vibeatlas/ship-reliability-sdk

v1.0.0

Published

3 months ago

Domain-agnostic framework for scoring ANY AI agent's output reliability

Downloads

0High
0Medium
0Low

vibeatlasadmin

ship-protocol reliability ai-agent scoring langchain openai anthropic code-quality

@ship-protocol/reliability-sdk

Domain-agnostic framework for scoring ANY AI agent's output reliability.

SHIP Protocol scores code today. This SDK generalizes the methodology to score any agent output: code agents, customer service agents, legal agents, and more — through pluggable domain adapters.

Install

npm install @ship-protocol/reliability-sdk

Python:

cd python && pip install -e .

Quick Start

3-Line LangChain Integration

import { ShipReliability } from '@ship-protocol/reliability-sdk';
const ship = new ShipReliability({ domain: 'code' });
chain = chain.pipe(ship.asLangChainMiddleware());

Direct Scoring

import { ShipReliability } from '@ship-protocol/reliability-sdk';

const ship = new ShipReliability({ domain: 'code' });
const score = await ship.score({
  input: 'Write a REST API',
  output: 'feat: add REST API with Express.js and error handling',
  metadata: { language: 'typescript', tool: 'claude' },
});

console.log(score);
// => { score: 73, grade: 'B', confidence: 0.82, factors: {...}, domain: 'code', ... }

Python

from ship_reliability import ShipReliability, ScoringInput

ship = ShipReliability(domain="code")
score = ship.score(ScoringInput(
    input="Write a REST API",
    output="feat: add REST API with Express.js",
    metadata={"language": "typescript", "tool": "claude"}
))
print(f"Score: {score.score}, Grade: {score.grade}")

Domains

`code` — Code Reliability (SHIP API)

Calls the live SHIP API /v2/score endpoint. Requires network access.

const ship = new ShipReliability({ domain: 'code' });
const score = await ship.score({
  input: 'Write authentication',
  output: 'feat: add JWT auth with bcrypt',
  metadata: { language: 'typescript', tool: 'claude', repo: 'owner/repo' },
});

Metadata fields:

language — Programming language (default: 'typescript')
tool — AI tool that generated the code (e.g. 'claude', 'copilot')
repo — Repository in owner/repo format
owner — Repository owner

`customer-service` — Customer Service Reliability

Heuristic scoring for customer service agent responses. Evaluates empathy, actionability, certainty, and channel fit.

const ship = new ShipReliability({ domain: 'customer-service' });
const score = await ship.score({
  input: 'My order arrived damaged',
  output: "I'm sorry about that. Here's how to get a replacement: go to Orders > Returns.",
  metadata: { channel: 'chat', category: 'returns' },
});

Factors scored: empathy, actionability, certainty, response_length, channel_fit

`general` — General Agent Reliability

Heuristic scoring for any text agent output. Works offline without API calls.

const ship = new ShipReliability({ domain: 'general' });
const score = await ship.score({
  input: 'Explain closures in JavaScript',
  output: 'A closure is a function that...',
});

Factors scored: completeness, coherence, relevance, specificity, consistency

Custom Domain Adapters

Implement DomainAdapter to add scoring for any agent type:

import { ShipReliability, scoreToGrade } from '@ship-protocol/reliability-sdk';
import type { DomainAdapter, ScoringInput, ScoringConfig, ReliabilityScore } from '@ship-protocol/reliability-sdk';

class LegalDomainAdapter implements DomainAdapter {
  readonly domain = 'legal';
  readonly name = 'Legal Document Reliability';

  validate(input: ScoringInput) {
    return { valid: !!input.output, errors: input.output ? [] : ['output required'] };
  }

  async score(input: ScoringInput, config: ScoringConfig): Promise<ReliabilityScore> {
    // Your domain-specific scoring logic here
    const score = 75;
    return {
      score,
      grade: scoreToGrade(score),
      confidence: 0.6,
      factors: { precision: 80, structure: 70 },
      domain: this.domain,
      timestamp: new Date().toISOString(),
      modelVersion: 'legal-v1',
      recommendations: [],
    };
  }
}

const ship = new ShipReliability({ adapter: new LegalDomainAdapter() });

Integrations

LangChain

import { ShipReliability } from '@ship-protocol/reliability-sdk';

const ship = new ShipReliability({ domain: 'code' });

// Option 1: Pipe middleware
chain = chain.pipe(ship.asLangChainMiddleware());

// Option 2: Standalone middleware
import { createLangChainMiddleware } from '@ship-protocol/reliability-sdk';
const middleware = createLangChainMiddleware({ domain: 'general' });

// Option 3: Callback handler
import { ShipReliabilityCallback } from '@ship-protocol/reliability-sdk';
const callback = new ShipReliabilityCallback({ domain: 'code' });
await callback.onChainEnd(chainOutput);
console.log(callback.getAverageScore());

Python LangChain:

from ship_reliability import ShipReliability
from langchain_core.runnables import RunnableLambda

ship = ShipReliability(domain="code")
chain = my_chain | RunnableLambda(ship.as_langchain_middleware())

OpenAI

import { wrapOpenAIChat } from '@ship-protocol/reliability-sdk';
import OpenAI from 'openai';

const openai = new OpenAI();
const scoredChat = wrapOpenAIChat(
  openai.chat.completions.create.bind(openai.chat.completions),
  { domain: 'general' }
);

const { result, reliability } = await scoredChat({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Write a function' }],
});
console.log(`Score: ${reliability.score}`);

Anthropic

import { wrapAnthropicMessages } from '@ship-protocol/reliability-sdk';
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();
const scoredMessages = wrapAnthropicMessages(
  anthropic.messages.create.bind(anthropic.messages),
  { domain: 'general' }
);

const { result, reliability } = await scoredMessages({
  model: 'claude-sonnet-4-6',
  messages: [{ role: 'user', content: 'Write a function' }],
  max_tokens: 1024,
});
console.log(`Score: ${reliability.score}`);

ShipClient (Low-Level API Client)

Direct access to the SHIP Protocol API:

import { ShipClient } from '@ship-protocol/reliability-sdk';

const client = new ShipClient({
  baseUrl: 'https://ship-protocol.dhruvaapi.workers.dev', // default
  timeout: 10000,
  retries: 2,
});

// Score a commit
const score = await client.score({
  commit_message: 'feat: add auth',
  language: 'typescript',
  tool: 'claude',
});

// Detect AI-generated code
const detection = await client.detect({
  code: 'function foo() { return bar; }',
  language: 'javascript',
});

// Get tool leaderboard
const tools = await client.tools();
const leaderboard = await client.leaderboard();
const health = await client.health();

API Reference

`ShipReliability`

Main class for domain-agnostic reliability scoring.

Constructor

new ShipReliability(config?: ShipReliabilityConfig)

| Option | Type | Default | Description | |--------|------|---------|-------------| | domain | string | 'general' | Built-in domain: 'code', 'customer-service', 'general' | | adapter | DomainAdapter | — | Custom domain adapter (overrides domain) | | apiUrl | string | SHIP API URL | Base URL for API calls | | timeout | number | 10000 | Request timeout in ms | | retries | number | 2 | Number of retries on failure | | cache | boolean | true | Enable LRU score caching | | cacheSize | number | 100 | Max cached entries | | cacheTtl | number | 300000 | Cache TTL in ms (5 min) |

Methods

| Method | Returns | Description | |--------|---------|-------------| | score(input) | Promise<ReliabilityScore> | Score an agent output | | asLangChainMiddleware() | Function | Get a LangChain-compatible middleware | | wrapOpenAI(fn) | Function | Wrap an OpenAI completion call | | wrapAnthropic(fn) | Function | Wrap an Anthropic message call | | clearCache() | void | Clear the score cache | | domain | string | Current domain name |

`ReliabilityScore`

Universal score type returned by all adapters.

interface ReliabilityScore {
  score: number;          // 0-100
  grade: string;          // A+, A, B, C, D, F
  confidence: number;     // 0-1
  factors: Record<string, number>;
  domain: string;
  timestamp: string;      // ISO 8601
  modelVersion: string;
  recommendations: string[];
}

`DomainAdapter`

Plugin interface for adding new agent types.

interface DomainAdapter {
  readonly domain: string;
  readonly name: string;
  score(input: ScoringInput, config: ScoringConfig): Promise<ReliabilityScore>;
  validate(input: ScoringInput): { valid: boolean; errors: string[] };
}

`ShipClient`

Low-level HTTP client for the SHIP Protocol API.

| Method | Returns | Description | |--------|---------|-------------| | score(req) | Promise<ScoreResponse> | POST /v2/score | | detect(req) | Promise<DetectResponse> | POST /v2/detect | | tools() | Promise<ToolsResponse> | GET /v2/tools | | leaderboard() | Promise<LeaderboardResponse> | GET /v2/leaderboard | | health() | Promise<HealthResponse> | GET /v2/crawler-status |

Configuration

Environment

The SDK works with Node.js 18+ (uses native fetch). No external HTTP dependencies.

Caching

Built-in LRU cache prevents redundant API calls. Disable with cache: false or customize:

const ship = new ShipReliability({
  domain: 'code',
  cache: true,
  cacheSize: 200,    // Max entries
  cacheTtl: 600000,  // 10 min TTL
});

Testing

# TypeScript
npm test                    # Run all tests
npx tsc --noEmit            # Type check

# Python
cd python && python -m pytest tests/ -v

Examples

See the examples/ directory:

score-code.ts — Score AI-generated code via SHIP API
langchain-basic.ts — 3-line LangChain integration
custom-domain.ts — Create a custom domain adapter
langchain-basic.py — Python LangChain integration

Run with: npx tsx examples/score-code.ts

SHIP Protocol API

The code domain adapter calls these endpoints:

| Endpoint | Method | Description | |----------|--------|-------------| | /v2/score | POST | Score a commit's reliability | | /v2/detect | POST | Detect AI-generated code | | /v2/tools | GET | AI tool comparison data | | /v2/leaderboard | GET | Tool reliability rankings | | /v2/crawler-status | GET | System health |

Base URL: https://ship-protocol.dhruvaapi.workers.dev No auth required. Rate limit: 100 req/min.

License

MIT