@mankinds/sdk

v1.0.1

Published

5 months ago

TypeScript SDK for Mankinds AI Evaluation API

0High
0Medium
0Low

mankinds

mankinds ai evaluation api sdk

Evaluate AI system with automated tests.

Register an AI system, optionally attach connectors (logs, databases), import or generate your golden dataset, and run automated evaluations covering privacy, security, performance, fairness, explainability, transparency and accountability.

Features

System Management — Create, update, and configure AI systems with custom API endpoints
Endpoint Configuration — Support for REST, SSE streaming, and multi-turn conversations
Dataset Generation — Auto-generate or provide custom test scenarios
Evaluation — Run evaluations with real-time polling and configurable profiles
Connectors — Attach data sources (log files, Datadog, SQLite, PostgreSQL)
Error Handling — Typed exceptions for all error cases

Documentation

Mankinds Documentation

Requirements

Node.js ≥ 16

Installation

npm install @mankinds/sdk

Usage

The SDK follows a simple 3-step workflow: create a system, generate test data, run an evaluation.

Initialize the Client

import { MankindsClient } from "@mankinds/sdk";

const client = new MankindsClient("mk_...");

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | apiKey | string | Yes | — | Your API key | | baseUrl | string | No | https://app.mankinds.io | Custom API base URL | | timeout | number | No | 120 | Request timeout in seconds |

Create an AI System

Register your AI system by providing its name, description, and API endpoint. The endpoint defines how your AI is called during evaluation.

const system = await client.createSystem(
  "Customer Support Bot",
  "A chatbot that handles order inquiries and returns for an e-commerce platform.",
  {
    url: "https://api.example.com/chat",
    method: "POST",
    headers: { Authorization: "Bearer your-token" },
    body: { message: "{{input}}" },
    response: { answer: "{{output}}" },
  }
);

const systemId = system.id;

Use {{input}} in the request body and {{output}} in the response mapping so test inputs and expected outputs are bound during evaluation.

Endpoint Configuration

The endpoint defines how to call your AI system during evaluation.

| Field | Type | Required | Description | |-------|------|----------|-------------| | url | string | Yes | API endpoint URL | | method | string | Yes | HTTP method (POST, GET, etc.) | | body | object | Yes | Request body with {{input}} placeholder | | response | object | Yes | Response mapping with {{output}} placeholder | | headers | object | No | HTTP headers | | streaming | object | No | SSE streaming configuration | | multiturn | object | No | Multi-turn conversation configuration |

Placeholders:

{{input}} in body: replaced with test inputs during evaluation
{{output}} in response: indicates which field contains the AI response

{
  body: { message: "{{input}}" },
  response: { answer: "{{output}}" }
}

Streaming (SSE):

{
  url: "https://api.example.com/chat",
  method: "POST",
  body: { message: "{{input}}" },
  response: { answer: "{{output}}" },
  streaming: {
    enabled: true,
    format: "openai", // "openai" | "anthropic" | "custom"
    content_path: "choices[0].delta.content",
  },
}

Multi-turn conversations:

{
  url: "https://api.example.com/chat",
  method: "POST",
  body: { message: "{{input}}", session_id: "{{session}}" },
  response: { answer: "{{output}}" },
  multiturn: {
    type: "session_id", // "none" | "session_id" | "history"
    field: "conversation_id",
    location: "body",
  },
}

Generate Evaluation Dataset

Test scenarios can be auto-generated based on your system description, or you can provide custom scenarios.

Auto-generate scenarios:

const dataset = await client.generateDataset(systemId, 20);

Provide custom scenarios:

const dataset = await client.generateDataset(systemId, 10, [
  { input: "Where is my order?", outputs: ["I can help you track your order."] },
  { input: "I want a refund", outputs: ["I'll process your refund request."] },
]);

Refine an existing dataset:

const dataset = await client.updateDataset(systemId, {
  orientation: "Add more edge cases about payment failures",
});

Note: generateDataset requires a validated system description. If validation fails, a DescriptionNotValidatedError is thrown with recommendations.

Run Evaluation

Start an evaluation to test your AI system. By default, the call blocks until the evaluation completes.

Block until complete (default):

const result = await client.evaluate(systemId);
console.log(`Score: ${result.summary}`);

Start without waiting:

const runInfo = await client.evaluate(systemId, { wait: false });
const runId = runInfo.run_id;

// Check status later
const result = await client.getEvaluation(runId);
console.log(`Status: ${result.status}`);

With specific thematics:

const result = await client.evaluate(systemId, {
  thematicsConfig: {
    explainability: { justification: { nb_tests: 5 } },
    robustness: { prompt_injection: { nb_tests: 10 } },
  },
});

With evaluation profile:

const result = await client.evaluate(systemId, { profile: "extended" });

With progress callback:

const result = await client.evaluate(systemId, {
  pollInterval: 10,
  onPoll: (status, elapsed) => console.log(`  ${status} (${elapsed}s)`),
});

Connectors

Connectors attach external data sources (logs, databases) to your system for richer evaluation context.

File logs:

import { FileConnector } from "@mankinds/sdk";

const connector = new FileConnector({ filePath: "/path/to/logs.json" });
await client.addConnector(systemId, connector);

Datadog logs:

import { DatadogConnector } from "@mankinds/sdk";

const connector = new DatadogConnector({
  apiKey: "dd-api-key",
  appKey: "dd-app-key",
  site: "datadoghq.eu", // default
});
await client.addConnector(systemId, connector);

SQLite database:

import { SqliteConnector } from "@mankinds/sdk";

const connector = new SqliteConnector({ filePath: "/path/to/database.db" });
await client.addConnector(systemId, connector);

PostgreSQL database:

import { PostgresqlConnector } from "@mankinds/sdk";

const connector = new PostgresqlConnector({
  host: "localhost",
  database: "mydb",
  user: "admin",
  password: "secret",
  port: 5432,
});
await client.addConnector(systemId, connector);

Manage connectors:

// List all connectors
const connectors = await client.getConnectors(systemId);

// Update a connector
const connector = new FileConnector({ filePath: "/path/to/new-logs.json" });
await client.updateConnector(systemId, connector);

// Remove a connector
await client.deleteConnector(systemId, connector);

Only one connector per category (logs, database) is allowed per system. Adding a duplicate throws ConnectorAlreadyExistsError.

Complete Example

import { MankindsClient, FileConnector } from "@mankinds/sdk";

const client = new MankindsClient("mk_...");

// Create system
const system = await client.createSystem(
  "Support Bot",
  "A customer support chatbot for order tracking and returns.",
  {
    url: "https://api.example.com/chat",
    method: "POST",
    body: { message: "{{input}}" },
    response: { answer: "{{output}}" },
  }
);
const systemId = system.id;

// Attach production logs
const connector = new FileConnector({ filePath: "./logs/production.json" });
await client.addConnector(systemId, connector);

// Generate dataset and evaluate
const dataset = await client.generateDataset(systemId, 15);
const result = await client.evaluate(systemId, { profile: "extended" });

console.log(`Status: ${result.status}`);
console.log(`Score: ${JSON.stringify(result.summary)}`);

API Reference

MankindsClient

| Method | Description | |--------|-------------| | getSystem(systemId) | Get system details and configuration | | createSystem(name, description, endpoint) | Create a new AI system | | updateSystem(systemId, options) | Update an existing system | | generateDataset(systemId, numScenarios?, scenarios?) | Generate and validate evaluation scenarios | | updateDataset(systemId, options) | Refine or replace dataset scenarios | | evaluate(systemId, options?) | Run an evaluation | | getEvaluation(runId) | Get evaluation status and results | | addConnector(systemId, connector) | Add a data source connector | | getConnectors(systemId) | List all connectors for a system | | updateConnector(systemId, connector) | Update a connector | | deleteConnector(systemId, connector) | Remove a connector |

Types

The SDK exports all interfaces:

import type {
  EndpointConfig,
  StreamingConfig,
  MultiturnConfig,
  ScenarioInput,
  ThematicsConfig,
  SystemDetails,
  Dataset,
  EvaluationResult,
  ConnectorInfo,
} from "@mankinds/sdk";

Exceptions

| Exception | When Thrown | |-----------|------------| | CredentialsError | Missing API key | | AuthenticationError | Invalid or expired API key (401) | | NotFoundError | Resource not found (404) | | ValidationError | Request validation failed (422) | | RateLimitError | Too many requests (429) | | ServerError | Server error (5xx) | | InvalidEndpointError | Endpoint missing required fields | | EndpointNotConfiguredError | Evaluation without endpoint | | DescriptionNotValidatedError | Dataset generation before validation | | ConnectorAlreadyExistsError | Duplicate connector category |

import {
  AuthenticationError,
  InvalidEndpointError,
  DescriptionNotValidatedError,
} from "@mankinds/sdk";

try {
  const result = await client.evaluate(systemId);
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error("Invalid API key");
  } else if (error instanceof InvalidEndpointError) {
    console.error("Missing fields:", error.missingFields);
  } else if (error instanceof DescriptionNotValidatedError) {
    console.error("Fix description:", error.recommendations);
  }
}

License

MIT