mca-sdk

v0.4.2

Published

3 months ago

Model Collector Agent SDK - Node.js OpenTelemetry instrumentation for ML model monitoring

0High
0Medium
0Low

toluweebhsf

opentelemetry observability ml-monitoring healthcare telemetry instrumentation

MCA SDK - Node.js

OpenTelemetry instrumentation library for ML model monitoring in Node.js/TypeScript.

Features

✅ OpenTelemetry-based metrics, logs, and traces
✅ OTLP/HTTP export to collector (port 4318)
✅ Registry client with caching and background refresh
✅ TypeScript support with full type definitions
✅ Async/await patterns with Promise-based APIs
✅ Context manager for automatic cleanup
✅ Multi-source configuration (kwargs > env > YAML)
✅ HTTPS enforcement for production endpoints
✅ Exponential backoff retry logic
✅ NEW: Agentic AI tracking (goals, tools, human interventions)
✅ NEW: Mid-lifecycle flush() for batch processing
✅ NEW: Advanced error tracking with recordError()
✅ NEW: Model evaluation metrics (MAE, RMSE)
✅ NEW: Metric filtering to drop system metrics
✅ NEW: Policy violation tracking

Installation

npm install mca-sdk

Quick Start

import { withMCAClient } from 'mca-sdk';

await withMCAClient(
  {
    serviceName: 'my-model',
    modelId: 'mdl-001',
    modelVersion: '1.0',
    teamName: 'data-science',
    modelType: 'internal',
    collectorEndpoint: 'http://localhost:4318',
  },
  async (client) => {
    // Record prediction
    await client.recordPrediction({
      latency: 0.15,
      inputSize: 100,
      outputSize: 1,
    });
  }
);
// Client automatically shuts down

Manual Usage

import { MCAClient } from 'mca-sdk';

const client = await MCAClient.create({
  serviceName: 'my-model',
  modelId: 'mdl-001',
  modelVersion: '1.0',
  teamName: 'data-science',
  collectorEndpoint: 'http://localhost:4318',
});

await client.recordPrediction({ latency: 0.15 });
await client.shutdown();

Configuration

Priority order: Constructor args > Environment variables > YAML config > Defaults

Environment Variables

export MCA_SERVICE_NAME=my-model
export MCA_MODEL_ID=mdl-001
export MCA_MODEL_VERSION=1.0
export MCA_TEAM_NAME=data-science
export MCA_OTEL_ENDPOINT=http://localhost:4318
export MCA_REGISTRY_URL=https://registry.example.com
export MCA_REGISTRY_TOKEN=secret-token
export MCA_FILTER_SYSTEM_METRICS=true  # Drop process.*, nodejs.* metrics (default: true)

API Reference

MCAClient

class MCAClient {
  static create(config: MCAClientConfig): Promise<MCAClient>

  // Core prediction tracking
  recordPrediction(options: RecordPredictionOptions): Promise<void>
  recordMetric(name: string, value: number, attributes?: object): Promise<void>

  // Telemetry management
  flush(timeoutMs?: number): Promise<boolean>
  shutdown(timeoutMs?: number): Promise<boolean>

  // Agentic AI tracking (NEW in v0.3.0)
  recordGoalStarted(description: string, type?: string, attributes?: Attributes): string
  recordGoalCompleted(goalId: string, status?: 'success'|'failure'|'partial', attributes?: Attributes): void
  trackTool<T>(toolName: string, fn: () => Promise<T>, goalId?: string, attributes?: Attributes): Promise<T>
  recordHumanIntervention(reason: string, waitTime: number, type?: string, timestamp?: number, attributes?: Attributes): void
  recordPolicyViolation(policyType: string): void

  // Advanced tracking
  recordError(error: Error, latency?: number, predictionId?: string, attributes?: Attributes): string
  recordEvaluation(options: { mae?: number, rmse?: number, dataset?: string, attributes?: Attributes }): void
  recordCounter(name: string, value?: number, attributes?: Attributes): void
  recordGauge(name: string, value: number, attributes?: Attributes): void

  // Accessors
  readonly meter: Meter
  readonly tracer: Tracer
  readonly logger: Logger
  readonly thresholdsConfig: Record<string, number>
}

withMCAClient

function withMCAClient<T>(
  config: MCAClientConfig,
  fn: (client: MCAClient) => Promise<T>
): Promise<T>

Examples

Agentic AI Tracking

Track agent goals with automatic span hierarchy:

import { MCAClient } from 'mca-sdk';

const client = await MCAClient.create({
  serviceName: 'research-agent',
  modelId: 'agt-001',
  teamName: 'clinical-ai',
});

// Start a goal
const goalId = client.recordGoalStarted(
  'Research diabetes treatment options',
  'research',
  { priority: 'high' }
);

try {
  // Track tool executions
  const results = await client.trackTool(
    'search_database',
    async () => {
      return await database.search('diabetes treatments');
    },
    goalId,
    { query: 'diabetes' }
  );

  // Process results with another tool
  const analysis = await client.trackTool(
    'analyze_results',
    async () => {
      return await analyzeData(results);
    },
    goalId
  );

  // Record human intervention if needed
  if (analysis.requiresClarification) {
    client.recordHumanIntervention(
      'Results ambiguous - need expert review',
      45.2,
      'clarification'
    );
  }

  // Complete goal
  client.recordGoalCompleted(goalId, 'success', {
    results_count: results.length
  });
} catch (error) {
  client.recordGoalCompleted(goalId, 'failure', {
    error: error.message
  });
}

await client.shutdown();

Mid-Lifecycle Flush

Flush telemetry after batch processing:

const client = await MCAClient.create({ serviceName: 'batch-processor' });

for (const batch of batches) {
  await processBatch(batch, client);

  // Flush after each batch to ensure visibility
  const success = await client.flush(5000);
  if (!success) {
    console.warn('Flush timed out - telemetry may be delayed');
  }
}

await client.shutdown();

Error Tracking

Track errors with full context:

try {
  const prediction = await model.predict(features);
  await client.recordPrediction({ latency: 0.15 });
} catch (error) {
  const sanitizedMessage = client.recordError(
    error,
    0.15,
    'pred-123',
    { model_version: '2.0' }
  );
  // sanitizedMessage has sensitive info masked
  throw error;
}

Model Evaluation

Record evaluation metrics:

client.recordEvaluation({
  mae: 0.15,
  rmse: 0.23,
  dataset: 'test',
  attributes: { model_version: '2.0' }
});

Metric Filtering

Drop expensive system metrics:

const client = await MCAClient.create({
  serviceName: 'my-model',
  filterSystemMetrics: true,  // Drop process.*, nodejs.*, etc.
});

Registry Integration

The Node.js SDK supports two patterns for registry integration:

Explicit Fetch (Python-style)

Use when registry data is mandatory:

import { RegistryClient } from 'mca-sdk/registry';

const registryClient = new RegistryClient(registryUrl, token);
const modelConfig = await registryClient.fetchModelConfig(modelId);

const client = await MCAClient.create({
  serviceName: modelConfig.serviceName,
  teamName: modelConfig.teamName,
  modelId: modelId,
  // ... other config
});

Behavior: Fails immediately if registry is unavailable (blocking).

Implicit Fetch (Node.js Enhancement)

Use when registry provides optional enhancements:

const client = await MCAClient.create({
  serviceName: 'my-service',
  modelId: 'mdl-001',
  teamName: 'my-team',
  registryUrl: 'https://registry.example.com',  // Optional
  strictValidation: false,  // Graceful degradation
  // ... other config
});

Behavior:

Fetches config from registry in background
Logs warning if fetch fails but continues
Automatically refreshes config periodically (if refreshIntervalSecs > 0)

See TESTING_WORKBENCH.md for detailed examples and comparison with Python SDK.

Testing

See TESTING_WORKBENCH.md for complete testing instructions.

# Unit tests
npm test

# Integration test (requires OTel Collector)
npx ts-node test-integration-v0.4.ts

# Registry integration test
npx ts-node test-registry-integration.ts

# Performance test
npx ts-node test-performance.ts

Requirements

Node.js 16+
OpenTelemetry Collector running on port 4318 (for production use)
Peer dependency: @opentelemetry/api@^1.9.0 (installed automatically by npm 7+)

License

MIT

API Parity with Python SDK v0.8.x

This Node.js SDK v0.3.0 provides full feature parity with Python MCA SDK v0.8.6:

Core Features:

✅ Prediction tracking (recordPrediction)
✅ Custom metrics (recordMetric, recordCounter, recordGauge)
✅ Mid-lifecycle flush (flush())

Agentic AI Tracking (v0.8.0+):

✅ Goal tracking (recordGoalStarted, recordGoalCompleted)
✅ Tool execution tracking (trackTool)
✅ Human intervention tracking (recordHumanIntervention)
✅ Policy violation tracking (recordPolicyViolation)
✅ Background cleanup for stale goals (1-hour timeout)

Advanced Tracking:

✅ Error tracking with span (recordError)
✅ Model evaluation metrics (recordEvaluation - MAE, RMSE)
✅ Metric filtering (filterSystemMetrics - v0.8.4)

Metric Names (Consistent Across SDKs):

model.predictions_total, model.latency_seconds, model.errors_total
model.mae, model.rmse
agentic.goals_started_total, agentic.goals_completed_total, agentic.goals_failed_total
agentic.goal_duration_seconds, agentic.tool_calls_total, agentic.tool_latency_seconds
agentic.human_interventions_total, agentic.policy_violations_total

Resource Attributes:

service.name, model.id, model.version, team.name, model.type

See Python SDK documentation for cross-language compatibility.