@weaveaijs/mcp-observatory
v0.3.0
Published
Observability and APM toolkit for MCP servers
Maintainers
Readme
MCP Observatory - Node.js Edition
A modern TypeScript/Node.js implementation of an observability and APM toolkit for MCP (Model Context Protocol) servers, featuring a two-phase execution pattern for high-risk tool calls.
Overview
MCP Observatory provides:
- Tracer & Observability: Track spans, tokens, costs, and latency for every operation
- Proposal/Commit Pattern: Two-phase execution for high-risk operations with deterministic fallbacks
- Risk Assessment: Hallucination scoring, tool risk categorization, and numeric variance detection
- Cost Tracking: Token estimation and model-aware pricing for various AI providers
- Storage Options: In-memory or PostgreSQL-backed proposal/commit storage
Architecture
┌─ Core Layer
│ ├─ Tracer: Span creation and lifecycle management
│ ├─ TraceContext: Span data model with metrics
│ └─ InvocationWrapper: Policy-driven execution wrapper
│
├─ Proposal/Commit Layer
│ ├─ ToolProposer: Risk scoring and proposal generation
│ ├─ TokenManager: HMAC-signed token issue/verify
│ ├─ CommitVerifier: Multi-stage verification with replay protection
│ └─ Storage: In-memory or PostgreSQL backends
│
├─ Assessment Layer
│ ├─ Hallucination Scoring: Output instability, grounding, variance
│ ├─ Risk Scoring: Tool classification by destructiveness/scope
│ └─ Hashing: Stable canonical JSON and prompt normalization
│
└─ Demo Layer
├─ MCPServer: Example server with propose/commit handlers
└─ MCPClient: Example client with dual-measurement supportQuick Start
Installation
npm install
npm run buildRunning the Demo
# Without database
npm run demo
# With PostgreSQL (optional)
export MCP_OBSERVATORY_PG_DSN='postgresql://user:pass@localhost:5432/postgres'
psql "$MCP_OBSERVATORY_PG_DSN" -f sql/schema.sql
npm run demoRunning Tests
npm testCore Concepts
Spans & Tracing
Track execution metrics for any operation:
import { Tracer } from 'mcp-observatory';
const tracer = new Tracer('my-service');
await tracer.withSpan(async (span) => {
span.inputTokens = 150;
span.outputTokens = 250;
span.costUsd = 0.0045;
// Your async operation here
}, { model: 'gpt-4o' });Two-Phase Execution
For high-risk operations, use propose/commit pattern:
import { ToolProposer, CommitVerifier } from 'mcp-observatory';
const proposer = new ToolProposer();
const verifier = new CommitVerifier();
// Phase 1: Propose (no side effects)
const proposal = await proposer.propose({
toolName: 'transfer_funds',
toolArgs: { amount: 1000, to: 'account_456' },
outputInstability: 0.3,
});
// Response includes commit token
console.log(proposal.status); // 'allowed' | 'blocked' | 'review'
// Phase 2: Commit (only if proposal token is valid)
if (proposal.commitToken) {
const verification = verifier.verify({
token: proposal.commitToken,
proposalId: proposal.proposalId,
toolName: 'transfer_funds',
toolArgs: { amount: 1000, to: 'account_456' },
});
if (verification.canExecute) {
// Execute the operation
}
}Risk Assessment
Compute composite risk scores from multiple signals:
import {
computeHallucinationRiskScore,
computeRiskScore,
categorizeRisk,
} from 'mcp-observatory';
// Hallucination risk
const hallucScore = computeHallucinationRiskScore({
outputInstability: 0.4,
groundingScore: 0.7,
numericVarianceScore: 0.1,
selfConsistencyScore: 0.9,
toolClaimMismatch: false,
});
// Tool risk categorization
const signals = categorizeRisk('delete_user', {});
const riskScore = computeRiskScore(signals);Cost Tracking
Estimate tokens and costs across models:
import { estimateTokens, estimateCost, getPricing } from 'mcp-observatory';
const text = 'Generate a deployment plan';
const tokens = estimateTokens(text, 'gpt-4o');
const cost = estimateCost(tokens, 'gpt-4o', 'model');
console.log(`Tokens: ${tokens}, Cost: $${cost.toFixed(4)}`);Advanced Features
Dual Measurement (Shadow Runs)
Compare two execution paths:
import { InvocationWrapper } from 'mcp-observatory';
const wrapper = new InvocationWrapper('my-service');
const result = await wrapper.invoke({
source: 'agent',
model: 'gpt-4o',
prompt: 'Analyze this request',
call: () => primaryCall(),
dualInvoke: true,
shadowCall: () => shadowCall(),
});
console.log(`Primary cost: $${result.span.costUsd}`);
console.log(`Shadow cost: $${result.shadowSpan?.costUsd}`);Storage Options
In-Memory (Default)
import { InMemoryProposalStorage } from 'mcp-observatory';
const storage = new InMemoryProposalStorage();PostgreSQL
import { PostgresProposalStorage } from 'mcp-observatory';
const storage = new PostgresProposalStorage(
process.env.MCP_OBSERVATORY_PG_DSN!
);
// Later
await storage.close();Module Reference
Core (src/core)
tracer.ts: Span creation and lifecyclecontext.ts: TraceContext data model with metricswrapper.ts: InvocationWrapper with policy decisioning
Proposal/Commit (src/proposal)
token.ts: TokenManager for HMAC-signed token issue/verifyproposer.ts: ToolProposer with risk-based decisioningverifier.ts: CommitVerifier with replay protectionstorage.ts: In-memory and PostgreSQL storage backends
Assessment (src/hallucination, src/risk)
scoring.ts(hallucination): Risk scoring from instability, grounding, variancescoring.ts(risk): Tool risk categorization by destructiveness/scope
Utilities (src/utils, src/cost)
hashing.ts: Stable canonical JSON and prompt normalizationtime.ts: Time utilitiestokenizer.ts: Token estimation with model-specific limitspricing.ts: Cost estimation for GPT and Claude models
Demo (src/demo)
server.ts: Example MCP server with propose/commit handlersclient.ts: Example client with dual-measurement support
Testing
npm testTests include:
- Tracer span creation and inheritance
- Token issuance, verification, and expiry
- Proposal scoring and status determination
- Hash stability and prompt normalization
- Risk and hallucination scoring
API Endpoints (Demo Server)
The demo server exposes:
transfer_funds_propose(amount, to)→ {proposalId, status, commitToken?}transfer_funds_commit(proposalId, commitToken, amount, to)→ {success, transactionId?}
Security Rules
Commit verification enforces:
- ✅ Token signature is valid
- ✅ Token not expired (5-minute default)
- ✅ Proposal exists and matches proposal_id
- ✅ Tool name matches token payload
- ✅ Args hash matches token payload (prevents tampering)
- ✅ Nonce not already used (replay protection)
Database Setup (Optional)
Apply schema to existing PostgreSQL database:
export MCP_OBSERVATORY_PG_DSN='postgresql://user:pass@localhost:5432/postgres'
psql "$MCP_OBSERVATORY_PG_DSN" -f sql/schema.sqlSchema includes:
proposals: Proposal records with status and expirationcommits: Execution tracking with timestampsnonces: Used nonces for replay preventiontool_prompt_baselines: Prompt drift baseline storagetraces: Full execution traces with span hierarchycleanup_expired_proposals(): Automated cleanup function
Environment Variables
MCP_OBSERVATORY_PG_DSN: PostgreSQL connection string (optional)NODE_ENV: Set toproductionfor optimized builds
Build & Development
npm run build # Compile TypeScript
npm run dev # Watch mode
npm run lint # Lint with ESLint
npm test # Run tests
npm run demo # Run demoLicense
Apache License 2.0 (see LICENSE file)
Architecture Preservation
This Node.js rewrite preserves the original Python architecture:
- ✅ Core tracer and span model
- ✅ Two-phase proposal/commit pattern
- ✅ HMAC-signed token verification
- ✅ Multi-stage risk scoring
- ✅ Storage abstraction (in-memory + PostgreSQL)
- ✅ Hallucination detection signals
- ✅ Cost and token tracking
- ✅ Demo applications
- ✅ Test coverage
The module structure and API interfaces remain consistent, enabling code reuse and knowledge transfer between Python and Node.js implementations.
