npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@llm-dev-ops/sentinel-cli

v0.1.2

Published

Enterprise-grade anomaly detection and observability platform for LLM applications - NPM CLI wrapper

Downloads

298

Readme

LLM-Sentinel

Enterprise-Grade Anomaly Detection and Observability for LLM Applications

Rust License Build Kubernetes Helm

LLM-Sentinel is a production-ready, real-time anomaly detection and observability platform designed specifically for Large Language Model (LLM) applications. Built in Rust for maximum performance and reliability, it provides comprehensive monitoring, statistical anomaly detection, automated alerting, and deep observability to ensure the reliability, security, and cost-effectiveness of your LLM deployments.

Status: ✅ Production Ready - Full implementation with enterprise-grade deployment infrastructure

Table of Contents

Key Features

🔍 Advanced Multi-Algorithm Anomaly Detection

  • Z-Score Detection: Parametric anomaly detection for normally distributed metrics with configurable thresholds (default: 3.0σ)
  • IQR Detection: Non-parametric outlier detection using interquartile range (default: 1.5x multiplier)
  • MAD Detection: Robust outlier detection using median absolute deviation (default: 3.5 threshold)
  • CUSUM Detection: Cumulative sum change point detection for drift and regime shifts (default: 5.0 threshold, 0.5 drift)
  • Multi-Dimensional Baselines: Per-service, per-model statistical baselines with automatic updates
  • Configurable Sensitivity: Tune detection sensitivity for your specific use cases

📊 Comprehensive Monitoring

Monitor all critical LLM metrics:

  • Latency Spikes: Detect unusual response times (P50, P95, P99)
  • Token Usage Anomalies: Monitor prompt and completion token consumption patterns
  • Cost Anomalies: Track unexpected spending patterns and budget overruns
  • Error Rate Spikes: Identify service degradation and failures
  • Model Drift: Detect quality degradation over time
  • Usage Patterns: Identify suspicious or abnormal usage behavior
  • Throughput Changes: Monitor request rate variations

🚀 High Performance Architecture

  • 10,000+ events/second ingestion throughput
  • Sub-5ms P50 latency for anomaly detection
  • Lock-free concurrent baseline updates using DashMap
  • Batch processing for optimal InfluxDB writes (100-record batches)
  • Zero-copy parsing for maximum efficiency
  • Async/await throughout for non-blocking I/O
  • Memory-efficient streaming processing

🛡️ Production-Grade Reliability

  • Zero unsafe code: Memory safety guaranteed by Rust's type system
  • Comprehensive error handling: Type-safe Result propagation with detailed error context
  • Graceful shutdown: Proper signal handling (SIGTERM, SIGINT) with resource cleanup
  • Health checks: Liveness, readiness, and startup probes for Kubernetes
  • Circuit breakers: Automatic failure detection and recovery
  • Exponential backoff: Intelligent retry logic for transient failures
  • Connection pooling: Efficient resource management

🔔 Flexible Alerting System

  • RabbitMQ Integration: Topic-based routing with severity levels (info, warning, critical)
  • Webhook Delivery: HTTP POST with HMAC-SHA256 signatures for verification
  • Alert Deduplication: Configurable 5-minute window to prevent alert storms
  • Retry Logic: Exponential backoff with configurable max attempts (default: 3)
  • Priority Routing: Route critical alerts to different channels
  • Batch Alerting: Optional batching for high-volume scenarios

💾 Scalable Storage & Caching

  • InfluxDB v3: Time-series storage for telemetry with automatic downsampling
  • Moka Cache: High-performance in-memory cache (10,000 entry capacity)
  • Redis Support: Distributed caching for multi-instance deployments
  • Persistent Baselines: Baseline persistence to disk for quick restarts
  • Query API: REST endpoints for historical data retrieval and analysis
  • TTL Management: Automatic expiration of stale data (300s default)

📈 Rich Observability

  • 50+ Prometheus Metrics: Comprehensive instrumentation of all subsystems
  • 4 Pre-built Grafana Dashboards:
    • Anomaly Detection Dashboard
    • System Health Dashboard
    • Performance Metrics Dashboard
    • Alert Overview Dashboard
  • 50+ Alert Rules: Production-ready Prometheus alerting covering all failure modes
  • Distributed Tracing: OpenTelemetry support for request tracing
  • Structured Logging: JSON logs with configurable levels (trace, debug, info, warn, error)

☸️ Cloud-Native Deployment

  • Kubernetes-Ready: Production manifests with HPA, PDB, and network policies
  • Helm Chart: Parameterized chart for easy deployment and upgrades
  • Docker Images: Multi-stage builds with minimal attack surface (<50MB)
  • Horizontal Scaling: Support for 3-10+ replicas with auto-scaling
  • StatefulSet Support: Optional for baseline persistence
  • Service Mesh Compatible: Works with Istio, Linkerd, Consul Connect

🔒 Security & Compliance

  • Non-Root Containers: Runs as UID 1000 with dropped capabilities
  • Read-Only Filesystem: Root filesystem mounted read-only
  • Network Policies: Restrict ingress/egress to required services only
  • Secret Management: Support for Kubernetes secrets and external secret stores
  • PII Sanitization: Automatic detection and removal of sensitive data
  • Audit Logging: Complete audit trail of all anomalies and alerts
  • SBOM Generation: Software Bill of Materials for vulnerability tracking

Detection Capabilities

Real-Time Anomaly Detection

LLM-Sentinel provides four complementary detection algorithms that can be enabled individually or in combination:

1. Z-Score Detection (Parametric)

Best for: Metrics that follow a normal distribution

detection:
  zscore:
    threshold: 3.0           # Standard deviations from mean
    sensitivity: "medium"     # low, medium, high
    metrics:
      - latency_ms
      - total_tokens
      - cost_usd

Use Cases:

  • Latency spike detection
  • Token usage monitoring
  • Cost anomaly detection

2. IQR Detection (Non-Parametric)

Best for: Metrics with skewed distributions or outliers

detection:
  iqr:
    multiplier: 1.5          # IQR multiplier for outliers
    metrics:
      - latency_ms
      - total_tokens

Use Cases:

  • Robust outlier detection
  • Handling non-normal distributions
  • Resistant to extreme values

3. MAD Detection (Robust)

Best for: Metrics requiring high robustness to outliers

detection:
  mad:
    threshold: 3.5           # MAD threshold
    metrics:
      - latency_ms

Use Cases:

  • Ultra-robust detection
  • Minimal false positives
  • Gradual baseline shifts

4. CUSUM Detection (Change Point)

Best for: Detecting sustained changes and drift

detection:
  cusum:
    threshold: 5.0           # Cumulative threshold
    drift: 0.5              # Drift parameter
    metrics:
      - latency_ms

Use Cases:

  • Model performance degradation
  • Service quality drift
  • Gradual system changes

Baseline Management

  • Adaptive Baselines: Automatic baseline updates every 60 seconds
  • Multi-Dimensional: Separate baselines per service, model, and metric
  • Configurable Window: 1000-sample sliding window (configurable)
  • Minimum Samples: Require 10+ samples before detection (prevents cold-start false positives)
  • Persistence: Save/load baselines from disk for fast restarts

Deployment Options

Option 1: Docker Compose (Development)

Perfect for local development and testing:

# Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)
docker-compose up -d

# View logs
docker-compose logs -f sentinel

# Stop environment
docker-compose down

Includes:

  • Sentinel (3 replicas)
  • Kafka cluster (3 brokers)
  • InfluxDB v3
  • RabbitMQ with management UI
  • Redis
  • Prometheus
  • Grafana with pre-loaded dashboards

Option 2: Kubernetes (Production)

Production-grade deployment with all manifests:

# Apply all manifests
kubectl apply -f k8s/

# Or using kustomize
kubectl apply -k k8s/

# Check status
kubectl get pods -l app.kubernetes.io/name=sentinel

Includes:

  • Deployment with 3 replicas
  • HorizontalPodAutoscaler (3-10 replicas, CPU/memory targets)
  • PodDisruptionBudget (min 2 available)
  • NetworkPolicy (restricted ingress/egress)
  • ServiceMonitor (Prometheus integration)
  • Ingress with TLS

Option 3: Helm Chart (Recommended)

Easiest production deployment with full parameterization:

# Install with default values
helm install sentinel ./helm/sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"

# Install with custom values
helm install sentinel ./helm/sentinel -f production-values.yaml

# Upgrade
helm upgrade sentinel ./helm/sentinel

# Uninstall
helm uninstall sentinel

Helm Chart Features:

  • Full configuration parameterization (335+ options)
  • External secrets support (AWS Secrets Manager, Vault, etc.)
  • Multiple environment profiles (dev, staging, prod)
  • Init containers for dependency checking
  • Automatic ConfigMap/Secret generation
  • Post-install notes with helpful commands

See Helm Chart README for complete documentation.

Option 4: Binary (Bare Metal)

Direct installation on Linux servers:

# Build release binary
cargo build --release

# Install
sudo cp target/release/sentinel /usr/local/bin/

# Create config
sudo mkdir -p /etc/sentinel
sudo cp config/sentinel.yaml /etc/sentinel/

# Create systemd service
sudo cp deployments/systemd/sentinel.service /etc/systemd/system/
sudo systemctl enable sentinel
sudo systemctl start sentinel

Quick Start

Prerequisites

  • Rust 1.75+ (for building from source)
  • Kafka 2.8+ (message queue)
  • InfluxDB v3 (time-series database)
  • RabbitMQ 3.8+ (optional, for alerting)
  • Redis 6.0+ (optional, for distributed caching)

Local Development Setup

# Clone repository
git clone https://github.com/llm-devops/llm-sentinel.git
cd llm-sentinel

# Start infrastructure
docker-compose up -d kafka influxdb rabbitmq redis

# Build project
cargo build --release

# Run tests
cargo test --workspace

# Start service
export INFLUXDB_TOKEN="your-token"
export RABBITMQ_PASSWORD="your-password"
./target/release/sentinel --config config/sentinel.yaml

Kubernetes Quick Start

# Create namespace
kubectl create namespace sentinel

# Create secrets
kubectl create secret generic sentinel-secrets \
  --from-literal=influxdb-token="your-token" \
  --from-literal=rabbitmq-password="your-password" \
  -n sentinel

# Deploy with Helm
helm install sentinel ./helm/sentinel \
  --namespace sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"

# Verify deployment
kubectl get pods -n sentinel
kubectl logs -f deployment/sentinel -n sentinel

# Access dashboards
kubectl port-forward svc/sentinel 8080:8080 -n sentinel
open http://localhost:8080/metrics

Verify Installation

# Check health
curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready

# View metrics
curl http://localhost:8080/metrics

# Query recent anomalies
curl http://localhost:8080/api/v1/anomalies/recent?limit=10

# Query telemetry
curl "http://localhost:8080/api/v1/telemetry?service=chat-api&hours=1"

Architecture

High-Level Architecture

┌────────────────────────────────────────────────────────────────────┐
│                         LLM Applications                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│  │ OpenAI   │  │ Claude   │  │ Llama    │  │ Custom   │         │
│  │ API      │  │ API      │  │ API      │  │ LLM API  │         │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘         │
└───────┼─────────────┼─────────────┼──────────────┼────────────────┘
        │             │             │              │
        └─────────────┴─────────────┴──────────────┘
                      │
                      ▼ (Telemetry Events)
        ┌─────────────────────────────┐
        │      Apache Kafka           │
        │   (llm.telemetry topic)     │
        └─────────────┬───────────────┘
                      │
                      ▼
┌────────────────────────────────────────────────────────────────────┐
│                         LLM-SENTINEL                               │
├────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │               Ingestion Layer (sentinel-ingestion)           │ │
│  │  ┌──────────────┐    ┌──────────────┐    ┌───────────────┐ │ │
│  │  │   Kafka      │───>│  OTLP/JSON   │───>│  Validation   │ │ │
│  │  │   Consumer   │    │   Parsing    │    │  & PII Filter │ │ │
│  │  └──────────────┘    └──────────────┘    └───────┬───────┘ │ │
│  └────────────────────────────────────────────────────┼─────────┘ │
│                                                        │           │
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │          Detection Engine (sentinel-detection)               │ │
│  │  ┌──────────────────────────────────────────────────────┐   │ │
│  │  │           Baseline Manager (Multi-Dimensional)        │   │ │
│  │  │  Per Service × Model × Metric Statistical Baselines  │   │ │
│  │  └─────────────────────┬────────────────────────────────┘   │ │
│  │                        │                                     │ │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌──────────┐│ │
│  │  │  Z-Score  │  │    IQR    │  │    MAD    │  │  CUSUM   ││ │
│  │  │ Detector  │  │ Detector  │  │ Detector  │  │ Detector ││ │
│  │  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘  └────┬─────┘│ │
│  │        └──────────────┴───────────────┴──────────────┘      │ │
│  │                        │ (Anomalies)                         │ │
│  └────────────────────────┼─────────────────────────────────────┘ │
│                           │                                       │
│  ┌────────────────────────┼─────────────────────────────────────┐ │
│  │                        ▼                                      │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │ │
│  │  │  InfluxDB    │  │  RabbitMQ    │  │  Webhook     │      │ │
│  │  │  Storage     │  │  Alerting    │  │  Delivery    │      │ │
│  │  │              │  │              │  │              │      │ │
│  │  │ • Telemetry  │  │ • Topic      │  │ • HMAC       │      │ │
│  │  │ • Anomalies  │  │   Routing    │  │   Signature  │      │ │
│  │  │ • Query API  │  │ • Severity   │  │ • Retry      │      │ │
│  │  └──────────────┘  └──────────────┘  └──────────────┘      │ │
│  │  (sentinel-storage) (sentinel-alerting)                     │ │
│  └──────────────────────────────────────────────────────────────┘ │
│                                                                    │
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │                REST API (sentinel-api)                        │ │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐            │ │
│  │  │  Health    │  │  Metrics   │  │   Query    │            │ │
│  │  │  /health/* │  │ /metrics   │  │  /api/v1/* │            │ │
│  │  └────────────┘  └────────────┘  └────────────┘            │ │
│  └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
                      │              │
                      ▼              ▼
        ┌──────────────────┐  ┌──────────────────┐
        │   Prometheus     │  │     Grafana      │
        │   (Metrics)      │  │   (Dashboards)   │
        └──────────────────┘  └──────────────────┘

Component Architecture

sentinel-core

  • Error types and handling
  • Configuration models
  • Telemetry event types
  • Shared utilities

sentinel-ingestion

  • Kafka consumer with group management
  • OTLP/JSON parsing
  • Schema validation
  • PII detection and sanitization
  • Configurable message handling

sentinel-detection

  • Statistical baseline management
  • Four detection algorithms (Z-Score, IQR, MAD, CUSUM)
  • Multi-dimensional baseline tracking
  • Lock-free concurrent updates
  • Baseline persistence

sentinel-storage

  • InfluxDB v3 client with batch writes
  • In-memory cache (Moka)
  • Redis distributed cache
  • Query API for historical data
  • Automatic TTL management

sentinel-alerting

  • RabbitMQ topic publisher
  • Webhook HTTP delivery
  • Alert deduplication (5-minute window)
  • Exponential backoff retry
  • HMAC signature generation

sentinel-api

  • REST API server (Axum)
  • Health check endpoints
  • Prometheus metrics exporter
  • Query endpoints for telemetry and anomalies
  • CORS support

Observability

Grafana Dashboards

Four production-ready dashboards included in deployments/grafana/dashboards/:

1. Anomaly Detection Dashboard

  • Anomaly detection rate by severity (timeseries)
  • Total anomaly rate gauge
  • Anomalies in last 24h (stat)
  • Anomaly types distribution (bar chart)
  • Top services by anomalies
  • Detection latency percentiles (P50, P95, P99)

2. System Health Dashboard

  • Service status (up/down)
  • Active instance count
  • Uptime tracking
  • Memory usage by instance
  • CPU usage by instance
  • Event processing throughput
  • Error rates by component

3. Performance Metrics Dashboard

  • Events processed per second
  • Detection P95 latency
  • Cache hit rate percentage
  • Pipeline throughput (Kafka → Detection → Storage)
  • Storage write rate
  • API request rate

4. Alert Overview Dashboard

  • Alerts sent (last hour)
  • Alerts deduplicated (last hour)
  • Alert failures (last hour)
  • RabbitMQ publishes by severity
  • Webhook delivery success rate
  • Alert distribution by type

Prometheus Metrics

50+ metrics exported at /metrics:

Ingestion Metrics:

  • sentinel_events_received_total - Total events consumed from Kafka
  • sentinel_events_processed_total - Successfully processed events
  • sentinel_ingestion_errors_total - Ingestion errors by type
  • sentinel_validation_failures_total - Validation failures
  • sentinel_kafka_messages_consumed_total - Kafka messages consumed
  • sentinel_kafka_consumption_errors_total - Kafka errors

Detection Metrics:

  • sentinel_anomalies_detected_total - Anomalies by severity and detector
  • sentinel_detection_latency_seconds - Detection latency histogram
  • sentinel_detection_errors_total - Detection errors
  • sentinel_baseline_updates_total - Baseline update count
  • sentinel_baseline_samples - Current baseline sample counts

Storage Metrics:

  • sentinel_storage_writes_total - Successful storage writes
  • sentinel_storage_errors_total - Storage errors
  • sentinel_cache_hits_total - Cache hits
  • sentinel_cache_misses_total - Cache misses
  • sentinel_cache_size - Current cache size

Alerting Metrics:

  • sentinel_alerts_sent_total - Alerts sent by channel
  • sentinel_alerts_deduplicated_total - Deduplicated alerts
  • sentinel_alert_failures_total - Alert delivery failures
  • sentinel_rabbitmq_publishes_total - RabbitMQ publishes
  • sentinel_webhook_deliveries_total - Webhook deliveries
  • sentinel_webhook_failures_total - Webhook failures

Prometheus Alerts

50+ production-ready alert rules in deployments/prometheus/alerts/sentinel-alerts.yaml:

Service Health:

  • SentinelServiceDown (1m down)
  • SentinelHighRestartRate (frequent restarts)
  • SentinelInsufficientInstances (<2 instances)

Performance:

  • SentinelHighLatency (P95 >50ms)
  • SentinelVeryHighLatency (P95 >100ms)
  • SentinelLowThroughput (<100 events/sec)

Errors:

  • SentinelHighErrorRate (>10 errors/sec)
  • SentinelIngestionErrors (>5 errors/sec)
  • SentinelDetectionErrors (>5 errors/sec)
  • SentinelStorageErrors (>5 errors/sec)

Resources:

  • SentinelHighMemoryUsage (>1.5GB)
  • SentinelCriticalMemoryUsage (>1.8GB)
  • SentinelHighCPUUsage (>80%)

Anomalies:

  • SentinelAnomalySpike (>50 anomalies/sec)
  • SentinelCriticalAnomalies (>5 critical/sec)
  • SentinelNoAnomalies (0 anomalies for 6h - detection health check)

API Reference

Health Endpoints

Liveness Probe

GET /health/live

Response: 200 OK
{
  "status": "healthy",
  "timestamp": "2024-11-06T10:30:00Z"
}

Readiness Probe

GET /health/ready

Response: 200 OK (when ready to accept traffic)
Response: 503 Service Unavailable (when not ready)

Metrics Endpoint

Prometheus Metrics

GET /metrics

Response: 200 OK
# HELP sentinel_events_processed_total Total events processed
# TYPE sentinel_events_processed_total counter
sentinel_events_processed_total{service="chat-api"} 12453
...

Query Endpoints

Query Recent Telemetry

GET /api/v1/telemetry?service={service}&model={model}&hours={hours}

Example:
GET /api/v1/telemetry?service=chat-api&model=gpt-4&hours=24

Response: 200 OK
{
  "data": [...],
  "count": 1523,
  "timeRange": {
    "start": "2024-11-05T10:30:00Z",
    "end": "2024-11-06T10:30:00Z"
  }
}

Query Anomalies

GET /api/v1/anomalies?severity={severity}&hours={hours}&limit={limit}

Example:
GET /api/v1/anomalies?severity=critical&hours=1&limit=50

Response: 200 OK
{
  "anomalies": [
    {
      "id": "anom-123",
      "timestamp": "2024-11-06T10:25:00Z",
      "service": "chat-api",
      "model": "gpt-4",
      "detector": "zscore",
      "metric": "latency_ms",
      "value": 15234.5,
      "baseline_mean": 1234.5,
      "baseline_stddev": 234.2,
      "z_score": 59.8,
      "severity": "critical"
    }
  ],
  "count": 3
}

Query Recent Anomalies

GET /api/v1/anomalies/recent?limit={limit}

Example:
GET /api/v1/anomalies/recent?limit=10

Response: 200 OK (last 10 anomalies)

Query Baseline Statistics

GET /api/v1/baselines?service={service}&model={model}

Response: 200 OK
{
  "baselines": [
    {
      "key": "chat-api:gpt-4:latency_ms",
      "samples": 1000,
      "mean": 1234.5,
      "stddev": 234.2,
      "median": 1205.3,
      "p95": 1687.2,
      "p99": 2103.4
    }
  ]
}

Example Producers

Python Producer

Full-featured Python example for sending telemetry to Kafka:

cd examples/python

# Install dependencies
pip install -r requirements.txt

# Run with defaults (20 normal + 5 anomalous events)
python producer.py --brokers localhost:9092

# Run continuously
python producer.py --continuous

# Custom configuration
python producer.py \
  --brokers kafka-0:9092,kafka-1:9092 \
  --topic llm.telemetry \
  --normal-events 50 \
  --anomalous-events 10

Features:

  • Configurable event generation
  • Simulates 4 anomaly types (high latency, high tokens, high cost, suspicious patterns)
  • Kafka integration with guaranteed delivery
  • Continuous mode for load testing

See Python Example README

Go Producer

High-performance Go example for production use:

cd examples/go

# Build
go build -o producer producer.go

# Run with defaults
./producer -brokers localhost:9092

# Run continuously
./producer -continuous

# Custom configuration
./producer \
  -brokers kafka-0:9092,kafka-1:9092 \
  -topic llm.telemetry \
  -normal-events 100 \
  -anomalous-events 20

Features:

  • Native Go performance (10,000+ events/sec)
  • Graceful shutdown (SIGINT/SIGTERM)
  • Connection pooling and retries
  • Minimal memory footprint (<20MB)

See Go Example README

Integration Example

Integrate telemetry into your LLM application:

from llm_sentinel import TelemetryProducer

# Initialize
producer = TelemetryProducer(
    brokers=["kafka:9092"],
    topic="llm.telemetry"
)

# After each LLM API call
event = producer.create_telemetry_event(
    service_name="my-chatbot",
    model_name="gpt-4",
    latency_ms=response_time,
    prompt_tokens=completion.usage.prompt_tokens,
    completion_tokens=completion.usage.completion_tokens,
    cost_usd=calculated_cost,
    user_id=user.id,
    session_id=session.id
)

producer.send_event(event)

Configuration

Complete Configuration Example

# Ingestion configuration
ingestion:
  kafka:
    brokers:
      - "kafka-0:9092"
      - "kafka-1:9092"
      - "kafka-2:9092"
    topic: "llm.telemetry"
    group_id: "sentinel-consumer"
    session_timeout_ms: 6000
    enable_auto_commit: false
    auto_offset_reset: "latest"
    max_poll_records: 500

  parsing:
    max_text_length: 10000
    enable_sanitization: true
    sanitize_patterns:
      - "password"
      - "api_key"
      - "secret"

  validation:
    min_latency_ms: 0.1
    max_latency_ms: 300000.0  # 5 minutes
    max_tokens: 100000
    max_cost_usd: 100.0
    enable_pii_detection: true

# Detection configuration
detection:
  enabled_detectors:
    - "zscore"
    - "iqr"
    - "mad"
    - "cusum"

  baseline:
    window_size: 1000
    min_samples: 10
    update_interval_secs: 60
    enable_persistence: true
    persistence_path: "/var/lib/sentinel/baselines"

  zscore:
    threshold: 3.0
    sensitivity: "medium"  # low, medium, high
    metrics:
      - "latency_ms"
      - "total_tokens"
      - "cost_usd"

  iqr:
    multiplier: 1.5
    metrics:
      - "latency_ms"
      - "total_tokens"

  mad:
    threshold: 3.5
    metrics:
      - "latency_ms"

  cusum:
    threshold: 5.0
    drift: 0.5
    metrics:
      - "latency_ms"

# Storage configuration
storage:
  influxdb:
    url: "http://influxdb:8086"
    org: "sentinel"
    token: "${INFLUXDB_TOKEN}"
    telemetry_bucket: "telemetry"
    anomaly_bucket: "anomalies"
    batch_size: 100
    timeout_secs: 10

  cache:
    max_capacity: 10000
    ttl_secs: 300
    tti_secs: 60
    enable_metrics: true

  redis:
    enabled: true
    url: "redis://redis:6379"
    password: "${REDIS_PASSWORD}"
    key_prefix: "sentinel:"
    ttl_secs: 300

# Alerting configuration
alerting:
  rabbitmq:
    url: "amqp://rabbitmq:5672"
    username: "sentinel"
    password: "${RABBITMQ_PASSWORD}"
    exchange: "sentinel.alerts"
    exchange_type: "topic"
    routing_key_prefix: "alert"
    persistent: true
    timeout_secs: 10
    retry_config:
      max_attempts: 3
      initial_delay_ms: 1000
      backoff_multiplier: 2.0
      max_delay_ms: 30000

  webhook:
    enabled: false
    url: "https://alerts.example.com/webhook"
    method: "POST"
    secret: "${WEBHOOK_SECRET}"
    timeout_secs: 10
    max_retries: 3
    retry_delay_ms: 1000
    backoff_multiplier: 2.0

  deduplication:
    enabled: true
    window_secs: 300
    cleanup_interval_secs: 60

# API configuration
api:
  bind_addr: "0.0.0.0:8080"
  enable_cors: true
  cors_origins:
    - "*"
  timeout_secs: 30
  max_body_size: 10485760  # 10MB
  enable_logging: true
  metrics_path: "/metrics"

See config/sentinel.yaml for a complete annotated example.

Environment Variables

All sensitive configuration can be provided via environment variables:

export INFLUXDB_TOKEN="your-influxdb-token"
export RABBITMQ_PASSWORD="your-rabbitmq-password"
export REDIS_PASSWORD="your-redis-password"
export WEBHOOK_SECRET="your-webhook-secret"
export SENTINEL_LOG_LEVEL="info"
export SENTINEL_LOG_JSON="true"
export RUST_BACKTRACE="1"

Performance

Throughput Benchmarks

Tested on AWS m5.xlarge (4 vCPU, 16GB RAM):

| Operation | Throughput | P50 | P95 | P99 | |-----------|------------|-----|-----|-----| | Ingestion | 10,500 events/sec | 2ms | 8ms | 15ms | | Detection | 5,200 detections/sec | 3ms | 12ms | 25ms | | Storage (batched) | 8,300 writes/sec | 5ms | 18ms | 35ms | | API Queries | 2,800 req/sec | 8ms | 25ms | 48ms |

Resource Usage

| Configuration | Memory | CPU | Disk I/O | |---------------|--------|-----|----------| | Idle | 180MB | 0.1 cores | 1 MB/s | | Light load (1k events/sec) | 320MB | 0.8 cores | 15 MB/s | | Medium load (5k events/sec) | 520MB | 1.8 cores | 42 MB/s | | Heavy load (10k events/sec) | 850MB | 2.9 cores | 78 MB/s |

Scaling Characteristics

  • Horizontal: Linear scaling up to 10 replicas
  • Vertical: Efficient use of multi-core (up to 8 cores)
  • Storage: InfluxDB handles 100k+ writes/sec with proper sizing
  • Network: ~10 Mbps at 10k events/sec (depends on event size)

Development

Project Structure

llm-sentinel/
├── crates/
│   ├── sentinel-core/          # Core types and error handling (1,350 lines)
│   ├── sentinel-ingestion/     # Kafka consumer and OTLP parsing (1,390 lines)
│   ├── sentinel-detection/     # Anomaly detection algorithms (2,319 lines)
│   ├── sentinel-storage/       # InfluxDB and caching (987 lines)
│   ├── sentinel-alerting/      # RabbitMQ and webhooks (1,645 lines)
│   └── sentinel-api/           # REST API server (1,452 lines)
├── sentinel/                   # Main binary (285 lines)
├── config/                     # Configuration examples
├── deployments/                # Deployment configurations
│   ├── grafana/               # 4 Grafana dashboards
│   └── prometheus/            # 50+ alert rules
├── docs/                       # Complete documentation (27 files)
├── examples/                   # Example producers
│   ├── python/                # Python producer example
│   └── go/                    # Go producer example
├── helm/                       # Helm chart
│   └── sentinel/              # Production-ready chart
├── k8s/                        # Kubernetes manifests (13 files)
├── .github/workflows/          # CI/CD pipelines
├── Cargo.toml                  # Rust workspace
├── docker-compose.yaml         # Local development environment
└── Dockerfile                  # Multi-stage Docker build

Total: ~9,500 lines of production Rust code

Running Tests

# All tests
cargo test --workspace

# Specific crate
cargo test -p sentinel-detection

# With output
cargo test -- --nocapture

# Integration tests only
cargo test --workspace --test '*'

# With coverage
cargo tarpaulin --out Html --output-dir coverage

Building

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# With specific features
cargo build --features redis

# Docker build
docker build -t llm-sentinel:latest .

# Multi-platform Docker build
docker buildx build --platform linux/amd64,linux/arm64 -t llm-sentinel:latest .

Code Quality

# Format code
cargo fmt

# Lint with clippy
cargo clippy -- -D warnings

# Security audit
cargo audit

# Check dependencies
cargo deny check

CI/CD Pipeline

Full GitHub Actions workflow:

CI Pipeline (.github/workflows/ci.yaml)

  • Format check (rustfmt)
  • Lint (clippy with all warnings as errors)
  • Build (debug and release)
  • Test (all workspaces)
  • Coverage (tarpaulin with 80% target)
  • Security audit (cargo-audit)
  • Dependency check (cargo-deny)

CD Pipeline (.github/workflows/cd.yaml)

  • Docker build (multi-stage)
  • SBOM generation (syft)
  • Image signing (cosign)
  • Vulnerability scan (trivy)
  • Push to registry (GHCR)
  • Helm package (chart packaging)
  • Deploy (to staging/production)

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • Code of Conduct
  • Development setup
  • Coding standards
  • Testing requirements
  • Pull request process
  • Security reporting

Documentation

Complete documentation available in /docs:

License

This project is licensed under the Apache License 2.0 - see LICENSE file for details.

Support

Roadmap

  • [ ] Additional detection algorithms (Isolation Forest, AutoEncoder)
  • [ ] Real-time dashboard (WebSocket streaming)
  • [ ] Multi-tenant support
  • [ ] Advanced query DSL
  • [ ] Machine learning model drift detection
  • [ ] Automated baseline tuning
  • [ ] Support for additional message brokers (NATS, Pulsar)
  • [ ] OpenTelemetry native ingestion
  • [ ] Grafana plugin for custom visualizations

Status: ✅ Production Ready Version: 0.1.0 License: Apache 2.0 Built with: Rust 🦀 Last Updated: 2025-11-06