npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@llm-dev-ops/llm-edge-agent

v0.1.0

Published

Enterprise-grade LLM intercepting proxy with intelligent caching, routing, and observability

Readme

LLM Edge Agent

License Rust Build Status Coverage Production Ready

Enterprise-grade LLM intercepting proxy with intelligent caching, routing, and observability

FeaturesQuick StartDocumentationArchitectureContributing


Overview

LLM Edge Agent is a high-performance, production-ready intercepting proxy for Large Language Model (LLM) APIs. It provides intelligent request routing, multi-tier caching, cost optimization, and comprehensive observability for enterprise LLM deployments.

Key Highlights

  • 🚀 High Performance: 1000+ RPS throughput, <50ms proxy overhead
  • 💰 Cost Optimization: 70%+ cache hit rate, intelligent provider routing
  • 🔄 Multi-Provider Support: OpenAI, Anthropic, with easy extensibility
  • 📊 Enterprise Observability: Prometheus metrics, Grafana dashboards, Jaeger tracing
  • 🛡️ Production Grade: Comprehensive testing, security hardening, chaos engineering validated
  • ☸️ Cloud Native: Docker, Kubernetes, Helm chart ready

Features

Intelligent Caching

  • L1 Cache (Moka): In-memory cache with TinyLFU eviction, <100μs access time
  • L2 Cache (Redis): Distributed cache cluster, 3-node HA configuration
  • L3 Cache (Semantic): Vector similarity-based caching (planned)
  • Smart Key Generation: SHA-256 based, collision-resistant
  • Cache Hit Rate: >70% in production workloads

Advanced Routing

  • Model-Based Routing: Automatic provider selection by model
  • Cost-Optimized Routing: Route to cheapest provider
  • Latency-Optimized Routing: Route to fastest provider
  • Failover Routing: Automatic failover on provider outages
  • Circuit Breaker: Protect against cascading failures (5 failures → 30s timeout)

Observability

  • Prometheus Metrics: 20+ metrics including request rate, latency, cache hits, cost tracking
  • Grafana Dashboards: Pre-built dashboards for monitoring and analytics
  • Distributed Tracing: Jaeger integration with OTLP support
  • Structured Logging: JSON logs with correlation IDs
  • Health Checks: Liveness, readiness, and startup probes

Security

  • Authentication: API key validation with rate limiting
  • Rate Limiting: Per-user request limits
  • Input Validation: Comprehensive request validation
  • Security Headers: X-Content-Type-Options, X-Frame-Options, HSTS
  • Dependency Scanning: Automated vulnerability detection with cargo-audit
  • OWASP Compliance: Baseline and full scans passing

Testing & Quality

  • Unit Test Coverage: 85%
  • Integration Tests: 39 comprehensive scenarios
  • Load Tests: k6 tests for baseline, spike, stress, and soak scenarios
  • Security Tests: OWASP ZAP, penetration testing, dependency scanning
  • Performance Benchmarks: Criterion.rs benchmarks for cache and routing
  • Chaos Engineering: 10 failure scenarios validated

Quick Start

Prerequisites

  • Rust: 1.83 or later
  • Docker: 20.10+ (optional, for infrastructure)
  • Docker Compose: 1.29+ (optional, for infrastructure)

Installation

Option 1: NPM (Recommended - Cross-Platform)

# Install globally
npm install -g @llm-dev-ops/llm-edge-agent

# Or use with npx (no installation)
npx @llm-dev-ops/llm-edge-agent start

Option 2: From Source

# Clone the repository
git clone https://github.com/globalbusinessadvisors/llm-edge-agent.git
cd llm-edge-agent

# Build the project
cargo build --release

# Run tests
cargo test --workspace

Option 3: Docker

# Start with Docker Compose (includes Redis, Prometheus, Grafana)
docker-compose -f docker-compose.production.yml up -d

Configuration

NPM Installation

# Generate configuration template
llm-edge-agent config init

# Edit .env file with your API keys
# Then start the server
llm-edge-agent start

Manual Configuration

Create a .env file:

# LLM Provider API Keys (at least one required)
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# Server Configuration (optional)
HOST=0.0.0.0
PORT=8080
METRICS_PORT=9090

# Cache Configuration (optional)
ENABLE_L2_CACHE=true
REDIS_URL=redis://localhost:6379
L1_CACHE_SIZE=1000
L1_TTL_SECONDS=300
L2_TTL_SECONDS=3600

# Observability (optional)
ENABLE_TRACING=true
OTLP_ENDPOINT=http://localhost:4317
ENABLE_METRICS=true
RUST_LOG=info,llm_edge_agent=debug

Running

NPM Installation

# Start the server (basic)
llm-edge-agent start

# Start with custom configuration
llm-edge-agent start --port 8080 --openai-key sk-... --enable-l2-cache --redis-url redis://localhost:6379

# Run in background (daemon mode)
llm-edge-agent start --daemon

# Check health
llm-edge-agent health

# View metrics
llm-edge-agent metrics

From Source

# Standalone (without infrastructure)
cargo run --release

# With complete infrastructure (Redis, Prometheus, Grafana, Jaeger)
docker-compose -f docker-compose.production.yml up -d

# Check health
curl http://localhost:8080/health

First Request

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "Hello, world!"
      }
    ]
  }'

Architecture

System Architecture

┌─────────────┐
│   Client    │
└──────┬──────┘
       │
       ▼
┌─────────────────────────────────────┐
│      LLM Edge Agent (Proxy)         │
│  ┌─────────────────────────────┐   │
│  │   HTTP Server (Axum)        │   │
│  └────────────┬────────────────┘   │
│               │                     │
│  ┌────────────▼────────────────┐   │
│  │   Authentication & Rate     │   │
│  │   Limiting Layer            │   │
│  └────────────┬────────────────┘   │
│               │                     │
│  ┌────────────▼────────────────┐   │
│  │   Cache Manager             │   │
│  │   ┌──────────────────────┐  │   │
│  │   │  L1: Moka (In-Mem)   │  │   │
│  │   │  L2: Redis (3-node)  │  │   │
│  │   └──────────────────────┘  │   │
│  └────────────┬────────────────┘   │
│               │                     │
│  ┌────────────▼────────────────┐   │
│  │   Routing Engine            │   │
│  │   (Model/Cost/Latency/      │   │
│  │    Failover strategies)     │   │
│  └────────────┬────────────────┘   │
│               │                     │
│  ┌────────────▼────────────────┐   │
│  │   Provider Adapters         │   │
│  │   ┌──────────┬──────────┐   │   │
│  │   │ OpenAI   │Anthropic │   │   │
│  │   └──────────┴──────────┘   │   │
│  └────────────┬────────────────┘   │
│               │                     │
│  ┌────────────▼────────────────┐   │
│  │   Observability Layer       │   │
│  │   (Metrics, Tracing, Logs)  │   │
│  └─────────────────────────────┘   │
└─────────────────────────────────────┘
       │            │            │
       ▼            ▼            ▼
 ┌──────────┐ ┌──────────┐ ┌──────────┐
 │Prometheus│ │ Grafana  │ │  Jaeger  │
 └──────────┘ └──────────┘ └──────────┘

Component Overview

| Component | Technology | Purpose | |-----------|------------|---------| | HTTP Server | Axum 0.8 | High-performance async HTTP server | | Runtime | Tokio 1.40 | Async runtime with work-stealing scheduler | | L1 Cache | Moka 0.12 | In-memory cache with TinyLFU eviction | | L2 Cache | Redis 7 | Distributed cache cluster (3 nodes) | | Routing | Custom | Intelligent routing strategies | | Providers | OpenAI, Anthropic | LLM provider integrations | | Metrics | Prometheus | Time-series metrics collection | | Dashboards | Grafana | Visualization and analytics | | Tracing | Jaeger + OTLP | Distributed tracing |


Performance

Benchmarks

| Metric | Value | Notes | |--------|-------|-------| | Throughput | 1000+ RPS | Sustained load | | Proxy Overhead | <50ms | P95 latency | | L1 Cache Hit | <100μs | In-memory access | | L2 Cache Hit | 1-2ms | Redis access | | Cache Miss | Provider latency | Typically 500-2000ms | | Routing Decision | <100ns | All strategies | | Memory Usage | <2GB | Normal operation |

Cache Performance

Cache Hit Rate Distribution:
L1 Cache: 60-70% (hot data)
L2 Cache: 10-15% (warm data)
Cache Miss: 15-25% (cold data)

Overall Cache Hit Rate: >70%
Cost Savings: 70%+ (cached requests free)

Deployment

Docker Compose (Development/Staging)

# Start complete stack
docker-compose -f docker-compose.production.yml up -d

# Services included:
# - llm-edge-agent (main application)
# - redis-1, redis-2, redis-3 (cache cluster)
# - prometheus (metrics)
# - grafana (dashboards)
# - jaeger (tracing)
# - redis-commander (Redis UI)

# Access UIs
open http://localhost:8080     # LLM Edge Agent
open http://localhost:3000     # Grafana (admin/admin)
open http://localhost:9091     # Prometheus
open http://localhost:16686    # Jaeger

Kubernetes (Production)

# Create namespace
kubectl create namespace llm-edge-production

# Create secrets
kubectl create secret generic llm-edge-secrets \
  --from-literal=openai-api-key="sk-..." \
  --from-literal=anthropic-api-key="sk-ant-..." \
  -n llm-edge-production

# Deploy infrastructure
kubectl apply -f deployments/kubernetes/namespace.yaml
kubectl apply -f deployments/kubernetes/redis-cluster.yaml
kubectl apply -f deployments/kubernetes/prometheus.yaml
kubectl apply -f deployments/kubernetes/grafana.yaml
kubectl apply -f deployments/kubernetes/jaeger.yaml
kubectl apply -f deployments/kubernetes/llm-edge-agent.yaml

# Check status
kubectl get all -n llm-edge-production

# Features:
# - HorizontalPodAutoscaler (3-10 replicas)
# - Rolling updates (zero downtime)
# - StatefulSets for Redis
# - PersistentVolumeClaims for data
# - Liveness and readiness probes

Resource Requirements

Development/Staging:

  • CPU: 5.5 cores
  • Memory: 13GB
  • Disk: 110GB

Production:

  • CPU: 20 cores (with 3 app replicas)
  • Memory: 28GB
  • Disk: 110GB

Monitoring

Metrics (Prometheus)

20+ metrics collected:

  • Request Metrics: llm_edge_requests_total, llm_edge_request_duration_seconds
  • Cache Metrics: llm_edge_cache_hits_total, llm_edge_cache_misses_total
  • Provider Metrics: llm_edge_provider_health, llm_edge_provider_latency_seconds
  • Cost Metrics: llm_edge_cost_usd_total, llm_edge_tokens_used_total
  • System Metrics: llm_edge_cpu_usage_percent, llm_edge_memory_bytes

Alerts (12 Alert Rules)

Critical Alerts:

  • Service down >1min
  • Error rate >1% for 5min
  • All providers down for 2min

Warning Alerts:

  • High latency (P95 >2s for 5min)
  • Low cache hit rate (<60% for 15min)
  • High memory usage (>3.5GB for 10min)
  • Circuit breaker open for 2min

Cost Alerts:

  • High daily cost (>$100 for 1hr)
  • Cost spike (50% increase vs. yesterday)

Dashboards (Grafana)

Pre-built dashboards ready for import:

  1. Request Overview: Request rate, latency, errors, cache hits
  2. Cache Performance: Hit rates per tier, latency, memory usage
  3. Cost Analytics: Cost per provider/model, savings, trends
  4. System Health: CPU, memory, connections, provider health
  5. Provider Metrics: Per-provider latency, errors, circuit breakers

Testing

Test Suite

# Run all tests
./run-tests.sh all

# Run specific test suites
./run-tests.sh unit         # Unit tests (2-5 min)
./run-tests.sh integration  # Integration tests (5-10 min)
./run-tests.sh load         # Load tests with k6 (15-20 min)
./run-tests.sh security     # Security tests (10 min)
./run-tests.sh performance  # Benchmarks (5-10 min)
./run-tests.sh chaos        # Chaos engineering (15-20 min)

Coverage

  • Unit Tests: 85% coverage
  • Integration Tests: 75% coverage
  • Load Tests: 5 scenarios (baseline, spike, stress, soak, cache)
  • Security Tests: OWASP ZAP, penetration tests, dependency scanning
  • Performance Tests: Criterion benchmarks, flamegraph profiling
  • Chaos Tests: 10 failure scenarios

Quality Gates (CI/CD)

✅ Unit tests passing (100%) ✅ Integration tests passing (100%) ✅ Security vulnerabilities: 0 critical/high ✅ Code quality: Rustfmt + Clippy clean ✅ Performance: <150% regression threshold ✅ Docker build successful ✅ Kubernetes manifests valid


Documentation


Development

Project Structure

llm-edge-agent/
├── crates/
│   ├── llm-edge-agent/      # Main application
│   ├── llm-edge-server/     # HTTP server layer
│   ├── llm-edge-cache/      # Caching implementation
│   ├── llm-edge-providers/  # Provider adapters
│   ├── llm-edge-routing/    # Routing engine
│   ├── llm-edge-observability/ # Metrics & tracing
│   └── llm-edge-types/      # Shared types
├── tests/                   # Integration tests
│   ├── load/                # k6 load tests
│   ├── security/            # Security tests
│   ├── performance/         # Performance tests
│   └── chaos/               # Chaos engineering
├── benches/                 # Criterion benchmarks
├── deployments/
│   └── kubernetes/          # K8s manifests
├── infrastructure/
│   ├── prometheus/          # Prometheus config
│   └── grafana/             # Grafana config
└── docker-compose.production.yml

Building

# Development build
cargo build

# Release build (optimized)
cargo build --release

# With debug symbols for profiling
RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release

# Run with hot reload (cargo-watch)
cargo install cargo-watch
cargo watch -x run

Code Quality

# Format code
cargo fmt --all

# Run linter
cargo clippy --all-targets --all-features -- -D warnings

# Check for security vulnerabilities
cargo audit

# Generate documentation
cargo doc --no-deps --document-private-items

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (./run-tests.sh all)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Standards

  • All code must pass cargo fmt and cargo clippy
  • Unit tests required for new features
  • Integration tests for new flows
  • Documentation for public APIs
  • Security review for authentication/authorization changes

Roadmap

Current (v1.0 - Production Ready)

  • ✅ Core proxy functionality
  • ✅ Multi-tier caching (L1 + L2)
  • ✅ Provider adapters (OpenAI, Anthropic)
  • ✅ Intelligent routing (4 strategies)
  • ✅ Prometheus metrics + Grafana dashboards
  • ✅ Distributed tracing (Jaeger)
  • ✅ Comprehensive testing
  • ✅ Production infrastructure (Docker + K8s)

Planned (v1.1 - Beta Features)

  • 🔄 L3 Semantic caching (vector similarity)
  • 🔄 Streaming response support
  • 🔄 Additional providers (Cohere, AI21, Azure OpenAI)
  • 🔄 Advanced rate limiting (token buckets)
  • 🔄 Request/response transformation
  • 🔄 A/B testing support

Future (v2.0)

  • 🔮 LLM-Shield security integration
  • 🔮 Custom model fine-tuning support
  • 🔮 Multi-region deployment
  • 🔮 GraphQL API
  • 🔮 Admin dashboard UI
  • 🔮 Cost forecasting and budgets

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments


Support


Stats

GitHub stars GitHub forks GitHub watchers


Made with ❤️ by the LLM Edge Agent team

⬆ Back to Top