@ruvector/burst-scaling
v1.0.0
Published
Adaptive burst scaling system for ruvector - handles 10-50x traffic spikes
Maintainers
Readme
Ruvector Adaptive Burst Scaling System
Production-ready auto-scaling infrastructure for handling 10-50x traffic bursts while maintaining <50ms p99 latency
Overview
This burst scaling system enables Ruvector to handle massive traffic spikes (e.g., World Cup events with 25 billion concurrent streams) while maintaining strict latency SLAs and cost controls.
Key Features
- Predictive Scaling: ML-based forecasting pre-warms capacity before known events
- Reactive Scaling: Real-time auto-scaling based on CPU, memory, connections, and latency
- Global Orchestration: Cross-region capacity allocation with budget controls
- Cost Management: Sophisticated budget tracking with graceful degradation
- Infrastructure as Code: Complete Terraform configuration for GCP Cloud Run
- Comprehensive Monitoring: Cloud Monitoring dashboard with 15+ key metrics
Capabilities
| Metric | Baseline | Burst Capacity | Target | |--------|----------|----------------|--------| | Concurrent Streams | 500M | 25B (50x) | <50ms p99 | | Scale-Out Time | N/A | <60 seconds | Full capacity | | Regions | 3 | 8+ | Global coverage | | Cost Control | $240k/day | $5M/month | Budget-aware | | Instances per Region | 10-50 | 1000+ | Auto-scaling |
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Global Load Balancer │
│ (CDN + SSL + Health Checks) │
└───────────────────┬──────────────┬──────────────┬───────────────┘
│ │ │
┌───────────▼──────┐ ┌────▼─────────┐ ┌▼──────────────┐
│ us-central1 │ │ europe-west1 │ │ asia-east1 │
│ Cloud Run │ │ Cloud Run │ │ Cloud Run │
│ 10-1000 inst │ │ 10-1000 inst│ │ 10-1000 inst │
└───────────┬──────┘ └────┬─────────┘ └┬──────────────┘
│ │ │
┌───────────▼──────────────▼──────────────▼──────────────┐
│ Capacity Manager (Orchestration) │
│ ┌────────────────┐ ┌──────────────────────────────┐ │
│ │ Burst Predictor│ │ Reactive Scaler │ │
│ │ - Event cal │ │ - Real-time metrics │ │
│ │ - ML forecast │ │ - Dynamic thresholds │ │
│ │ - Pre-warming │ │ - Rapid scale-out │ │
│ └────────────────┘ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │ │
┌───────────▼──────┐ ┌────▼─────────┐ ┌▼──────────────┐
│ Cloud SQL │ │ Redis │ │ Monitoring │
│ + Read Replicas │ │ 64GB HA │ │ Dashboards │
└──────────────────┘ └──────────────┘ └───────────────┘Quick Start
Prerequisites
- Node.js 18+
- Terraform 1.0+
- GCP Project with billing enabled
- GCP CLI (
gcloud) authenticated
Installation
cd /home/user/ruvector/src/burst-scaling
# Install dependencies
npm install
# Configure GCP
gcloud config set project YOUR_PROJECT_ID
# Initialize Terraform
cd terraform
terraform init
# Create terraform.tfvars (see variables.tf for all options)
cat > terraform.tfvars <<EOF
project_id = "ruvector-prod"
billing_account = "0123AB-CDEF45-67890"
domain = "api.ruvector.io"
alert_email = "[email protected]"
regions = [
"us-central1",
"europe-west1",
"asia-east1"
]
# Scaling configuration
min_instances = 10
max_instances = 1000
burst_multiplier_max = 50
# Budget
hourly_budget = 10000
daily_budget = 200000
monthly_budget = 5000000
# Thresholds
cpu_scale_out_threshold = 0.70
latency_threshold_ms = 50
EOFDeploy Infrastructure
# Plan deployment
terraform plan -var-file="terraform.tfvars"
# Deploy (creates all infrastructure)
terraform apply -var-file="terraform.tfvars"
# Outputs will show:
# - Load balancer IP address
# - Cloud Run service URLs
# - Database connection strings
# - Redis instance hostsConfigure Monitoring
# Import dashboard to Cloud Monitoring
gcloud monitoring dashboards create \
--config-from-file=../monitoring-dashboard.json
# Set up alerting (already configured via Terraform)
# Alerts will be sent to: [email protected]Run Scaling Components
# Start Burst Predictor (loads event calendar)
npm run predictor
# Start Reactive Scaler (monitors real-time metrics)
npm run scaler
# Start Capacity Manager (orchestrates everything)
npm run manager
# For production, run as systemd services or Cloud Run jobsUsage
Predictive Scaling
import { BurstPredictor, EventCalendar } from './burst-predictor';
const predictor = new BurstPredictor();
// Load event calendar
const calendar: EventCalendar = {
events: [
{
id: 'world-cup-final',
name: 'World Cup Final 2026',
type: 'sports',
startTime: new Date('2026-07-19T15:00:00Z'),
region: ['us-central1', 'europe-west1', 'south-america-east1'],
expectedViewers: 2_000_000_000
}
]
};
await predictor.loadEventCalendar(calendar);
// Get predictions for next 24 hours
const bursts = await predictor.predictUpcomingBursts(24);
console.log(`Predicted ${bursts.length} burst events`);
// Get pre-warming schedule
const schedule = await predictor.getPreWarmingSchedule();Reactive Scaling
import { ReactiveScaler, ScalingMetrics } from './reactive-scaler';
const scaler = new ReactiveScaler();
// Update thresholds
scaler.updateThresholds({
cpuScaleOut: 0.70,
cpuScaleIn: 0.30,
maxP99Latency: 50
});
// Process metrics (called continuously)
const metrics: ScalingMetrics = {
region: 'us-central1',
timestamp: new Date(),
cpuUtilization: 0.75,
memoryUtilization: 0.68,
activeConnections: 45_000_000,
requestRate: 150_000,
errorRate: 0.005,
p99Latency: 42,
currentInstances: 50
};
const action = await scaler.processMetrics(metrics);
if (action.action !== 'none') {
console.log(`Scaling ${action.region}: ${action.fromInstances} -> ${action.toInstances}`);
}Capacity Management
import { CapacityManager } from './capacity-manager';
const manager = new CapacityManager();
// Update budget
manager.updateBudget({
hourlyBudget: 12000,
warningThreshold: 0.85
});
// Run orchestration (call every 60 seconds)
const plan = await manager.orchestrate();
console.log(`Total instances: ${plan.totalInstances}`);
console.log(`Total cost: $${plan.totalCost}/hour`);
console.log(`Degradation level: ${plan.degradationLevel}`);Configuration
Scaling Thresholds
Edit terraform/variables.tf:
# CPU thresholds
cpu_scale_out_threshold = 0.70 # Scale out at 70% CPU
cpu_scale_in_threshold = 0.30 # Scale in at 30% CPU
# Memory thresholds
memory_scale_out_threshold = 0.75
memory_scale_in_threshold = 0.35
# Latency
latency_threshold_ms = 50 # p99 latency SLA
# Connections
max_connections_per_instance = 500000Budget Controls
# Budget limits
hourly_budget = 10000 # $10k/hour
daily_budget = 200000 # $200k/day
monthly_budget = 5000000 # $5M/month
# Enforcement
hard_budget_limit = false # Allow temporary overages during bursts
budget_warning_threshold = 0.80 # Warn at 80%Region Configuration
regions = [
"us-central1", # Primary
"europe-west1", # Europe
"asia-east1", # Asia
"us-east1", # Additional US
"asia-southeast1" # SEA
]
# Region priorities (1-10, higher = more important)
region_priorities = {
"us-central1" = 10
"europe-west1" = 9
"asia-east1" = 8
}
# Region costs ($/hour per instance)
region_costs = {
"us-central1" = 0.50
"europe-west1" = 0.55
"asia-east1" = 0.60
}Monitoring
Cloud Monitoring Dashboard
Access at: https://console.cloud.google.com/monitoring/dashboards/custom/ruvector-burst
Key Metrics:
- Total connections across all regions
- Connections by region (stacked area)
- P50/P95/P99 latency percentiles
- Instance count by region
- CPU & memory utilization
- Error rates
- Hourly & daily cost estimates
- Burst event timeline
Alerts
Configured alerts (sent to alert_email):
| Alert | Threshold | Action | |-------|-----------|--------| | High Latency | p99 > 50ms for 2min | Investigate | | Critical Latency | p99 > 100ms for 1min | Page on-call | | High Error Rate | >1% for 5min | Investigate | | Budget Warning | >80% hourly | Review costs | | Budget Critical | >100% hourly | Enable degradation | | Region Down | 0 healthy backends | Page on-call |
Log Queries
# View scaling events
gcloud logging read 'jsonPayload.message =~ "SCALING"' --limit=50
# View high latency requests
gcloud logging read 'jsonPayload.latency > 0.1' --limit=50
# View budget alerts
gcloud logging read 'jsonPayload.message =~ "BUDGET"' --limit=50Operations
Daily Operations
See RUNBOOK.md for complete operational procedures.
Quick checks:
# Check system status
npm run manager
# View predictions
npm run predictor
# Check current metrics
gcloud run services list --platform=managed
# Review costs
gcloud billing accounts listEmergency Procedures
Latency spike (p99 > 100ms):
# Force scale-out all regions
for region in us-central1 europe-west1 asia-east1; do
gcloud run services update ruvector-$region \
--region=$region \
--max-instances=1500
doneBudget exceeded:
# Enable minor degradation (shed free tier)
npm run manager -- --degrade=minor
# Enable major degradation (free tier only, limited features)
npm run manager -- --degrade=majorRegion failure:
# Scale up remaining regions
gcloud run services update ruvector-europe-west1 \
--region=europe-west1 \
--max-instances=2000
# Activate backup region
terraform apply -var='regions=["us-central1","europe-west1","asia-east1","us-east1"]'Cost Analysis
Expected Costs
| Scenario | Instances | Hourly | Daily | Monthly | |----------|-----------|--------|-------|---------| | Baseline | 30 (10/region) | $45 | $1,080 | $32,400 | | Normal Load | 150 (50/region) | $225 | $5,400 | $162,000 | | Medium Burst (10x) | 600 (200/region) | $900 | $21,600 | $648,000 | | Major Burst (25x) | 1,500 (500/region) | $2,250 | $54,000 | $1,620,000 | | World Cup (50x) | 3,000 (1000/region) | $4,500 | $108,000 | $3,240,000 |
Cost Breakdown:
- Cloud Run instances: $0.50/hour per instance (varies by region)
- Cloud SQL: $500/month per region
- Redis: $300/month per region
- Load Balancer: $18/month + $0.008/GB
- Networking: ~$0.12/GB egress
Cost Optimization
- Auto-scale down: Gradual scale-in after bursts (5-10 minutes)
- Regional pricing: Prioritize cheaper regions (us-central1 < europe-west1 < asia-east1)
- CDN caching: Reduce backend load by 40-60%
- Connection pooling: Reduce database costs
- Budget controls: Automatic degradation at thresholds
Testing
Load Testing
# Install dependencies
npm install -g artillery
# Run load test
artillery run load-test.yaml
# Expected results:
# - Handle 10x burst: 5B connections
# - Maintain p99 < 50ms
# - Auto-scale to required capacityBurst Simulation
# Simulate World Cup event
npm run predictor -- --simulate --event-type=world-cup-final
# Monitor dashboard during simulation
# Verify pre-warming occurs 15 minutes before
# Verify scaling to 1000 instances per region
# Verify p99 latency stays < 50msCost Testing
# Simulate costs for different scenarios
npm run manager -- --simulate --multiplier=10 # 10x burst
npm run manager -- --simulate --multiplier=25 # 25x burst
npm run manager -- --simulate --multiplier=50 # 50x burst
# Review estimated costs
# Verify budget controls trigger at thresholdsTroubleshooting
Issue: Auto-scaling not working
Check:
# Verify Cloud Run auto-scaling config
gcloud run services describe ruvector-us-central1 --region=us-central1
# Check quotas
gcloud compute project-info describe --project=ruvector-prod
# Check IAM permissions
gcloud projects get-iam-policy ruvector-prodIssue: High latency during burst
Check:
- Database connection pool exhaustion
- Redis cache hit rate
- Network bandwidth limits
- CPU/memory saturation
Fix:
# Scale up database
gcloud sql instances patch ruvector-db-us-central1 --cpu=32 --memory=128GB
# Scale up Redis
gcloud redis instances update ruvector-redis-us-central1 --size=128
# Force scale-out
gcloud run services update ruvector-us-central1 --max-instances=2000Issue: Budget exceeded unexpectedly
Check:
# Review cost breakdown
gcloud billing accounts list
# Check instance counts
gcloud run services list
# Review recent scaling events
gcloud logging read 'jsonPayload.message =~ "SCALING"' --limit=100Fix:
- Enable hard budget limit
- Adjust scale-in cooldown (faster scale-down)
- Review regional priorities
- Enable aggressive degradation
Development
Build
npm run buildTest
npm testLint
npm run lintWatch Mode
npm run watchFiles
burst-scaling/
├── burst-predictor.ts # Predictive scaling engine
├── reactive-scaler.ts # Reactive auto-scaling
├── capacity-manager.ts # Global orchestration
├── monitoring-dashboard.json # Cloud Monitoring dashboard
├── package.json # Dependencies
├── tsconfig.json # TypeScript config
├── README.md # This file
├── RUNBOOK.md # Operations runbook
└── terraform/
├── main.tf # Infrastructure as Code
└── variables.tf # Configuration parametersSupport
- Documentation: This README and RUNBOOK.md
- Issues: https://github.com/ruvnet/ruvector/issues
- Slack: #burst-scaling
- On-call: Check PagerDuty rotation
License
MIT License - See LICENSE file in repository root
Author: Ruvector DevOps Team Last Updated: 2025-01-20 Version: 1.0.0
