@bernierllc/retry-suite
v0.1.7
Published
Complete retry system with admin interface, monitoring dashboard, and comprehensive retry workflows
Readme
@bernierllc/retry-suite
Complete retry system with admin interface, monitoring dashboard, and comprehensive retry workflows for production environments.
Installation
npm install @bernierllc/retry-suiteQuick Start
import { RetrySuite } from '@bernierllc/retry-suite';
const retrySuite = new RetrySuite({
retryManager: {
defaultOptions: {
maxRetries: 5,
initialDelayMs: 1000,
backoffFactor: 2
}
},
retryStorage: {
type: 'memory',
options: {}
},
retryMonitoring: {
enabled: true,
metricsInterval: 60000,
alertThresholds: {
failureRate: 0.1,
averageRetryTime: 5000
}
},
admin: {
port: 3000,
host: 'localhost',
auth: {
enabled: false
}
},
integrations: [],
logging: {
level: 'info',
format: 'json'
}
});
// Execute with retry
const result = await retrySuite.executeWithRetry(
'api-call',
async () => {
const response = await fetch('https://api.example.com/data');
return response.json();
},
{ maxRetries: 3 }
);
// Start admin interface
await retrySuite.startAdminServer(3000);
console.log('Admin interface running at: http://localhost:3000');Features
🔄 Comprehensive Retry System
- Exponential backoff with jitter support
- Configurable retry policies per operation
- State persistence across application restarts
- Retry scheduling for future execution
📊 Real-time Monitoring
- Live metrics dashboard with WebSocket updates
- Performance tracking and trend analysis
- Failure reason analysis and recommendations
- Configurable alerting with multiple severity levels
🌐 Admin Web Interface
- React-based dashboard for monitoring and management
- Real-time retry status with pause/resume/cancel controls
- Metrics visualization with historical data
- Configuration management with validation
🔗 Multi-Platform Integrations
- Slack notifications for alerts and critical failures
- Email alerts with HTML templates
- Webhook integrations for custom workflows
- PagerDuty integration for incident management
🛡️ Production Ready
- Authentication support for admin interface
- Comprehensive logging with structured output
- Error handling with graceful degradation
- Health checks and monitoring endpoints
Usage Examples
Basic Retry Operations
import { RetrySuite } from '@bernierllc/retry-suite';
const retrySuite = new RetrySuite(config);
// Simple retry with backoff
try {
const result = await retrySuite.retryWithBackoff(async () => {
// Your async operation here
return await processData();
}, {
maxRetries: 3,
initialDelayMs: 1000,
backoffFactor: 2
});
console.log('Operation succeeded:', result);
} catch (error) {
console.error('Operation failed after retries:', error);
}
// Scheduled retry for future execution
const scheduleResult = await retrySuite.scheduleRetry(
'batch-job',
async () => await runBatchProcess(),
new Date(Date.now() + 60000) // Execute in 1 minute
);
if (scheduleResult.success) {
console.log('Job scheduled with ID:', scheduleResult.scheduleId);
}Retry Management
// Get retry status
const status = await retrySuite.getRetryStatus('operation-id');
console.log('Retry status:', status);
// Cancel a running retry
const cancelResult = await retrySuite.cancelRetry('operation-id');
if (cancelResult.success) {
console.log('Retry cancelled successfully');
}
// Pause and resume retries
await retrySuite.pauseRetry('operation-id');
await retrySuite.resumeRetry('operation-id');Metrics and Reporting
// Get current metrics
const metrics = await retrySuite.getMetrics();
console.log('Success rate:', metrics.retrySuccessRate);
console.log('Average retry time:', metrics.averageRetryTime);
// Get performance report
const report = await retrySuite.getPerformanceReport({
startDate: new Date('2024-01-01'),
endDate: new Date('2024-01-31')
});
console.log('Monthly summary:', report.summary);
console.log('Recommendations:', report.recommendations);Integration Setup
const retrySuite = new RetrySuite({
// ... other config
integrations: [
// Slack integration
{
type: 'slack',
config: {
token: process.env.SLACK_BOT_TOKEN,
channel: '#alerts',
criticalChannel: '#critical-alerts'
}
},
// Email integration
{
type: 'email',
config: {
provider: 'sendgrid',
apiKey: process.env.SENDGRID_API_KEY,
from: '[email protected]',
to: ['[email protected]', '[email protected]']
}
},
// Webhook integration
{
type: 'webhook',
config: {
url: 'https://api.company.com/webhooks/retry-alerts',
headers: {
'Authorization': `Bearer ${process.env.WEBHOOK_TOKEN}`
},
timeout: 10000
}
}
]
});React Dashboard Integration
import React from 'react';
import { RetryDashboard } from '@bernierllc/retry-suite';
export const MonitoringPage: React.FC = () => {
return (
<div>
<h1>System Monitoring</h1>
<RetryDashboard apiBaseUrl="/api/retry-suite" />
</div>
);
};Custom Retry Policies
// High-frequency, low-latency operations
const quickRetryOptions = {
maxRetries: 3,
initialDelayMs: 100,
backoffFactor: 1.5,
maxDelayMs: 1000,
jitter: false
};
// Heavy operations with longer delays
const heavyRetryOptions = {
maxRetries: 5,
initialDelayMs: 5000,
backoffFactor: 2,
maxDelayMs: 60000,
jitter: true
};
await retrySuite.executeWithRetry('quick-api-call', quickOperation, quickRetryOptions);
await retrySuite.executeWithRetry('heavy-batch-job', heavyOperation, heavyRetryOptions);Configuration
Complete Configuration Example
import { RetrySuiteConfig } from '@bernierllc/retry-suite';
const config: RetrySuiteConfig = {
// Retry manager configuration
retryManager: {
defaultOptions: {
maxRetries: 5,
initialDelayMs: 1000,
backoffFactor: 2,
maxDelayMs: 30000,
jitter: true
}
},
// Storage configuration
retryStorage: {
type: 'redis', // 'memory', 'redis', or 'file'
options: {
url: process.env.REDIS_URL,
keyPrefix: 'retry-suite:',
ttl: 3600 // 1 hour TTL for retry state
}
},
// Monitoring configuration
retryMonitoring: {
enabled: true,
metricsInterval: 60000, // Collect metrics every minute
alertThresholds: {
failureRate: 0.1, // Alert if >10% failure rate
averageRetryTime: 5000 // Alert if average >5 seconds
}
},
// Admin interface configuration
admin: {
port: 3000,
host: '0.0.0.0',
auth: {
enabled: true,
username: 'admin',
password: process.env.ADMIN_PASSWORD
}
},
// Integration configurations
integrations: [
{
type: 'slack',
config: {
token: process.env.SLACK_BOT_TOKEN,
channel: '#retry-alerts',
criticalChannel: '#critical'
}
}
],
// Logging configuration
logging: {
level: 'info', // 'debug', 'info', 'warn', 'error'
format: 'json' // 'json' or 'text'
}
};Storage Options
Memory Storage (Development)
retryStorage: {
type: 'memory',
options: {}
}Redis Storage (Production)
retryStorage: {
type: 'redis',
options: {
url: 'redis://localhost:6379',
keyPrefix: 'retry-suite:',
ttl: 3600,
maxRetries: 3,
retryDelayOnFailover: 100
}
}File Storage (Lightweight Persistence)
retryStorage: {
type: 'file',
options: {
path: './retry-state.json',
syncInterval: 5000,
maxFileSize: 10485760 // 10MB
}
}API Reference
RetrySuite Class
Constructor
new RetrySuite(config: RetrySuiteConfig)Core Methods
executeWithRetry<T>(id: string, fn: () => Promise<T>, options?: RetryOptions): Promise<RetryResult<T>>
Execute a function with retry logic and comprehensive tracking.
retryWithBackoff<T>(fn: () => Promise<T>, options?: RetryOptions): Promise<T>
Execute a function with exponential backoff, throwing on final failure.
scheduleRetry(id: string, fn: () => Promise<any>, scheduleTime: Date): Promise<ScheduleResult>
Schedule a retry operation for future execution.
Management Methods
getRetryStatus(id: string): Promise<RetryStatus | null>
Get the current status of a retry operation.
cancelRetry(id: string): Promise<CancelResult>
Cancel a running retry operation.
pauseRetry(id: string): Promise<PauseResult>
Pause a running retry operation.
resumeRetry(id: string): Promise<ResumeResult>
Resume a paused retry operation.
Monitoring Methods
getMetrics(options?: MetricsOptions): Promise<RetryMetrics>
Get current system metrics and statistics.
getAlerts(options?: AlertOptions): Promise<Alert[]>
Get system alerts with optional filtering.
getPerformanceReport(options?: ReportOptions): Promise<PerformanceReport>
Generate comprehensive performance report with recommendations.
Configuration Methods
updateConfiguration(config: Partial<RetrySuiteConfig>): Promise<ConfigResult>
Update system configuration with validation.
getConfiguration(): RetrySuiteConfig
Get current system configuration.
validateConfiguration(config: RetrySuiteConfig): ValidationResult
Validate a configuration object.
Admin Server Methods
startAdminServer(port?: number): Promise<void>
Start the admin web interface server.
stopAdminServer(): Promise<void>
Stop the admin web interface server.
getAdminUrl(): string
Get the URL of the admin interface.
Integration Guides
Slack Integration
- Create a Slack Bot in your workspace
- Add the bot to desired channels
- Configure the integration:
{
type: 'slack',
config: {
token: 'xoxb-your-bot-token',
channel: '#alerts',
criticalChannel: '#critical-alerts'
}
}Email Integration
Supports multiple email providers:
// SendGrid
{
type: 'email',
config: {
provider: 'sendgrid',
apiKey: process.env.SENDGRID_API_KEY,
from: '[email protected]',
to: ['[email protected]']
}
}
// Amazon SES
{
type: 'email',
config: {
provider: 'ses',
apiKey: process.env.AWS_ACCESS_KEY_ID,
apiSecret: process.env.AWS_SECRET_ACCESS_KEY,
region: 'us-east-1',
from: '[email protected]',
to: ['[email protected]']
}
}Webhook Integration
{
type: 'webhook',
config: {
url: 'https://api.company.com/webhooks/retry-alerts',
headers: {
'Authorization': 'Bearer your-webhook-token',
'Content-Type': 'application/json'
},
timeout: 10000
}
}Monitoring & Alerting
Metrics Tracked
- Total retries: Number of retry operations
- Success rate: Percentage of successful operations
- Average retry time: Mean time to completion
- Active retries: Currently running operations
- Failure reasons: Analysis of common failure patterns
- Performance trends: Historical performance data
Alert Conditions
- High failure rate: Configurable threshold
- Slow retry times: Performance degradation detection
- Critical failures: System-level errors
- Resource exhaustion: Storage or memory limits
Performance Recommendations
The system provides automatic recommendations based on metrics:
- Retry policy adjustments
- Resource allocation suggestions
- Integration health checks
- Configuration optimizations
Production Deployment
Environment Variables
# Required
REDIS_URL=redis://localhost:6379
ADMIN_PASSWORD=your-secure-password
# Optional integrations
SLACK_BOT_TOKEN=xoxb-your-token
SENDGRID_API_KEY=your-api-key
WEBHOOK_TOKEN=your-webhook-token
# Optional settings
NODE_ENV=production
LOG_LEVEL=info
METRICS_INTERVAL=60000Docker Deployment
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]Health Checks
// Health check endpoint
GET /health
// Response
{
"status": "healthy",
"timestamp": "2024-01-01T10:00:00Z",
"version": "0.1.0",
"metrics": {
"activeRetries": 5,
"uptime": 3600
}
}Monitoring Integration
// Prometheus metrics endpoint
GET /metrics
// Custom health check
async function healthCheck() {
const metrics = await retrySuite.getMetrics();
return {
healthy: metrics.retrySuccessRate > 0.8,
metrics
};
}Troubleshooting
Common Issues
High Memory Usage
- Use Redis storage instead of memory storage
- Implement retry state cleanup
- Monitor storage metrics
Slow Performance
- Optimize retry policies
- Check network connectivity
- Review failure patterns
Missing Alerts
- Verify integration configurations
- Check alert thresholds
- Test integration connectivity
Debug Mode
const config = {
// ... other config
logging: {
level: 'debug',
format: 'text'
}
};Monitoring Commands
# Check retry suite status
curl http://localhost:3000/health
# View current metrics
curl http://localhost:3000/api/metrics
# List active retries
curl http://localhost:3000/api/retriesDependencies
Internal Dependencies
@bernierllc/retry-manager: Core retry orchestration@bernierllc/retry-storage: Retry state management (when available)@bernierllc/retry-monitoring: Metrics and alerting (when available)
External Dependencies
express: Web server for admin interfacereact: Frontend framework for dashboardcors: Cross-origin resource sharinghelmet: Security middlewarecompression: Response compressionws: WebSocket support
Peer Dependencies
react: ^18.0.0 (for React components)
License
Copyright (c) 2025 Bernier LLC. All rights reserved.
This package is licensed under a limited-use license. See the LICENSE file for details.
For more information and examples, visit the GitHub repository.
