@thebadlab/opspilot
v1.0.0
Published
AI-powered operations monitoring SDK with log analysis, infrastructure insights, and deployment diagnostics
Downloads
12
Maintainers
Readme
OpsPilot SDK
AI-powered operations monitoring SDK with intelligent log analysis, infrastructure insights, deployment diagnostics, and natural language querying capabilities.
Features
AI-Powered Log Analysis - Automatically analyze logs, detect patterns, and explain incidents
Infrastructure Health Insights - Monitor system metrics and detect anomalies in real-time
Deployment Analysis - Diagnose deployment failures and compare deployment performance
Natural Language Queries - Ask questions about your systems in plain English
Multi-AI Provider Support - Works with OpenAI, Google Gemini, and xAI (Grok)
TypeScript Support - Full type definitions for excellent developer experience
Installation
npm install @thebadlab/opspilot
AI Provider Dependencies
OpsPilot supports multiple AI providers. Install the one you plan to use:
For OpenAI:
npm install openai
For Gemini and xAI:
No additional packages required (uses native fetch API)
Quick Start
import { OpsPilot } from '@thebadlab/opspilot';
// Initialize with your preferred AI provider
const pilot = new OpsPilot({
provider: 'openai',
apiKey: 'your-api-key',
model: 'gpt-4-turbo-preview' // optional
});
// Analyze logs
const logs = [
{
timestamp: new Date(),
level: 'error',
message: 'Database connection timeout',
service: 'api-server'
},
{
timestamp: new Date(),
level: 'error',
message: 'Failed to process request',
service: 'api-server'
}
];
const analysis = await pilot.logs.analyzeLogs(logs);
console.log(analysis.summary);
console.log(analysis.recommendations);
Configuration
OpsPilot Configuration Options
interface OpsPilotConfig {
provider: 'openai' | 'gemini' | 'xai';
apiKey: string;
model?: string; // Optional: AI model to use
maxTokens?: number; // Optional: Max tokens per request (default: 2000)
temperature?: number; // Optional: AI temperature (default: 0.7)
}
Provider-Specific Defaults
OpenAI:
Default model:
gpt-4-turbo-previewRequires:
openaipackage
Gemini:
Default model:
gemini-proUses Google's Generative AI API
xAI (Grok):
Default model:
grok-betaUses xAI's API
Usage Examples
Log Analysis
Analyze Logs
const analysis = await pilot.logs.analyzeLogs(logs);
console.log(analysis.summary); // Summary of what's happening
console.log(analysis.severity); // 'low' | 'medium' | 'high' | 'critical'
console.log(analysis.patterns); // Array of detected patterns
console.log(analysis.recommendations); // Array of actionable recommendations
console.log(analysis.rootCause); // Potential root cause
console.log(analysis.affectedServices); // Services affected
Explain Incidents
const explanation = await pilot.logs.explainIncident(
logs,
'Users reporting slow response times'
);
console.log(explanation); // Clear explanation of what caused the incident
Find Anomalies
const anomalies = await pilot.logs.findAnomalies(logs);
anomalies.forEach(anomaly => {
console.log(anomaly); // Description of each anomaly found
});
Infrastructure Monitoring
Analyze Infrastructure Health
const metrics = [
{
timestamp: new Date(),
cpu: 85.5,
memory: 72.3,
disk: 45.0,
network: { incoming: 1024, outgoing: 2048 },
host: 'web-server-1'
}
];
const health = await pilot.infrastructure.analyzeHealth(metrics);
console.log(health.status); // 'healthy' | 'degraded' | 'unhealthy' | 'critical'
console.log(health.message); // Health summary
console.log(health.anomalies); // Detected anomalies
console.log(health.predictions); // Predicted issues
console.log(health.recommendations); // Optimization recommendations
Detect Anomalies
const anomalies = await pilot.infrastructure.detectAnomalies(metrics);
anomalies.forEach(anomaly => {
console.log(anomaly.type); // 'spike' | 'drop' | 'pattern' | 'threshold'
console.log(anomaly.metric); // Which metric is anomalous
console.log(anomaly.severity); // Severity level
console.log(anomaly.description); // What was detected
});
Predict Issues
const predictions = await pilot.infrastructure.predictIssues(
currentMetrics,
historicalMetrics
);
predictions.forEach(prediction => {
console.log(prediction); // Predicted potential issue
});
Deployment Analysis
Analyze Deployment
const deployment = {
id: 'deploy-123',
environment: 'production',
version: 'v2.1.0',
timestamp: new Date(),
status: 'failed',
changes: ['Updated API endpoints', 'Database migration']
};
const analysis = await pilot.deployments.analyzeDeployment(
deployment,
logs, // optional
metrics // optional
);
console.log(analysis.status); // 'success' | 'failed' | 'warning'
console.log(analysis.summary); // Deployment summary
console.log(analysis.issues); // Array of issues found
console.log(analysis.recommendations); // Recommendations
console.log(analysis.rollbackSuggested); // Boolean
console.log(analysis.impactAssessment); // Impact description
Diagnose Deployment Failure
const diagnosis = await pilot.deployments.diagnoseFailure(
failedDeployment,
logs,
previousSuccessfulDeployment // optional
);
console.log(diagnosis); // Detailed root cause analysis and fix instructions
Compare Deployments
const comparison = await pilot.deployments.compareDeployments(
deployment1,
deployment2,
metrics1, // optional
metrics2 // optional
);
console.log(comparison); // Performance comparison and insights
Natural Language Queries
Query Your Data
const result = await pilot.query.query(
'What caused the spike in error rates?',
{
logs: logs,
metrics: metrics,
customData: { /* any additional context */ }
}
);
console.log(result.answer); // AI-generated answer
console.log(result.data); // Relevant data extracts
console.log(result.relatedInsights); // Related insights
Query Logs Specifically
const answer = await pilot.query.queryLogs(
'How many errors occurred in the last hour?',
logs
);
console.log(answer); // Answer based on log data
Query Metrics Specifically
const answer = await pilot.query.queryMetrics(
'Is CPU usage abnormal?',
metrics
);
console.log(answer); // Answer based on metrics data
Aggregate Query
const result = await pilot.query.aggregateQuery(
'What is the correlation between high CPU and error rates?',
logs,
metrics
);
console.log(result.answer); // Comprehensive answer using all data
Type Definitions
LogEntry
interface LogEntry {
timestamp: Date | string;
level: 'debug' | 'info' | 'warn' | 'error' | 'fatal';
message: string;
service?: string;
metadata?: Record<string, any>;
trace_id?: string;
span_id?: string;
}
InfrastructureMetrics
interface InfrastructureMetrics {
cpu?: number;
memory?: number;
disk?: number;
network?: {
incoming: number;
outgoing: number;
};
timestamp: Date | string;
host?: string;
service?: string;
}
DeploymentInfo
interface DeploymentInfo {
id: string;
environment: string;
version: string;
timestamp: Date | string;
status: 'success' | 'failed' | 'in_progress' | 'rolled_back';
changes?: string[];
metrics?: Record<string, any>;
}
Advanced Usage
Custom AI Provider
You can create custom AI providers by extending the BaseAIProvider class:
import { BaseAIProvider, AIResponse } from '@thebadlab/opspilot';
class CustomProvider extends BaseAIProvider {
async analyze(prompt: string, context?: string): Promise<AIResponse> {
// Your implementation
}
async queryNaturalLanguage(query: string, data: any): Promise<AIResponse> {
// Your implementation
}
}
Direct Provider Access
Access the underlying AI provider for custom queries:
const provider = pilot.getProvider();
const response = await provider.analyze('Custom prompt', 'context');
Using Individual Features
You can use features independently:
import { LogAnalyzer, OpenAIProvider } from '@thebadlab/opspilot';
const provider = new OpenAIProvider('your-api-key');
const logAnalyzer = new LogAnalyzer(provider);
const analysis = await logAnalyzer.analyzeLogs(logs);
API Reference
OpsPilot
Main SDK class that provides access to all features.
Constructor
new OpsPilot(config: OpsPilotConfig)
Properties
logs: LogAnalyzer- Log analysis featuresinfrastructure: InfrastructureMonitor- Infrastructure monitoring featuresdeployments: DeploymentAnalyzer- Deployment analysis featuresquery: NaturalQueryProcessor- Natural language query features
Methods
getProvider(): AIProvider- Get the underlying AI provideranalyze(prompt: string, context?: string): Promise<AIResponse>- Custom analysis
LogAnalyzer
Methods:
analyzeLogs(logs: LogEntry[]): Promise<LogAnalysisResult>explainIncident(logs: LogEntry[], incidentDescription?: string): Promise<string>findAnomalies(logs: LogEntry[]): Promise<string[]>
InfrastructureMonitor
Methods:
analyzeHealth(metrics: InfrastructureMetrics[]): Promise<HealthInsight>detectAnomalies(metrics: InfrastructureMetrics[]): Promise<Anomaly[]>predictIssues(currentMetrics: InfrastructureMetrics[], historicalMetrics: InfrastructureMetrics[]): Promise<string[]>
DeploymentAnalyzer
Methods:
analyzeDeployment(deployment: DeploymentInfo, logs?: LogEntry[], metrics?: InfrastructureMetrics[]): Promise<DeploymentAnalysis>diagnoseFailure(deployment: DeploymentInfo, logs: LogEntry[], previousDeployment?: DeploymentInfo): Promise<string>compareDeployments(deployment1: DeploymentInfo, deployment2: DeploymentInfo, metrics1?: InfrastructureMetrics[], metrics2?: InfrastructureMetrics[]): Promise<string>
NaturalQueryProcessor
Methods:
query(question: string, data: { logs?: LogEntry[], metrics?: InfrastructureMetrics[], customData?: any }): Promise<QueryResult>queryLogs(question: string, logs: LogEntry[]): Promise<string>queryMetrics(question: string, metrics: InfrastructureMetrics[]): Promise<string>aggregateQuery(question: string, logs: LogEntry[], metrics: InfrastructureMetrics[]): Promise<QueryResult>
Best Practices
Batch Your Data: For better analysis, provide larger batches of logs or metrics rather than single entries
Include Context: Add service names, trace IDs, and metadata to logs for richer analysis
Historical Data: When predicting issues, include sufficient historical data (at least 24 hours)
Temperature Settings: Use lower temperature (0.3-0.5) for more consistent analysis, higher (0.7-0.9) for creative insights
Error Handling: Always wrap SDK calls in try-catch blocks to handle API errors gracefully
Examples
Complete Monitoring Pipeline
import { OpsPilot } from '@thebadlab/opspilot';
const pilot = new OpsPilot({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY
});
async function monitorSystem(logs, metrics, deployment) {
// Analyze logs
const logAnalysis = await pilot.logs.analyzeLogs(logs);
if (logAnalysis.severity === 'critical') {
const incident = await pilot.logs.explainIncident(logs);
console.log('CRITICAL INCIDENT:', incident);
}
// Check infrastructure health
const health = await pilot.infrastructure.analyzeHealth(metrics);
if (health.status !== 'healthy') {
console.log('HEALTH ISSUE:', health.message);
console.log('RECOMMENDATIONS:', health.recommendations);
}
// Analyze deployment if provided
if (deployment) {
const deployAnalysis = await pilot.deployments.analyzeDeployment(
deployment,
logs,
metrics
);
if (deployAnalysis.rollbackSuggested) {
console.log('ROLLBACK SUGGESTED:', deployAnalysis.summary);
}
}
// Answer custom queries
const answer = await pilot.query.query(
'What is the overall system status?',
{ logs, metrics }
);
console.log('SYSTEM STATUS:', answer.answer);
}
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details
Support
For issues, questions, or feature requests, please open an issue on GitHub.
Changelog
v1.0.0
Initial release
Multi-AI provider support (OpenAI, Gemini, xAI)
Log analysis and incident explanation
Infrastructure monitoring and anomaly detection
Deployment analysis and failure diagnosis
Natural language query processing
