@thebadlab/opspilot

v1.0.0

Published

4 months ago

AI-powered operations monitoring SDK with log analysis, infrastructure insights, and deployment diagnostics

0High
0Medium
0Low

thebadlab

monitoring logs AI operations infrastructure observability deployment analytics

OpsPilot SDK

enter image description here

AI-powered operations monitoring SDK with intelligent log analysis, infrastructure insights, deployment diagnostics, and natural language querying capabilities.

Features

AI-Powered Log Analysis - Automatically analyze logs, detect patterns, and explain incidents
Infrastructure Health Insights - Monitor system metrics and detect anomalies in real-time
Deployment Analysis - Diagnose deployment failures and compare deployment performance
Natural Language Queries - Ask questions about your systems in plain English
Multi-AI Provider Support - Works with OpenAI, Google Gemini, and xAI (Grok)
TypeScript Support - Full type definitions for excellent developer experience

Installation


npm  install  @thebadlab/opspilot

AI Provider Dependencies

OpsPilot supports multiple AI providers. Install the one you plan to use:

For OpenAI:


npm  install  openai

For Gemini and xAI:

No additional packages required (uses native fetch API)

Quick Start


import { OpsPilot } from  '@thebadlab/opspilot';

  

// Initialize with your preferred AI provider

const  pilot = new  OpsPilot({

provider:  'openai',

apiKey:  'your-api-key',

model:  'gpt-4-turbo-preview'  // optional

});

  

// Analyze logs

const  logs = [

{

timestamp:  new  Date(),

level:  'error',

message:  'Database connection timeout',

service:  'api-server'

},

{

timestamp:  new  Date(),

level:  'error',

message:  'Failed to process request',

service:  'api-server'

}

];

  

const  analysis = await  pilot.logs.analyzeLogs(logs);

console.log(analysis.summary);

console.log(analysis.recommendations);

Configuration

OpsPilot Configuration Options


interface  OpsPilotConfig {

provider: 'openai' | 'gemini' | 'xai';

apiKey: string;

model?: string; // Optional: AI model to use

maxTokens?: number; // Optional: Max tokens per request (default: 2000)

temperature?: number; // Optional: AI temperature (default: 0.7)

}

Provider-Specific Defaults

OpenAI:

Default model: gpt-4-turbo-preview
Requires: openai package

Gemini:

Default model: gemini-pro
Uses Google's Generative AI API

xAI (Grok):

Default model: grok-beta
Uses xAI's API

Usage Examples

Log Analysis

Analyze Logs


const  analysis = await  pilot.logs.analyzeLogs(logs);

  

console.log(analysis.summary); // Summary of what's happening

console.log(analysis.severity); // 'low' | 'medium' | 'high' | 'critical'

console.log(analysis.patterns); // Array of detected patterns

console.log(analysis.recommendations); // Array of actionable recommendations

console.log(analysis.rootCause); // Potential root cause

console.log(analysis.affectedServices); // Services affected

Explain Incidents


const  explanation = await  pilot.logs.explainIncident(

logs,

'Users reporting slow response times'

);

  

console.log(explanation); // Clear explanation of what caused the incident

Find Anomalies


const  anomalies = await  pilot.logs.findAnomalies(logs);

  

anomalies.forEach(anomaly  => {

console.log(anomaly); // Description of each anomaly found

});

Infrastructure Monitoring

Analyze Infrastructure Health


const  metrics = [

{

timestamp:  new  Date(),

cpu:  85.5,

memory:  72.3,

disk:  45.0,

network: { incoming:  1024, outgoing:  2048 },

host:  'web-server-1'

}

];

  

const  health = await  pilot.infrastructure.analyzeHealth(metrics);

  

console.log(health.status); // 'healthy' | 'degraded' | 'unhealthy' | 'critical'

console.log(health.message); // Health summary

console.log(health.anomalies); // Detected anomalies

console.log(health.predictions); // Predicted issues

console.log(health.recommendations); // Optimization recommendations

Detect Anomalies


const  anomalies = await  pilot.infrastructure.detectAnomalies(metrics);

  

anomalies.forEach(anomaly  => {

console.log(anomaly.type); // 'spike' | 'drop' | 'pattern' | 'threshold'

console.log(anomaly.metric); // Which metric is anomalous

console.log(anomaly.severity); // Severity level

console.log(anomaly.description); // What was detected

});

Predict Issues


const  predictions = await  pilot.infrastructure.predictIssues(

currentMetrics,

historicalMetrics

);

  

predictions.forEach(prediction  => {

console.log(prediction); // Predicted potential issue

});

Deployment Analysis

Analyze Deployment


const  deployment = {

id:  'deploy-123',

environment:  'production',

version:  'v2.1.0',

timestamp:  new  Date(),

status:  'failed',

changes: ['Updated API endpoints', 'Database migration']

};

  

const  analysis = await  pilot.deployments.analyzeDeployment(

deployment,

logs, // optional

metrics  // optional

);

  

console.log(analysis.status); // 'success' | 'failed' | 'warning'

console.log(analysis.summary); // Deployment summary

console.log(analysis.issues); // Array of issues found

console.log(analysis.recommendations); // Recommendations

console.log(analysis.rollbackSuggested); // Boolean

console.log(analysis.impactAssessment); // Impact description

Diagnose Deployment Failure


const  diagnosis = await  pilot.deployments.diagnoseFailure(

failedDeployment,

logs,

previousSuccessfulDeployment  // optional

);

  

console.log(diagnosis); // Detailed root cause analysis and fix instructions

Compare Deployments


const  comparison = await  pilot.deployments.compareDeployments(

deployment1,

deployment2,

metrics1, // optional

metrics2  // optional

);

  

console.log(comparison); // Performance comparison and insights

Natural Language Queries

Query Your Data


const  result = await  pilot.query.query(

'What caused the spike in error rates?',

{

logs:  logs,

metrics:  metrics,

customData: { /* any additional context */ }

}

);

  

console.log(result.answer); // AI-generated answer

console.log(result.data); // Relevant data extracts

console.log(result.relatedInsights); // Related insights

Query Logs Specifically


const  answer = await  pilot.query.queryLogs(

'How many errors occurred in the last hour?',

logs

);

  

console.log(answer); // Answer based on log data

Query Metrics Specifically


const  answer = await  pilot.query.queryMetrics(

'Is CPU usage abnormal?',

metrics

);

  

console.log(answer); // Answer based on metrics data

Aggregate Query


const  result = await  pilot.query.aggregateQuery(

'What is the correlation between high CPU and error rates?',

logs,

metrics

);

  

console.log(result.answer); // Comprehensive answer using all data

Type Definitions

LogEntry


interface  LogEntry {

timestamp: Date | string;

level: 'debug' | 'info' | 'warn' | 'error' | 'fatal';

message: string;

service?: string;

metadata?: Record<string, any>;

trace_id?: string;

span_id?: string;

}

InfrastructureMetrics


interface  InfrastructureMetrics {

cpu?: number;

memory?: number;

disk?: number;

network?: {

incoming: number;

outgoing: number;

};

timestamp: Date | string;

host?: string;

service?: string;

}

DeploymentInfo


interface  DeploymentInfo {

id: string;

environment: string;

version: string;

timestamp: Date | string;

status: 'success' | 'failed' | 'in_progress' | 'rolled_back';

changes?: string[];

metrics?: Record<string, any>;

}

Advanced Usage

Custom AI Provider

You can create custom AI providers by extending the BaseAIProvider class:


import { BaseAIProvider, AIResponse } from  '@thebadlab/opspilot';

  

class  CustomProvider  extends  BaseAIProvider {

async  analyze(prompt: string, context?: string): Promise<AIResponse> {

// Your implementation

}

  

async  queryNaturalLanguage(query: string, data: any): Promise<AIResponse> {

// Your implementation

}

}

Direct Provider Access

Access the underlying AI provider for custom queries:


const  provider = pilot.getProvider();

const  response = await  provider.analyze('Custom prompt', 'context');

Using Individual Features

You can use features independently:


import { LogAnalyzer, OpenAIProvider } from  '@thebadlab/opspilot';

  

const  provider = new  OpenAIProvider('your-api-key');

const  logAnalyzer = new  LogAnalyzer(provider);

  

const  analysis = await  logAnalyzer.analyzeLogs(logs);

API Reference

OpsPilot

Main SDK class that provides access to all features.

Constructor


new  OpsPilot(config: OpsPilotConfig)

Properties

logs: LogAnalyzer - Log analysis features
infrastructure: InfrastructureMonitor - Infrastructure monitoring features
deployments: DeploymentAnalyzer - Deployment analysis features
query: NaturalQueryProcessor - Natural language query features

Methods

getProvider(): AIProvider - Get the underlying AI provider
analyze(prompt: string, context?: string): Promise<AIResponse> - Custom analysis

LogAnalyzer

Methods:

analyzeLogs(logs: LogEntry[]): Promise<LogAnalysisResult>
explainIncident(logs: LogEntry[], incidentDescription?: string): Promise<string>
findAnomalies(logs: LogEntry[]): Promise<string[]>

InfrastructureMonitor

Methods:

analyzeHealth(metrics: InfrastructureMetrics[]): Promise<HealthInsight>
detectAnomalies(metrics: InfrastructureMetrics[]): Promise<Anomaly[]>
predictIssues(currentMetrics: InfrastructureMetrics[], historicalMetrics: InfrastructureMetrics[]): Promise<string[]>

DeploymentAnalyzer

Methods:

analyzeDeployment(deployment: DeploymentInfo, logs?: LogEntry[], metrics?: InfrastructureMetrics[]): Promise<DeploymentAnalysis>
diagnoseFailure(deployment: DeploymentInfo, logs: LogEntry[], previousDeployment?: DeploymentInfo): Promise<string>
compareDeployments(deployment1: DeploymentInfo, deployment2: DeploymentInfo, metrics1?: InfrastructureMetrics[], metrics2?: InfrastructureMetrics[]): Promise<string>

NaturalQueryProcessor

Methods:

query(question: string, data: { logs?: LogEntry[], metrics?: InfrastructureMetrics[], customData?: any }): Promise<QueryResult>
queryLogs(question: string, logs: LogEntry[]): Promise<string>
queryMetrics(question: string, metrics: InfrastructureMetrics[]): Promise<string>
aggregateQuery(question: string, logs: LogEntry[], metrics: InfrastructureMetrics[]): Promise<QueryResult>

Best Practices

Batch Your Data: For better analysis, provide larger batches of logs or metrics rather than single entries
Include Context: Add service names, trace IDs, and metadata to logs for richer analysis
Historical Data: When predicting issues, include sufficient historical data (at least 24 hours)
Temperature Settings: Use lower temperature (0.3-0.5) for more consistent analysis, higher (0.7-0.9) for creative insights
Error Handling: Always wrap SDK calls in try-catch blocks to handle API errors gracefully

Examples

Complete Monitoring Pipeline


import { OpsPilot } from  '@thebadlab/opspilot';

  

const  pilot = new  OpsPilot({

provider:  'openai',

apiKey:  process.env.OPENAI_API_KEY

});

  

async  function  monitorSystem(logs, metrics, deployment) {

// Analyze logs

const  logAnalysis = await  pilot.logs.analyzeLogs(logs);

  

if (logAnalysis.severity === 'critical') {

const  incident = await  pilot.logs.explainIncident(logs);

console.log('CRITICAL INCIDENT:', incident);

}

  

// Check infrastructure health

const  health = await  pilot.infrastructure.analyzeHealth(metrics);

  

if (health.status !== 'healthy') {

console.log('HEALTH ISSUE:', health.message);

console.log('RECOMMENDATIONS:', health.recommendations);

}

  

// Analyze deployment if provided

if (deployment) {

const  deployAnalysis = await  pilot.deployments.analyzeDeployment(

deployment,

logs,

metrics

);

  

if (deployAnalysis.rollbackSuggested) {

console.log('ROLLBACK SUGGESTED:', deployAnalysis.summary);

}

}

  

// Answer custom queries

const  answer = await  pilot.query.query(

'What is the overall system status?',

{ logs, metrics }

);

  

console.log('SYSTEM STATUS:', answer.answer);

}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Support

For issues, questions, or feature requests, please open an issue on GitHub.

Changelog

v1.0.0

Initial release
Multi-AI provider support (OpenAI, Gemini, xAI)
Log analysis and incident explanation
Infrastructure monitoring and anomaly detection
Deployment analysis and failure diagnosis
Natural language query processing

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

OpsPilot SDK

Features

Installation

AI Provider Dependencies

Quick Start

Configuration

OpsPilot Configuration Options

Provider-Specific Defaults

Usage Examples

Log Analysis

Analyze Logs

Explain Incidents

Find Anomalies

Infrastructure Monitoring

Analyze Infrastructure Health

Detect Anomalies

Predict Issues

Deployment Analysis

Analyze Deployment

Diagnose Deployment Failure

Compare Deployments

Natural Language Queries

Query Your Data

Query Logs Specifically

Query Metrics Specifically

Aggregate Query

Type Definitions

LogEntry

InfrastructureMetrics

DeploymentInfo

Advanced Usage

Custom AI Provider

Direct Provider Access

Using Individual Features

API Reference

OpsPilot

Constructor

Properties

Methods

LogAnalyzer

InfrastructureMonitor

DeploymentAnalyzer

NaturalQueryProcessor

Best Practices

Examples

Complete Monitoring Pipeline

Contributing

License

Support

Changelog

v1.0.0