kai-k8s

v1.3.1

Published

a month ago

Kubernetes AI - Your intelligent K8s assistant powered by GitHub Copilot SDK

Downloads

0High
0Medium
0Low

asaf5767

kubernetes k8s ai assistant kubectl helm copilot cli

Overview

kai is a command-line AI assistant that helps you manage, troubleshoot, and understand your Kubernetes clusters using natural language. Instead of memorizing kubectl commands, just ask questions like "why is my pod crashing?" and kai figures out what to do.

Powered by the GitHub Copilot SDK, kai combines the intelligence of large language models with direct access to your cluster through kubectl and helm.

Features

🗣️ Natural Language Interface — Ask questions in plain English, get answers instantly
🤖 Model Selection — Choose your AI model (GPT-5, Claude, etc.) with /model command
🔄 Context Switching — Switch K8s contexts and namespaces interactively
💾 Session Persistence — Save and resume conversations across sessions
📎 File Attachments — Attach YAML manifests directly in your prompts
🔑 BYOK Mode — Use your own OpenAI/Azure API key
⚡ Direct Command Execution — Runs kubectl and helm commands automatically
🔍 Smart Troubleshooting — Follows systematic debugging workflows
📡 Streaming Responses — See answers as they're generated in real-time
🎨 Beautiful CLI — Animated banners, colored output, and styled command boxes
⚡ Quick Actions — AI-suggested follow-up actions you can execute with a single keystroke
👁️ Live Watch — Monitor resources in real-time with /watch command
🎓 Learning Mode — Educational annotations that teach K8s concepts as you work
🧩 Skills System — Extend kai with custom domain expertise and commands

Prerequisites

Before installing kai, ensure you have the following:

| Requirement | Version | Notes | |-------------|---------|-------| | Node.js | 18.0+ | Download | | kubectl | Any | Must be configured and connected to your cluster | | GitHub CLI | 2.0+ | For authentication (gh auth login) | | GitHub Copilot | — | Active subscription (Individual, Business, or Enterprise) | | Copilot CLI | Latest | npm install -g @github/copilot (auto-installed with VS Code Copilot Chat) | | helm | 3.0+ | Optional, for Helm-related queries |

Authentication

kai uses GitHub Copilot for AI capabilities. Authenticate using one of these methods:

# Recommended: Use GitHub CLI
gh auth login

# Alternative: Set environment variable
export COPILOT_GITHUB_TOKEN=your_token_here

# Or use standard GitHub tokens
export GH_TOKEN=your_token_here

Installation

From npm (Recommended)

# Install globally
npm install -g kai-k8s

# Run kai
kai

From Source

# Clone the repository
git clone https://github.com/asaf5767/kai.git
cd kai

# Install dependencies
npm install

# Run directly
npm start

# Or build and install globally
npm run build
npm link
kai  # Now available globally

Quick Start

# Via npm (easiest)
npm install -g kai-k8s && kai

# Or from source
git clone https://github.com/asaf5767/kai.git && cd kai && npm install && npm start

Usage

Starting kai

# Default: Start with GPT-5
npm start

# Choose a specific model
kai --model claude-sonnet-4.5

# Resume a previous session
kai --session my-debug-session

# Use your own API key (BYOK)
export OPENAI_API_KEY=sk-...
kai --byok

CLI Options

| Option | Description | |--------|-------------| | -h, --help | Show help message | | -v, --version | Show version number | | -m, --model <name> | Use specific AI model (e.g., gpt-5, claude-sonnet-4.5) | | -s, --session <id> | Resume a specific session | | -l, --learn | Enable learning mode (educational annotations) | | --byok | Use your own API key (requires OPENAI_API_KEY env var) |

Commands

Interactive Commands

| Command | Description | |---------|-------------| | /model [name] | Switch AI model or list available models | | /context [name] | Switch Kubernetes context or list available | | /ns [name] | Switch namespace or list available | | /watch [resource] | Live resource monitoring (Ctrl+C to stop) | | /learn [on\|off] | Toggle learning mode (educational annotations) | | /skills | List installed skills and commands | | /skills reload | Reload skills from disk | | /sessions | List all saved sessions | | /resume <id> | Resume a previous session | | /forget <id> | Delete a saved session | | /new | Start a fresh session | | /history [n] | View command history, re-run by number, or search | | /help | Show all commands | | clear | Clear screen | | exit | Quit kai |

File Attachments

Include files directly in your prompts using bracket syntax:

> apply this [./deployment.yaml]
> compare [./staging.yaml] with [./prod.yaml]
> what's wrong with [./service.yaml]?

Supported formats: .yaml, .yml, .json, .txt, .log

Example Session

  ╭─────────────────────────────────────╮
  │  kai - Kubernetes AI Assistant      │
  │  Model: gpt-5                       │
  ╰─────────────────────────────────────╯

✓ Connected to GitHub Copilot
ℹ Ask me anything about your Kubernetes cluster!

> /context
ℹ Available Kubernetes Contexts:
  ● prod-cluster
    staging-cluster
    dev-cluster
  Use /context <name> to switch

> /context staging-cluster
✅ Switched to context: staging-cluster

> what pods are crashing in the api namespace?

┌─ Step 1 ──────────────────────────────┐
│ kubectl get pods -n api               │
└───────────────────────────────────────┘
┌─ Step 2 ──────────────────────────────┐
│ kubectl describe pod api-server-xyz   │
└───────────────────────────────────────┘

  │ Found 2 pods in CrashLoopBackOff state:
  │ 
  │ 1. **api-server-xyz** - OOMKilled
  │    The container is running out of memory.
  │    Current limit: 256Mi
  │    
  │    Recommendation: Increase memory limit to 512Mi
  │ 
  │ 2. **worker-abc** - ImagePullBackOff  
  │    Cannot pull image `myregistry/worker:v2.1`
  │    
  │    Recommendation: Check registry credentials

> /model claude-sonnet-4.5
✅ Switched to model: claude-sonnet-4.5

> apply this [./fix-memory.yaml]
ℹ Attached 1 file(s): fix-memory.yaml

  │ I'll apply the memory fix...

Quick Actions

After identifying a problem, kai suggests numbered quick actions that you can execute with a single keystroke:

> why is my pod crashing?

  │ Found pod nginx-pod in CrashLoopBackOff state.
  │ The container is running out of memory (OOMKilled).
  │ Current limit: 256Mi

╭─ ⚡ Quick Actions ───────────────────────────╮
│   1) View logs for nginx-pod                  │
│   2) View previous container logs             │
│   3) Describe pod                             │
│   4) Increase memory limit                    │
╰───────────────────────────────────────────────╯

> 1

  │ Executing: View logs for nginx-pod
  │
  │ [pod logs displayed...]

Simply type a number (1-9) to execute the corresponding action. The quick actions are context-aware and suggest relevant next steps based on the problem kai identified.

Live Watch

Monitor Kubernetes resources in real-time with the /watch command:

> /watch pods -n default

🔄 Watching pods in default
   Press Ctrl+C to stop watching

NAME             READY   STATUS    RESTARTS   AGE
nginx-abc        1/1     Running   0          2d
api-xyz          1/1     Running   0          1d
worker-123       0/1     Pending   0          5m

Watch stopped

Watch Options

| Command | Description | |---------|-------------| | /watch pods | Watch pods in current namespace | | /watch pods -n kube-system | Watch pods in specific namespace | | /watch deploy | Watch deployments | | /watch svc | Watch services | | /watch nodes | Watch cluster nodes | | /watch pods -A | Watch across all namespaces | | /watch pods -l app=nginx | Watch pods matching label selector |

Press Ctrl+C to stop watching and return to the normal REPL.

🎓 Learning Mode

Learning Mode transforms kai from a "do it for me" assistant into a "teach me while helping" mentor. When enabled, kai adds educational annotations after every command, explaining what it did and why.

Why Learning Mode?

A common criticism of AI assistants is that they become "crutches" — users get things done but never learn the underlying skills. Learning mode counters this by teaching K8s concepts as you work.

Enabling Learning Mode

# Start kai with learning mode
kai --learn

# Or toggle during a session
/learn on
/learn off
/learn      # Toggle

Note: Changes take effect on the next session (restart kai or use /new)

What You'll Learn

In learning mode, after every kubectl or helm command, kai adds:

📚 Learn: What each part of the command does
🔍 Notice: What to look for in the output
💡 Why: Reasoning behind the approach (when relevant)

Example: Normal vs Learning Mode

Normal Mode:

> why is my pod crashing?

┌─ Step 1 ──────────────────────────────────┐
│ kubectl logs nginx-abc --previous          │
└────────────────────────────────────────────┘

Error: Config file not found at /app/config.yaml

Your pod is crashing because it can't find the config file.

Learning Mode (kai --learn):

> why is my pod crashing?

┌─ Step 1 ──────────────────────────────────┐
│ kubectl logs nginx-abc --previous          │
└────────────────────────────────────────────┘

Error: Config file not found at /app/config.yaml

📚 Learn:
   • logs → View container stdout/stderr
   • --previous → CRITICAL for CrashLoopBackOff! Gets logs from crashed container
     (without it, you'd see empty logs from the fresh restart)

🔍 Notice: The error message shows the exact file path that's missing

💡 Next time you see CrashLoopBackOff, always use --previous to see why it crashed.

Your pod is crashing because it can't find the config file.

Key Concepts You'll Learn

Flag explanations: --previous, -o wide, -l selectors, --dry-run=client
When to use which command: get vs describe vs logs
Patterns to recognize: Error messages, status codes, event types
Pro tips: Advanced techniques and shortcuts

Best Practices

Start with learning mode ON when you're new to K8s
Turn it OFF when you're in a hurry or already know the commands
Toggle as needed — it's designed to be low-friction

🧩 Skills System

Skills are the most powerful feature of kai — they transform kai from a generic Kubernetes assistant into a domain expert for YOUR specific infrastructure.

A skill can provide:

Custom commands — Shortcuts like /myteam:deploy prod
Domain knowledge — AI expertise specific to your architecture
Quick action templates — Context-aware suggestions
Environment variables — Team-specific configurations

Why Skills?

Without skills, kai is a general K8s expert. With skills, kai becomes YOUR team's expert:

| Without Skills | With Skills | |----------------|-------------| | Generic K8s knowledge | Deep understanding of YOUR architecture | | Manual kubectl commands | /myteam:status shortcuts | | Generic troubleshooting | "I know your pod naming convention..." | | Generic suggestions | Team-specific quick actions |

Installing Skills

Skills live in two locations (higher priority first):

1. Project-local:  .kai/skills/<skill-name>/
2. User global:    ~/.config/kai/skills/<skill-name>/

Install from GitHub:

# Clone to user skills directory
git clone https://github.com/myteam/kai-skills ~/.config/kai/skills/myteam

# Or as project submodule
git submodule add https://github.com/myteam/kai-skills .kai/skills/myteam

# Reload skills in kai
/skills reload

Managing Skills

> /skills
  🧩 kai Skills
  ──────────────────────────────────────

  📁 Project Skills
    ● myteam v1.0.0
      My Team Kubernetes Expert · 8 commands

  👤 User Skills
    ● helm-ops v1.0.0
      Helm Operations Expert · 8 commands

  ⚡ Available Skill Commands:
    myteam:
      /myteam:status - Show cluster status
      /myteam:deploy - Deploy to environment
      ...

> /skills info myteam
  [Detailed skill information]

> /skills reload
  ✅ Reloaded 2 skill(s)

Using Skill Commands

Skill commands use the format /<skill>:<command>:

> /helm-ops:releases
NAME            NAMESPACE       STATUS
nginx-ingress   ingress         deployed
redis           default         deployed

> /myteam:deploy prod --version=2.0
⚠️  This command requires confirmation:
    helm upgrade myapp ./charts/myapp -n production --set version=2.0

    Proceed? [y/N] y
    
✅ myteam:deploy completed

Creating Your Own Skill

Create a skill.yaml file:

apiVersion: kai.ms/v1
kind: Skill

metadata:
  name: my-skill
  displayName: My Custom Skill
  description: Custom K8s operations for my team
  version: 1.0.0

spec:
  commands:
    - name: status
      description: Show cluster health
      script: kubectl get pods -A | grep -v Running
      
    - name: restart
      description: Restart a deployment
      args:
        - name: deployment
          description: Deployment name
          required: true
      script: kubectl rollout restart deployment/${deployment}
      confirm: true  # Requires Y/N confirmation

  prompts:
    inline: |
      ## My Team Expertise
      When troubleshooting this cluster:
      - Our apps run in the 'production' namespace
      - Deployments use prefix 'myapp-'
      - Check ConfigMaps first for config issues

See examples/skills/helm-ops/ for a complete example.

Skill Capabilities

| Feature | Description | |---------|-------------| | commands | Custom /skill:cmd slash commands | | prompts | Domain knowledge injected into AI context | | quickActions | Pre-defined action templates | | env | Environment variables for scripts | | contextPatterns | Auto-activate for matching K8s contexts | | confirm | Require Y/N for destructive commands |

Security

Skills are secure by design:

✅ YAML-only — No executable code, just configuration
✅ Visible execution — All commands shown before running
✅ Confirmation gates — Destructive commands require explicit approval
✅ User-controlled — You install and enable skills

How It Works

kai leverages the GitHub Copilot SDK's built-in tools for shell command execution:

┌─────────────────────────────────────────────────────────┐
│                         kai                             │
├─────────────────────────────────────────────────────────┤
│  User Question                                          │
│       │                                                 │
│       ▼                                                 │
│  ┌─────────────────────────────────────────────────┐   │
│  │           GitHub Copilot SDK                     │   │
│  │  ┌─────────────┐    ┌──────────────────────┐   │   │
│  │  │ AI Model    │───▶│ Built-in Shell Tool  │   │   │
│  │  └─────────────┘    └──────────────────────┘   │   │
│  └─────────────────────────────────────────────────┘   │
│       │                        │                        │
│       │                        ▼                        │
│       │              kubectl / helm                     │
│       │                        │                        │
│       │                        ▼                        │
│       │              Kubernetes Cluster                 │
│       │                        │                        │
│       ▼                        ▼                        │
│  Intelligent Response + Command Output                  │
└─────────────────────────────────────────────────────────┘

Project Structure

kai/
├── src/
│   ├── index.ts              # Main entry: CLI args, REPL loop
│   ├── cli/
│   │   ├── commands.ts       # Slash command handlers (incl. skill commands)
│   │   ├── ui.ts             # Colors, spinners, status bar
│   │   ├── frames.ts         # Boxed UI components
│   │   ├── banner.ts         # Animated ASCII banner
│   │   ├── quick-actions.ts  # Quick action parsing and selection
│   │   ├── watch.ts          # Live resource monitoring
│   │   └── autocomplete.ts   # Tab completion
│   ├── config/
│   │   ├── system-prompt.ts  # K8s expert system prompt + skill injection
│   │   └── preferences.ts    # User preferences and history
│   └── skills/               # 🧩 Skills System
│       ├── index.ts          # Skills module exports
│       ├── types.ts          # TypeScript interfaces
│       ├── loader.ts         # Skill discovery and loading
│       └── registry.ts       # Command registration, prompt composition
├── examples/
│   └── skills/               # Example skills
│       └── helm-ops/         # Helm operations skill
├── plans/
│   └── kai-skills-architecture.md  # Skills system design doc
├── package.json
├── tsconfig.json
└── README.md

Development

# Run with hot reload
npm run dev

# Build for production
npm run build

# Type checking
npx tsc --noEmit

Troubleshooting

Protocol Version Mismatch

If you see "SDK protocol version mismatch" error:

# Update the Copilot CLI to the latest version
npm update -g @github/copilot

# Verify the update
copilot --version

This happens when the Copilot CLI is outdated and doesn't support the SDK's protocol version.

403 Error on Startup

You're using a GitHub account without Copilot access.

# Switch to an account with Copilot subscription
gh auth login

kubectl Not Found

Ensure kubectl is installed and in your PATH:

kubectl version --client

BYOK (Bring Your Own Key)

Use --byok to use your own API key instead of GitHub Copilot. The SDK supports:

OpenAI:

export OPENAI_API_KEY=sk-your-key-here
# Optional: custom endpoint (for proxies, local models)
export OPENAI_API_BASE=https://api.openai.com/v1
kai --byok

Azure OpenAI:

export AZURE_OPENAI_API_KEY=your-azure-key
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
# Optional: API version (default: 2024-10-21)
export AZURE_OPENAI_API_VERSION=2024-10-21
kai --byok

Anthropic:

export ANTHROPIC_API_KEY=sk-ant-your-key
# Optional: custom endpoint
export ANTHROPIC_API_BASE=https://api.anthropic.com
kai --byok

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.