kafuops

v0.3.3

Published

20 days ago

Open-source, self-hostable AI production-debugging agent for backend teams — turns incidents into sandbox-validated, evidence-backed merge requests. Bring your own LLM (OpenAI/Anthropic/local).

Downloads

675

KafuOps is an open-source, self-hostable AI production-debugging agent (an AI SRE) for backend and platform teams. It turns real production incidents into sandbox-validated, evidence-backed merge requests with a confidence score — bring your own LLM (OpenAI, Anthropic, or a local Codex or a local AI CLI). Built for SREs, on-call, and DevOps engineers who want automated root cause analysis and incident-to-pull-request fixes without streaming their logs to a model.

KafuOps is designed around one important rule:

It does not stream all logs to an LLM. It watches locally, detects meaningful incidents, builds a small sanitized evidence packet, grounds the model with only the relevant files, and opens a reviewable MR or PR.

📋 Looking for the honest picture of what works today vs. what is spec only? See STATUS.md — every doc in docs/ is mapped to ✅ implemented, 🟡 partial, or 🔲 not yet.

See it fix a real bug

A planted bug in a tiny checkout service — KafuOps diagnoses it, writes the patch, self-corrects when the first attempt fails, validates the fix in a sandbox, and opens a reviewable MR. This run is driven by the locally-installed local AI CLI (no API key).

The fix KafuOps generated, after the sandbox test went green:

- return price - price * percent;
+ return price - price * (percent / 100);

$ scripts/demo.sh
### Before — the test fails (red):
AssertionError: 20% off $100 should be $80   (-1900 !== 80)

### KafuOps runs (provider: local AI CLI):
! attempt 1: patch did not apply → revise → retry
✓ attempt 2: patch applied, tests passed        # self-correcting loop
  confidence=80 (high)   risk=low
! MR ready for review — saved mr-body.md

### After — the test passes (green):
all tests passed

What KafuOps does

Observes backend logs (wrapper mode + sidecar file tailing), OpenTelemetry traces, runtime errors, and alert webhooks (Sentry/Datadog/Alertmanager).
Builds a living .kafuops/memory/ folder that explains the codebase, architecture, routes, services, database usage, queues, external APIs, and previous incidents.
Detects errors and deduplicates noisy events into incidents.
Selects the relevant source files, tests, configs, traces, and log snippets.
Calls an LLM only after an incident trigger and only with sanitized context.
Generates a failing regression test when possible.
Creates a fix in a sandbox branch.
Runs configured tests.
Opens a GitHub PR or GitLab MR with root cause, evidence, confidence score, blast radius, and validation notes.
Updates project memory after review and merge.

Recommended architecture

For production, KafuOps should run beside your backend, not inside it.

The default mode is a sidecar/agent + control-plane worker model:

graph LR
  Backend[Backend Runtime] -->|stdout/stderr logs| Agent[KafuOps Agent]
  Backend -->|OpenTelemetry traces/errors| Agent
  Alerts[Sentry/Datadog/Alertmanager/Webhooks] --> Agent
  Agent --> Incident[Incident Engine]
  Incident --> Context[Context Builder]
  Repo[Git Repository] --> Context
  Memory[.kafuops/memory] --> Context
  Context --> LLM[LLM Fix Agent]
  LLM --> Sandbox[Patch Sandbox]
  Sandbox --> Git[GitHub PR / GitLab MR]

This keeps your app in control of its own runtime and makes KafuOps easy to adopt without becoming the process manager for production.

For local development and staging, KafuOps can also wrap the backend command:

kafuops run -- npm run dev
kafuops run -- python -m uvicorn app.main:app
kafuops run -- ./gradlew bootRun

That mode captures stdout/stderr, process exits, stack traces, and runtime metadata directly.

Quick start

cd your-backend-repo
npx kafuops quickstart      # discover the stack, set up, build memory — one command
kafuops run -- npm start    # wrap your app and watch for incidents

quickstart auto-detects your framework, start command, git remote, and which AI is available (it'll use a local Codex or a local AI CLI with no API key, or an OpenAI/Anthropic key). Your key is stored in .kafuops/.env (gitignored, mode 0600) and loaded automatically — no manual export needed. Run kafuops doctor any time to check the setup.

Or for production-style setup:

kafuops init
kafuops agent start --config .kafuops.yml
kafuops worker start --config .kafuops.yml

Run with Docker

Prebuilt multi-arch images are published on each release to GHCR and Docker Hub:

docker pull ghcr.io/kalmuraee/kafuops:latest        # or: kalmuraee/kafuops:latest
docker run --rm -p 7878:7878 \
  -e KAFUOPS_CONFIG=/workspace/.kafuops.yml \
  -v "$PWD/.kafuops.yml:/workspace/.kafuops.yml" \
  -v "$PWD/.kafuops:/workspace/.kafuops" \
  ghcr.io/kalmuraee/kafuops:latest agent start

See docs/DEPLOYMENT_DOCKER.md for the agent + worker compose setup, and deploy/ for Kubernetes manifests and a Helm chart.

Core design principles

Incident-triggered AI — no continuous log streaming to the model.
Small grounded context — send only the files and evidence needed for this incident.
Human-reviewable output — every MR includes evidence, risk, tests, and confidence.
Memory-first debugging — every incident improves the project memory.
Privacy by default — redaction, file allowlists, audit logs, and local-first processing.
No auto-merge by default — KafuOps opens reviewable MRs, not silent production changes.

FAQ

What does KafuOps do?

KafuOps is an open-source AI production-debugging agent. It takes a backend incident or signal, builds grounded and redacted context, runs a 4-stage LLM pipeline, and opens a sandbox-validated, evidence-backed merge request with a confidence score for a human to review.

Is KafuOps open source and self-hostable?

Yes. KafuOps is licensed under AGPL-3.0-only and is fully self-hostable. You run it yourself and connect your own LLM provider — OpenAI, Anthropic, or a local Codex or a local AI CLI.

How is KafuOps different from Sentry Seer or Datadog Bits AI Dev?

Those are closed, SaaS-hosted, and tied to their own telemetry. KafuOps is open-source and self-hostable, is not locked to a single observability vendor, lets you bring your own LLM, and does not stream continuous logs to the model — it grounds context with redaction and a grounding manifest before generating a patch.

Does KafuOps send my production logs to an AI model?

No continuous logs are streamed to the model. KafuOps grounds each incident with redaction and a grounding manifest, then passes only that scoped context to the LLM, so you keep control of what leaves your environment.

Will KafuOps merge code automatically?

No. KafuOps validates each candidate fix in a sandbox with a self-correcting loop, then opens a reviewable pull/merge request with evidence and a confidence score. A human reviews and merges — the agent never deploys to production on its own.

How do I install KafuOps and what does it need?

Install the CLI with npm install -g kafuops (Node.js >= 20, ESM/TypeScript). Then run the setup wizard to validate your LLM key and connect your repo. Full docs are at https://kalmuraee.github.io/KafuOps/.

Documentation

Start here:

Status

The 0.1.0 MVP is implemented as a Node.js/TypeScript package in this repository:

src/         agent, incident engine, scanner, graph, context builder, LLM
             orchestrator, sandbox, MR/PR creator, webhooks, policies, CLI
tests/       vitest unit tests
bin/kafuops  CLI entry point
Dockerfile   multi-stage build
examples/    a tiny sample-app to exercise the wrapper-mode incident flow

Quick local check:

npm install
npm run build
npm test
node bin/kafuops.js --help

The MVP follows the success criteria in PRODUCT_BRIEF.md: GitHub + GitLab MR creation, OpenAI-grounded analysis, sanitized context bundles, grounding manifest per call, redaction at ingest and before model calls, regression-test-first sandbox validation, confidence + blast-radius scoring, and audit logging of every model call. The implementation runs in dry-run mode automatically when OPENAI_API_KEY or KAFUOPS_GIT_TOKEN are absent, so the full pipeline can be exercised offline.

As of 0.2.0 the agent observes a live system (sidecar log tailing + OpenTelemetry OTLP intake), the worker autonomously drives incidents to MRs, and the review-feedback memory loop is closed. The documentation in docs/ and website/ remains the canonical product specification — anything not implemented yet (first-class Kubernetes operator/CRD, embedded SDKs, similar-incident matching) is on the roadmap in docs/ROADMAP.md. See STATUS.md for the honest doc-by-doc mapping.