rlhf-feedback-loop

v0.6.13

Published

2 days ago

Feedback-Driven Development (FDD) for AI agents — capture preference signals, steer behavior via Thompson Sampling, and export KTO/DPO training pairs for downstream fine-tuning.

MCP Memory Gateway

Local-first memory and feedback pipeline for AI agents. Captures thumbs-up/down signals, promotes reusable memories, generates prevention rules from repeated failures, and exports KTO/DPO pairs for fine-tuning.

Works with any MCP-compatible agent: Claude, Codex, Gemini, Amp, Cursor.

What It Does

thumbs up/down → validate → promote to memory → vector index → prevention rules → DPO export

Capture — capture_feedback MCP tool accepts signals with context
Validate — Rubric engine gates promotion (vague feedback is rejected with clarification prompts)
Remember — Promoted memories stored in JSONL + LanceDB vectors
Prevent — Repeated failures auto-generate prevention rules
Export — KTO/DPO pairs for downstream fine-tuning
Bridge — JSONL file watcher auto-ingests signals from external sources (Amp plugins, hooks, scripts)

Quick Start

# Add to any MCP-compatible agent
claude mcp add rlhf -- npx -y rlhf-feedback-loop serve
codex mcp add rlhf -- npx -y rlhf-feedback-loop serve
amp mcp add rlhf -- npx -y rlhf-feedback-loop serve
gemini mcp add rlhf "npx -y rlhf-feedback-loop serve"

# Or auto-detect all installed platforms
npx rlhf-feedback-loop init

MCP Tools

| Tool | Description | |------|-------------| | capture_feedback | Accept up/down signal + context, validate, promote to memory | | recall | Vector-search past feedback and prevention rules for current task | | feedback_stats | Approval rate, per-skill/tag breakdown, trend analysis | | feedback_summary | Human-readable recent feedback summary | | prevention_rules | Generate prevention rules from repeated mistakes | | export_dpo_pairs | Build DPO preference pairs from promoted memories | | construct_context_pack | Bounded context pack from contextfs | | evaluate_context_pack | Record context pack outcome (closes learning loop) | | list_intents | Available action plan templates | | plan_intent | Generate execution plan with policy checkpoints | | context_provenance | Audit trail of context decisions |

CLI

npx rlhf-feedback-loop init              # Scaffold .rlhf/ + configure MCP
npx rlhf-feedback-loop serve             # Start MCP server (stdio) + watcher
npx rlhf-feedback-loop status            # Learning curve dashboard
npx rlhf-feedback-loop watch             # Watch .rlhf/ for external signals
npx rlhf-feedback-loop watch --once      # Process pending signals and exit
npx rlhf-feedback-loop capture           # Capture feedback via CLI
npx rlhf-feedback-loop stats             # Analytics + Revenue-at-Risk
npx rlhf-feedback-loop rules             # Generate prevention rules
npx rlhf-feedback-loop export-dpo        # Export DPO training pairs
npx rlhf-feedback-loop risk              # Train/query boosted risk scorer
npx rlhf-feedback-loop self-heal         # Run self-healing diagnostics

JSONL File Watcher

The serve command automatically starts a background watcher that monitors feedback-log.jsonl for entries written by external sources (Amp plugins, shell hooks, CI scripts). These entries are routed through the full captureFeedback() pipeline — validation, memory promotion, vector indexing, and DPO eligibility.

# Standalone watcher
npx rlhf-feedback-loop watch --source amp-plugin-bridge

# Process pending entries once and exit
npx rlhf-feedback-loop watch --once

External sources write entries with a source field:

{"signal":"positive","context":"Agent fixed bug on first try","source":"amp-plugin-bridge","tags":["amp-ui-bridge"]}

The watcher tracks its position via .rlhf/.watcher-offset for crash-safe, idempotent processing.

Learning Curve Dashboard

npx rlhf-feedback-loop status

╔══════════════════════════════════════╗
║     RLHF Learning Curve Dashboard   ║
╠══════════════════════════════════════╣
║ Total signals:    148                ║
║ Positive:          45  (30%)         ║
║ Negative:         103  (70%)         ║
║ Recent (last 20):  20%               ║
║ Trend:            📉 declining       ║
║ Memories:          17                ║
║ Prevention rules:   9                ║
╠══════════════════════════════════════╣
║ Top failure domains:                 ║
║   execution-gap     4                ║
║   asked-not-doing   2                ║
║   speed             2                ║
╠══════════════════════════════════════╣
║ Learning curve (approval % by window)║
║   [1-10]   10% ██                    ║
║   [11-20]  20% ████                  ║
║   [21-30]  35% ███████               ║
║   [31-40]  30% ██████                ║
╚══════════════════════════════════════╝

Architecture

Five-phase pipeline: Capture → Validate → Remember → Prevent → Export

Agent (Claude/Codex/Amp/Gemini)
  │
  ├── MCP tool call ──→ captureFeedback()
  ├── REST API ────────→ captureFeedback()
  ├── CLI ─────────────→ captureFeedback()
  └── External write ──→ JSONL ──→ Watcher ──→ captureFeedback()
                                        │
                                        ▼
                              ┌─────────────────┐
                              │  Full Pipeline   │
                              │  • Schema valid  │
                              │  • Rubric gate   │
                              │  • Memory promo  │
                              │  • Vector index  │
                              │  • Risk scoring  │
                              │  • RLAIF audit   │
                              │  • DPO eligible  │
                              └─────────────────┘

Agent Runner Contract

WORKFLOW.md: scope, proof-of-work, hard stops, and done criteria for isolated agent runs
.github/ISSUE_TEMPLATE/ready-for-agent.yml: bounded intake template for "Ready for Agent" tickets
.github/pull_request_template.md: proof-first handoff format for PRs

License

MIT. See LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme