@contextpilot/openclaw-contextpilot

v0.1.4

Published

2 months ago

IRL-based dialogue and tool interception with PRM judging

0High
0Medium
0Low

haowen_

openclaw plugin contextpilot irl prm reinforcement-learning agent context-engineering

ContextPilot for OpenClaw

基于逆强化学习（IRL）的对话与工具拦截插件 / IRL-based dialogue and tool interception plugin

概述 / Overview

ContextPilot 是一个为 OpenClaw 设计的智能插件，通过**逆强化学习（IRL）**技术实现对话和工具调用的自动学习与优化。插件能够从交互经验中学习，持续提升 AI Agent 的响应质量。

ContextPilot is an intelligent plugin for OpenClaw that uses Inverse Reinforcement Learning (IRL) to enable automatic learning and optimization of dialogues and tool calls. The plugin learns from interaction experiences to continuously improve AI Agent response quality.

核心功能 / Core Features

1. 对话拦截与学习 / Dialogue Interception & Learning

记录用户与 AI 的对话交互，提取对话模式和关键词，为后续优化提供数据支持。

Records user-AI dialogue interactions, extracts conversation patterns and keywords for optimization.

对话模式 / Patterns: code_block, question, first_person, multiline
关键词提取 / Keyword Extraction: 过滤停用词，统计词频

2. PRM 评分系统 / PRM Scoring System

使用 Process Reward Model 评估对话质量，从三个维度打分：

Uses Process Reward Model to evaluate dialogue quality across three dimensions:

| 维度 / Dimension | 权重 / Weight | 说明 / Description | |-----------------|---------------|-------------------| | Quality | 50% | 代码块、推理词、长度等 | | Efficiency | 30% | 对话轮次效率 | | Safety | 20% | 危险命令检测 |

3. 工具调用拦截 / Tool Call Interception

记录工具调用（bash, write_file 等），区分成功与失败，优化参数建议。

Records tool calls (bash, write_file, etc.), distinguishes success/failure, optimizes parameter suggestions.

4. 异步 IRL 学习 / Async IRL Learning

后台自动积累经验数据，定期训练策略模型，无需阻塞主流程。

Background accumulation of experience data, periodic policy training without blocking main flow.

5. 上下文增强引擎 / Context Enhancement Engine

基于学习到的策略，动态增强输入 Prompt，支持三种策略：

Dynamically enhances input prompts based on learned strategies, supports three strategies:

| 策略 / Strategy | 说明 / Description | |---------------|-------------------| | suggest | 建议模式 - 提供参考选项 | | prune | 剪枝模式 - 过滤低质量选项 | | prioritize | 优先模式 - 按质量排序（默认） |

6. RLVR 增强 / RLVR Enhancement

Reinforcement Learning from Virtual Rewards - 基于系统状态的奖励计算：

| 模块 / Module | 功能 / Function | |-------------|---------------| | Environment | 自动检测 K8S/Docker/物理机环境 | | Classifier | 区分参数问题与算法问题 | | K8sCollector | 采集 Pod 资源指标 | | Anomaly | 阈值/趋势/统计异常检测 |

安装 / Installation

# 使用 ClawHub
openclaw plugins install clawhub:@yourorg/openclaw-contextpilot

# 或使用 npm
openclaw plugins install @yourorg/openclaw-contextpilot

快速开始 / Quick Start

OpenClaw 插件使用 / As OpenClaw Plugin

import { DTMDPCEPlugin, createPlugin } from 'openclaw-contextpilot';

// 创建插件实例
const plugin = createPlugin({
  enableDialogueInterception: true,
  enableToolInterception: true,
  enablePRMJudge: true,
  enablePromptEnhancement: true,
  enableAsyncLearning: true,
  learningConfig: {
    mode: 'prm',
    minExperiencesBeforeTraining: 5,
    autoTrain: true
  }
});

await plugin.initialize();

独立使用 RLVR 模块 / Standalone RLVR Module

import { detectEnvironment, getProblemClassifier, getParamOptimizer } from 'openclaw-contextpilot/rlvr';

// 检测环境
const env = detectEnvironment(); // "k8s" | "docker" | "physical"

// 问题分类
const classifier = getProblemClassifier();
classifier.record({
  tool_name: 'kubectl',
  parameters: { replicas: 1 },
  success: false,
  execution_time: 1.0
});
const result = classifier.classify('kubectl');
// result.problem_type: "parameter_issue" | "algorithm_issue"

// 参数优化
const optimizer = getParamOptimizer();
const prompt = optimizer.generatePrompt('kubectl');

配置 / Configuration

插件配置 / Plugin Configuration

{
  "plugins": {
    "entries": {
      "openclaw-contextpilot": {
        "enabled": true,
        "config": {
          "guideOutputDir": "./contextpilot-guides",
          "maxGuideEntries": 100,
          "learning": {
            "enabled": true,
            "mode": "prm",
            "minExperiencesBeforeTraining": 5,
            "autoTrain": true
          },
          "ce": {
            "enabled": true,
            "strategy": "prioritize",
            "threshold": 0.3,
            "topK": 3
          }
        }
      }
    }
  }
}

配置说明 / Configuration Options

| 配置项 / Option | 类型 / Type | 默认值 / Default | 说明 / Description | |----------------|-------------|------------------|-------------------| | guideOutputDir | string | ./contextpilot-guides | 学习指南输出目录 | | maxGuideEntries | number | 100 | 最大指南条目数 | | learning.mode | string | prm | 学习模式: prm, rlvr, hybrid | | learning.minExperiencesBeforeTraining | number | 5 | 训练前最少经验数 | | learning.autoTrain | boolean | true | 自动训练 | | ce.strategy | string | prioritize | CE 策略: suggest, prune, prioritize | | ce.threshold | number | 0.3 | 过滤阈值 | | ce.topK | number | 3 | 返回前 K 个结果 |

API 参考 / API Reference

钩子方法 / Hook Methods

插件实现以下 OpenClaw 钩子：

The plugin implements the following OpenClaw hooks:

| 钩子 / Hook | 触发时机 / Trigger | 作用 / Function | |------------|-------------------|----------------| | before_prompt_build | Prompt 构建前 | 增强 Prompt | | agent_end | Agent 响应后 | 记录对话经验 | | after_tool_call | 工具调用后 | 记录工具经验 | | session_end | 会话结束时 | 生成学习指南 |

生命周期方法 / Lifecycle Methods

// 初始化
await plugin.initialize();

// 获取统计
const stats = plugin.getStats();
// {
//   experienceStore: { totalExperiences, dialogueExperiences, toolExperiences, ... },
//   learning: { experiencesProcessed, totalTrained, ... },
//   sessions: 5
// }

// 手动触发学习
await plugin.triggerLearning();

// 关闭
await plugin.shutdown();

项目结构 / Project Structure

extension/
├── src/
│   ├── index.ts              # 主入口 / Main entry
│   ├── types.ts              # 类型定义 / Type definitions
│   ├── plugin-entry.ts       # OpenClaw 入口 / Plugin entry
│   ├── hooks-register.ts     # 钩子注册 / Hook registration
│   │
│   ├── intercept/            # 拦截器 / Interceptors
│   │   ├── experience-store.ts      # 经验存储
│   │   ├── prompt-interceptor.ts   # Prompt 拦截增强
│   │   └── dialogue-interceptor.ts # 对话拦截
│   │
│   ├── learn/                # 学习模块 / Learning
│   │   ├── prm-judge.ts           # PRM 评分
│   │   └── async/
│   │       ├── index.ts
│   │       ├── learning-manager.ts
│   │       └── async-collector.ts
│   │
│   ├── rlvr/                # RLVR 模块
│   │   ├── environment.ts        # 环境检测
│   │   ├── classifier.ts         # 问题分类
│   │   ├── k8s-collector.ts    # K8S 采集
│   │   ├── param-optimizer.ts   # 参数优化
│   │   └── anomaly.ts           # 异常检测
│   │
│   ├── ce/                  # 上下文增强 / Context Enhancement
│   │   └── ce-engine.ts
│   │
│   ├── generate/            # Prompt 生成
│   │   └── prompt-generator.ts
│   │
│   ├── config/              # 配置管理 / Config
│   │   └── config-manager.ts
│   │
│   ├── integration/          # 集成 / Integration
│   │   └── openclaw-integration.ts
│   │
│   └── cli/                  # CLI
│       └── index.ts
│
├── dist/                     # 编译输出
├── package.json
├── openclaw.plugin.json
└── tsconfig.json

学习模式 / Learning Modes

| 模式 / Mode | 说明 / Description | 适用场景 / Use Case | |------------|-------------------|---------------------| | prm | Process Reward Model | 对话质量评估 | | rlvr | 虚拟奖励强化学习 | 系统状态感知 | | hybrid | 混合模式 | 综合优化 |

开发 / Development

# 安装依赖
npm install

# 构建
npm run build

# 测试
npm test

# 类型检查
npm run typecheck

常见问题 / FAQ

Q: 安装后没有任何效果？ A: 确认 OpenClaw 版本 >= 2026.3.24，插件需要 OpenClaw SDK 支持。

License

MIT