@keo-ai/axiom

v0.2.2

Published

6 days ago

基于 LLM 的预测与推理库，支持多 Provider 切换

0High
0Medium
0Low

windowlu

⚡ AXIOM

The execution foundation for LLM agents.

Axiom 不是编排框架。它是大模型调用工具的「底盘」

什么是 Axiom

Axiom 是 keo 的 LLM 底座库。它不碰业务编排（意图路由、节点流、客服话术），只解决一个问题：

当模型说「我要调用工具」时，确保它真的调了、真的拿到了结果、并且基于事实继续推理。

上层只管写业务节点（Conductor / 意图层 / 多模态预处理），Axiom 负责把 LLM → 工具 → 结果 → LLM 的循环跑稳。

┌────────────────────────────────────────────┐
│              你的项目层（业务）              │
│   意图路由 · 节点编排 · 客服话术 · 业务Tool   │
├────────────────────────────────────────────┤
│                 ⚡ AXIOM                    │
│  ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│  │ LLM调用  │ │ Function │ │   RAG基础   │ │
│  │ 故障转移 │ │ Call     │ │  Embedding │ │
│  │ 流式输出 │ │ Loop     │ │  pgvector  │ │
│  └─────────┘ │ 状态机   │ │  rerank    │ │
│              │ 副作用   │ └─────────────┘ │
│              │ 传播     │                 │
│              └──────────┘                 │
└────────────────────────────────────────────┘

架构边界

Axiom 做：

✅ LLM 统一调用 + Provider 故障转移
✅ Function Call Loop 的驱动、记录、校验、副作用传播
✅ RAG 基础（Embedding / pgvector 向量检索 / rerank）
✅ 环境变量配置

Axiom 不做：

❌ 意图识别与路由
❌ 业务节点编排（Conductor / DAG）
❌ 多模态预处理（语音/图片转文本）
❌ 消息队列与削峰
❌ 具体业务 Tool 的实现逻辑
❌ Provider 排序策略（成本/能力/速度）
❌ 记忆系统（Faiss / 语义记忆 / 槽位）— 由上层业务实现

上层写业务，Axiom 跑底盘。

安装

npm install @keo-ai/axiom pg

pg 是 peer dependency，用于 EmbeddingSearch 模块的 pgvector 连接。

在你的项目根目录创建 .env 文件：

BAILIAN_API_KEY=your-api-key

然后在你的项目入口文件顶部引入 dotenv 以加载环境变量：

import 'dotenv/config';

// 或指定路径
import { config } from 'dotenv';
config({ path: '.env.local' });

Predict 模块（LLM 调用）

Predict 是 Axiom 的 LLM 调用层。它封装了 Provider 连接、请求组装、故障转移和返回解析，让上层只需关心「用什么模型、传什么消息」。

快速开始

设置环境变量（百炼）：

export BAILIAN_API_KEY="your-api-key"

一行调用：

import { LLM } from '@keo-ai/axiom';

const { content, usage } = await LLM.predict({ model: 'qwen-max', prompt: '你好' });
console.log(content);        // 模型回复
console.log(usage);          // { promptTokens, completionTokens, totalTokens, cachedPromptTokens? }

API 概览

| 方法 | 用途 | 返回类型 | |---|---|---| | LLM.predict(config) | 单轮快速调用，自动构建 user message | { content, usage?: TokenUsage }，依 responseFormat 而定（见下） | | LLM.predictWithMessages(messages, config) | 传入完整消息列表，适用于多轮对话 | 同上 | | LLM.streamPredict(config) | 流式调用，逐块返回内容 | AsyncGenerator<StreamChunk> | | LLM.streamPredictWithMessages(messages, config) | 流式 + 自定义消息列表 | AsyncGenerator<StreamChunk> |

需要自定义 Provider 列表或故障转移策略时，直接使用 Predictor 类：

import { Predictor, BailianProvider } from '@keo-ai/axiom';

const predictor = new Predictor({
  providers: [
    new BailianProvider({ name: 'bailian', apiKey, baseUrl, defaultModel: 'qwen-max' }),
  ],
});
const response = await predictor.generateForModel('qwen-max', { messages: [...] });

responseFormat 与返回类型

predict 和 predictWithMessages 返回 { content, usage?: TokenUsage }，其中 content 的类型由 responseFormat 决定：

| responseFormat | content 类型 | 说明 | |---|---|---| | 未设置 / 'text' | string | 直接返回模型输出的文本 | | 'json' | any | 自动 JSON.parse。注意：glm-5.1 不支持此模式 |

// text → { content: string, usage?: TokenUsage }
const { content, usage } = await LLM.predict({ model: 'qwen-max', prompt: '讲个故事', responseFormat: 'text' });

// json → { content: any, usage?: TokenUsage }
const { content: obj, usage } = await LLM.predict({
  model: 'qwen-max',
  prompt: '生成一个 JSON',
  responseFormat: 'json',
});
console.log(obj.name);
console.log(usage);  // { promptTokens, completionTokens, totalTokens, cachedPromptTokens? }

Provider 配置

设置环境变量即可，无需代码改动。

BAILIAN_API_KEY           # 必填
BAILIAN_BASE_URL          # 可选，默认 https://dashscope.aliyuncs.com/compatible-mode/v1
BAILIAN_DEFAULT_MODEL     # 可选，默认 qwen-max

故障转移

Predictor 内部按模型查询 MODEL_REGISTRY，获取候选 Provider 队列，再通过 provider.supports(model) 做二次校验。当某个 Provider 失败或不支持该模型时，自动尝试下一个，直到成功或全部失败。

全部失败时抛出汇总错误：All providers failed: bailian: HTTP 500...; bailian: timeout...

流式调用

流式版本返回 AsyncGenerator<StreamChunk>，迭代即可逐块消费：

for await (const chunk of LLM.streamPredict({ model: 'qwen-max', prompt: '讲个故事' })) {
  if (chunk.type === 'content') {
    process.stdout.write(chunk.delta);
  }
  if (chunk.type === 'reasoning') {
    process.stdout.write(chunk.delta);  // 推理过程
  }
  if (chunk.type === 'finish' && chunk.usage) {
    console.log('Token usage:', chunk.usage);
    // { promptTokens, completionTokens, totalTokens, cachedPromptTokens? }
  }
}

💡 流式调用自动启用 stream_options: { include_usage: true }，finish 事件携带完整的 token 消耗统计。usage 可能与 finish_reason 在同一个 SSE chunk，也可能在独立的 chunk（choices: []）中返回，两种情况均已兼容。

模型列表

当前支持的模型（通过 Model 类型枚举）：

| 模型 | 推理深度 | json 模式 | 说明 | |------|----------|-----------|------| | qwen-max | ❌ 不支持 | ✅ | 默认模型 | | qwen3.7-max | ✅ 映射支持 | ✅ | low/medium/high 映射为 enable_thinking | | qwen-plus | ✅ 映射支持 | ✅ | 同上 | | qwen-turbo | ✅ 映射支持 | ✅ | 同上 | | deepseek-v4-pro | ✅ 原生支持 | ✅ | low/medium/high 直接传递 | | deepseek-v4-flash | ✅ 原生支持 | ✅ | 同上 | | kimi-k2.6 | ✅ 原生支持 | ✅ | 同上 | | qwq-plus | ❌ 不支持 | ✅ | 固定推理行为，不可调节 | | glm-5.1 | ❌ 不支持 | ❌ | 不支持推理参数，不支持 json 模式 | | qwen-vl-plus | ❌ 不支持 | ✅ | 视觉模型，不支持推理参数 |

推理深度的支持方式由 Provider 内部维护。百炼 Provider 中，Qwen 系列通过 extra_body.enable_thinking 映射实现（low 关闭，medium/high 开启），DeepSeek / Kimi 则原生透传 reasoning_effort。
目前所有模型均路由到百炼 Provider。后续接入其他厂商时，通过 MODEL_REGISTRY 扩展映射即可。

错误处理

Predict 模块遵循 Axiom 的统一错误策略：直接抛异常，调用方 try/catch。不使用 Result<T, E> 包装，不定义自定义错误类。

try {
  const { content, usage } = await LLM.predict({ model: 'qwen-max', prompt: 'hi' });
} catch (e) {
  // e.message 包含 Provider 汇总错误信息
}

Function Call Loop 模块

Function Call Loop 是 Axiom 的底层 Function Call 引擎。它负责把 LLM → Tool → Result → LLM 的循环跑稳，同时向上层暴露完整的生命周期事件和执行历史（Harness）。

核心设计

| 设计点 | 说明 | |---|---| | Harness 与 Messages 分离 | Harness 是只增不减的完整执行档案；Messages 是给 LLM 看的对话历史，可被压缩 | | Prompt 完全外部化 | 引擎不内置任何系统提示词、告警文案或终止文案 | | 默认安全 | 默认不压缩、不隐藏 tool、不拦截执行 | | Plan and Execute | 单轮 LLM 可返回多个 tool call，这些 tool 在本轮内并行执行 | | 硬兜底与软策略并存 | Turn Policy 是每轮必走的软策略；maxTurns 是引擎底层的硬兜底 |

快速开始

设置环境变量后一行调用：

export BAILIAN_API_KEY="your-api-key"

import { FunctionCallLoop } from '@keo-ai/axiom';

const result = await FunctionCallLoop.runLoop({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the weather in Beijing?' },
  ],
  model: 'qwen-max',
  temperature: 0.7,
  maxTokens: 2048,
  tools: [
    {
      name: 'get_weather',
      description: 'Get weather for a city',
      parameters: {
        type: 'object',
        properties: {
          city: { type: 'string', description: 'City name' },
        },
        required: ['city'],
      },
      execute: async (args, context) => {
        return { temperature: 25, condition: 'Sunny' };
      },
    },
  ],
  maxTurns: 5,
  onEvent: (event) => {
    if (event.type === 'execution:end') {
      console.log(`${event.toolName}: ${event.status} (${event.durationMs}ms)`);
    }
  },
});

console.log(result.finalContent);
console.log('Turns:', result.turns);
console.log('Harness:', result.harness);
console.log('Token usage:', result.totalUsage);        // 累计 token 消耗
console.log('Usage history:', result.usageHistory);    // 每轮明细

💡 需要流式输出（实时展示 LLM 生成内容）？使用下方的 runLoopStream。

流式调用（runLoopStream）

当需要实时展示 LLM 的思考过程或输出内容时，使用 runLoopStream：

import { FunctionCallLoop } from '@keo-ai/axiom';

const stream = FunctionCallLoop.runLoopStream({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the weather in Beijing?' },
  ],
  tools: [
    {
      name: 'get_weather',
      description: 'Get weather for a city',
      parameters: { /* ... */ },
      execute: async (args) => {
        return { temperature: 25, condition: 'Sunny' };
      },
    },
  ],
  maxTurns: 5,
});

// 逐块消费流事件
for await (const chunk of stream) {
  switch (chunk.type) {
    case 'turn_start':
      console.log(`\n--- Turn ${chunk.turn} ---`);
      break;
    case 'reasoning':
      // 实时流式输出模型的 reasoning content（思考过程）
      process.stdout.write(chunk.delta);
      break;
    case 'content':
      // 实时流式输出 content（正式回复内容）
      process.stdout.write(chunk.delta);
      break;
    case 'tool_call':
      console.log('\n[Tool Call]', chunk.toolCalls.map(t => t.function.name));
      break;
    case 'tool_result':
      console.log(`[Result] ${chunk.toolName}: ${chunk.content} (${chunk.status})`);
      break;
    case 'turn_end':
      console.log(`\n--- Turn ${chunk.turn} End ---`);
      if (chunk.usage) {
        console.log(`Token usage: prompt=${chunk.usage.promptTokens}, completion=${chunk.usage.completionTokens}`);
      }
      break;
  }
}

获取最终返回值：流结束后需要通过手动驱动迭代器获取 LoopResult：

const stream = FunctionCallLoop.runLoopStream(config);

const chunks: FunctionCallLoop.LoopStreamChunk[] = [];
let result: FunctionCallLoop.LoopResult | undefined;

while (true) {
  const { value, done } = await stream.next();
  if (done) {
    result = value as FunctionCallLoop.LoopResult;
    break;
  }
  chunks.push(value as FunctionCallLoop.LoopStreamChunk);
  // 实时消费...
}

console.log('Final:', result!.finalContent);
console.log('Turns:', result!.turns);

流式事件类型

| 事件 | 触发时机 | 包含字段 | |---|---|---| | turn_start | 新一轮开始 | turn | | reasoning | LLM 输出 reasoning content（推理过程）增量 | delta, turn | | content | LLM 输出 content（正式回复）增量 | delta, turn | | tool_call | LLM 决定调用 tool（流结束、完整 tool_calls 解析完成） | toolCalls, turn | | tool_result | tool 执行完成 | callId, toolName, content, status, turn | | turn_end | 一轮结束（tool 全部执行完或 content 直接返回） | turn, usage? |

💡 turn_end 的 usage 字段携带该轮 LLM 调用的 token 消耗统计（promptTokens、completionTokens、totalTokens、cachedPromptTokens?）。流式请求自动启用 stream_options: { include_usage: true }，确保 API 返回 usage 数据。

流式 vs 非流式的选择

| 场景 | 推荐方式 | |---|---| | 需要实时展示 LLM 输出（打字机效果） | runLoopStream | | 需要实时展示模型 reasoning（思考）过程 | runLoopStream（通过 reasoning 事件独立透出） | | 后台静默执行，只关心最终结果 | runLoop | | 低延迟、简单场景 | runLoop |

注意：runLoopStream 需要 LLMCaller 支持 stream 方法。使用默认的 createLLMCaller 时自动支持。如果注入自定义 llmCaller，请确保实现了 stream 接口。

生命周期事件

Loop 每处理一个 tool call，按顺序抛出两个事件：

const result = await FunctionCallLoop.runLoop({
  // ...
  onEvent: (event) => {
    if (event.type === 'execution:start') {
      console.log(`[${event.turn}] ${event.toolName} started`);
    }
    if (event.type === 'execution:end') {
      console.log(`[${event.turn}] ${event.toolName} done in ${event.durationMs}ms`);
      console.log(`[${event.turn}] ${event.toolName} result:`, event.result ?? event.error);
    }
  },
});

| 事件 | 触发时机 | 包含字段 | |---|---|---| | execution:start | execute 函数被调用之前 | callId, toolName, args, turn | | execution:end | execute 函数返回之后 | callId, toolName, status, durationMs, result / error, turn |

status 类型为 HarnessRecordStatus，取值如下：

| 值 | 含义 | |---|---| | 'success' | execute 函数正常返回 | | 'error' | execute 函数抛异常 | | 'timeout' | 执行超时 | | 'cancelled' | 被 AbortSignal 取消 |

Harness（执行历史）

Harness 是跨轮次累积的只读执行记录，tool 之间可以互相看见：

console.log(result.harness);
// [
//   { id: 'call-1', turn: 1, toolName: 'get_weather', status: 'success', ... },
//   { id: 'call-2', turn: 2, toolName: 'get_time', status: 'success', ... },
// ]

每条记录包含：唯一 ID、所属轮次、全局序号、tool 名称、参数、执行状态、返回值/错误、时间戳和耗时。

并行隔离原则：本轮并行执行的多个 tool，各自收到的 Harness 是"本轮并行开始前"的快照，互相看不到同轮其他正在执行的 tool。

Tool 结果分流（ToolResult）

当 tool 返回大量结构化数据时，直接 JSON.stringify 给 LLM 会导致模型重复复述。此时可以用 ToolResult 把结果分成两份：

const recommendTool = {
  name: 'recommend_conferences',
  description: 'Recommend academic conferences',
  parameters: { /* ... */ },
  execute: async (args) => {
    const raw = await recommendConferences(args);
    return {
      forLLM: `找到 ${raw.results.length} 个相关会议：${raw.results.map(r => r.name).join('、')}`,
      forFrontend: raw,
    };
  },
};

| 字段 | 去向 | 说明 | |---|---|---| | forLLM | messages[] → LLM 上下文 | 精简摘要，建议用自然语言 | | forFrontend | tool_result SSE 事件 | 完整原始数据，前端渲染用 |

流式消费时前端可以拿到完整数据：

for await (const chunk of stream) {
  if (chunk.type === 'tool_result' && chunk.frontendData) {
    renderCard(chunk.frontendData);   // 前端拿到完整 JSON 渲染卡片
  }
}

向后兼容：返回普通值（非 ToolResult）时行为完全不变，仍按原有逻辑 JSON.stringify 后透给 LLM。

Tool 审批拦截

对于敏感操作（转账、删除数据等），可以给 tool 配置 approval，Loop 会在执行前暂停并返回 pendingApproval：

const result = await FunctionCallLoop.runLoop({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Transfer 1000 to Alice' },
  ],
  metadata: { userId: 'u-123', orgId: 'o-456' },
  tools: [
    {
      name: 'transfer_money',
      description: 'Transfer money',
      parameters: { /* ... */ },
      approval: {
        createTicket: async (args, context) => {
          const { userId } = context.metadata as { userId: string };
          const ticket = await createApprovalTicket({ ...args, applicant: userId });
          return { ticketId: ticket.id };
        },
      },
      execute: async (args, context) => {
        // 审批通过后才会执行到这里
        return await doTransfer(args);
      },
    },
  ],
});

if (result.pendingApproval) {
  // 保存 checkpoint，等待 webhook 回调或人工审批
  await db.saveCheckpoint({
    messages: result.messages,
    ticketId: result.pendingApproval.ticketId,
    toolName: result.pendingApproval.toolName,
  });
  return { status: 'waiting_approval' };
}

审批通过后继续执行（tool 配置保持不变）：

const record = await db.findByTicketId(ticketId);

const result = await FunctionCallLoop.runLoop({
  messages: record.messages,
  metadata: { userId: 'u-123', orgId: 'o-456' },
  tools: [
    {
      name: 'transfer_money',
      description: 'Transfer money',
      parameters: { /* ... */ },
      approval: {
        createTicket: async (args, context) => {
          // 查询该 ticket 是否已审批
          const ticket = await db.findTicket(record.ticketId);
          if (ticket?.status === 'approved') {
            return { ticketId: ticket.id, approved: true };
          }
          return { ticketId: ticket.id };
        },
      },
      execute: async (args, context) => await doTransfer(args),
    },
  ],
});

关键点：

approval.createTicket 被调用时，execute 不会执行；返回 { approved: true } 时直接执行
createTicket 可通过 context.metadata 访问 LoopConfig.metadata，用于携带业务上下文
result.messages 中已包含完整的对话历史（assistant 的 tool_calls + 占位 tool result），可直接用于续跑
恢复执行时无需去掉 approval 配置，通过 createTicket 内部判断审批状态即可

Tool 发现（动态可见性）

每次调用 LLM 之前，Loop 会对所有已注册 tool 执行 discover 函数，动态决定本轮暴露哪些 tool：

const tool = {
  name: 'admin_only',
  description: 'Admin operation',
  parameters: {},
  discover: async (harness, metadata) => {
    // 根据 Harness 或元数据决定是否暴露
    const isAdmin = metadata?.role === 'admin';
    return {
      name: 'admin_only',
      description: 'Admin operation',
      parameters: {},
      visible: isAdmin,
    };
  },
  execute: async (args, context) => { /* ... */ },
};

| 边界情况 | 行为 | |---|---| | discover 抛异常 | 视为 visible: false，该 tool 本轮隐藏 | | 所有 tool 都不可见 | LLM 接收空 tool 列表 | | LLM 调用了不可见的 tool | 按错误处理，记录 error 状态 | | 未配置 discover | 使用静态注册配置，visible: true |

Turn Policy（全局轮次策略）

每轮开始时，Loop 先执行 Turn Policy，可注入一条 system 消息干预本轮：

const result = await FunctionCallLoop.runLoop({
  // ...
  turnPolicy: async (harness, turn, metadata) => {
    // 连续失败时注入提醒
    const failCount = harness.filter((r) => r.status === 'error').length;
    if (failCount >= 3) {
      return {
        injectMessage: 'Previous calls failed, please try a different approach.',
      };
    }
    return {};
  },
});

默认策略（未自定义时自动生效）：

| 条件 | 行为 | |---|---| | turn < maxTurns - 1 | 继续执行 | | turn === maxTurns - 1 | 注入 warningMessage，继续执行 | | turn >= maxTurns | 交由硬兜底处理 |

| 边界情况 | 行为 | |---|---| | turnPolicy 抛异常 | 视为无注入，继续执行 |

上下文 compact

控制给 LLM 的历史消息长度，只动 Messages，不动 Harness：

const result = await FunctionCallLoop.runLoop({
  // ...
  compact: {
    keepRounds: 3, // 最近 3 轮保持完整
    compress: async (oldRounds) => {
      // oldRounds: 需要压缩的轮次数组，每轮是一个 Message 数组
      // 返回压缩后的 Message 数组
      return [
        {
          role: 'system',
          content: `Previous ${oldRounds.length} rounds summarized...`,
        },
      ];
    },
  },
});

| 规则 | 说明 | |---|---| | compact 只影响 Messages | Harness 始终完整保留 | | 系统消息和初始输入不参与压缩 | 始终保留 | | compress 抛异常 | 视为不压缩，使用原始 Messages 继续执行 |

取消信号

支持通过 AbortSignal 终止 Loop：

const controller = new AbortController();

const promise = FunctionCallLoop.runLoop({
  // ...
  signal: controller.signal,
});

// 5 秒后取消
setTimeout(() => controller.abort(), 5000);

const result = await promise;

配置项

`runLoop` / `runLoopStream` 配置

两个函数共享同一套配置（LoopConfig），runLoopStream 额外要求 llmCaller 支持 stream 方法：

对话入口

| 配置项 | 类型 | 必填 | 说明 | |---|---|---|---| | messages | Message[] | ✓ | 初始消息列表，直接作为对话起点 |

Tool 与执行控制

| 配置项 | 类型 | 必填 | 说明 | |---|---|---|---| | tools | Tool[] | ✅ | 注册的 tool 列表 | | llmCaller | LLMCaller | — | 自定义 LLM 调用器。未设置时从环境变量自动创建 | | maxTurns | number | — | 最大轮次，默认无限制 | | turnPolicy | TurnPolicy | — | 自定义轮次策略 | | compact | CompactConfig | — | 上下文 compact 配置 | | metadata | unknown | — | 传递给 Tool 和 Turn Policy 的元数据 | | warningMessage | string | — | 默认策略在 maxTurns - 1 时注入的告警文本 | | terminateMessage | string | — | 默认策略在 maxTurns 时注入的终止文本 | | onEvent | (event) => void | — | 生命周期事件监听器 | | signal | AbortSignal | — | 取消信号 |

LLM 调用参数

| 配置项 | 类型 | 必填 | 说明 | |---|---|---|---| | model | Model | — | 模型枚举值。默认从 BAILIAN_DEFAULT_MODEL 环境变量读取，否则 qwen-max | | maxTokens | number | — | 单次 LLM 调用的最大输出 token 数 | | temperature | number | — | 采样温度，范围 0~2 | | topP | number | — | 核采样概率阈值，范围 0~1 | | reasoningEffort | 'low' \| 'medium' \| 'high' | — | 推理深度。low 关闭推理，medium/high 开启推理。具体支持情况见「模型列表」 |

返回结果

interface LoopResult {
  messages: Message[];              // 完整的对话历史
  harness: HarnessRecord[];         // 执行历史
  finalContent: string | null;      // 最终回复内容
  turns: number;                    // 实际执行轮数
  usageHistory: TokenUsage[];       // 每轮 LLM 调用的 token 消耗明细
  totalUsage: TokenUsage;           // 累计 token 消耗（含 cachedPromptTokens）
}

interface TokenUsage {
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  cachedPromptTokens?: number;      // 命中缓存的 prompt token（Prompt Caching）
}

Message 中的 reasoningContent：当使用支持推理的模型（如 DeepSeek-R1、QwQ）时，messages 中 role: 'assistant' 的条目会包含 reasoningContent 字段，记录该轮模型的完整推理过程。该字段会自动保留在对话上下文中，供多轮对话回传。

const result = await FunctionCallLoop.runLoop(config);

for (const msg of result.messages) {
  if (msg.role === 'assistant' && msg.reasoningContent) {
    console.log('[Reasoning]', msg.reasoningContent);
    console.log('[Content]', msg.content);
  }
}

错误处理

同其他模块：直接抛异常。常见场景：

execute 抛异常 → 记录 error 状态，发出事件，不影响同轮其他并行 tool
discover 抛异常 → 该 tool 本轮隐藏
compress 抛异常 → 视为不压缩
turnPolicy 抛异常 → 视为 continue

EmbeddingSearch 模块（向量检索）

EmbeddingSearch 是 Axiom 的 RAG 底座。它封装了 query → embedding → pgvector 检索 → [可选 rerank] 的完整链路，只需一行代码即可实现语义检索。

需要的列

使用 EmbeddingSearch 前，先创建 pgvector 表。只有 embedding 列是必需的（默认列名 embedding，可通过 embeddingColumn 改），其余列完全自由：

| 列名 | 类型 | 说明 | |---|---|---| | embedding | vector(1024) | 向量字段，必需，默认列名 embedding，可通过 embeddingColumn 自定义 | | (任意) | (任意) | 你自己的业务列，可直接用于过滤和返回 |

开 rerank 时的额外要求：需指定 rerankKey 告诉系统哪一列是文本内容（默认 'query'）。

CREATE EXTENSION IF NOT EXISTS vector;

-- 最简表：不开 rerank，只有向量和业务列
CREATE TABLE products (
  product_id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  category VARCHAR(50),
  price NUMERIC,
  embedding VECTOR(1024)
);

-- 开 rerank 的表：有文本列供 rerank 使用
CREATE TABLE documents (
  doc_id SERIAL PRIMARY KEY,
  title VARCHAR(100),
  body TEXT,           -- rerankKey: 'body'
  embedding VECTOR(1024)
);

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

列设计建议

EmbeddingSearch 不限制你的 schema，但建议遵循这个分层原则：

频繁过滤或需要索引的字段 → 放独立列（如 category、price），配合 filter 做数据库层过滤
不参与索引的业务杂项 → 可以放入 JSONB 列（如 metadata），通过 select 整列取出后在应用层消费。JSONB 不适合 filter，普通等值比较走不了索引，容易触发全表扫描
rerank 的文本内容 → 必须是独立的文本列（TEXT/VARCHAR），rerankKey 不支持指向 JSONB 列

快速开始

不传 select 时默认返回所有列（SELECT *）：

import { EmbeddingSearch } from '@keo-ai/axiom';
import { Pool } from 'pg';

const pool = new Pool({ connectionString: 'postgresql://...' });

// 默认 SELECT *，返回所有列
const results = await EmbeddingSearch.query('衣服质量怎么样', {
  tableName: 'documents',
}, pool);

for (const r of results) {
  console.log(r.id, r.query, r.embeddingScore, r.rerankScore);
}

指定 select 只返回需要的列：

const results = await EmbeddingSearch.query('衣服质量怎么样', {
  tableName: 'documents',
  select: ['id', 'category', 'price'],
}, pool);

// 只返回 id、category、price 三列，以及 embeddingScore 和 rerankScore
for (const r of results) {
  console.log(r.category, r.price, r.embeddingScore);
}

文本列通过 rerankKey 指定：

const results = await EmbeddingSearch.query('衣服质量怎么样', {
  tableName: 'documents',
  rerankKey: 'body',
}, pool);

返回结构

SearchResult 不固定字段，取决于你查询了哪些列，但始终包含两个 score：

interface SearchResult {
  [key: string]: unknown;  // 你 select 的列
  embeddingScore: number;   // cosine similarity，范围 -1 ~ 1
  rerankScore?: number;     // 仅 enableRerank 时存在
}

EmbeddingSearch.query 配置项

| 配置项 | 类型 | 默认值 | 说明 | |---|---|---|---| | tableName | string | — | 检索表名（必填） | | enableRerank | boolean | true | 是否启用 rerank | | embeddingTopK | number | 10 | pgvector 检索返回条数（rerank 的候选集） | | finalTopN | number | 5 | 最终结果条数 | | embeddingThreshold | number | 0.8 | embedding 相似度阈值过滤（cosine similarity） | | rerankThreshold | number | 0.1 | rerank 分数阈值过滤 | | embeddingColumn | string | 'embedding' | 向量列名 | | rerankKey | string | 'query' | 启用 rerank 时，用于重排序的文本列名（必须是文本列，不支持 JSONB） | | dimensions | number | 1024 | embedding 输出维度（1 ~ 1024） | | select | string[] | undefined | 指定返回哪些列，不传则 SELECT * | | filter | Record<string, unknown> | — | 对表独立列做等值/范围过滤（不支持 JSONB 内部字段） | | parseEnum | boolean | true | 是否将枚举字段的数字值自动解析为可读文本 |

过滤

支持对表的任意独立列做等值、范围和 IN 查询：

// 等值过滤
const results = await EmbeddingSearch.query('query', {
  filter: { category: '衣服', shop: '旗舰店' },
}, pool);

// 数值范围
const ranked = await EmbeddingSearch.query('query', {
  filter: {
    category: '衣服',
    price: { $gt: 100, $lt: 500 },
    rating: { $gte: 4 },
  },
}, pool);

// 数组：IN 查询
const multi = await EmbeddingSearch.query('query', {
  filter: {
    category: ['衣服', '裤子', '鞋子'],
  },
}, pool);

filter 只支持对独立列做过滤，不支持 JSONB 内部字段（如 metadata->>'field'）。

枚举值解析

默认开启 parseEnum，返回结果中的特定枚举字段会自动从数字值解析为可读文本。如需关闭解析（保留原始数字值），设置 parseEnum: false。

未知枚举值、非数字类型、非枚举字段均保持原样不变。

生成 Embedding

如果你只需要把文本转成向量，直接用 embed：

import { embed } from '@keo-ai/axiom';

const vector = await embed('衣服质量怎么样');
// vector: number[]，默认 1024 维

// 指定维度（1 ~ 1024）
const vector256 = await embed('衣服质量怎么样', 256);

| 参数 | 类型 | 默认值 | 说明 | |---|---|---|---| | text | string | — | 要转成向量的文本（必填） | | dimensions | number | 1024 | 输出维度（1 ~ 1024） |

纯向量检索（已有向量）

EmbeddingSearch.query 会自动把文本转成向量并做 rerank。如果你已经有了向量（比如自己生成的 embedding），只想做最原始的 pgvector 近邻搜索，用 EmbeddingSearch.vectorSearch：

import { EmbeddingSearch } from '@keo-ai/axiom';

const results = await EmbeddingSearch.vectorSearch({
  pool,
  tableName: 'documents',
  embeddingColumn: 'vec',
  vector: queryVector,  // 你自己生成的向量
  topK: 10,
  threshold: 0.7,
  filter: { category: '衣服' },
});

EmbeddingSearch.vectorSearch 不做 embedding，也不做 rerank，只负责拿你给的向量去 pgvector 里搜近邻。

| 参数 | 类型 | 默认值 | 说明 | |---|---|---|---| | pool | Pool | — | pg 连接池（必填） | | tableName | string | — | 检索表名（必填） | | vector | number[] | — | 查询向量（必填） | | topK | number | — | 返回条数（必填） | | threshold | number | — | 相似度阈值 | | embeddingColumn | string | 'embedding' | 向量列名 | | filter | Record<string, unknown> | — | 对表独立列做过滤 | | select | string[] | undefined | 指定返回哪些列 |

依赖

EmbeddingSearch 模块内部调用百炼的 embedding 和 rerank 服务，无需额外配置：

Embedding：text-embedding-v4，1024 维，兼容 OpenAI 协议
Rerank：qwen3-rerank，百炼原生 API

只需确保 BAILIAN_API_KEY 已设置。

错误处理

同 Predict 模块：直接抛异常。常见错误：

BAILIAN_API_KEY environment variable is not set — 未配置 API Key
[embedding] HTTP 4xx/5xx — embedding 接口异常
[rerank] HTTP 4xx/5xx — rerank 接口异常
Invalid table name — 表名包含非法字符

设计理念

┌────────────────────────────────────────┐
│  Axiom 不纠正模型。                     │
│  Axiom 保证模型永远不会基于幻觉推理。     │
└────────────────────────────────────────┘

底座驱动循环 — Axiom 串起 LLM → 解析 → 校验 → 执行 → 记录的闭环
项目层填充内容 — 上层准备消息、工具、业务逻辑，Axiom 负责可靠执行
副作用即状态 — 工具调用改变外部世界，Harness 让这种改变在 Loop 间可见
单一事实来源 — Harness 的执行记录是 Function Call 的唯一权威来源

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

⚡ AXIOM

什么是 Axiom

架构边界

安装

Predict 模块（LLM 调用）

快速开始

API 概览

responseFormat 与返回类型

Provider 配置

故障转移

流式调用

模型列表

错误处理

Function Call Loop 模块

核心设计

快速开始

流式调用（runLoopStream）

流式事件类型

流式 vs 非流式的选择

生命周期事件

Harness（执行历史）

Tool 结果分流（ToolResult）

Tool 审批拦截

Tool 发现（动态可见性）

Turn Policy（全局轮次策略）

上下文 compact

取消信号

配置项

runLoop / runLoopStream 配置

返回结果

错误处理

EmbeddingSearch 模块（向量检索）

需要的列

列设计建议

快速开始

返回结构

EmbeddingSearch.query 配置项

过滤

枚举值解析

生成 Embedding

纯向量检索（已有向量）

依赖

错误处理

设计理念

`runLoop` / `runLoopStream` 配置