doubaoime-asr

v0.1.0

Published

7 days ago

豆包输入法 ASR TypeScript 客户端

0High
0Medium
0Low

konghayao

asr speech-recognition doubao speech-to-text voice-recognition realtime-asr streaming-asr

doubaoime-asr

豆包输入法语音识别 TypeScript 客户端。

免责声明

本项目通过对安卓豆包输入法客户端通信协议分析并参考客户端代码实现，非官方提供的 API。

本项目仅供学习和研究目的
不保证未来的可用性和稳定性
服务端协议可能随时变更导致功能失效

安装

npm install doubaoime-asr

环境要求

Node.js >= 18.0.0

依赖说明

本库使用以下运行时依赖（安装时自动拉取）：

| 依赖 | 用途 | |------|------| | protobufjs | Protobuf 编解码（ASR WebSocket 通信协议） | | ws | WebSocket 客户端（与服务端建立 ASR 连接） | | opus-encdec | WASM 版 Opus 编码器（PCM → Opus 帧编码，无 native 绑定） |

所有加密操作（ECDH、ChaCha20、HKDF、ECDSA）均使用 Node.js 内置 crypto 模块，无额外依赖。

运行麦克风示例

麦克风录音示例调用 SoX 采集音频，需要系统安装 SoX：

# macOS
brew install sox

# Debian/Ubuntu
sudo apt install sox

# Arch Linux
sudo pacman -S sox

注意: macOS 首次运行时会弹出麦克风权限对话框，授权后即可正常使用。

运行时注意事项

推荐用 tsx 运行 — protobufjs 和 opus-encdec 是 CJS 模块，通过 createRequire 加载。使用 tsx 或 node --experimental-strip-types 运行：
```
npx tsx examples/mic-realtime.ts
```

快速开始

实时麦克风识别

import { transcribeRealtime, ASRConfig, ResponseType } from "doubaoime-asr";

const config = new ASRConfig({ credentialPath: "./credentials.json" });

// audioSource 是一个 AsyncIterable<Uint8Array>，每个 chunk 为 16-bit PCM 数据
for await (const response of transcribeRealtime(audioSource, { config })) {
  switch (response.type) {
    case ResponseType.INTERIM_RESULT:
      process.stdout.write(`\r[识别中] ${response.text}`);
      break;
    case ResponseType.FINAL_RESULT:
      console.log(`\r[最终]   ${response.text}`);
      break;
    case ResponseType.ERROR:
      console.error(`[错误] ${response.errorMsg}`);
      break;
  }
}

流式识别（完整音频 Buffer）

import { transcribeStream, ASRConfig, ResponseType } from "doubaoime-asr";

const config = new ASRConfig({ credentialPath: "./credentials.json" });
const pcmBuffer: Buffer = /* 16-bit PCM 音频数据 */;

for await (const response of transcribeStream(pcmBuffer, { config })) {
  if (response.type === ResponseType.FINAL_RESULT) {
    console.log(`识别结果: ${response.text}`);
  }
}

非流式识别

import { transcribe, ASRConfig } from "doubaoime-asr";

const config = new ASRConfig({ credentialPath: "./credentials.json" });
const pcmBuffer: Buffer = /* 16-bit PCM 音频数据 */;

const result = await transcribe(pcmBuffer, {
  config,
  onInterim: (text) => console.log(`[中间] ${text}`),
});

console.log(`最终结果: ${result}`);

NER 命名实体识别

import { ASRConfig, ner } from "doubaoime-asr";

const config = new ASRConfig({ credentialPath: "./credentials.json" });
const result = await ner(config, "张三李四以及张三在使用 Chrome 浏览器");

for (const r of result.results) {
  for (const w of r.words) {
    console.log(`${w.word} (频率: ${w.freq})`);
  }
}

API 参考

transcribe

非流式语音识别，返回最终结果。

async function transcribe(
  audio: Buffer,
  options?: {
    config?: ASRConfig;
    realtime?: boolean;
    onInterim?: (text: string) => void;
  },
): Promise<string>

参数：

audio — 16-bit PCM 音频数据（Buffer）
options.config — ASR 配置
options.realtime — 是否模拟实时发送（每帧间加入延迟），默认 false（尽快发送）
options.onInterim — 中间结果回调

transcribeStream

流式语音识别，返回 ASRResponse 异步迭代器。

async function* transcribeStream(
  audio: Buffer,
  options?: {
    config?: ASRConfig;
    realtime?: boolean;
  },
): AsyncGenerator<ASRResponse>

transcribeRealtime

实时流式语音识别，接收 PCM 音频数据的异步迭代器。

async function* transcribeRealtime(
  audioSource: AsyncIterable<Uint8Array>,
  options?: {
    config?: ASRConfig;
  },
): AsyncGenerator<ASRResponse>

ner

命名实体识别。

async function ner(
  config: ASRConfig,
  text: string,
  appName?: string,
): Promise<NerResponse>

ASRConfig

配置类：

| 参数 | 类型 | 默认值 | 说明 | |------|------|--------|------| | credentialPath | string | — | 凭据缓存文件路径 | | deviceId | string | — | 设备 ID（空则自动注册） | | token | string | — | 认证 Token（空则自动获取） | | sampleRate | number | 16000 | 采样率 | | channels | number | 1 | 声道数 | | enablePunctuation | boolean | true | 是否启用标点 | | frameDurationMs | number | 20 | 每帧时长（毫秒） |

ResponseType

响应类型枚举：

| 类型 | 说明 | |------|------| | TASK_STARTED | 任务已启动 | | SESSION_STARTED | 会话已启动 | | VAD_START | 检测到语音开始 | | INTERIM_RESULT | 中间识别结果 | | FINAL_RESULT | 最终识别结果 | | SESSION_FINISHED | 会话结束 | | ERROR | 错误 |

凭据管理

首次使用时会自动向服务器注册虚拟设备并获取认证 Token。推荐指定 credentialPath 参数，凭据会自动缓存到 JSON 文件，避免重复注册：

const config = new ASRConfig({
  credentialPath: "~/.config/doubaoime-asr/credentials.json",
});

运行示例

# 安装依赖
npm install

# 实时麦克风识别
npx tsx examples/mic-realtime.ts

# NER 命名实体识别
npx tsx examples/ner.ts

开发

# 类型检查
npx tsc --noEmit

# 运行测试
npm test

# 构建
npm run build

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

doubaoime-asr

免责声明

安装

环境要求

依赖说明

运行麦克风示例

运行时注意事项

快速开始

实时麦克风识别

流式识别（完整音频 Buffer）

非流式识别

NER 命名实体识别

API 参考

transcribe

transcribeStream

transcribeRealtime

ner

ASRConfig

ResponseType

凭据管理

运行示例

开发