npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

speech-asr

v1.1.6

Published

Browser-based real-time speech recognition using sherpa-onnx WebAssembly

Readme

speech-asr

基于 sherpa-onnx WebAssembly 的浏览器实时语音识别 SDK,纯前端运行,无需后端服务。

✨ 功能特性

  • 🎤 实时语音识别:支持 Zipformer/Paraformer 模型,纯浏览器端推理
  • 🤫 语音活动检测:内置 Silero VAD,自动分段识别
  • ✏️ 标点符号恢复:集成标点模型,提升文本可读性
  • 🔁 Two-Pass 增强:可选 SenseVoice 二次解码,显著提升准确率
  • 📦 开箱即用:一行配置,支持实时流式和文件转写

📦 安装

npm install speech-asr@latest

🚀 快速开始

1. 下载模型 ZIP 包

Hugging Face 下载模型包:

选项 A:1-Pass(1Pass实时识别)

  • 文件:sherpa-onnx-wasm-asr-1pass.zip
  • 特点:快速实时,适合基础转写需求
  • 包含:Zipformer + Silero VAD + 标点模型

选项 B:1&2-Pass(实时+2Pass增强)

  • 文件:sherpa-onnx-wasm-asr-1&2pass.zip
  • 特点:两次识别,准确率更高
    • Pass 1: 快速实时 Zipformer
    • Pass 2: SenseVoice 增强
  • 包含:完整模型 + VAD + 标点 + Worker

解压到项目目录,例如 ./models/sherpa-onnx-wasm-asr-20251211-120752/

2. 简单配置(推荐)

import { SpeechASR } from 'speech-asr';

const asr = new SpeechASR({
  vadMode: 'silero',            // 启用 VAD 自动分段
  twoPass: { enabled: true },   // 启用二次增强(1&2-Pass 包)
  punctuation: { enabled: true }, // 启用标点恢复
  modelPaths: {
    m_path: './models/sherpa-onnx-wasm-asr-20251211-120752'  // 模型目录
  },
  onReady: () => console.log('✅ ASR 已就绪'),
  onPartial: (text) => console.log('🎤 实时:', text),
  onResult: (text) => console.log('📝 结果:', text),
  onTwoPassResult: ({ offlineText }) => console.log('🚀 增强:', offlineText),
});

await asr.init();
await asr.start();  // 开始录音识别
// asr.stop();      // 停止
// asr.destroy();   // 释放资源

3. 手动二次识别模式(stop 后只跑一次 2-Pass)

const asr = new SpeechASR({
  vadMode: 'off', // 手动模式可关闭 VAD,避免自动分段
  twoPass: {
    enabled: true,
    mode: 'manual-stop',          // 手动:只在 stop() 后跑一次二次识别
    disableOnlineInManual: true,  // 可选:关闭在线 1pass,节省 CPU
    autoWarmup: true,             // 可选:init 后预热 2-Pass,减少首次等待
  },
  modelPaths: { m_path: './models/sherpa-onnx-wasm-asr-20251211-120752' },
  onReady: () => console.log('✅ 按住空格开始,松开停止'),
});

await asr.init();
// 绑定键盘:按下空格 start(),松开空格 stop()

提示:disableOnlineInManual 为 true 时不会输出实时 1pass 结果,只保留 stop 后的 2-Pass 结果;若需实时文本,将其设为 false(默认)。

📚 API 配置说明

核心选项

| 选项 | 类型 | 默认值 | 说明 | |------|------|--------|------| | modelPaths.m_path | string | - | 推荐方式,指定模型目录,SDK 自动拼接所有文件 | | vadMode | 'off' \| 'silero' | 'off' | 语音活动检测模式 | | twoPass.enabled | boolean | false | 启用二次识别增强 | | punctuation.enabled | boolean | false | 启用标点恢复 | | sampleRate | number | 16000 | 音频采样率 |

VAD 配置(vadMode='silero' 时)

vad: {
  silero: {
    threshold: 0.5,              // 检测阈值 (0-1)
    minSilenceDuration: 0.3,     // 最短静音时长(秒)
    minSpeechDuration: 0.25,     // 最短语音时长(秒)
    maxSpeechDuration: 20,       // 最长语音片段(秒)
  }
}

回调函数

| 回调 | 参数 | 说明 | |------|------|------| | onReady | () | ASR 初始化完成 | | onPartial | (text: string) | 实时部分结果(未最终确认) | | onResult | (text: string) | 在线识别最终结果 | | onTwoPassResult | ({ offlineText, streamingText }) | 二次识别增强结果 | | onTwoPassError | (error) | 二次识别错误 | | onError | (error) | 错误回调 |

生命周期

// 1. 初始化
await asr.init();

// 2. 开始识别
await asr.start();

// 3. 停止识别
asr.stop();

// 4. 清空缓存
asr.clear();

// 5. 销毁实例
asr.destroy();

📦 模型包结构

1-Pass 包

sherpa-onnx-wasm-asr-1pass/
├── sherpa-onnx-wasm-main-asr.data    # 模型权重
├── sherpa-onnx-wasm-main-asr.js      # WASM 包装器
├── sherpa-onnx-wasm-main-asr.wasm    # WASM 运行时
├── sherpa-onnx-asr.js                # ASR 辅助函数
└── sherpa-onnx-vad.js                # VAD 辅助函数

1&2-Pass 包(包含以上所有文件 +)

└── offline-worker.js                 # 二次识别 Worker

📝 注意事项

  • 所有模型在浏览器端运行,无需后端服务
  • 首次加载会下载模型文件,建议本地部署

🔗 相关链接

🛠️ 开发

npm run build        # 构建
npm run dev          # 开发模式

📄 许可证

MIT License