@litomore/coli

v0.0.2

Published

2 months ago

A CLI for the Cola

Downloads

0High
0Medium
0Low

litomore

coli

Core library for Cola. Provides essential capabilities commonly used by agents.

CLI

npm i -g @litomore/coli

Usage
  $ coli <command> [options]

Commands
  asr    Transcribe an audio file using speech recognition

Examples
  $ coli asr recording.m4a
  $ coli asr -j recording.m4a
  $ coli asr --model whisper recording.wav

`coli asr`

Transcribe an audio file using speech recognition.

# Plain text output
coli asr recording.m4a

# JSON output
coli asr -j recording.m4a

# Select model
coli asr --model whisper recording.wav

Options

-j, --json     Output result in JSON format
--model        Model to use: whisper, sensevoice (default: sensevoice)

JSON output example

{
	"text": "The tribal chieftain called for the boy.",
	"model": "sensevoice-small",
	"lang": "<|en|>",
	"emotion": "<|NEUTRAL|>",
	"event": "<|Speech|>",
	"tokens": ["The", " tri", "bal", " chief", "tain", "..."],
	"timestamps": [0.9, 1.26, 1.56, 1.8, 2.16, "..."],
	"duration": 7.152
}

API

npm i @litomore/coli

ASR

`ensureModels()`

Check for required models and download any that are missing. Call this before runAsr.

import {ensureModels} from '@litomore/coli';

await ensureModels();

`runAsr(filePath, options)`

Run speech recognition on an audio file. Results are printed to stdout.

import {ensureModels, runAsr} from '@litomore/coli';

await ensureModels();

// Plain text output
await runAsr('recording.m4a', {json: false, model: 'sensevoice'});

// JSON output
await runAsr('recording.m4a', {json: true, model: 'whisper'});

Options

| Property | Type | Description | | -------- | --------------------------- | ----------------------------------------------------------------------------- | | json | boolean | Output JSON (with model name, tokens, timestamps, etc.) instead of plain text | | model | 'whisper' \| 'sensevoice' | Which model to use for recognition |

`getModelPath(model)`

Returns the local filesystem path for a given model.

import {getModelPath} from '@litomore/coli';

getModelPath('sensevoice');
// => '/Users/you/.coli/models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17'

getModelPath('whisper');
// => '/Users/you/.coli/models/sherpa-onnx-whisper-tiny.en'

`modelDisplayNames`

A mapping from model key to its human-readable display name.

import {modelDisplayNames} from '@litomore/coli';

modelDisplayNames.sensevoice; // => 'sensevoice-small'
modelDisplayNames.whisper; // => 'whisper-tiny.en'

Models

On first run, coli automatically downloads ASR models to ~/.coli/models/:

| Name | Model | Languages | | ---------------------- | ------------------------------------------------------------------ | --------------------------------------------- | | sensevoice (default) | SenseVoice Small int8 | Chinese, English, Japanese, Korean, Cantonese | | whisper | Whisper tiny.en int8 | English |

Supported audio formats

WAV files are passed directly to the recognizer. All other formats (m4a, mp3, ogg, flac, etc.) are automatically converted to 16 kHz mono WAV via ffmpeg.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

coli

CLI

coli asr

API

ASR

ensureModels()

runAsr(filePath, options)

getModelPath(model)

modelDisplayNames