fnaught
v0.1.3
Published
Lightweight monophonic pitch (f0) detection for Node and the browser. 106k-parameter ONNX model with a dual-resolution STFT + instantaneous-frequency frontend.
Downloads
596
Maintainers
Readme
fnaught
Lightweight monophonic pitch (f0) detection for Node.js and the browser.
fnaught (“f-naught”, as f₀ is read aloud) estimates the fundamental frequency of speech, singing, and monophonic
instruments in real time. It runs a small (106k-parameter, 428 KB) ONNX model
through onnxruntime-web, so the same package works in Node and in the
browser with no native dependencies. Audio never leaves the process.
- 16 kHz analysis, one pitch estimate every 16 ms, range 46.9–2093.8 Hz
- Dual-resolution STFT (1024/4096) + instantaneous-frequency frontend, designed for accuracy on low-pitched voices
- Batch and streaming APIs, plus note-name/cents utilities for tuner UIs
- The TypeScript DSP frontend is verified in CI to match the PyTorch training implementation numerically
Install
npm install fnaughtQuick start
import { PitchDetector } from "fnaught";
const detector = await PitchDetector.create();
// audio: Float32Array of mono samples in [-1, 1]
const { pitchHz, confidence, timestamps } = await detector.detect(audio, {
sampleRate: 44100, // resampled to 16 kHz internally
});
for (let i = 0; i < pitchHz.length; i++) {
if (confidence[i] > 0.9) {
console.log(`${timestamps[i].toFixed(2)}s ${pitchHz[i].toFixed(1)} Hz`);
}
}pitchHz is defined for every frame; use confidence (0–1) to gate voiced
frames. A threshold around 0.9 works well.
Streaming (microphone)
import { PitchDetector, hzToNote } from "fnaught";
const detector = await PitchDetector.create();
const stream = detector.createStream({
sampleRate: audioContext.sampleRate,
onFrame: ({ pitchHz, confidence, time }) => {
if (confidence > 0.9) {
const note = hzToNote(pitchHz);
console.log(`${note.name}${note.octave} ${note.cents.toFixed(0)}¢`);
}
},
});
// From an AudioWorklet / ScriptProcessor callback:
stream.push(channelData);The stream computes features incrementally (cost proportional to new audio
only) and emits one frame per 16 ms of input, ~300 ms behind real time at the
default hopsPerInference: 8 (frames are finalized once the model's full
receptive field is available, so streaming output is numerically identical
to batch detect()). Call await stream.flush() after the last chunk to
emit the remaining tail frames. Works for clips of any length.
Model loading
The ONNX model (428 KB) ships inside the package.
- Node: loaded from the package directory; works offline.
- Browser: fetched by default from jsDelivr, which mirrors the exact file
published to npm:
https://cdn.jsdelivr.net/npm/fnaught@<version>/model/fnaught.onnx
To self-host, pass your own source:
const detector = await PitchDetector.create({
model: "/assets/fnaught.onnx", // or an ArrayBuffer / Uint8Array
});If your bundler does not resolve the onnxruntime-web WASM binaries, point
to a CDN copy:
const detector = await PitchDetector.create({
wasmPaths: "https://cdn.jsdelivr.net/npm/[email protected]/dist/",
});API
| Export | Description |
| --- | --- |
| PitchDetector.create(options?) | Load the model and create a detector. |
| detector.detect(audio, options?) | Batch pitch detection over a buffer. |
| detector.createStream(options?) | Streaming detector for realtime input. |
| hzToNote(hz, a4?) | Frequency → { name, octave, midi, cents }. |
| resampleLinear(audio, from, to) | Simple linear resampler. |
| computeFeatures / decode | Low-level DSP, exposed for advanced use. |
Model
The bundled model is a 106k-parameter CNN trained on public pitch datasets (MIR-1K, MDB-stem-synth, PTDB-TUG, and synthetic speech) with noise and pitch-shift augmentation. Evaluation details, training code, and benchmark scripts are available in the project repository. The model card is published at https://huggingface.co/mnaoizyyy/fnaught.
Performance notes (CPU, single thread): ~70 ms to process 2 s of audio (≈30× real time), suitable for realtime use on modest hardware.
License
MIT (code). The model weights are released under the same terms; they were trained on publicly available research datasets — see the model card for dataset attributions.
