fnaught

v0.1.3

Published

7 days ago

Lightweight monophonic pitch (f0) detection for Node and the browser. 106k-parameter ONNX model with a dual-resolution STFT + instantaneous-frequency frontend.

Downloads

596

0High
0Medium
0Low

mnaoizy

pitch pitch-detection f0 fundamental-frequency audio music speech onnx tuner

fnaught

Lightweight monophonic pitch (f0) detection for Node.js and the browser.

fnaught (“f-naught”, as f₀ is read aloud) estimates the fundamental frequency of speech, singing, and monophonic instruments in real time. It runs a small (106k-parameter, 428 KB) ONNX model through onnxruntime-web, so the same package works in Node and in the browser with no native dependencies. Audio never leaves the process.

16 kHz analysis, one pitch estimate every 16 ms, range 46.9–2093.8 Hz
Dual-resolution STFT (1024/4096) + instantaneous-frequency frontend, designed for accuracy on low-pitched voices
Batch and streaming APIs, plus note-name/cents utilities for tuner UIs
The TypeScript DSP frontend is verified in CI to match the PyTorch training implementation numerically

Install

npm install fnaught

Quick start

import { PitchDetector } from "fnaught";

const detector = await PitchDetector.create();

// audio: Float32Array of mono samples in [-1, 1]
const { pitchHz, confidence, timestamps } = await detector.detect(audio, {
  sampleRate: 44100, // resampled to 16 kHz internally
});

for (let i = 0; i < pitchHz.length; i++) {
  if (confidence[i] > 0.9) {
    console.log(`${timestamps[i].toFixed(2)}s  ${pitchHz[i].toFixed(1)} Hz`);
  }
}

pitchHz is defined for every frame; use confidence (0–1) to gate voiced frames. A threshold around 0.9 works well.

Streaming (microphone)

import { PitchDetector, hzToNote } from "fnaught";

const detector = await PitchDetector.create();
const stream = detector.createStream({
  sampleRate: audioContext.sampleRate,
  onFrame: ({ pitchHz, confidence, time }) => {
    if (confidence > 0.9) {
      const note = hzToNote(pitchHz);
      console.log(`${note.name}${note.octave} ${note.cents.toFixed(0)}¢`);
    }
  },
});

// From an AudioWorklet / ScriptProcessor callback:
stream.push(channelData);

The stream computes features incrementally (cost proportional to new audio only) and emits one frame per 16 ms of input, ~300 ms behind real time at the default hopsPerInference: 8 (frames are finalized once the model's full receptive field is available, so streaming output is numerically identical to batch detect()). Call await stream.flush() after the last chunk to emit the remaining tail frames. Works for clips of any length.

Model loading

The ONNX model (428 KB) ships inside the package.

Node: loaded from the package directory; works offline.
Browser: fetched by default from jsDelivr, which mirrors the exact file published to npm: https://cdn.jsdelivr.net/npm/fnaught@<version>/model/fnaught.onnx

To self-host, pass your own source:

const detector = await PitchDetector.create({
  model: "/assets/fnaught.onnx", // or an ArrayBuffer / Uint8Array
});

If your bundler does not resolve the onnxruntime-web WASM binaries, point to a CDN copy:

const detector = await PitchDetector.create({
  wasmPaths: "https://cdn.jsdelivr.net/npm/[email protected]/dist/",
});

API

| Export | Description | | --- | --- | | PitchDetector.create(options?) | Load the model and create a detector. | | detector.detect(audio, options?) | Batch pitch detection over a buffer. | | detector.createStream(options?) | Streaming detector for realtime input. | | hzToNote(hz, a4?) | Frequency → { name, octave, midi, cents }. | | resampleLinear(audio, from, to) | Simple linear resampler. | | computeFeatures / decode | Low-level DSP, exposed for advanced use. |

Model

The bundled model is a 106k-parameter CNN trained on public pitch datasets (MIR-1K, MDB-stem-synth, PTDB-TUG, and synthetic speech) with noise and pitch-shift augmentation. Evaluation details, training code, and benchmark scripts are available in the project repository. The model card is published at https://huggingface.co/mnaoizyyy/fnaught.

Performance notes (CPU, single thread): ~70 ms to process 2 s of audio (≈30× real time), suitable for realtime use on modest hardware.

License

MIT (code). The model weights are released under the same terms; they were trained on publicly available research datasets — see the model card for dataset attributions.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

fnaught

Install

Quick start

Streaming (microphone)

Model loading

API

Model

License