pitch-detection
v1.0.0
Published
Pitch, chroma, chord and key detection: YIN, McLeod, pYIN, HPS, cepstrum, SWIPE, autocorrelation, AMDF, NNLS chroma, chord templates, Krumhansl-Schmuckler
Maintainers
Readme
pitch-detection

Pitch, chroma, chord and key detection. YIN, McLeod, pYIN, HPS, cepstrum, SWIPE, autocorrelation, AMDF, NNLS chroma, chord templates, Krumhansl-Schmuckler.
Pitch
YIN — cumulative mean normalized difference McLeod — normalized square difference (MPM) pYIN — probabilistic YIN with Beta prior Autocorrelation — normalized autocorrelation AMDF — average magnitude difference
Spectral pitch
HPS — harmonic product spectrum Cepstrum — real cepstrum peak picking SWIPE — sawtooth waveform inspired estimator
Harmony
Chroma — PCP / NNLS pitch-class profiles Chord — template matching + Viterbi smoothing Key — Krumhansl-Schmuckler key finding
Install
npm install pitch-detectionUsage
import { yin, mcleod, chroma, chord, key } from 'pitch-detection'
let fs = 44100
let frame = new Float32Array(2048) // fill from your audio source
// pitch
let result = yin(frame, { fs })
// → { freq: 440.1, clarity: 0.97 } or null
// chroma → chord → key
let c = chroma(frame, { fs, method: 'nnls' })
let ch = chord(c)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
let k = key(c)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }Works in Node.js and browser. No Web Audio API needed — operates on raw
Float32Arraysamples.
Sliding windows — call repeatedly as new samples arrive:
let hop = 512
for (let i = 0; i + 2048 <= samples.length; i += hop) {
let frame = samples.subarray(i, i + 2048)
let result = yin(frame, { fs })
if (result) console.log(i / fs, result.freq.toFixed(1))
}Full pipeline — pitch → chroma → chord → key on a sequence of frames:
import { chroma, chord, smoothChords, key } from 'pitch-detection'
let frames = []
for (let i = 0; i + 4096 <= samples.length; i += 2048) {
frames.push(chroma(samples.subarray(i, i + 4096), { fs, method: 'nnls' }))
}
let chords = smoothChords(frames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]
let k = key(frames)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }API
All pitch algorithms return { freq, clarity } | null:
freq— fundamental frequency in Hzclarity— algorithm-specific confidence in[0, 1]null— no periodic structure found (silence, noise, polyphony)
Time-domain algorithms (YIN, McLeod, pYIN, autocorrelation, AMDF) accept any buffer length. Spectral algorithms (HPS, cepstrum, SWIPE, chroma) require power-of-2 length.
YIN
de Cheveigné & Kawahara, 2002. The reference algorithm for monophonic pitch estimation. Most cited, most tested, most robust.
import yin from 'pitch-detection/yin.js'
let result = yin(samples, { fs: 44100 })Steps:
- Difference function — $d(\tau) = \sum_{j=1}^{W} (x_j - x_{j+\tau})^2$ for lags $\tau = 1 \ldots W/2$
- Cumulative mean normalized difference — $d'(\tau) = d(\tau) \cdot \tau , / \sum_{j=1}^{\tau} d(j)$, with $d'(0) = 1$
- Absolute threshold — find the first $\tau$ where $d'(\tau) < \text{threshold}$, then descend to the local minimum
- Parabolic interpolation — sub-sample period using neighbors of the chosen $\tau$
- Output — $f_0 = f_s / \tau'$, $\text{clarity} = 1 - d'(\tau)$
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| threshold | 0.15 | CMND threshold — lower = stricter, fewer detections |
Use when: General-purpose monophonic pitch tracking — speech, singing, solo instruments. The most reliable choice when in doubt. Not for: Polyphonic audio (returns dominant or null), real-time with hard latency budgets (needs full window). Ref: de Cheveigné & Kawahara, "YIN, a fundamental frequency estimator for speech and music", JASA 2002. Complexity: $O(N^2/4)$ — two nested passes over half the window.
McLeod
McLeod & Wyvill, 2005. Normalized square difference with smarter peak picking. Handles smaller windows — good for vibrato and fast pitch changes.
import mcleod from 'pitch-detection/mcleod.js'
let result = mcleod(samples, { fs: 44100 })Steps:
- NSDF — $\text{NSDF}(\tau) = 2 \sum_j x_j x_{j+\tau} ;/; \bigl(\sum_j x_j^2 + \sum_j x_{j+\tau}^2\bigr)$, ranges $[-1, 1]$
- Positive-region peaks — collect the local maximum in each positive run that follows a negative region (skipping the self-correlation region at $\tau = 0$)
- Threshold — pick the first peak $\geq k \cdot \max(\text{peaks})$ (default $k = 0.9$)
- Parabolic interpolation — sub-sample the peak
- Output — $f_0 = f_s / \tau'$, $\text{clarity} = \text{NSDF}(\tau)$
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| threshold | 0.9 | Peak selection threshold as fraction of global max |
Use when: Vibrato tracking, small hop sizes, singing voice where YIN occasionally double-triggers. Not for: Highly noisy signals (NSDF is less thresholded than YIN's CMND). Ref: McLeod & Wyvill, "A smarter way to find pitch", ICMC 2005. Complexity: $O(N^2/4)$ — same asymptotic cost as YIN.
pYIN
Mauch & Dixon, 2014. Probabilistic YIN — runs YIN at multiple thresholds weighted by a Beta(2, 18) prior, producing a distribution over candidate pitches instead of a single hard pick. More robust than YIN on ambiguous frames.
import pyin from 'pitch-detection/pyin.js'
let result = pyin(samples, { fs: 44100 })
// → { freq: 440.1, clarity: 0.92, candidates: [{ freq: 440.1, prob: 0.85 }, ...] }Steps:
- CMND — same cumulative mean normalized difference as YIN
- Multi-threshold sweep — for thresholds [0.05, 0.10, …, 0.50], find the first τ where CMND dips below the threshold, then descend to local minimum
- Beta weighting — each threshold's contribution is weighted by Beta(2, 18) pdf, concentrating mass on low thresholds (strict picks)
- Aggregation — probability mass is accumulated per τ; voicing probability = $\max(0, \min(1, 1 - \text{CMND}(\tau)))$
- Parabolic interpolation — sub-sample the most probable τ
- Output —
{ freq, clarity, candidates }wherecandidatesis the full posterior sorted by probability
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| minFreq | 50 | Minimum detectable frequency (Hz) |
| maxFreq | 2000 | Maximum detectable frequency (Hz) |
Use when: Ambiguous pitched content — breathy vocals, noisy recordings, or when you need a pitch posterior for downstream HMM tracking. Not for: Clean signals where YIN already works well (pYIN is ~10× slower due to multi-threshold sweep). Ref: Mauch & Dixon, "pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions", ICASSP 2014.
Autocorrelation
Normalized autocorrelation — the simplest pitch estimator. Educational baseline.
import autocorrelation from 'pitch-detection/autocorrelation.js'
let result = autocorrelation(samples, { fs: 44100 })Steps:
- Autocorrelation — $r(\tau) = \sum_{j=0}^{W-\tau-1} x_j x_{j+\tau}$ for $\tau = 0 \ldots W/2$
- Normalize — divide by $r(0)$ so $r(0) = 1$
- Peak pick — descend past the initial region, climb to the first peak above threshold
- Parabolic interpolation — sub-sample the peak using neighbors of $\tau$
- Output — $f_0 = f_s / \tau'$, $\text{clarity} = r(\tau)$
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| threshold | 0.5 | Minimum normalized autocorrelation value to accept |
Use when: Learning, quick prototypes, signals with strong dominant periodicity and low noise. Not for: Production — octave errors are common without additional heuristics. Ref: Rabiner, "Use of autocorrelation analysis for pitch detection", IEEE TASSP 1977. Complexity: $O(N^2/4)$.
AMDF
Ross et al., 1974. Average Magnitude Difference Function — the classical predecessor to YIN. Measures average absolute difference between a signal and its delayed copy; minima indicate periodicity.
import amdf from 'pitch-detection/amdf.js'
let result = amdf(samples, { fs: 44100 })Steps:
- AMDF — $d(\tau) = \frac{1}{N - \tau} \sum_{i=0}^{N-\tau-1} |x_i - x_{i+\tau}|$ for valid lag range
- Normalize — divide by max so threshold is scale-invariant
- First minimum — find the first local minimum below threshold
- Parabolic interpolation — sub-sample the minimum
- Output — $f_0 = f_s / \tau'$, $\text{clarity} = 1 - d(\tau)$
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| minFreq | 50 | Minimum detectable frequency (Hz) |
| maxFreq | 2000 | Maximum detectable frequency (Hz) |
| threshold | 0.3 | Normalized AMDF dip threshold |
Use when: Low-complexity environments, embedded systems. Simpler and cheaper than YIN (no squaring, no cumulative normalization). Not for: Noisy signals — lacks YIN's cumulative normalization that suppresses octave errors. Ref: Ross et al., "Average magnitude difference function pitch extractor", IEEE TASSP 1974. Complexity: $O(N^2/4)$.
HPS
Schroeder, 1968. Harmonic Product Spectrum — multiplies the spectrum by its downsampled copies so that harmonic peaks align at the fundamental. Robust to the missing-fundamental problem.
import hps from 'pitch-detection/hps.js'
let result = hps(samples, { fs: 44100 })Steps:
- FFT — magnitude spectrum via
rfft - Log-spaced candidates — at configurable cent resolution (default 10 cents)
- Interpolated harmonic product — $H(f_0) = \sum_{h=1}^{K} \log \hat{X}(h \cdot f_0)$ with linear interpolation on the magnitude spectrum to avoid bin-alignment bias
- Parabolic interpolation — in log-frequency for sub-cent accuracy
- Clarity — ratio of peak height to second non-adjacent local maximum
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| harmonics | 5 | Number of harmonic products |
| minFreq | 50 | Minimum detectable frequency (Hz) |
| maxFreq | 4000 | Maximum detectable frequency (Hz) |
| cents | 10 | Candidate spacing in cents |
| threshold | 0.1 | Minimum clarity to accept |
Use when: Harmonic-rich signals (guitar, piano, brass). Naturally handles missing fundamentals. Not for: Pure sinusoids (only one harmonic), very noisy signals. Ref: Schroeder, "Period histogram and product spectrum", JASA 1968. Requires: Power-of-2 window length.
Cepstrum
Noll, 1967. Real cepstrum — $c(\tau) = \text{IFFT}(\log |\text{FFT}(x)|)$. A peak at quefrency $\tau$ corresponds to period $\tau$ in the time domain.
import cepstrum from 'pitch-detection/cepstrum.js'
let result = cepstrum(samples, { fs: 44100 })Steps:
- FFT — complex FFT via
fft - Log magnitude — $\log(\sqrt{\text{re}^2 + \text{im}^2} + \epsilon)$
- IFFT — real cepstrum via
ifft - Peak pick — largest peak in valid quefrency range $[f_s / f_{\max}, f_s / f_{\min}]$
- Parabolic interpolation — sub-sample the peak
- Clarity — ratio of peak height to second-highest local maximum
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| minFreq | 50 | Minimum detectable frequency (Hz) |
| maxFreq | 2000 | Maximum detectable frequency (Hz) |
| threshold | 0.3 | Minimum clarity to accept |
Use when: Harmonic signals where you want a clean spectral-domain method. Good pedagogical complement to time-domain algorithms. Not for: Low-pitched signals (quefrency resolution is limited by window length). Ref: Noll, "Cepstrum pitch determination", JASA 1967. Requires: Power-of-2 window length.
SWIPE
Camacho & Harris, 2008. SWIPE' (Sawtooth Waveform Inspired Pitch Estimator, prime harmonics). Measures spectral similarity between the window and a sawtooth template whose lobes sit at prime harmonics. More accurate than HPS on clean instrumental signals; robust against octave errors because only prime harmonics contribute.
Simplified single-window form: uses one FFT instead of the multi-resolution loudness pyramid of the original paper — sufficient for stationary windows.
import swipe from 'pitch-detection/swipe.js'
let result = swipe(samples, { fs: 44100 })Steps:
- Hann window — shapes the main lobe for accurate parabolic interpolation
- FFT — magnitude spectrum via
rfft, then $\sqrt{|X(f)|}$ to emphasize weaker harmonics - Log-spaced candidates — at configurable cent resolution (default 10 cents)
- Cosine kernel — $K(f; f_k, f_0) = \cos(2\pi(f - f_k) / f_0)$ for $|f - f_k| \leq f_0/2$; positive central lobe rewards harmonic energy, negative sidelobes penalize inter-harmonic energy
- Prime harmonics — only harmonics at orders [1, 2, 3, 5, 7, 11], weighted $1/\sqrt{k}$
- Parabolic interpolation — in log-frequency, then refinement against the nearest spectral peak
- Clarity — normalized peak strength
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| minFreq | 60 | Minimum detectable frequency (Hz) |
| maxFreq | 4000 | Maximum detectable frequency (Hz) |
| cents | 10 | Candidate spacing in cents |
| threshold | 0.15 | Minimum clarity to accept |
Use when: Clean instrumental signals, studio recordings, where sub-Hz accuracy matters. Not for: Very noisy or reverberant signals (single-window form lacks multi-resolution robustness of the full SWIPE'). Ref: Camacho & Harris, "A sawtooth waveform inspired pitch estimator for speech and music", JASA 2008. Requires: Power-of-2 window length.
Chroma
Fujishima, 1999 (PCP) / Mauch & Dixon, 2010 (NNLS). Chroma feature — a 12-D vector where each bin holds the energy attributed to one pitch class (C, C#, ..., B).
import chroma from 'pitch-detection/chroma.js'
// PCP — classical spectral folding
let c = chroma(samples, { fs: 44100 })
// NNLS — nonnegative least squares (cleaner for polyphonic audio)
let c2 = chroma(samples, { fs: 44100, method: 'nnls' })PCP (default)
Each spectral bin is mapped to its nearest pitch class and squared magnitudes are accumulated. Simple and fast.
NNLS
Fits the observed $\sqrt{\text{spectrum}}$ as a nonnegative combination of synthetic pitch-tone profiles (fundamental plus geometrically decaying overtones, Gaussian lobes in log-frequency with σ = 0.5 semitones). Uses multiplicative NMF updates: $a \leftarrow a \cdot (D^\top s) / (D^\top D a + \varepsilon)$. Suppresses octave and harmonic confusion on polyphonic audio.
Pitch dictionary covers MIDI 24–96 (C1–C7) with configurable harmonics per tone.
| Param | Default | |
|---|---|---|
| fs | 44100 | Sample rate (Hz) |
| method | 'pcp' | 'pcp' or 'nnls' |
| minFreq | 65 | Min frequency for PCP mapping (~C2) |
| maxFreq | 2093 | Max frequency for PCP mapping (~C7) |
| harmonics | 8 | Overtones per pitch (NNLS only) |
| iterations | 30 | NMF iterations (NNLS only) |
Returns: Float64Array(12), L1-normalized.
Use when: Building chord/key detectors, music information retrieval, audio fingerprinting. NNLS for polyphonic; PCP for speed. Ref (PCP): Fujishima, "Realtime chord recognition of musical sound", ICMC 1999. Ref (NNLS): Mauch & Dixon, "Approximate Note Transcription for the Improved Identification of Difficult Chords", ISMIR 2010. Requires: Power-of-2 window length.
Chord
Fujishima, 1999 (templates) / Viterbi smoothing. Classifies chroma frames as one of 24 major/minor triads via cosine similarity with binary templates.
import chord, { TEMPLATES, smooth as smoothChords } from 'pitch-detection/chord.js'
// single frame
let c = chord(chromaVec)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
// smoothed sequence
let chords = smoothChords(chromaFrames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]chord(chromaVec, opts)
Cosine similarity against 24 binary templates (12 major + 12 minor triads). Returns the best match with confidence score.
| Param | Default | |
|---|---|---|
| minConfidence | 0.3 | Below this, returns quality 'N' (no chord) |
Returns: { root, quality, label, confidence } where quality is 'maj', 'min', or 'N'.
smooth(frames, opts)
Viterbi decoding with a sticky self-transition prior. Observation log-likelihood = 8 × cosine similarity (temperature 8 gives reasonably sharp distributions).
| Param | Default | |
|---|---|---|
| selfProb | 0.5 | Self-transition probability (higher = smoother) |
Returns: { root, quality, label }[] — one chord per frame.
TEMPLATES
Exported array of 24 chord templates: { root, quality, label, vec } where vec is a Float64Array(12) with 1 on chord tones.
Use when: Quick chord labeling from chroma features. Combine with NNLS chroma for best results. Ref: Fujishima, "Realtime chord recognition of musical sound", ICMC 1999.
Key
Krumhansl & Schmuckler. Detects musical key from chroma via Pearson correlation against 24 rotated major/minor key profiles (Krumhansl-Kessler probe-tone ratings).
import key, { KK_MAJOR, KK_MINOR } from 'pitch-detection/key.js'
let k = key(chromaVec)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }
// from multiple frames (averages internally)
let k2 = key(chromaFrames)Steps:
- Average — if given an array of frames, compute mean chroma
- Correlate — Pearson correlation of input chroma against each of 24 rotated key profiles
- Rank — sort by correlation; highest wins
| Param | Default | |
|---|---|---|
| profile | { major: KK_MAJOR, minor: KK_MINOR } | Custom key profiles |
Returns: { tonic, mode, label, confidence, scores } where scores is all 24 keys sorted descending.
Exported profiles
KK_MAJOR— Krumhansl-Kessler major profile:[6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]KK_MINOR— Krumhansl-Kessler minor profile:[6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]
Use when: Key detection for music analysis, automatic transposition, music information retrieval. Ref: Krumhansl, Cognitive Foundations of Musical Pitch, Oxford 1990. Ref: Temperley, "What's Key for Key?", Music Perception 1999.
Comparison
Pitch algorithms
| | YIN | McLeod | pYIN | AMDF | HPS | Cepstrum | SWIPE | |---|---|---|---|---|---|---|---| | Domain | time | time | time | time | spectral | spectral | spectral | | Accuracy | ★★★★★ | ★★★★ | ★★★★★ | ★★★ | ★★★★ | ★★★ | ★★★★★ | | Noise robustness | ★★★★★ | ★★★★ | ★★★★★ | ★★★ | ★★★ | ★★★ | ★★★★ | | Octave errors | rare | rare | rare | common | rare | occasional | rare | | Missing fundamental | no | no | no | no | yes | yes | yes | | Min window | ~4 periods | ~2 periods | ~4 periods | ~4 periods | power of 2 | power of 2 | power of 2 | | Best for | general | vibrato | ambiguous | embedded | harmonic-rich | pedagogical | studio |
Choosing an algorithm
Use YIN when you need the most reliable result and can afford a full-size window (2048–4096 samples). The threshold directly controls how strict the periodicity requirement is.
Use McLeod when tracking fast pitch changes or vibrato, or when you want a smaller window. The NSDF peak selection naturally avoids sub-octave errors.
Use pYIN when the signal is ambiguous (breathy vocals, noisy recordings) and you want a pitch posterior for downstream smoothing.
Use AMDF in constrained environments. Simpler than YIN — no squaring, no cumulative normalization — but more prone to octave errors.
Use HPS for harmonic-rich timbres (guitar, piano, brass) — naturally handles missing fundamentals by aligning harmonic peaks.
Use Cepstrum as a spectral-domain complement to time-domain methods, or for pedagogical purposes.
Use SWIPE when you need sub-Hz accuracy on clean instrumental signals or studio recordings.
See also
- fourier-transform — FFT used by spectral algorithms
- beat-detection — onset detection, tempo estimation, beat tracking
- digital-filter — filter design and processing
- time-stretch — time stretching and pitch shifting
- pitch-shift — pitch shifting algorithms
