@qgustavor/stream-audio-fingerprint
v2.1.2
Published
Audio landmark fingerprinting as a JavaScript module
Downloads
1,086
Readme
Audio Landmark Fingerprinting as a JavaScript module
This module converts a PCM audio signal into a series of audio fingerprints. It works with limited audio tracks (e.g. recorded audio) as well as with unlimited audio streams (e.g. broadcast radio).
It's based on lpolito/stream-audio-fingerprint which is based adblockradio/stream-audio-fingerprint which is one of the foundations of the Adblock Radio project.
Credits and description
Check the original project for credits and detailed info on the algorithm used. To be fair, even as the maintainer of this fork, I still don’t fully understand it.
Usage
A usage demo is shown below. It requires the executable ffmpeg and Deno to run.
import Fingerprinter from 'npm:@qgustavor/stream-audio-fingerprint'
const decoder = (new Deno.Command('ffmpeg', {
args: [
'-i', 'pipe:0',
'-acodec', 'pcm_s16le',
'-ar', '22050',
'-ac', '1',
'-f', 's16le',
'-v', 'fatal',
'pipe:1'
],
stdout: 'piped',
stdin: 'inherit'
})).spawn()
const fingerprinter = new Fingerprinter()
const { dt } = fingerprinter.options
for await (const audioData of decoder.stdout.readable) {
const data = fingerprinter.process(audioData)
for (let i = 0; i < data.tcodes.length; i++) {
console.log(`time=${data.tcodes[i] * dt} fingerprint=${data.hcodes[i]}`)
}
}and then we pipe audio data, either a stream or a file
curl http://radiofg.impek.com/fg | deno run --allow-run=ffmpeg codegen_demo.mjs
deno run --allow-run=ffmpeg codegen_demo.mjs < awesome_music.mp3Warning: the path of the files changed on the version 2.1.0 which may affect Deno users that were importing src/codegen_landmark.ts, as this file is now codegen_landmark.mts.
Fingerprinter options
One improvement from the Lucas Polito fork is the ability to customize the fingerprinter options. Those are all the options available:
verbose: whether to print debug information toconsole.log.samplingRate: the sampling rate of the audio input in Hz. Defaults to22050.- If you change this, you must also adapt
windowDtandpruningDtto match your needs. - For more info, read the comments in the code.
- If you change this, you must also adapt
bps: bytes per sample. Defaults to2(16-bit PCM).- Do not change this without checking the code first.
mnlm: maximum number of local maxima detected in each FFT spectrum. Defaults to5.- Higher values increase the number of fingerprints produced.
mppp: maximum number of hashes (fingerprints) each peak can generate. Defaults to3.- Useful for tuning the density of fingerprints.
nfft: size of the FFT window. Defaults to512.- Larger values improve spectral precision (frequency resolution) but reduce temporal precision.
- The FFT spectrum will have
nfft / 2points.
step: number of samples to advance between successive FFT windows. Defaults tonfft / 2(50% overlap).- With a sampling rate of 22050 Hz, this yields ~86 windows per second (
dt ≈ 11.61 ms). - Typically you don’t need to change this.
- With a sampling rate of 22050 Hz, this yields ~86 windows per second (
dt: duration of each time step in seconds.- Defaults to
1 / (samplingRate / step). - It's useful to convert
tcodesinto seconds, just multiplytcodesbyfingerprinter.options.dtand you get the time in seconds, as shown in the demos.
- Defaults to
hwin: the Hann window applied to each FFT frame. Defaults to a precomputed array of sizenfft.- Adjusting this is rare unless experimenting with different window functions.
maskDecayLog: logarithmic decay factor for the detection threshold between frames. Defaults toMath.log(0.995).- Affects how quickly old peaks become irrelevant.
ifMin: minimum frequency bin (in units ofDF = samplingRate / nfft) to consider when generating fingerprints. Defaults to0.- Increase this to ignore very low frequencies.
ifMax: maximum frequency bin to consider when generating fingerprints. Defaults tonfft / 2.- Usually best left unchanged. To reduce processing time, lower
samplingRateinstead.
- Usually best left unchanged. To reduce processing time, lower
windowDf: maximum allowed frequency difference between paired peaks. Defaults to60.- Limits fingerprint generation to peaks within a certain frequency range of each other.
- Reducing this reduces fingerprint density.
windowDt: maximum time window (in units ofdt) for generating landmark pairs. Defaults to96(~1 second).- Controls how far apart peaks can be and still generate fingerprints.
pruningDt: time window (in units ofdt) used to prune older peaks that are overshadowed by newer ones. Defaults to24(~250 ms).- Also affects system latency: higher values increase latency.
maskDf: decay scale of the exponential mask on the frequency axis. Defaults to3.- Wider masks reduce sensitivity to small frequency variations.
eww: precomputed 2D exponential mask (log-domain) matrix of size(nfft / 2) × (nfft / 2). Defaults to a generated Gaussian-like mask based onmaskDf.- Advanced option for customizing the spectral masking behavior.
Node.js Usage
This code also works in Node.js and is available in NPM via npm install @qgustavor/stream-audio-fingerprint.
The previous demo can be rewritten as this:
import Fingerprinter from '@qgustavor/stream-audio-fingerprint'
import { spawn } from 'child_process'
const decoder = spawn('ffmpeg', [
'-i', 'pipe:0',
'-acodec', 'pcm_s16le',
'-ar', '22050',
'-ac', '1',
'-f', 's16le',
'-v', 'fatal',
'pipe:1'
])
const fingerprinter = new Fingerprinter()
const { dt } = fingerprinter.options
for await (const audioData of decoder.stdout) {
const data = fingerprinter.process(audioData)
for (let i = 0; i < data.tcodes.length; i++) {
console.log(`time=${data.tcodes[i] * dt} fingerprint=${data.hcodes[i]}`)
}
}TypeScript
This module already includes TypeScript types.
License
See LICENSE file.
