npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@ohnrshyp/metadata

v1.0.2

Published

Unified AI-assisted descriptive audio metadata extraction pipeline (Mood, Genre, Instruments, Vocals, Embeddings)

Downloads

437

Readme

@ohnrshyp/metadata

Unified hybrid neural AI audio metadata extraction pipeline.

This package integrates multiple deep learning and digital signal processing (DSP) models to automatically extract comprehensive structural and semantic metadata tags from raw audio files. It is the intelligence layer powering the auto-tagging and verification capabilities of the ORBIT protocol.


🚀 Key AI Pipelines

  • 🧠 Zero-Shot Mood & Vocal Tagger (LAION-CLAP): Uses contrastive text-audio embeddings to detect acoustic mood, vocal presence, vocal gender, and singing style.
  • 🎹 Primary Instrument Identification (PANNs): Leverages Pretrained Audio Neural Networks (CNN14 architecture) to analyze and score 50+ instruments.
  • 🏷️ Genre Classifier (wav2vec2): Classifies the primary musical genre over 10 structural music classifications.
  • 🧬 2048-dim Audio Embeddings (PANNs): Generates highly descriptive semantic vector representations for similarity indexing in vector databases.
  • ⏱️ Classical DSP Features: Integrates tempo (BPM), key/scale, RMS energy, and integrated loudness (dB) directly.
  • 🔗 Stem-Aware Analysis (Demucs): Optionally splits audio into vocal, bass, drum, and instrumental stems for isolated checking.

🔬 Model & Pipeline Architecture

The extraction orchestrator combines local JavaScript-based inferences and Python subprocess modules.

                    ┌─────────────────────────┐
                    │       Audio Input       │
                    └─────────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         ▼                       ▼                       ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   LAION-CLAP    │     │      PANNs      │     │    wav2vec2     │
│  (Transformers) │     │ (PyTorch Python)│     │  (Transformers) │
└─────────────────┘     └─────────────────┘     └─────────────────┘
     (Mood, Vocals)     (Instruments, Tag)          (10 Genres)
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 ▼
                    ┌─────────────────────────┐
                    │   DSP Signal Engine     │
                    │   (BPM, Key, Loudness)  │
                    └─────────────────────────┘
                                 │
                                 ▼
                    ┌─────────────────────────┐
                    │    Unified JSON/DB      │
                    │     Metadata Object     │
                    └─────────────────────────┘

1. LAION-CLAP Inferences

CLAP models contain joint audio and text encoders trained to project audio waveforms and textual descriptions into a shared latent space. Zero-shot tags are extracted by computing the cosine similarity between the audio embedding $E_a$ and candidate text prompt embeddings $E_t$: $$\text{Confidence}(i) = \frac{e^{S(E_a, E_{t, i}) / \tau}}{\sum_k e^{S(E_a, E_{t, k}) / \tau}}$$ This outputs mood keywords and vocal properties without retraining.

2. PANNs (Pretrained Audio Neural Networks)

PANNs maps the audio signal into a log-mel spectrogram and runs a CNN14 architecture trained on AudioSet (527 classes). By filtering AudioSet indices, we isolate specific instrument classes and extract the 2048-dimensional output from the global pooling layer to serve as our dense audio embedding.


📦 Installation

Node.js (NPM Package)

npm install @ohnrshyp/metadata

🛠️ API Reference

extractMetadata(input, [options])

Extracts comprehensive structural and semantic metadata from an audio source.

  • Parameters:

    • input (Buffer | string): Raw binary buffer or absolute path to the target audio file.
    • options (Object, optional):
      • includeEmbedding (boolean): Include the 2048-dim PANNs audio vector. Default is false.
      • verbose (boolean): Enable diagnostic logging. Default is false.
      • config (Object): Advanced configuration overrides (see below).
  • Returns: Promise<Object> containing the following schema:

    {
      "genre": [
        { "label": "Electronic", "confidence": 0.8912 }
      ],
      "mood": [
        { "label": "Energetic", "confidence": 0.8412 },
        { "label": "Bright", "confidence": 0.7612 }
      ],
      "instruments": [
        { "label": "Synthesizer", "confidence": 0.912 },
        { "label": "Drum Kit", "confidence": 0.824 }
      ],
      "vocals": {
        "present": true,
        "confidence": 0.9412,
        "gender": "female"
      },
      "bpm": {
        "value": 128.0,
        "confidence": 0.8912
      },
      "key": {
        "value": "A minor",
        "confidence": 0.7412
      },
      "energy": 0.8124,
      "loudness_db": -12.42,
      "danceability": 0.8942,
      "duration": 210.42,
      "sample_rate": 22050,
      "extractionStatus": {
        "clap": "success",
        "panns": "success",
        "genreClassifier": "success",
        "audioAnalysis": "success",
        "embedding": "success"
      }
    }

extractClapOnly(input, [options])

Performs only CLAP mood/vocals classification. Avoids PyTorch loading overhead, completing in $< 1$ second.

extractAudioAnalysisOnly(input, [options])

Performs only CPU-only classical DSP feature extraction (BPM, key, loudness). No machine learning model is loaded.

checkEnvironment()

Runs validation diagnostics across all sub-modules.

  • Returns: Promise<Object> detailing availability flags.

formatForDatabase(result)

Formats the extraction result into the JSON structure expected by the ai_metadata JSONB column in PostgreSQL.

formatEmbeddingForDatabase(embedding)

Converts the Float32Array embedding vector into the PostgreSQL pgvector string format "[v1,v2,...]" for query parameters.


⚙️ Configuration Flags

Override defaults in the options parameter:

const result = await metadata.extractMetadata(audioBuffer, {
  config: {
    enableClap: true,             // Zero-shot mood/vocals
    enablePanns: true,            // Instrument tagging
    enableGenreClassifier: true,  // wav2vec2 genre classification
    enableEmbedding: true,        // Generate 2048-dim vector
    enableAudioAnalysis: true,    // Run BPM/Key DSP
    enableDemucs: false,          // Run Demucs stem separation
    failOnError: false            // Bypass partial module failures
  }
});

💻 Code Examples

Full Metadata Extraction & Database Formatting

const metadata = require('@ohnrshyp/metadata');
const fs = require('fs');

async function run() {
  const audioBuffer = fs.readFileSync('dance-track.mp3');

  try {
    // Extract metadata including PANNs embedding
    const rawResult = await metadata.extractMetadata(audioBuffer, {
      includeEmbedding: true,
      verbose: true
    });

    console.log(`Primary Genre: ${rawResult.genre[0].label} (${(rawResult.genre[0].confidence * 100).toFixed(1)}%)`);
    console.log(`Mood Tags: ${rawResult.mood.map(m => m.label).join(', ')}`);
    console.log(`BPM: ${rawResult.bpm.value}`);

    // Format for pg/pgvector insertion
    const dbMetadataField = metadata.formatForDatabase(rawResult);
    const dbVectorParam = metadata.formatEmbeddingForDatabase(rawResult.embedding);

    console.log('Formatted pgvector String Prefix:', dbVectorParam.substring(0, 50));
    
  } catch (error) {
    console.error('Metadata extraction failed:', error.message);
  }
}

run();

📄 License

Licensed under the Apache License, Version 2.0 (the "License"). See LICENSE in the project root for details.