authaudio

v1.0.0

Published

7 months ago

AI-powered audio authentication - detect fake vs real audio using deep learning

0High
0Medium
0Low

op-prajwal

audio deepfake detection ai authentication fake-audio voice-detection tensorflow machine-learning

AuthAudio 🎙️

AI-powered audio authentication package that detects whether audio is fake (AI-generated) or real (human) using deep learning.

Features

🤖 AI-Powered Detection: Uses a trained neural network to classify audio
🎯 High Accuracy: Trained on MFCC features for robust detection
📦 Easy to Use: Simple API for Node.js applications
⚡ Fast: Efficient TensorFlow.js implementation
🔧 Flexible: Works with pre-extracted audio features

Installation

npm install authaudio

Quick Start

import AuthAudio from 'authaudio';

// Create classifier instance
const classifier = new AuthAudio();

// Load the model (only needed once)
await classifier.loadModel();

// Predict from pre-extracted MFCC features (40 coefficients)
const features = [/* your 40 MFCC coefficients */];
const result = await classifier.predictFromFeatures(features);

console.log(result);
// Output:
// {
//   prediction: 'Human',
//   confidence: '95.23%',
//   probabilities: {
//     human: '95.23%',
//     ai: '4.77%'
//   },
//   raw: {
//     human: 0.9523,
//     ai: 0.0477
//   }
// }

Feature Extraction

This package requires pre-extracted MFCC features. Use Python with librosa to extract features:

import librosa
import numpy as np

# Load audio file
audio, sr = librosa.load('audio.wav')

# Extract 40 MFCC coefficients
mfcc = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=40)

# Calculate mean across time (this gives you 40 features)
features = np.mean(mfcc.T, axis=0)

# Convert to list for JavaScript
features_list = features.tolist()
print(features_list)  # Use these in your Node.js app

API Reference

`new AuthAudio()`

Creates a new AuthAudio classifier instance.

const classifier = new AuthAudio();

`await classifier.loadModel()`

Loads the TensorFlow.js model. Must be called before making predictions.

await classifier.loadModel();

`await classifier.predictFromFeatures(features)`

Predicts whether audio is fake or real from pre-extracted MFCC features.

Parameters:

features (Array | Float32Array): 40 MFCC coefficients

Returns: Promise

{
  prediction: string,      // 'Human' or 'AI-Generated'
  confidence: string,      // Confidence percentage
  probabilities: {
    human: string,         // Human probability percentage
    ai: string            // AI probability percentage
  },
  raw: {
    human: number,         // Raw human probability (0-1)
    ai: number            // Raw AI probability (0-1)
  }
}

Complete Example

Python: Extract Features

# extract_features.py
import librosa
import numpy as np
import json

def extract_features(audio_path):
    # Load audio
    audio, sr = librosa.load(audio_path)
    
    # Extract MFCC
    mfcc = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=40)
    
    # Get mean
    features = np.mean(mfcc.T, axis=0)
    
    return features.tolist()

# Extract and save
features = extract_features('audio.wav')
with open('features.json', 'w') as f:
    json.dump(features, f)

Node.js: Classify Audio

// classify.js
import AuthAudio from 'authaudio';
import { readFileSync } from 'fs';

async function main() {
  // Load pre-extracted features
  const features = JSON.parse(readFileSync('features.json', 'utf-8'));
  
  // Create and load classifier
  const classifier = new AuthAudio();
  await classifier.loadModel();
  
  // Predict
  const result = await classifier.predictFromFeatures(features);
  
  console.log(`Prediction: ${result.prediction}`);
  console.log(`Confidence: ${result.confidence}`);
  console.log(`Probabilities:`, result.probabilities);
}

main();

How It Works

AuthAudio uses a deep neural network trained on audio features to detect AI-generated audio:

Feature Extraction (External): Extract 40 MFCC coefficients using Python/librosa
Normalization: Features are averaged across time frames
Classification: Neural network processes the 40 features
Prediction: Returns probability scores for human vs AI-generated audio

Model Architecture

Input: 40 MFCC features
Hidden Layers:
- Dense layer (256 units, ReLU activation)
- Dropout (30%)
- Dense layer (128 units, ReLU activation)
- Dropout (30%)
Output: 2 units (Human, AI-Generated) with softmax activation

Requirements

Node.js >= 14.0.0
Python with librosa for feature extraction

Performance

Model Size: ~550 KB
Inference Time: < 50ms per prediction
Memory Usage: Low (< 50 MB)

Why Pre-extracted Features?

Audio processing libraries in JavaScript are limited compared to Python. By using Python's librosa for feature extraction, you get:

✅ More accurate MFCC extraction
✅ Better compatibility with the training pipeline
✅ Smaller npm package size
✅ Faster inference in Node.js

License

MIT

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Author

Prajwal

Acknowledgments

Built with TensorFlow.js
Feature extraction using librosa