authaudio
v1.0.0
Published
AI-powered audio authentication - detect fake vs real audio using deep learning
Maintainers
Readme
AuthAudio 🎙️
AI-powered audio authentication package that detects whether audio is fake (AI-generated) or real (human) using deep learning.
Features
- 🤖 AI-Powered Detection: Uses a trained neural network to classify audio
- 🎯 High Accuracy: Trained on MFCC features for robust detection
- 📦 Easy to Use: Simple API for Node.js applications
- ⚡ Fast: Efficient TensorFlow.js implementation
- 🔧 Flexible: Works with pre-extracted audio features
Installation
npm install authaudioQuick Start
import AuthAudio from 'authaudio';
// Create classifier instance
const classifier = new AuthAudio();
// Load the model (only needed once)
await classifier.loadModel();
// Predict from pre-extracted MFCC features (40 coefficients)
const features = [/* your 40 MFCC coefficients */];
const result = await classifier.predictFromFeatures(features);
console.log(result);
// Output:
// {
// prediction: 'Human',
// confidence: '95.23%',
// probabilities: {
// human: '95.23%',
// ai: '4.77%'
// },
// raw: {
// human: 0.9523,
// ai: 0.0477
// }
// }Feature Extraction
This package requires pre-extracted MFCC features. Use Python with librosa to extract features:
import librosa
import numpy as np
# Load audio file
audio, sr = librosa.load('audio.wav')
# Extract 40 MFCC coefficients
mfcc = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=40)
# Calculate mean across time (this gives you 40 features)
features = np.mean(mfcc.T, axis=0)
# Convert to list for JavaScript
features_list = features.tolist()
print(features_list) # Use these in your Node.js appAPI Reference
new AuthAudio()
Creates a new AuthAudio classifier instance.
const classifier = new AuthAudio();await classifier.loadModel()
Loads the TensorFlow.js model. Must be called before making predictions.
await classifier.loadModel();await classifier.predictFromFeatures(features)
Predicts whether audio is fake or real from pre-extracted MFCC features.
Parameters:
features(Array | Float32Array): 40 MFCC coefficients
Returns: Promise
{
prediction: string, // 'Human' or 'AI-Generated'
confidence: string, // Confidence percentage
probabilities: {
human: string, // Human probability percentage
ai: string // AI probability percentage
},
raw: {
human: number, // Raw human probability (0-1)
ai: number // Raw AI probability (0-1)
}
}Complete Example
Python: Extract Features
# extract_features.py
import librosa
import numpy as np
import json
def extract_features(audio_path):
# Load audio
audio, sr = librosa.load(audio_path)
# Extract MFCC
mfcc = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=40)
# Get mean
features = np.mean(mfcc.T, axis=0)
return features.tolist()
# Extract and save
features = extract_features('audio.wav')
with open('features.json', 'w') as f:
json.dump(features, f)Node.js: Classify Audio
// classify.js
import AuthAudio from 'authaudio';
import { readFileSync } from 'fs';
async function main() {
// Load pre-extracted features
const features = JSON.parse(readFileSync('features.json', 'utf-8'));
// Create and load classifier
const classifier = new AuthAudio();
await classifier.loadModel();
// Predict
const result = await classifier.predictFromFeatures(features);
console.log(`Prediction: ${result.prediction}`);
console.log(`Confidence: ${result.confidence}`);
console.log(`Probabilities:`, result.probabilities);
}
main();How It Works
AuthAudio uses a deep neural network trained on audio features to detect AI-generated audio:
- Feature Extraction (External): Extract 40 MFCC coefficients using Python/librosa
- Normalization: Features are averaged across time frames
- Classification: Neural network processes the 40 features
- Prediction: Returns probability scores for human vs AI-generated audio
Model Architecture
- Input: 40 MFCC features
- Hidden Layers:
- Dense layer (256 units, ReLU activation)
- Dropout (30%)
- Dense layer (128 units, ReLU activation)
- Dropout (30%)
- Output: 2 units (Human, AI-Generated) with softmax activation
Requirements
- Node.js >= 14.0.0
- Python with librosa for feature extraction
Performance
- Model Size: ~550 KB
- Inference Time: < 50ms per prediction
- Memory Usage: Low (< 50 MB)
Why Pre-extracted Features?
Audio processing libraries in JavaScript are limited compared to Python. By using Python's librosa for feature extraction, you get:
- ✅ More accurate MFCC extraction
- ✅ Better compatibility with the training pipeline
- ✅ Smaller npm package size
- ✅ Faster inference in Node.js
License
MIT
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Author
Prajwal
Acknowledgments
- Built with TensorFlow.js
- Feature extraction using librosa
