@staffojo/transcribe

v1.0.1

Published

5 months ago

Express middleware and utilities for transcribing audio/video files with OpenAI Whisper, sentiment analysis, and summarization

0High
0Medium
0Low

staffojo

transcription openai whisper sentiment-analysis summarization audio video express middleware speech-to-text

Audio Transcription API

A standalone Express.js API service for transcribing audio and video files from URLs using OpenAI's Whisper model, with built-in sentiment analysis and summarization capabilities.

This package can be used in two ways:

As an npm package - Install and use as middleware in your existing Express apps
As a standalone server - Run your own transcription API server

📦 Using as an npm Package

Installation

npm install @staffojo/transcribe

Quick Start

Option 1: Add Router to Existing Express App

const express = require('express');
const { createTranscriptionRouter } = require('@staffojo/transcribe');

const app = express();

// Set your OpenAI API key (or use process.env.OPENAI_API_KEY)
app.use('/api', createTranscriptionRouter({
  openaiApiKey: process.env.OPENAI_API_KEY,
  basePath: '/api'
}));

app.listen(3000, () => {
  console.log('Server running on port 3000');
  console.log('Transcription endpoint: http://localhost:3000/api/transcribe');
});

Option 2: Create Standalone App

const { createTranscriptionApp } = require('@staffojo/transcribe');

const app = createTranscriptionApp({
  openaiApiKey: process.env.OPENAI_API_KEY,
  basePath: '/api'
});

app.listen(3000, () => {
  console.log('Transcription API running on port 3000');
});

Option 3: Use Individual Utilities

const { utils } = require('@staffojo/transcribe');

// Use individual utilities
const { transcribeAudio, analyzeSentiment } = utils.openai;
const { downloadFile, cleanupFiles } = utils.file;
const { isAudioFile, validateFileSize } = utils.audio;

// Example: Transcribe a local file
const result = await transcribeAudio('/path/to/audio.mp3');
console.log(result.text);

Environment Variables

Make sure to set your OpenAI API key:

export OPENAI_API_KEY=sk-your-key-here

Or in your .env file:

OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4o-mini

API Usage (Same as Standalone)

Once integrated, use the same API endpoints:

curl -X POST http://localhost:3000/api/transcribe \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'

🚀 Using as Standalone Server

✨ Quick Start

Get up and running in 3 steps:

# 1. Install dependencies
npm install

# 2. Set up your OpenAI API key
cp env.example .env
# Edit .env and add: OPENAI_API_KEY=sk-your-key-here

# 3. Start the server
npm run dev

Note: FFmpeg is only required if you plan to transcribe video files. For audio-only files, FFmpeg is optional!

Features

🎙️ Audio/Video Transcription - Transcribe audio and video files from URLs using OpenAI Whisper
🎭 Sentiment Analysis - Automatically analyze sentiment (positive, negative, neutral, mixed) with confidence scores and emotions
📝 Intelligent Summarization - Generate comprehensive summaries with customizable prompts
📊 Token Usage Tracking - Detailed tracking of OpenAI API usage and costs
🔍 Comprehensive Logging - Full request lifecycle logging for debugging and monitoring
✅ Error Handling - Robust error handling with automatic cleanup
🚀 Smart File Detection - Automatically detects audio vs video files and skips extraction when not needed

Prerequisites

Before you begin, ensure you have the following installed:

Node.js (v18.0.0 or higher)

node --version  # Should be v18.0.0 or higher

FFmpeg - Only required for video files (optional if you only need audio transcription)
- macOS: brew install ffmpeg
- Ubuntu/Debian: sudo apt-get install ffmpeg
- Windows: Download from FFmpeg official website
- You can skip this if you're only transcribing audio files!
OpenAI API Key - Get one from OpenAI Platform

Installation

Step 1: Clone or Download

git clone <repository-url>
cd transcribe

Step 2: Install Dependencies

npm install

Step 3: Configure Environment

Copy the example environment file:

cp env.example .env

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=sk-your-actual-openai-api-key-here
OPENAI_MODEL=gpt-4o-mini
PORT=3000
NODE_ENV=development

Step 4: Verify Installation (Optional)

For video file support, verify FFmpeg is installed:

ffmpeg -version

If you see version information, you're good to go! If not, you can still use the service for audio files.

Usage

Starting the Server

Development mode (with auto-reload):

npm run dev

Production mode:

npm start

The server will start on http://localhost:3000 (or your configured PORT).

You should see:

🚀 Transcription API server running on port 3000
📍 Health check: http://localhost:3000/health
📍 Transcription endpoint: http://localhost:3000/api/transcribe
✓ FFmpeg found at: /usr/local/bin/ffmpeg

API Endpoint

POST `/api/transcribe`

Transcribes an audio or video file from a URL.

Request Body:

{
  "url": "https://example.com/audio.mp3",
  "summarizationPrompt": "Optional custom prompt for summarization"
}

Parameters:

url (required, string) - The URL of the audio/video file to transcribe
summarizationPrompt (optional, string) - Custom prompt for summarization. If not provided, uses a default research-focused prompt.

Response:

{
  "url": "https://example.com/audio.mp3",
  "transcription": "The transcribed text from the audio/video...",
  "sentiment": {
    "sentiment": "positive",
    "confidence": 0.95,
    "emotions": ["satisfied", "enthusiastic"],
    "summary": "The speaker expresses positive sentiment with high confidence.",
    "error": false,
    "processingTimeMs": 345,
    "tokenUsage": {
      "promptTokens": 150,
      "completionTokens": 50,
      "totalTokens": 200
    }
  },
  "summary": "Comprehensive summary of the transcribed content...",
  "metadata": {
    "transcriptionLength": 1234,
    "wordCount": 234,
    "processingTimes": {
      "download": 1234,
      "extraction": 567,
      "transcription": 890,
      "sentiment": 345,
      "summarization": 678,
      "cleanup": 12,
      "total": 4000
    },
    "tokenUsage": {
      "whisper": {
        "note": "Whisper API pricing is per minute of audio processed, not per token",
        "estimatedDurationMinutes": 1.23,
        "audioSizeMB": 12.34
      },
      "sentiment": {
        "promptTokens": 150,
        "completionTokens": 50,
        "totalTokens": 200
      },
      "summarization": {
        "promptTokens": 500,
        "completionTokens": 300,
        "totalTokens": 800
      },
      "total": {
        "promptTokens": 650,
        "completionTokens": 350,
        "totalTokens": 1000
      }
    },
    "fileWasAudio": true
  }
}

Example Requests

Using curl with an audio file:

curl -X POST http://localhost:3000/api/transcribe \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/audio.mp3"
  }'

Using curl with a video file:

curl -X POST http://localhost:3000/api/transcribe \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/video.webm"
  }'

Using JavaScript (fetch):

const response = await fetch('http://localhost:3000/api/transcribe', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/audio.mp3',
    summarizationPrompt: 'Summarize this as a product feedback review'
  })
});

const data = await response.json();
console.log('URL:', data.url);
console.log('Transcription:', data.transcription);
console.log('Sentiment:', data.sentiment);
console.log('Summary:', data.summary);

Health Check

Check if the server is running:

curl http://localhost:3000/health

Response:

{
  "status": "ok",
  "timestamp": "2024-01-01T00:00:00.000Z"
}

Configuration

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | OPENAI_API_KEY | Yes | - | Your OpenAI API key | | OPENAI_MODEL | No | gpt-4o-mini | Model to use for sentiment and summarization | | PORT | No | 3000 | Server port | | NODE_ENV | No | development | Environment mode |

Custom Summarization Prompt

You can customize the summarization behavior by:

Passing a custom prompt in the request:

{
  "url": "https://example.com/audio.mp3",
  "summarizationPrompt": "You are a customer service analyst. Summarize this customer feedback focusing on pain points and suggestions."
}

Editing the default prompt in the code: Edit src/utils/config.js and modify the DEFAULT_SUMMARIZATION_PROMPT constant.

Supported File Formats

The service automatically detects file types:

Audio files (direct transcription, no FFmpeg needed):
- MP3, WAV, M4A, OGG, FLAC, WebM (audio), MP4 (audio)
Video files (requires FFmpeg for audio extraction):
- WebM, MP4, AVI, MOV, MKV, etc. (any format supported by FFmpeg)
Max file size: 25MB (OpenAI Whisper API limit)

How It Works

The service intelligently handles both audio and video files:

Download - Downloads the file from the provided URL
Detect File Type - Automatically detects if the file is audio or video
Extract Audio (if video) - Uses FFmpeg to extract audio track from video files
- Skip this step if the file is already audio!
Transcribe - Sends audio to OpenAI Whisper API for transcription
Analyze Sentiment - Uses OpenAI GPT to analyze sentiment of transcribed text
Summarize - Uses OpenAI GPT to generate a comprehensive summary
Cleanup - Automatically removes temporary files

Logging

The service provides comprehensive logging with:

Request tracking with unique identifiers
Timing information for each processing step
Token usage tracking
Error details with stack traces
File size and processing metrics
Audio vs video file detection

Logs are written to stdout with the format:

[TranscribeAudio] INFO Request received { url: '...', method: 'POST' }
[TranscribeAudio] INFO File is already an audio file, skipping extraction { ... }
[TranscribeAudio] INFO Transcription completed successfully { ... }

Error Handling

The API handles various error scenarios:

Invalid or missing URLs
File download failures
FFmpeg extraction errors (only for video files)
OpenAI API errors
File size limit violations (25MB max)
Network timeouts

All errors return appropriate HTTP status codes with descriptive error messages.

API Costs

OpenAI API pricing (as of 2024):

Whisper: $0.006 per minute of audio
GPT-4o-mini: ~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens
GPT-4o: ~$2.50 per 1M input tokens, ~$10 per 1M output tokens

The API tracks and logs all token usage for cost monitoring in the response metadata.

Troubleshooting

FFmpeg Not Found (for Video Files)

If you see "FFmpeg not found" errors when processing video files:

Check if FFmpeg is installed:
```
ffmpeg -version
```
Install FFmpeg:
- macOS: brew install ffmpeg
- Ubuntu/Debian: sudo apt-get install ffmpeg
- Windows: Download from FFmpeg website

Verify it's in your PATH:

which ffmpeg  # macOS/Linux
where ffmpeg # Windows

Note: If you're only transcribing audio files, you can ignore this error - FFmpeg is not required!

OpenAI API Key Error

Make sure your .env file contains a valid OpenAI API key:

OPENAI_API_KEY=sk-your-actual-key-here

File Too Large

OpenAI Whisper has a 25MB file size limit. If your file is larger:

Compress the audio/video file
Use a lower bitrate when extracting audio
Split longer files into smaller segments

Port Already in Use

If port 3000 is already in use, change the PORT in your .env file:

PORT=3001

Audio File Works, Video File Doesn't

If audio files work but video files fail, this means FFmpeg is not installed. Install FFmpeg using the instructions above, or stick to audio-only files if FFmpeg installation is not possible.

Development

Project Structure

transcribe/
├── src/
│   ├── index.js              # Main Express server
│   ├── routes/
│   │   └── transcribe.js     # Transcription route handler
│   └── utils/                # Utility modules
│       ├── logger.js          # Logging helper
│       ├── config.js          # Configuration (FFmpeg, prompts)
│       ├── fileUtils.js       # File operations
│       ├── audioUtils.js      # Audio detection & extraction
│       └── openaiUtils.js     # OpenAI API calls
├── package.json              # Dependencies and scripts
├── .env                      # Environment variables (not in git)
├── env.example              # Example environment file
├── .gitignore               # Git ignore rules
└── README.md                # This file

Adding Features

To extend the functionality:

Add new routes in src/routes/
Import and register routes in src/index.js
Create utility functions in src/utils/ if needed
Update dependencies in package.json

Common Use Cases

Transcribe Audio from URL

curl -X POST http://localhost:3000/api/transcribe \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/interview.mp3"}'

Transcribe Video from URL

curl -X POST http://localhost:3000/api/transcribe \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/meeting.webm"}'

Custom Summary Prompt

curl -X POST http://localhost:3000/api/transcribe \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/feedback.mp3",
    "summarizationPrompt": "Summarize this customer feedback, highlighting the top 3 concerns and suggestions."
  }'

License

MIT License - feel free to use this in your projects!

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

Check the troubleshooting section above
Review the logs for detailed error messages
Ensure all prerequisites are installed and configured
For audio-only use cases, FFmpeg is not required!

Built with ❤️ using OpenAI Whisper and GPT models

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Audio Transcription API

📦 Using as an npm Package

Installation

Quick Start

Option 1: Add Router to Existing Express App

Option 2: Create Standalone App

Option 3: Use Individual Utilities

Environment Variables

API Usage (Same as Standalone)

🚀 Using as Standalone Server

✨ Quick Start

Features

Prerequisites

Installation

Step 1: Clone or Download

Step 2: Install Dependencies

Step 3: Configure Environment

Step 4: Verify Installation (Optional)

Usage

Starting the Server

API Endpoint

POST /api/transcribe

Example Requests

Health Check

Configuration

Environment Variables

Custom Summarization Prompt

Supported File Formats

How It Works

Logging

Error Handling

API Costs

Troubleshooting

FFmpeg Not Found (for Video Files)

OpenAI API Key Error

File Too Large

Port Already in Use

Audio File Works, Video File Doesn't

Development

Project Structure

Adding Features

Common Use Cases

Transcribe Audio from URL

Transcribe Video from URL

Custom Summary Prompt

License

Contributing

Support

POST `/api/transcribe`