@allenhutchison/gemini-utils
v0.5.0
Published
Shared utilities for Google Gemini AI projects - file upload, MIME validation, operation tracking, and deep research
Maintainers
Readme
@allenhutchison/gemini-utils
Shared utilities for Google Gemini AI projects. Provides file upload, MIME validation, operation tracking for the Gemini File Search API, deep research capabilities, and audio transcription.
Installation
npm install @allenhutchison/gemini-utils @google/genaiFeatures
- File Upload: Upload files and directories to Gemini File Search stores
- Smart Sync: Skip unchanged files using SHA-256 hash comparison
- MIME Validation: Comprehensive MIME type detection with fallback support
- Progress Tracking: Real-time progress callbacks for upload operations
- Operation Management: Track long-running upload operations with customizable storage
- Deep Research: Run long-running research tasks with Gemini's deep research models
- Report Generation: Convert research outputs to formatted Markdown with citations
- Audio Transcription: Transcribe audio files with timestamps, speaker diarization, and multiple output formats (text, SRT, VTT, JSON)
- CLI: Full-featured command-line interface for all operations
CLI
Run commands directly with npx:
npx @allenhutchison/gemini-utils [command] [options]Global Options
-j, --json Output in JSON format
-q, --quiet Suppress non-essential output
--api-key <key> API key (overrides GEMINI_API_KEY env var)
-v, --version Show version number
-h, --help Show helpConfiguration
The CLI looks for an API key in this order:
--api-keycommand line optionGEMINI_API_KEYenvironment variable- Config file at
~/.config/gemini-utils/config.json:{ "apiKey": "your-api-key" } - Env file at
~/.config/gemini-utils/.env:GEMINI_API_KEY=your-api-key
Commands
Stores
# List all stores
npx @allenhutchison/gemini-utils stores list
# Create a new store
npx @allenhutchison/gemini-utils stores create "My Documents"
# Get store details
npx @allenhutchison/gemini-utils stores get stores/abc123
# Delete a store
npx @allenhutchison/gemini-utils stores delete stores/abc123
npx @allenhutchison/gemini-utils stores delete stores/abc123 --forceUpload
# Upload a single file
npx @allenhutchison/gemini-utils upload file ./document.pdf stores/abc123
# Upload a directory
npx @allenhutchison/gemini-utils upload directory ./docs stores/abc123
# Upload with smart sync (skip unchanged files)
npx @allenhutchison/gemini-utils upload directory ./docs stores/abc123 --smart-sync
# Control concurrency
npx @allenhutchison/gemini-utils upload directory ./docs stores/abc123 -c 10Documents
# List documents in a store
npx @allenhutchison/gemini-utils documents list stores/abc123
# Get document details
npx @allenhutchison/gemini-utils documents get stores/abc123/documents/xyz789
# Delete a document
npx @allenhutchison/gemini-utils documents delete stores/abc123/documents/xyz789Research
# Start research and get ID
npx @allenhutchison/gemini-utils research start "What is quantum computing?"
# Start and wait for completion
npx @allenhutchison/gemini-utils research start "What is quantum computing?" --wait
# Start, wait, and save report to file
npx @allenhutchison/gemini-utils research start "What is quantum computing?" --wait --output report.md
# Use file search stores for grounding
npx @allenhutchison/gemini-utils research start "Summarize the documents" --stores stores/abc123,stores/def456 --wait
# Check status
npx @allenhutchison/gemini-utils research status interactions/abc123
# Poll until complete
npx @allenhutchison/gemini-utils research poll interactions/abc123 --output report.md
# Cancel or delete
npx @allenhutchison/gemini-utils research cancel interactions/abc123
npx @allenhutchison/gemini-utils research delete interactions/abc123Query
# Query a store
npx @allenhutchison/gemini-utils query stores/abc123 "What does the documentation say about authentication?"
# Use a specific model
npx @allenhutchison/gemini-utils query stores/abc123 "Summarize the main points" --model gemini-2.0-flashTranscribe
# Basic transcription
npx @allenhutchison/gemini-utils transcribe file recording.mp3
# Save transcript to file
npx @allenhutchison/gemini-utils transcribe file recording.mp3 -o transcript.txt
# Transcribe with timestamps
npx @allenhutchison/gemini-utils transcribe file recording.mp3 -t -o transcript.txt
# Generate SRT subtitles (SRT output works for both audio transcripts and video subtitles)
npx @allenhutchison/gemini-utils transcribe file audio.mp3 -t -f srt -o subtitles.srt
# Generate WebVTT subtitles
npx @allenhutchison/gemini-utils transcribe file audio.mp3 -t -f vtt -o subtitles.vtt
# Enable speaker diarization (identifies different speakers)
npx @allenhutchison/gemini-utils transcribe file meeting.wav -d -t -o meeting.txt
# Specify language hint
npx @allenhutchison/gemini-utils transcribe file spanish-audio.mp3 -l es -o transcript.txt
# Use a different model
npx @allenhutchison/gemini-utils transcribe file recording.mp3 -m gemini-3-pro-preview
# Upload transcript to a file search store
npx @allenhutchison/gemini-utils transcribe file recording.mp3 -s stores/abc123
# Output formats: text (default), timestamped, srt, vtt, json
npx @allenhutchison/gemini-utils transcribe file recording.mp3 -t -f json -o transcript.json
# Manage large audio files (> 20MB)
npx @allenhutchison/gemini-utils transcribe upload large-audio.mp3
npx @allenhutchison/gemini-utils transcribe list
npx @allenhutchison/gemini-utils transcribe delete files/abc123JSON Output
Use --json for machine-readable output:
npx @allenhutchison/gemini-utils stores list --json | jq '.[] | .name'Usage
Basic File Upload
import { GoogleGenAI } from '@google/genai';
import { FileUploader, FileSearchManager } from '@allenhutchison/gemini-utils';
const client = new GoogleGenAI({ apiKey: 'your-api-key' });
// Create a file search store
const manager = new FileSearchManager(client);
const store = await manager.createStore('My Documents');
// Upload a directory
const uploader = new FileUploader(client);
await uploader.uploadDirectory('./docs', store.name, {
smartSync: true,
onProgress: (event) => {
console.log(`${event.type}: ${event.currentFile} (${event.percentage}%)`);
},
});MIME Type Validation
import {
getMimeTypeWithFallback,
isExtensionSupportedWithFallback
} from '@allenhutchison/gemini-utils';
// Check if a file type is supported
if (isExtensionSupportedWithFallback('.ts')) {
const result = getMimeTypeWithFallback('app.ts');
console.log(result); // { mimeType: 'text/plain', isFallback: true }
}Operation Tracking
import { UploadOperationManager } from '@allenhutchison/gemini-utils';
const manager = new UploadOperationManager();
// Create an operation
const op = manager.createOperation('/path/to/files', 'storeName', true);
// Update progress
manager.updateProgress(op.id, 5, 2, 0);
// Mark complete
manager.markCompleted(op.id);Custom Storage for Operations
import { UploadOperationManager, OperationStorage, UploadOperation } from '@allenhutchison/gemini-utils';
// Implement your own storage
class MyStorage implements OperationStorage {
private db: Map<string, UploadOperation> = new Map();
get(id: string) { return this.db.get(id); }
set(id: string, op: UploadOperation) { this.db.set(id, op); }
getAll() { return Object.fromEntries(this.db); }
}
const manager = new UploadOperationManager(new MyStorage());Deep Research
import { GoogleGenAI } from '@google/genai';
import { ResearchManager, ReportGenerator } from '@allenhutchison/gemini-utils';
const client = new GoogleGenAI({ apiKey: 'your-api-key' });
const researcher = new ResearchManager(client);
// Start a research task
const interaction = await researcher.startResearch({
input: 'What are the latest developments in quantum computing?',
// Optional: ground with file search stores
fileSearchStoreNames: ['stores/my-documents'],
});
// Poll until complete
const completed = await researcher.poll(interaction.id);
// Generate a markdown report with citations
const generator = new ReportGenerator();
const report = generator.generateMarkdown(completed.outputs ?? []);
console.log(report);Research with Custom Polling
import { ResearchManager, isTerminalStatus } from '@allenhutchison/gemini-utils';
const researcher = new ResearchManager(client);
const interaction = await researcher.startResearch({
input: 'Analyze the impact of AI on healthcare',
});
// Custom polling with status updates
let status = await researcher.getStatus(interaction.id);
while (!isTerminalStatus(status.status ?? '')) {
console.log(`Status: ${status.status}`);
await new Promise(r => setTimeout(r, 10000));
status = await researcher.getStatus(interaction.id);
}
if (status.status === 'completed') {
console.log('Research complete!');
} else if (status.status === 'failed') {
console.log('Research failed');
}Audio Transcription
import { GoogleGenAI } from '@google/genai';
import { TranscriptionManager, TranscriptFormatter } from '@allenhutchison/gemini-utils';
const client = new GoogleGenAI({ apiKey: 'your-api-key' });
const transcriber = new TranscriptionManager(client);
// Basic transcription
const result = await transcriber.transcribe({
audioSource: './recording.mp3',
});
console.log(result.text);
// Transcription with timestamps and speaker diarization
const detailed = await transcriber.transcribe({
audioSource: './meeting.wav',
timestamps: true,
diarization: true,
language: 'en',
model: 'gemini-3-flash-preview',
});
// Access timestamped segments
for (const segment of detailed.segments ?? []) {
console.log(`[${segment.startTime}s] ${segment.speaker}: ${segment.text}`);
}
// Format output in different formats
const formatter = new TranscriptFormatter();
const srt = formatter.toSRT(detailed); // SRT subtitles
const vtt = formatter.toVTT(detailed); // WebVTT subtitles
const json = formatter.toJSON(detailed); // Structured JSONLarge Audio File Upload
import { GoogleGenAI } from '@google/genai';
import { TranscriptionManager } from '@allenhutchison/gemini-utils';
const client = new GoogleGenAI({ apiKey: 'your-api-key' });
const transcriber = new TranscriptionManager(client);
// Upload large files (> 20MB) to Gemini File API first
const metadata = await transcriber.uploadAudioFile('./large-audio.mp3');
console.log(`Uploaded: ${metadata.uri}`);
// Then transcribe using the URI
const result = await transcriber.transcribe({
audioSource: metadata.uri,
timestamps: true,
});
// List and manage uploaded audio files
const files = await transcriber.listAudioFiles();
await transcriber.deleteAudioFile(files[0].name);API Reference
FileSearchManager
Manages file search stores in Google's Gemini API.
createStore(displayName)- Create a new storelistStores()- List all storesgetStore(name)- Get a store by namedeleteStore(name, force?)- Delete a storequeryStore(storeName, query, model?)- Query a storelistDocuments(storeName)- List documents in a storedeleteDocument(documentName)- Delete a document
FileUploader
Handles file uploads to Gemini File Search stores.
uploadFile(filePath, storeName, config?)- Upload a single fileuploadDirectory(dirPath, storeName, config?)- Upload a directorygetExistingFileHashes(storeName)- Get hashes for smart syncgetFileHash(filePath)- Compute SHA-256 hash of a file
MIME Utilities
getMimeType(filePath)- Get validated MIME typegetMimeTypeWithFallback(filePath)- Get MIME type with text/plain fallbackisExtensionSupported(ext)- Check if extension is supportedisExtensionSupportedWithFallback(ext)- Check including fallback extensionsgetSupportedExtensions()- List all supported extensionsgetFallbackExtensions()- List fallback extensions
ResearchManager
Manages deep research interactions with the Gemini API.
startResearch(params)- Start a new research interactiongetStatus(id)- Get current status and outputs of an interactionpoll(id, intervalMs?)- Poll until research completes (default: 5s interval)cancel(id)- Cancel a running research interactiondelete(id)- Delete a research interaction
ReportGenerator
Converts research outputs to formatted documents.
generateMarkdown(outputs)- Generate markdown report with citations
TranscriptionManager
Manages audio transcription interactions with the Gemini API.
transcribe(params, onProgress?)- Transcribe an audio fileuploadAudioFile(filePath)- Upload large audio files to Gemini File APIlistAudioFiles()- List uploaded audio filesdeleteAudioFile(name)- Delete an uploaded audio filegetStatus(id)- Get transcription interaction statuspoll(id, intervalMs?)- Poll until transcription completes (default: 2s interval)
TranscriptFormatter
Converts transcription results to various output formats.
format(result, format)- Format result to specified formattoPlainText(result)- Plain text outputtoTimestampedText(result)- Text with timestamps[00:00:05] Hello...toSRT(result)- SRT subtitle formattoVTT(result)- WebVTT subtitle formattoJSON(result)- Structured JSON output
Transcription Types
TranscribeParams- Configuration for transcription (audioSource, language, model, timestamps, diarization)TranscriptionResult- Result with text, segments, duration, and metadataTranscriptSegment- Timestamped segment with startTime, endTime, text, speakerTranscriptFormat- Output formats:text,timestamped,srt,vtt,jsonAudioFormat- Supported formats:mp3,wav,flac,aac,ogg,m4a,webm
Research Types
StartResearchParams- Configuration for starting researchInteraction- Research interaction objectInteractionStatus- Status types:in_progress,requires_action,completed,failed,cancelledisTerminalStatus(status)- Check if status indicates completionTERMINAL_STATUSES- Array of terminal status values
Error Classes
UnsupportedFileTypeError- Thrown for unsupported file typesFileSizeExceededError- Thrown when file exceeds 100MB limitFileUploadError- Wrapper for upload failuresUnsupportedAudioTypeError- Thrown for unsupported audio file typesAudioFileSizeExceededError- Thrown when audio file exceeds 2GB limit
Supported File Types
The library supports 36 validated MIME types plus 100+ text file extensions via fallback:
Validated types: PDF, XML, HTML, Markdown, C, Java, Python, Go, Kotlin, Perl, Lua, Erlang, TCL, BibTeX, diff
Fallback (as text/plain): JavaScript, TypeScript, JSON, CSS, SCSS, YAML, TOML, Shell scripts, Ruby, PHP, Rust, Swift, Scala, and many more.
Supported Audio Formats
For audio transcription, the following formats are supported:
- MP3 (
audio/mpeg) - WAV (
audio/wav) - FLAC (
audio/flac) - AAC (
audio/aac) - OGG (
audio/ogg) - M4A (
audio/mp4) - WebM (
audio/webm)
File size limits:
- Files ≤ 100MB: Uploaded inline (base64)
- Files > 100MB or when the total inline request payload exceeds ~20MB: Uploaded via Gemini File API (max 2GB)
Note: The 20MB limit applies to the total inline request payload (including text, system instructions, and all inline content), not individual files. Prefer using the Files API whenever your total inline request size might approach this limit.
License
MIT
