gemini-gaio
v1.0.4
Published
A Gemini API service using @google/genai
Readme
Gemini-Gaio: The All-in-One Gemini SDK 🚀
Unleash the full power of Google Gemini with one SDK.
Text, images, audio, video, music, code, streaming, and more—modular, type-safe, and fun.
✨ Why Gemini-Gaio?
- All-in-One: Every Gemini API feature in one package—text, image, audio, video, TTS, music, streaming, function calling, context, files, code execution, and more.
- Type-Safe & Modern: Built with strict TypeScript for maximum safety and autocompletion.
- Production-Ready: Robust error handling, logging, and modular design.
- Real-World Proven: Used in production for multimodal, agentic, and creative AI apps.
- Easy to Use: Simple, consistent APIs for every Gemini capability.
- Fun & Creative: Designed for hackers, artists, educators, and AI dreamers.
👩💻 Who is this for?
- AI engineers & ML researchers
- Creative coders & indie hackers
- Educators & students
- Startups & product teams
- Hackathon warriors
- Anyone who wants to build with Gemini's full multimodal power!
🎨 What can you build?
- AI tutors that reason, explain, and code
- Music bots that jam in real time
- Video narrators that turn text or images into movies
- Multimodal chatbots that see, hear, and speak
- Document analyzers for contracts, PDFs, and more
- Voice assistants with TTS and streaming
- Creative apps: meme generators, art critics, podcast summarizers, and more!
🏗️ How it works
graph TD;
User((User / App))
subgraph SDK[Gemini-Gaio SDK]
Core["Core (gemini-service.ts)"]
Text[TextService]
Image[ImageService]
Audio[AudioService]
Video[VideoService]
Music[LyriaMusicService]
Streaming[LiveApiService]
TTS[TextToSpeechService]
FunctionCalling[FunctionCallingService]
Files[FilesApiService]
Context[ContextCacheService]
Code[CodeExecutionService]
Grounding[GroundingService]
Structured[StructuredOutputService]
UrlContext[UrlContextService]
Veo[VeoVideoService]
Document[DocumentService]
Thinking[ThinkingService]
Token[TokenService]
Utils[Utils/Types]
Logger[Logger/Error Handling]
end
GoogleAPI((Google Gemini API))
User-->|calls|Core
Core-->|uses|Text
Core-->|uses|Image
Core-->|uses|Audio
Core-->|uses|Video
Core-->|uses|Music
Core-->|uses|Streaming
Core-->|uses|TTS
Core-->|uses|FunctionCalling
Core-->|uses|Files
Core-->|uses|Context
Core-->|uses|Code
Core-->|uses|Grounding
Core-->|uses|Structured
Core-->|uses|UrlContext
Core-->|uses|Veo
Core-->|uses|Document
Core-->|uses|Thinking
Core-->|uses|Token
Core-->|uses|Utils
Core-->|uses|Logger
Text-->|API|GoogleAPI
Image-->|API|GoogleAPI
Audio-->|API|GoogleAPI
Video-->|API|GoogleAPI
Music-->|API|GoogleAPI
Streaming-->|API|GoogleAPI
TTS-->|API|GoogleAPI
FunctionCalling-->|API|GoogleAPI
Files-->|API|GoogleAPI
Context-->|API|GoogleAPI
Code-->|API|GoogleAPI
Grounding-->|API|GoogleAPI
Structured-->|API|GoogleAPI
UrlContext-->|API|GoogleAPI
Veo-->|API|GoogleAPI
Document-->|API|GoogleAPI
Thinking-->|API|GoogleAPI
Token-->|API|GoogleAPI
subgraph Internals
Utils
Logger
end
Core-->|uses|Internals🛠️ Features
- Text, Image, Audio, Video, and Document Understanding
- Structured Output (JSON, Enum, Schema)
- Veo Video Generation (Text-to-Video, Image-to-Video)
- Text-to-Speech (TTS, Multi-Speaker)
- Lyria RealTime Music Generation
- Live API (Real-Time Streaming, Bidirectional)
- URL Context & Google Search Grounding
- Context Caching
- File Management (Upload, List, Delete)
- Token Counting & Usage Metadata
- Function Calling (OpenAPI subset, Parallel)
- Chain-of-Thought & Reasoning
- Python Code Execution (with output parsing)
- Strict TypeScript Types & Autocompletion
🚀 Installation
npm install gemini-gaio⚡ Quick Start
import { TextService, GeminiModel } from 'gemini-gaio';
const text = new TextService(process.env.GEMINI_API_KEY!);
const result = await text.generateText({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Write a poem about AI.',
});
console.log(result.text);🧑🔬 Usage Examples
Multimodal Chaining
import { ImageService, GeminiModel } from 'gemini-gaio';
const image = new ImageService(process.env.GEMINI_API_KEY!);
const multimodal = [
{ text: 'Describe this image and the sound:' },
{ inlineData: { mimeType: 'image/png', data: base64Img } },
{ inlineData: { mimeType: 'audio/mp3', data: base64Audio } },
];
const result = await image.generateImage({
model: GeminiModel.GEMINI_2_0_FLASH_PREVIEW_IMAGE_GENERATION,
contents: multimodal,
});Real-Time Streaming
import { LiveApiService, GeminiModel } from 'gemini-gaio';
const live = new LiveApiService(process.env.GEMINI_API_KEY!);
const session = await live.connectSession({
model: GeminiModel.GEMINI_2_0_FLASH_LIVE_001,
responseModality: 'TEXT',
callbacks: { onmessage: (msg) => console.log(msg) },
});
await live.sendText(session, 'Hello!');
await live.closeSession(session);Function Calling
import { FunctionCallingService, GeminiModel } from 'gemini-gaio';
const fc = new FunctionCallingService(process.env.GEMINI_API_KEY!);
const fnDecl = FunctionCallingService.createFunctionDeclaration({
name: 'getWeather',
description: 'Get weather by city',
parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
});
const resp = await fc.callWithFunctions({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'What is the weather in Paris?',
functionDeclarations: [fnDecl],
});Advanced: Chaining, Streaming, and Error Handling
import { TextService, AudioService, Logger } from 'gemini-gaio';
Logger.enabled = true;
async function narrateAndSpeak(textPrompt: string) {
try {
const textService = new TextService(process.env.GEMINI_API_KEY!);
const audioService = new AudioService(process.env.GEMINI_API_KEY!);
const narration = await textService.generateText({
model: 'gemini-2.5-pro-preview-05-06',
contents: textPrompt,
});
const audio = await audioService.generateSingleSpeakerSpeech({
model: 'gemini-2.5-flash-preview-tts',
text: narration.text,
voiceName: 'Kore',
});
// ...play or save audio Buffer
} catch (err) {
Logger.error('Narration failed:', err);
}
}🧩 API Overview
All services are available from the main entry point:
import {
TextService,
ImageService,
DocumentService,
AudioService,
VideoService,
StructuredOutputService,
VeoVideoService,
TextToSpeechService,
LyriaMusicService,
LiveApiService,
UrlContextService,
ContextCacheService,
FilesApiService,
TokenService,
FunctionCallingService,
GroundingService,
ThinkingService,
CodeExecutionService,
GeminiModel,
Logger,
} from 'gemini-gaio';See the full API documentation for detailed usage of each service.
Service Usage Examples
TextService
import { TextService, GeminiModel } from 'gemini-gaio';
const text = new TextService(process.env.GEMINI_API_KEY!);
// Basic text generation
const result = await text.generateText({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Write a poem about AI.',
});
console.log(result.text);
// Streaming text
for await (const chunk of await text.generateTextStream({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Stream a story, one sentence at a time.',
})) {
console.log(chunk.text);
}
// Multi-turn chat
const chat = text.createChat(GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06);
const reply = await chat.sendMessage({ message: 'Hello, who are you?' });
console.log(reply.text);ImageService
import { ImageService, GeminiModel } from 'gemini-gaio';
const image = new ImageService(process.env.GEMINI_API_KEY!);
const multimodal = [
{ text: 'Describe this image:' },
{ inlineData: { mimeType: 'image/png', data: base64Img } },
];
const result = await image.generateImage({
model: GeminiModel.GEMINI_2_0_FLASH_PREVIEW_IMAGE_GENERATION,
contents: multimodal,
});
result.forEach((part) => {
if (part.type === 'image') {
// Save or display image (base64)
} else {
console.log(part.data);
}
});AudioService
import { AudioService, GeminiModel } from 'gemini-gaio';
const audio = new AudioService(process.env.GEMINI_API_KEY!);
// Text-to-speech (single speaker)
const tts = await audio.generateSingleSpeakerSpeech({
model: GeminiModel.GEMINI_2_5_FLASH_PREVIEW_TTS,
text: 'Hello world',
voiceName: 'Kore',
});
// ...play or save tts Buffer...
// Audio file analysis (transcription)
const transcript = await audio.analyzeAudioFile({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
filePath: './audio.mp3',
prompt: 'Transcribe this audio.',
});
console.log(transcript);
// Count tokens in audio
const tokens = await audio.countAudioTokens({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
filePath: './audio.mp3',
});
console.log(tokens);VideoService
import { VideoService, GeminiModel } from 'gemini-gaio';
const video = new VideoService(process.env.GEMINI_API_KEY!);
// Analyze a local video file
const summary = await video.analyzeVideoFile({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
filePath: './video.mp4',
prompt: 'Summarize this video.',
});
console.log(summary);
// Analyze a YouTube video
const ytSummary = await video.analyzeYoutubeVideo({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
youtubeUrl: 'https://youtube.com/xyz',
prompt: 'Summarize this YouTube video.',
});
console.log(ytSummary);DocumentService
import { DocumentService, GeminiModel } from 'gemini-gaio';
const doc = new DocumentService(process.env.GEMINI_API_KEY!);
// Summarize a PDF from a URL
const summary = await doc.summarizeFromUrl({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
url: 'https://example.com/file.pdf',
});
console.log(summary);
// Summarize a large local PDF
const summaryLarge = await doc.summarizeLargeFromFile({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
filePath: './file.pdf',
});
console.log(summaryLarge);
// Summarize multiple documents
const multi = await doc.summarizeMultiple({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
docs: [
{ file: './file1.pdf', displayName: 'Doc1' },
{ file: 'https://example.com/file2.pdf', displayName: 'Doc2', isUrl: true },
],
prompt: 'Summarize all docs',
});
console.log(multi);FilesApiService
import { FilesApiService } from 'gemini-gaio';
const files = new FilesApiService(process.env.GEMINI_API_KEY!);
// Upload a file
const uploaded = await files.uploadFile({ file: './file.pdf', mimeType: 'application/pdf' });
console.log(uploaded);
// List files
const listed = await files.listFiles();
console.log(listed);
// Get file metadata
const fileMeta = await files.getFile(uploaded.name);
console.log(fileMeta);
// Delete a file
await files.deleteFile(uploaded.name);FunctionCallingService
import { FunctionCallingService, GeminiModel } from 'gemini-gaio';
const fc = new FunctionCallingService(process.env.GEMINI_API_KEY!);
const fnDecl = FunctionCallingService.createFunctionDeclaration({
name: 'getWeather',
description: 'Get weather by city',
parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
});
const resp = await fc.callWithFunctions({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'What is the weather in Paris?',
functionDeclarations: [fnDecl],
});
console.log(resp);
// Parallel function calling
const respParallel = await fc.callWithParallelFunctions({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Get weather and news for Paris.',
functionDeclarations: [fnDecl /*, ...other functions */],
});
console.log(respParallel);CodeExecutionService
import { CodeExecutionService, GeminiModel } from 'gemini-gaio';
const codeExec = new CodeExecutionService(process.env.GEMINI_API_KEY!);
// Single-turn code execution (Python)
const parts = await codeExec.executeCode({
model: GeminiModel.GEMINI_2_0_FLASH,
prompt: 'Calculate the sum of the first 50 prime numbers in Python.',
});
console.log(parts);
// Chat-based code execution
const chatHistory = [
{ role: 'user', parts: [{ text: 'I have a math question for you:' }] },
{ role: 'model', parts: [{ text: "Great! I'm ready for your math question." }] },
];
const chatParts = await codeExec.executeCodeChat({
model: GeminiModel.GEMINI_2_0_FLASH,
history: chatHistory,
message: 'What is the sum of the first 50 prime numbers?',
});
console.log(chatParts);StructuredOutputService
import { StructuredOutputService, GeminiModel } from 'gemini-gaio';
const struct = new StructuredOutputService(process.env.GEMINI_API_KEY!);
const schema = { type: 'object', properties: { name: { type: 'string' } }, required: ['name'] };
const result = await struct.generateStructuredOutput({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Extract the name.',
config: { responseMimeType: 'application/json', responseSchema: schema },
});
console.log(result);VeoVideoService
import { VeoVideoService, GeminiModel } from 'gemini-gaio';
const veo = new VeoVideoService(process.env.GEMINI_API_KEY!);
// Generate video from text
const uris = await veo.generateVideoFromText({
prompt: 'A cat playing piano.',
model: GeminiModel.VEO_2_0_GENERATE_001,
});
console.log(uris);
// Download the video
await veo.downloadVideo(uris[0], 'cat.mp4');TextToSpeechService
import { TextToSpeechService, GeminiModel } from 'gemini-gaio';
const tts = new TextToSpeechService(process.env.GEMINI_API_KEY!);
// Single-speaker TTS
const audioBuffer = await tts.generateSingleSpeakerSpeech({
model: GeminiModel.GEMINI_2_5_FLASH_PREVIEW_TTS,
text: 'Hello world',
voiceName: 'Kore',
});
// Multi-speaker TTS
const audioMulti = await tts.generateMultiSpeakerSpeech({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_TTS,
text: 'Amit: नमस्ते! John: Hello!',
speakers: [
{ speaker: 'Amit', voiceName: 'Sadaltager' },
{ speaker: 'John', voiceName: 'Kore' },
],
});LyriaMusicService
import { LyriaMusicService } from 'gemini-gaio';
const lyria = new LyriaMusicService(process.env.GEMINI_API_KEY!);
const session = lyria.connectSession({
onMessage: (msg) => {
/* handle PCM audio */
},
});
await lyria.setWeightedPrompts(session, [{ text: 'jazz', weight: 1 }]);
await lyria.play(session);LiveApiService
import { LiveApiService, GeminiModel } from 'gemini-gaio';
const live = new LiveApiService(process.env.GEMINI_API_KEY!);
const session = await live.connectSession({
model: GeminiModel.GEMINI_2_0_FLASH_LIVE_001,
responseModality: 'TEXT',
callbacks: {
onmessage: (msg) => {
/* handle stream */
},
},
});
await live.sendText(session, 'Hello!');
await live.closeSession(session);UrlContextService
import { UrlContextService, GeminiModel } from 'gemini-gaio';
const urlCtx = new UrlContextService(process.env.GEMINI_API_KEY!);
const result = await urlCtx.generateWithUrlContext({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Summarize https://en.wikipedia.org/wiki/AI',
});
console.log(result);ContextCacheService
import { ContextCacheService, GeminiModel } from 'gemini-gaio';
const cache = new ContextCacheService(process.env.GEMINI_API_KEY!);
const created = await cache.createCache({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
fileUris: [{ uri: 'gs://...', mimeType: 'application/pdf' }],
});
const result = await cache.generateWithCache({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: 'Use the cached doc',
cacheName: created.name,
});
console.log(result);TokenService
import { TokenService, GeminiModel } from 'gemini-gaio';
const tokens = new TokenService(process.env.GEMINI_API_KEY!);
// Count tokens in text
const count = await tokens.countTextTokens(
GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
'How many tokens?',
);
console.log(count);🧑🏫 Best Practices
- API Key Management: Use environment variables, never hardcode keys.
- Streaming: Use streaming APIs for low-latency, real-time UX.
- Error Handling: Catch and log errors with the built-in Logger. Use try/catch in async flows.
- Prompt Design: Be explicit and clear in prompts for best results.
- Large Files: Use the File API for files >20MB.
- TypeScript: Leverage strict types for safety and autocompletion.
- Testing: Mock Gemini services for unit tests.
🧠 Technical Details
- Strict TypeScript: All services and types are fully typed.
- Error Handling: Custom error classes for API, validation, and file errors.
- Extensible: Add your own services or extend existing ones.
- Modular: Import only what you need.
- Performance: Uses async/await, streaming, and efficient batching where possible.
❓ FAQ
Q: Can I use this with Next.js, Vercel, or serverless? A: Yes! All services are pure TypeScript/Node and work in any modern JS runtime (Node 18+, serverless, etc).
Q: How do I handle large files?
A: Use the File API methods for files >20MB. See FilesApiService and DocumentService examples.
Q: How do I stream audio or text?
A: Use LiveApiService for real-time streaming, or the streaming methods on TextService and AudioService.
Q: Is this safe for production? A: Yes! Strict typing, error handling, and logging are built in. See Best Practices above.
Q: Can I chain multiple modalities? A: Absolutely! See the advanced chaining example above.
Q: How do I contribute? A: Fork, branch, code, PR! See the Contributing section below.
🌟 Showcase & Inspiration
- [ ] AI Podcast Summarizer: Summarize and transcribe podcasts with audio + text.
// AI Podcast Summarizer Example
import { AudioService, DocumentService, GeminiModel } from 'gemini-gaio';
const audio = new AudioService(process.env.GEMINI_API_KEY!);
const doc = new DocumentService(process.env.GEMINI_API_KEY!);
// 1. Transcribe podcast audio
const transcript = await audio.analyzeAudioFile({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
filePath: './podcast.mp3',
prompt: 'Transcribe this podcast episode.',
});
// 2. Summarize the transcript
const summary = await doc.summarizeFromUrl({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
url: 'data:text/plain;base64,' + Buffer.from(transcript).toString('base64'),
});
console.log('Summary:', summary);- [ ] Real-Time Music Jam Bot: Use LyriaMusicService for collaborative music generation.
// Real-Time Music Jam Bot Example
import { LyriaMusicService } from 'gemini-gaio';
const lyria = new LyriaMusicService(process.env.GEMINI_API_KEY!);
const session = lyria.connectSession({
onMessage: (msg) => {
// Handle PCM audio chunk (play or stream to users)
console.log('Music chunk received', msg);
},
onError: (err) => console.error('Music error:', err),
onClose: () => console.log('Session closed'),
});
await lyria.setWeightedPrompts(session, [
{ text: 'funky jazz', weight: 1 },
{ text: 'saxophone solo', weight: 0.5 },
]);
await lyria.play(session);- [ ] Video-to-Story Generator: Turn videos into illustrated stories with VideoService + ImageService.
// Video-to-Story Generator Example
import { VideoService, ImageService, GeminiModel } from 'gemini-gaio';
const video = new VideoService(process.env.GEMINI_API_KEY!);
const image = new ImageService(process.env.GEMINI_API_KEY!);
// 1. Summarize video into story scenes
const story = await video.analyzeVideoFile({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
filePath: './movie.mp4',
prompt: 'Break this video into 5 illustrated story scenes.',
});
// 2. Generate an image for each scene
const scenes = story.split('\n').filter(Boolean);
for (const scene of scenes) {
const img = await image.generateImage({
model: GeminiModel.GEMINI_2_0_FLASH_PREVIEW_IMAGE_GENERATION,
contents: [{ text: scene }],
});
// Save or display img[0].data (base64)
console.log('Scene:', scene, 'Image:', img[0].data);
}- [ ] Multimodal Chatbot: Build a bot that sees, hears, and speaks.
// Multimodal Chatbot Example
import { TextService, AudioService, ImageService, GeminiModel } from 'gemini-gaio';
const text = new TextService(process.env.GEMINI_API_KEY!);
const audio = new AudioService(process.env.GEMINI_API_KEY!);
const image = new ImageService(process.env.GEMINI_API_KEY!);
// User sends text, image, and audio
const userInput = [
{ text: 'What is happening in this image and audio?' },
{ inlineData: { mimeType: 'image/png', data: base64Img } },
{ inlineData: { mimeType: 'audio/mp3', data: base64Audio } },
];
const response = await text.generateText({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
contents: userInput,
});
console.log('Bot says:', response.text);
// Optionally, reply with TTS
const tts = await audio.generateSingleSpeakerSpeech({
model: GeminiModel.GEMINI_2_5_FLASH_PREVIEW_TTS,
text: response.text,
voiceName: 'Kore',
});
// ...play tts Buffer...- [ ] Contract Analyzer: Bulk-analyze legal docs with DocumentService.
// Contract Analyzer Example
import { DocumentService, GeminiModel } from 'gemini-gaio';
const doc = new DocumentService(process.env.GEMINI_API_KEY!);
const contracts = [
{ file: './contract1.pdf', displayName: 'Contract 1' },
{ file: './contract2.pdf', displayName: 'Contract 2' },
];
const analysis = await doc.summarizeMultiple({
model: GeminiModel.GEMINI_2_5_PRO_PREVIEW_05_06,
docs: contracts,
prompt: 'Summarize the key obligations and risks in each contract.',
});
console.log('Analysis:', analysis);📚 Learn More
🤝 Contributing
Contributions, bug reports, and feature requests are welcome! Please open an issue or pull request on GitHub.
- Fork the repo and create your branch from
main. - Add/fix your feature or bug.
- Add/adjust tests if needed.
- Open a PR and describe your changes.
💬 Support & Contact
- For questions, open a GitHub Issue
- For commercial support or consulting, contact Bryan Antoine
📝 License
MIT
