@villutur/gemini-ai-lib
v0.6.6
Published
Reusable TypeScript wrappers and helpers for Google's Gemini API.
Downloads
1,155
Maintainers
Readme
gemini-ai-lib
Reusable TypeScript wrappers around the Gemini SDK for applications, tools, and libraries that want a cleaner typed integration surface.
This is my Gemini AI library/helper that I use in my own projects. I am sharing it in case someone else finds it useful.
If you want a very basic place to try the current functionality and see implementation examples, take a look at gemini-ai-lib-playground.
PRs are welcome.
Changes In 0.6.6
@google/genaiis updated to^2.6.0.- The package now documents Node.js
>=20.0.0, matching the upstream SDK runtime requirement.
Changes In 0.6.5
gemini-3.5-flashis now included in the text and Interactions catalogs.- Text and chat helpers now default to
gemini-3.5-flash; the previousgemini-3-flash-previewmodel remains available in the catalogs for compatibility. - Gemini 3.5 Flash uses
thinkingLevel(minimal,low,medium,high) for thinking control. Google recommends leaving Gemini 3.x sampling parameters at their model defaults unless a consumer has a specific reason to override them.
Breaking Changes In 0.6.4
@google/genaiis upgraded to^2.1.0. Consumers should run their normal typecheck after upgrading because SDK type names and Interactions response shapes changed.- Interactions responses use the new
stepsschema. Readinteraction.stepsand streamstep.*events instead of legacyoutputs/content-delta shapes. - The text and Interactions catalogs now use
gemini-3.1-flash-liteinstead ofgemini-3.1-flash-lite-preview. - The embedding catalog now uses
gemini-embedding-2instead ofgemini-embedding-2-preview. Existing indexes still need a full reindex when switching between embedding model families. GeminiLiveChatSessionnow defaults togemini-3.1-flash-live-preview. The previousgemini-2.5-flash-native-audio-preview-12-2025model remains inGEMINI_LIVE_MODELSfor compatibility.- Gemini 3.1 Flash Live uses
thinkingLevel(minimal,low,medium,high) instead ofthinkingBudget; proactive audio and affective dialogue are not advertised for that model.
Purpose
- Keep reusable Gemini SDK wiring out of individual projects
- Provide small, composable services for text, embeddings, chat, image, music, audio, video, and live workflows
- Let consuming projects own app-specific validation, model allowlists, route contracts, and user-facing error handling
Services
GeminiBaseService: shared client setup, API key resolution, and tool configurationGeminiTextService: one-shot text generation helpersGeminiEmbeddingService: text and multimodal embedding helpersGeminiChatService: persistent multi-turn chat wrapperGeminiAudioService: text-to-speech generation helpersGeminiMusicService: non-realtime Lyria music generation helpersGeminiImageService: image generation and SVG generation helpersGeminiVideoService: Veo video generation and operation polling helpersGeminiInteractionsService: raw Gemini Interactions API wrapper for models and agentsGeminiLiveChatSession: real-time live-session wrapper, currently client-side only
Helpers, Catalogs, and Metadata Exports
createGeminiThinkingConfig(...): reusable low-level thinking-config helpercreateGeminiThinkingConfigForModel(...): model-aware helper that keepsthinkingLevelandthinkingBudgetaligned with the selected Gemini modelnormalizeGeminiResponseMetadata(...): reusable response-metadata normalizer for latency, finish reason, response ids, and usage payloadsGeminiAttachmentHelper: browser and server helpers for turning files and buffers into GeminiPartobjects- structured logging contracts and logger adapters for injecting app-owned sinks into Gemini request flows
createGeminiLiveAudioWorkletModuleUrl(...): browser helper for creating a working AudioWorklet module URL for Gemini Live microphone captureGEMINI_LIVE_AUDIO_WORKLET_SOURCE: bundled worklet source for consumers that want to self-host or inspect the processorGEMINI_TEXT_MODELS: shared text-model list for consumer model pickersGEMINI_TEXT_MODEL_DISPLAY_NAMES: user-facing labels for known text modelsGEMINI_INTERACTION_MODELS: shared Interactions model list for consumer model pickersGEMINI_INTERACTION_MODEL_DISPLAY_NAMES: user-facing labels for known Interactions modelsGEMINI_INTERACTION_AGENTS: shared Interactions agent list, including Deep Research preview agentsGEMINI_INTERACTION_AGENT_DISPLAY_NAMES: user-facing labels for known Interactions agentsGEMINI_EMBEDDING_MODELS: shared embedding-model list for consumer model pickersGEMINI_EMBEDDING_MODEL_DISPLAY_NAMES: user-facing labels for known embedding modelsGEMINI_AUDIO_MODELS: shared audio/TTS model list for consumer model pickersGEMINI_AUDIO_MODEL_DISPLAY_NAMES: user-facing labels for known audio/TTS modelsGEMINI_AUDIO_VOICES: curated prebuilt Gemini voice names for audio/TTS pickersGEMINI_AUDIO_VOICE_CATALOG: richer voice metadata including sample URLs and descriptive traitsGEMINI_MUSIC_MODELS: shared music-generation model list for consumer model pickersGEMINI_MUSIC_MODEL_DISPLAY_NAMES: user-facing labels for known music modelsGEMINI_VIDEO_MODELS: shared video-generation model list for consumer model pickersGEMINI_VIDEO_MODEL_DISPLAY_NAMES: user-facing labels for known video modelsGEMINI_IMAGE_MODELS: shared allowlist covering Gemini image models plusimagen-4.0-generate-001GEMINI_LIVE_MODELS: shared live-model list for real-time voice/video flowsGEMINI_LIVE_MODEL_DISPLAY_NAMES: user-facing labels for known live modelsGEMINI_IMAGE_MODEL_CAPABILITIES: model-aware image limits and supported config options for dynamic UIsGEMINI_TEXT_MODEL_CAPABILITIES: model-aware text limits and supported config options for dynamic UIsGEMINI_INTERACTION_MODEL_CAPABILITIES: model-aware Interactions support hints and config options for dynamic UIsGEMINI_EMBEDDING_MODEL_CAPABILITIES: model-aware embedding limits and supported config options for dynamic UIsGEMINI_AUDIO_MODEL_CAPABILITIES: model-aware audio/TTS limits and supported config options for dynamic UIsGEMINI_MUSIC_MODEL_CAPABILITIES: model-aware music limits and supported config options for dynamic UIsGEMINI_VIDEO_MODEL_CAPABILITIES: model-aware video limits and supported config options for dynamic UIsGEMINI_LIVE_MODEL_CAPABILITIES: model-aware live-session limits and supported config options for dynamic UIs- model-aware image handling that keeps Gemini image-model requests on their native output path while still allowing explicit output format control for Imagen where the API supports it
Installation
pnpm add @villutur/gemini-ai-libSupported Models
The package exports model catalogs for the currently supported and curated model IDs below.
Text
gemini-2.5-flash-litegemini-2.5-flashgemini-2.5-progemini-3.5-flashgemini-3-flash-previewgemini-3.1-flash-litegemini-3.1-pro-preview
Embeddings
gemini-embedding-001gemini-embedding-2
Image
gemini-2.5-flash-imagegemini-3.1-flash-image-previewgemini-3-pro-image-previewimagen-4.0-generate-001
Audio
gemini-3.1-flash-tts-previewgemini-2.5-flash-preview-ttsgemini-2.5-pro-preview-tts
Music
lyria-3-clip-previewlyria-3-pro-preview
Video
veo-3.1-generate-previewveo-3.1-fast-generate-preview
Live
gemini-3.1-flash-live-previewgemini-2.5-flash-native-audio-preview-12-2025
Interactions
Models:
gemini-2.5-flash-litegemini-2.5-flashgemini-2.5-progemini-3.1-flash-litegemini-3.5-flashgemini-3-flash-previewgemini-3.1-pro-previewlyria-3-clip-previewlyria-3-pro-preview
Agents:
deep-research-pro-preview-12-2025deep-research-preview-04-2026deep-research-max-preview-04-2026
Usage
Server-side usage is the default and preferred integration path.
GeminiLiveChatSession is the main exception right now: it currently depends
on browser APIs such as navigator.mediaDevices, AudioContext, and
AudioWorkletNode, so it only works in client-side runtime contexts.
import { GeminiTextService } from "@villutur/gemini-ai-lib";
const textService = new GeminiTextService({
apiKey: process.env.GEMINI_API_KEY,
});
const response = await textService.generateTextString("Summarize the current rollout status in three bullets.", {
model: "gemini-3.5-flash",
systemInstruction: "Answer like a pragmatic product engineer. Be concise and explicit.",
temperature: 0.4,
});generateContent(...) is now the canonical one-shot text method. The older
generateText(...) method is still available as a deprecated backward-compatible
alias.
You can also import the most common Gemini SDK types directly from
@villutur/gemini-ai-lib instead of mixing imports from @google/genai:
import {
GeminiTextService,
type ContentListUnion,
type GenerateContentResponse,
type Part,
} from "@villutur/gemini-ai-lib";
const parts: Part[] = [
{ text: "Summarize this rollout update in two bullets." },
{ text: "Team A completed the migration, but monitoring still needs follow-up." },
];
const contents: ContentListUnion = [{ role: "user", parts }];
const textService = new GeminiTextService({
apiKey: process.env.GEMINI_API_KEY,
});
const response: GenerateContentResponse = await textService.generateContent(contents, {
model: "gemini-3.5-flash",
});The root package also re-exports a curated image-focused subset of
@google/genai types and values such as ImageConfig,
GenerateImagesConfig, GenerateImagesResponse, GeneratedImage,
GeneratedImageMask, and PersonGeneration.
The root package also re-exports the SDK Interactions type namespace plus
library aliases such as GeminiInteraction, GeminiInteractionCreateParams,
GeminiInteractionSSEEvent, and common Interactions step/content types.
Persistent text chat can layer on top of GeminiChatService while still
letting the consuming project own validation and request shaping.
import { createGeminiTextChatHistory, GeminiChatService } from "@villutur/gemini-ai-lib";
const chatService = new GeminiChatService({
apiKey: process.env.GEMINI_API_KEY,
model: "gemini-3.5-flash",
history: createGeminiTextChatHistory([
{
role: "user",
text: "Give me the safest rollout order for Prompt Workbench v1.",
},
{
role: "model",
text: "Start with internal dogfooding, then expand to a small feature-flagged cohort.",
},
]),
});
const text = await chatService.sendMessageString("Now add the top two risks and a rollback trigger.");GeminiInteractionsService is a thin wrapper around the Gemini Interactions
API. It returns SDK responses unchanged and leaves session storage, tool
execution loops, retention UI, and user-facing error mapping to the consuming
app. The Interactions API is beta, so schemas may change.
Interaction objects are stored by default by the Gemini API (store=true) so
they can be retrieved, continued with previous_interaction_id, or run in the
background. Set store: false when a consumer explicitly wants to opt out, but
stateful continuation and background behavior depend on stored interactions.
For Gemini 3.5 Flash, use Interactions generationConfig.thinkingLevel
(minimal, low, medium, or high) for thinking effort. Function results
must match the preceding function-call id, name, and response count.
import { GeminiInteractionsService, type GeminiInteractionCreateParams } from "@villutur/gemini-ai-lib";
const interactions = new GeminiInteractionsService({
apiKey: process.env.GEMINI_API_KEY,
});
const first = await interactions.create({
model: "gemini-3.5-flash",
input: "Give me a one-sentence project risk summary.",
} satisfies GeminiInteractionCreateParams);
const followUp = await interactions.create({
model: "gemini-3.5-flash",
previous_interaction_id: first.id,
input: "Now suggest the safest next action.",
});
const retrieved = await interactions.get(followUp.id, { include_input: true });
const finalText = retrieved.steps
.find((step) => step.type === "model_output")
?.content?.find((content) => content.type === "text")?.text;
const stream = await interactions.createStream({
model: "gemini-3.5-flash",
input: "Stream a concise status update.",
stream: true,
});
for await (const event of stream) {
console.log(event.event_type);
}
const research = await interactions.create({
agent: "deep-research-pro-preview-12-2025",
input: "Research current SVG editor automation approaches.",
background: true,
});
await interactions.cancel(research.id);
console.log(retrieved.status, finalText);Projects can also inject their own structured logger adapter when they want Gemini request lifecycle events to land in an app-owned sink.
import { GeminiTextService, type LoggerAdapter } from "@villutur/gemini-ai-lib";
const logger: LoggerAdapter = {
log(event) {
console.log(event.source, event.level, event.message, event.metadata);
},
};
const textService = new GeminiTextService({
apiKey: process.env.GEMINI_API_KEY,
logger,
});Projects that own model policy in their own codebase can also reuse the library's thinking-config and response-metadata helpers without giving up control of route contracts or storage.
import {
createGeminiThinkingConfigForModel,
normalizeGeminiResponseMetadata,
GeminiTextService,
} from "@villutur/gemini-ai-lib";
const service = new GeminiTextService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await service.generateContent("Compare the rollout risks in three bullets.", {
model: "gemini-3.1-pro-preview",
thinkingConfig: createGeminiThinkingConfigForModel("gemini-3.1-pro-preview", {
includeThoughts: false,
}),
});
const telemetry = normalizeGeminiResponseMetadata(result);Consumers can also render text-model selectors directly from shared exports:
import { GEMINI_TEXT_MODELS, GEMINI_TEXT_MODEL_DISPLAY_NAMES } from "@villutur/gemini-ai-lib";
const options = GEMINI_TEXT_MODELS.map((model) => ({
value: model,
label: GEMINI_TEXT_MODEL_DISPLAY_NAMES[model],
}));Interactions model and agent pickers can use separate catalogs:
import {
GEMINI_INTERACTION_AGENTS,
GEMINI_INTERACTION_MODELS,
getInteractionAgentDisplayName,
getInteractionModelDisplayName,
} from "@villutur/gemini-ai-lib";
const interactionModelOptions = GEMINI_INTERACTION_MODELS.map((model) => ({
value: model,
label: getInteractionModelDisplayName(model),
}));
const interactionAgentOptions = GEMINI_INTERACTION_AGENTS.map((agent) => ({
value: agent,
label: getInteractionAgentDisplayName(agent),
}));Embedding model pickers can use the same pattern:
import { GEMINI_EMBEDDING_MODELS, getEmbeddingModelDisplayName } from "@villutur/gemini-ai-lib";
const embeddingOptions = GEMINI_EMBEDDING_MODELS.map((model) => ({
value: model,
label: getEmbeddingModelDisplayName(model),
}));Audio/TTS model pickers can use the same pattern:
import { GEMINI_AUDIO_MODELS, getAudioModelDisplayName } from "@villutur/gemini-ai-lib";
const audioOptions = GEMINI_AUDIO_MODELS.map((model) => ({
value: model,
label: getAudioModelDisplayName(model),
}));Audio/TTS voice pickers can also use the exported curated voice catalog:
import { GEMINI_AUDIO_VOICES, GeminiAudioService } from "@villutur/gemini-ai-lib";
const selectedVoice = GEMINI_AUDIO_VOICES[0] ?? "Kore";
const audioService = new GeminiAudioService();
const audioBuffer = await audioService.generateAudio("Hello from Gemini TTS.", undefined, {
model: "gemini-3.1-flash-tts-preview",
voiceName: selectedVoice,
});Music model pickers can use the same pattern:
import { GEMINI_MUSIC_MODELS, getMusicModelDisplayName } from "@villutur/gemini-ai-lib";
const musicOptions = GEMINI_MUSIC_MODELS.map((model) => ({
value: model,
label: getMusicModelDisplayName(model),
}));Video model pickers can use the same pattern:
import { GEMINI_VIDEO_MODELS, getVideoModelDisplayName } from "@villutur/gemini-ai-lib";
const videoOptions = GEMINI_VIDEO_MODELS.map((model) => ({
value: model,
label: getVideoModelDisplayName(model),
}));Live model pickers can use the same pattern:
import { GEMINI_LIVE_MODELS, getLiveModelDisplayName } from "@villutur/gemini-ai-lib";
const liveOptions = GEMINI_LIVE_MODELS.map((model) => ({
value: model,
label: getLiveModelDisplayName(model),
}));Consumers that need model-aware config dialogs can build controls directly from capability and option exports:
import { getImageModelCapabilities, getImageModelConfigOptions } from "@villutur/gemini-ai-lib";
const modelId = "imagen-4.0-generate-001";
const capabilities = getImageModelCapabilities(modelId);
const optionDescriptors = getImageModelConfigOptions(modelId);
// Example policy in consumer code:
// max attachment slots = model limit - attachments already reserved elsewhere
const reservedAttachmentSlots = 2;
const maxReferenceImages = capabilities.attachmentLimits.maxReferenceImages ?? 0;
const remainingAttachmentBudget = Math.max(0, maxReferenceImages - reservedAttachmentSlots);Embedding consumers can also drive model-aware retrieval controls from shared capability exports:
import { getEmbeddingModelCapabilities, getEmbeddingModelConfigOptions } from "@villutur/gemini-ai-lib";
const embeddingModel = "gemini-embedding-2";
const embeddingCapabilities = getEmbeddingModelCapabilities(embeddingModel);
const embeddingOptionDescriptors = getEmbeddingModelConfigOptions(embeddingModel);
const supportsMultimodalRetrieval = embeddingCapabilities.inputLimits.supportsMultimodalInput;
const recommendedDimensions = embeddingCapabilities.outputLimits.recommendedOutputDimensions;Embedding support in v1 uses the stable Gemini API embedContent(...) flow.
Vertex-only options such as mimeType and autoTruncate are intentionally not
part of the typed runtime API.
Audio/TTS consumers can also drive their config controls from model-aware capability exports:
import {
getAudioModelCapabilities,
getAudioModelConfigOptions,
getAudioVoiceOptions,
} from "@villutur/gemini-ai-lib";
const audioModel = "gemini-3.1-flash-tts-preview";
const audioCapabilities = getAudioModelCapabilities(audioModel);
const audioOptionDescriptors = getAudioModelConfigOptions(audioModel);
const audioVoices = getAudioVoiceOptions();
const supportsDialogue = audioCapabilities.speakerLimits.supportsMultiSpeaker;
const maxSpeakers = audioCapabilities.speakerLimits.maxSpeakers ?? 1;
const defaultVoice = audioCapabilities.defaultVoiceName;GeminiAudioService.generateAudio(...) defaults to
gemini-3.1-flash-tts-preview. The model is documented as text input,
audio output, 8,192 input tokens, 16,384 output tokens, multi-speaker TTS with
up to two speakers, and Batch API capable.
The current @google/genai SDK surface exposes voiceName,
languageCode, responseModalities, and multiSpeakerVoiceConfig for TTS
request shaping. The Gemini docs also mention controls such as speaking rate,
pitch, and volume gain, but those are not exported here until the SDK exposes
a stable typed contract for them.
The exported voice catalog is consumer guidance metadata for building pickers
and defaults. GenerateAudioOptions.voiceName intentionally remains a plain
string so callers can stay forward-compatible with newly available voices.
GEMINI_LIVE_CONFIG_OPTIONS.voiceName uses the same curated voice-name list,
so Live clients can build consistent voice pickers from the same exported
catalog.
Music consumers can drive Lyria config controls from model-aware capability exports:
import { getMusicModelCapabilities, getMusicModelConfigOptions } from "@villutur/gemini-ai-lib";
const musicModel = "lyria-3-pro-preview";
const musicCapabilities = getMusicModelCapabilities(musicModel);
const musicOptionDescriptors = getMusicModelConfigOptions(musicModel);
const supportsImageGuidance = musicCapabilities.attachmentLimits.supportsImageInput;
const bestFor = musicCapabilities.outputLimits.bestFor;Lyria music generation in v1 is intentionally based on the official stable
generateContent(...) flow. Richer controls such as BPM, intensity,
generate-lyrics, and vocal-type remain prompt-driven and are not part of the
typed runtime API yet.
Video consumers can also drive Veo controls from model-aware capability exports:
import { getVideoModelCapabilities, getVideoModelConfigOptions } from "@villutur/gemini-ai-lib";
const videoModel = "veo-3.1-fast-generate-preview";
const videoCapabilities = getVideoModelCapabilities(videoModel);
const videoOptionDescriptors = getVideoModelConfigOptions(videoModel);
const maxReferenceImages = videoCapabilities.attachmentLimits.maxReferenceImages ?? 0;
const supportsVideoExtension = videoCapabilities.attachmentLimits.supportsVideoInput;Video generation is long-running and operation-based. The video service normalizes operation polling, while downloading generated files remains under consumer control through the underlying SDK client.
The official Gemini video docs may describe more knobs over time, but this
library only exports video config metadata for the stable @google/genai
2.x contract plus explicitly typed generateVideos(...) config fields.
Live-session UIs can use the same capability pattern:
import { getLiveModelCapabilities, getLiveModelConfigOptions } from "@villutur/gemini-ai-lib";
const liveModel = "gemini-3.1-flash-live-preview";
const liveCapabilities = getLiveModelCapabilities(liveModel);
const liveOptions = getLiveModelConfigOptions(liveModel);Live sessions are currently client-side only because they depend on
browser audio and media APIs. GeminiLiveChatSession also ships with a bundled
working AudioWorklet, so you do not need to host your own
/audio-processor.js just to get started:
import { GeminiLiveChatSession } from "@villutur/gemini-ai-lib";
const liveSession = new GeminiLiveChatSession({
apiKey: process.env.NEXT_PUBLIC_GEMINI_API_KEY,
model: "gemini-3.1-flash-live-preview",
systemInstruction: "You are a concise voice assistant.",
voiceName: "Aoede",
thinkingLevel: "minimal",
onSetupComplete() {
console.log("Live session ready.");
},
onOutputTranscription(text, isFinal) {
console.log("Model said:", text, isFinal ? "(final)" : "(partial)");
},
onError(error) {
console.error("Live session error:", error);
},
});
await liveSession.connect("Say hello and ask how you can help.");If you want to manage the worklet module URL explicitly, you can use the exported helper:
import {
createGeminiLiveAudioWorkletModuleUrl,
GeminiLiveChatSession,
revokeGeminiLiveAudioWorkletModuleUrl,
} from "@villutur/gemini-ai-lib";
const audioWorkletModulePath = createGeminiLiveAudioWorkletModuleUrl();
try {
const liveSession = new GeminiLiveChatSession({
apiKey: process.env.NEXT_PUBLIC_GEMINI_API_KEY,
audioWorkletModulePath,
});
await liveSession.connect();
} finally {
revokeGeminiLiveAudioWorkletModuleUrl(audioWorkletModulePath);
}Embedding generation supports both text-only and multimodal inputs:
import { GeminiEmbeddingService } from "@villutur/gemini-ai-lib";
const embeddingService = new GeminiEmbeddingService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await embeddingService.embedText("How do I build a robust semantic search index?", {
model: "gemini-embedding-001",
taskType: "RETRIEVAL_QUERY",
outputDimensionality: 768,
});
const firstVector = result.embedding;You can also embed multiple text entries in one request:
const batchResult = await embeddingService.embedTexts(
[
"What is the meaning of life?",
"What is the purpose of existence?",
"How do I bake a cake?",
],
{
model: "gemini-embedding-001",
taskType: "SEMANTIC_SIMILARITY",
},
);
const vectors = batchResult.embeddings;gemini-embedding-2 also supports multimodal embedding. One content
entry with multiple parts returns one aggregated embedding, while multiple
entries return multiple embeddings:
import { GeminiAttachmentHelper, GeminiEmbeddingService } from "@villutur/gemini-ai-lib";
import { readFile } from "node:fs/promises";
const imageBuffer = await readFile("./dog.png");
const imagePart = GeminiAttachmentHelper.CreateFromBuffer(imageBuffer, "image/png");
const embeddingService = new GeminiEmbeddingService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await embeddingService.embedContent(
{
parts: [{ text: "An image of a dog" }, imagePart],
},
{
model: "gemini-embedding-2",
taskType: "RETRIEVAL_DOCUMENT",
outputDimensionality: 1536,
},
);
const aggregatedVector = result.embedding;Reduced dimensions such as 768 and 1536 can be a strong storage/latency
tradeoff, but consumer-side normalization may still be appropriate for
similarity-focused use cases. Switching between gemini-embedding-001 and
gemini-embedding-2 requires reindexing because the vector spaces are
not interchangeable.
Music generation uses Lyria through the stable generateContent(...) path:
import { GeminiMusicService } from "@villutur/gemini-ai-lib";
const musicService = new GeminiMusicService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await musicService.generateMusicFromPrompt(
"Create a 30-second cheerful acoustic folk track with guitar and harmonica.",
{
model: "lyria-3-clip-preview",
},
);
const firstClip = result.audioBuffer;
const description = result.text;You can also guide Lyria with an image reference:
import { readFile } from "node:fs/promises";
import { GeminiAttachmentHelper, GeminiMusicService } from "@villutur/gemini-ai-lib";
const coverArt = await readFile("./reference-cover.jpg");
const imagePart = GeminiAttachmentHelper.CreateFromBuffer(coverArt, "image/jpeg");
const musicService = new GeminiMusicService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await musicService.generateMusicFromImage(
imagePart,
"Create warm cinematic music that matches the color and mood of the image.",
{
model: "lyria-3-pro-preview",
},
);Lyria RealTime is out of scope for this package surface today and is tracked in
docs/future-work.md.
For lightweight UI/model-picker usage without importing runtime services, you can also import model catalogs directly from the subpath entry:
import {
GEMINI_TEXT_MODELS,
GEMINI_EMBEDDING_MODELS,
GEMINI_AUDIO_MODELS,
GEMINI_AUDIO_VOICES,
GEMINI_AUDIO_VOICE_CATALOG,
GEMINI_MUSIC_MODELS,
GEMINI_VIDEO_MODELS,
GEMINI_IMAGE_MODELS,
GEMINI_LIVE_MODELS,
GEMINI_INTERACTION_MODELS,
GEMINI_INTERACTION_AGENTS,
getInteractionAgentDisplayName,
getInteractionModelDisplayName,
getEmbeddingModelDisplayName,
getAudioModelDisplayName,
getAudioVoiceNames,
getAudioVoiceOptions,
getMusicModelDisplayName,
getVideoModelDisplayName,
getTextModelDisplayName,
getLiveModelDisplayName,
} from "@villutur/gemini-ai-lib/model-catalogs";Capability metadata is also available from a dedicated subpath entry:
import {
GEMINI_EMBEDDING_MODEL_CAPABILITIES,
GEMINI_EMBEDDING_CONFIG_OPTIONS,
GEMINI_AUDIO_MODEL_CAPABILITIES,
GEMINI_AUDIO_CONFIG_OPTIONS,
GEMINI_AUDIO_VOICES,
GEMINI_AUDIO_VOICE_CATALOG,
GEMINI_MUSIC_MODEL_CAPABILITIES,
GEMINI_MUSIC_CONFIG_OPTIONS,
GEMINI_VIDEO_MODEL_CAPABILITIES,
GEMINI_VIDEO_CONFIG_OPTIONS,
GEMINI_IMAGE_MODEL_CAPABILITIES,
GEMINI_IMAGE_CONFIG_OPTIONS,
GEMINI_LIVE_MODEL_CAPABILITIES,
GEMINI_LIVE_CONFIG_OPTIONS,
GEMINI_INTERACTION_MODEL_CAPABILITIES,
GEMINI_INTERACTION_CONFIG_OPTIONS,
GEMINI_TEXT_MODEL_CAPABILITIES,
GEMINI_TEXT_CONFIG_OPTIONS,
getInteractionModelCapabilities,
getInteractionModelConfigOptions,
getEmbeddingModelCapabilities,
getEmbeddingModelConfigOptions,
getEmbeddingModelInputLimits,
getAudioModelCapabilities,
getAudioModelConfigOptions,
getAudioModelSpeakerLimits,
getAudioVoiceNames,
getAudioVoiceOptions,
getMusicModelAttachmentLimits,
getMusicModelCapabilities,
getMusicModelConfigOptions,
getVideoModelAttachmentLimits,
getVideoModelCapabilities,
getVideoModelConfigOptions,
getLiveModelCapabilities,
getLiveModelFeatureFlags,
getImageModelAttachmentLimits,
getTextModelAttachmentLimits,
getTextModelCapabilities,
getTextModelConfigOptions,
} from "@villutur/gemini-ai-lib/model-capabilities";Image-capable projects can also keep their own policy layer while reusing the shared image service and model list.
For Gemini image models, the Gemini API returns the model's native image
format. Do not assume outputMimeType is supported there. The library keeps
that behavior model-aware and only forwards explicit output-format controls to
Imagen-style routes where they are actually supported.
GeminiImageService now has two layers:
generateContent(...): a raw Gemini image-model wrapper aroundmodels.generateContent(...)that returns the SDK response unchangedgenerateImage(...)/generateImageFromPrompt(...): ergonomic helpers that normalize generated images into the existingGenerateImageResultstructure
import { GEMINI_IMAGE_MODELS, GeminiImageService } from "@villutur/gemini-ai-lib";
const imageService = new GeminiImageService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await imageService.generateImageFromPrompt("Create a clean geometric product logo.", {
model: "gemini-3.1-flash-image-preview",
aspectRatio: "1:1",
});If you want the transparent Gemini image-model response instead of the
normalized helper result, use generateContent(...) directly:
import {
GeminiAttachmentHelper,
GeminiImageService,
type ContentListUnion,
type GenerateContentResponse,
} from "@villutur/gemini-ai-lib";
const imageService = new GeminiImageService({
apiKey: process.env.GEMINI_API_KEY,
});
const contents: ContentListUnion = [
{
role: "user",
parts: [
{ text: "Create a clean product-shot style render." },
GeminiAttachmentHelper.CreateFromBuffer(referenceBuffer, "image/png"),
],
},
];
const response: GenerateContentResponse = await imageService.generateContent(contents, {
model: "gemini-3.1-flash-image-preview",
config: {
responseModalities: ["IMAGE", "TEXT"],
imageConfig: {
aspectRatio: "1:1",
imageSize: "1K",
},
},
});If you need explicit output-format control, use an Imagen-capable model:
const result = await imageService.generateImageFromPrompt("Create a clean geometric product logo.", {
model: "imagen-4.0-generate-001",
aspectRatio: "1:1",
outputMimeType: "image/png",
});Consumer-facing image capability metadata in this package is docs-first and conservative. In practice that means:
- Gemini image-model capability metadata does not advertise Imagen-only output format or compression controls
gemini-2.5-flash-imageis treated as a conservative1K-tier helper model even though prose docs may describe a broader approximate pixel ceiling- the raw
generateContent(...)wrapper remains SDK-aligned, whilegetImageModelCapabilities(...)andgetImageModelConfigOptions(...)stay optimized for app-facing UI and helper defaults
Video generation follows a long-running-operation flow:
import { GeminiVideoService } from "@villutur/gemini-ai-lib";
const videoService = new GeminiVideoService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await videoService.generateVideoFromPrompt("A cinematic drone shot over a snowy forest at sunrise.", {
model: "veo-3.1-fast-generate-preview",
aspectRatio: "16:9",
resolution: "1080p",
durationSeconds: 8,
generateAudio: true,
});
const firstVideo = result.generatedVideos[0]?.video;
if (firstVideo) {
await videoService.getClient().files.download({
file: firstVideo,
downloadPath: "./veo-output.mp4",
});
}If you want to build multimodal request parts yourself, GeminiAttachmentHelper
can convert buffers or files into Gemini Part objects:
import { readFile } from "node:fs/promises";
import { GeminiAttachmentHelper, GeminiImageService } from "@villutur/gemini-ai-lib";
const imageBuffer = await readFile("./reference.png");
const referencePart = GeminiAttachmentHelper.CreateFromBuffer(imageBuffer, "image/png");
const imageService = new GeminiImageService({
apiKey: process.env.GEMINI_API_KEY,
});
const result = await imageService.generateImage(
[{ text: "Create a product-shot style render that matches the reference image lighting." }, referencePart],
{
model: "gemini-3.1-flash-image-preview",
aspectRatio: "1:1",
},
);Environment Guidance
- Prefer
GEMINI_API_KEYfor server-side usage. NEXT_PUBLIC_GEMINI_API_KEYis treated as a deliberate browser-oriented fallback, not the default integration path.- Projects should not depend on a public Gemini key unless the user explicitly wants a browser-side integration and accepts that tradeoff.
The base service now resolves keys in this order:
- explicit
apiKeypassed in code GEMINI_API_KEYNEXT_PUBLIC_GEMINI_API_KEY
When examples or app-level model catalogs need refreshing, use Google's official Gemini model index as the source of truth:
- https://ai.google.dev/gemini-api/docs/models.md.txt
Public Contract
- Import from the package name, not from
src/or sibling repo paths. - Keep generic Gemini SDK concerns here.
- This library exports model limits and config-option metadata; consuming projects still own final UI policy, validation, and product-specific constraints.
- Keep reusable history shaping and portable chat-session helpers here when they are app-agnostic.
- Keep logger contracts and lifecycle emission generic here, while letting the consuming project own storage, retention, and log-history UI.
- Keep app-specific model allowlists, request validation, transport contracts, and user-facing error mapping in the consuming project.
Development
Install dependencies:
pnpm installBuild the package:
pnpm buildRun watch mode:
pnpm devRun typecheck only:
pnpm typecheckRelease Workflow
This package uses a repo-safe manual release flow:
pnpm release:bump --patch
pnpm release:publishFull release instructions, rollback guidance, and safety checks are documented in docs/release.md.
Repository Notes
- Public exports are defined in
src/index.ts. - Package entrypoints and build outputs are defined in
package.json. - The library currently ships both ESM and CJS output from
dist/. - Deferred library ideas and transport follow-ups live in
docs/future-work.md. - Repository-specific contributor guidance lives in
AGENTS.md.
