@villutur/gemini-ai-lib

v0.6.6

Published

2 months ago

Reusable TypeScript wrappers and helpers for Google's Gemini API.

Downloads

313

0High
0Medium
0Low

villutur

gemini google-genai ai typescript sdk multimodal

gemini-ai-lib

Reusable TypeScript wrappers around the Gemini SDK for applications, tools, and libraries that want a cleaner typed integration surface.

This is my Gemini AI library/helper that I use in my own projects. I am sharing it in case someone else finds it useful.

If you want a very basic place to try the current functionality and see implementation examples, take a look at gemini-ai-lib-playground.

PRs are welcome.

Changes In 0.6.6

@google/genai is updated to ^2.6.0.
The package now documents Node.js >=20.0.0, matching the upstream SDK runtime requirement.

Changes In 0.6.5

gemini-3.5-flash is now included in the text and Interactions catalogs.
Text and chat helpers now default to gemini-3.5-flash; the previous gemini-3-flash-preview model remains available in the catalogs for compatibility.
Gemini 3.5 Flash uses thinkingLevel (minimal, low, medium, high) for thinking control. Google recommends leaving Gemini 3.x sampling parameters at their model defaults unless a consumer has a specific reason to override them.

Breaking Changes In 0.6.4

@google/genai is upgraded to ^2.1.0. Consumers should run their normal typecheck after upgrading because SDK type names and Interactions response shapes changed.
Interactions responses use the new steps schema. Read interaction.steps and stream step.* events instead of legacy outputs/content-delta shapes.
The text and Interactions catalogs now use gemini-3.1-flash-lite instead of gemini-3.1-flash-lite-preview.
The embedding catalog now uses gemini-embedding-2 instead of gemini-embedding-2-preview. Existing indexes still need a full reindex when switching between embedding model families.
GeminiLiveChatSession now defaults to gemini-3.1-flash-live-preview. The previous gemini-2.5-flash-native-audio-preview-12-2025 model remains in GEMINI_LIVE_MODELS for compatibility.
Gemini 3.1 Flash Live uses thinkingLevel (minimal, low, medium, high) instead of thinkingBudget; proactive audio and affective dialogue are not advertised for that model.

Purpose

Keep reusable Gemini SDK wiring out of individual projects
Provide small, composable services for text, embeddings, chat, image, music, audio, video, and live workflows
Let consuming projects own app-specific validation, model allowlists, route contracts, and user-facing error handling

Services

GeminiBaseService: shared client setup, API key resolution, and tool configuration
GeminiTextService: one-shot text generation helpers
GeminiEmbeddingService: text and multimodal embedding helpers
GeminiChatService: persistent multi-turn chat wrapper
GeminiAudioService: text-to-speech generation helpers
GeminiMusicService: non-realtime Lyria music generation helpers
GeminiImageService: image generation and SVG generation helpers
GeminiVideoService: Veo video generation and operation polling helpers
GeminiInteractionsService: raw Gemini Interactions API wrapper for models and agents
GeminiLiveChatSession: real-time live-session wrapper, currently client-side only

Helpers, Catalogs, and Metadata Exports

createGeminiThinkingConfig(...): reusable low-level thinking-config helper
createGeminiThinkingConfigForModel(...): model-aware helper that keeps thinkingLevel and thinkingBudget aligned with the selected Gemini model
normalizeGeminiResponseMetadata(...): reusable response-metadata normalizer for latency, finish reason, response ids, and usage payloads
GeminiAttachmentHelper: browser and server helpers for turning files and buffers into Gemini Part objects
structured logging contracts and logger adapters for injecting app-owned sinks into Gemini request flows
createGeminiLiveAudioWorkletModuleUrl(...): browser helper for creating a working AudioWorklet module URL for Gemini Live microphone capture
GEMINI_LIVE_AUDIO_WORKLET_SOURCE: bundled worklet source for consumers that want to self-host or inspect the processor
GEMINI_TEXT_MODELS: shared text-model list for consumer model pickers
GEMINI_TEXT_MODEL_DISPLAY_NAMES: user-facing labels for known text models
GEMINI_INTERACTION_MODELS: shared Interactions model list for consumer model pickers
GEMINI_INTERACTION_MODEL_DISPLAY_NAMES: user-facing labels for known Interactions models
GEMINI_INTERACTION_AGENTS: shared Interactions agent list, including Deep Research preview agents
GEMINI_INTERACTION_AGENT_DISPLAY_NAMES: user-facing labels for known Interactions agents
GEMINI_EMBEDDING_MODELS: shared embedding-model list for consumer model pickers
GEMINI_EMBEDDING_MODEL_DISPLAY_NAMES: user-facing labels for known embedding models
GEMINI_AUDIO_MODELS: shared audio/TTS model list for consumer model pickers
GEMINI_AUDIO_MODEL_DISPLAY_NAMES: user-facing labels for known audio/TTS models
GEMINI_AUDIO_VOICES: curated prebuilt Gemini voice names for audio/TTS pickers
GEMINI_AUDIO_VOICE_CATALOG: richer voice metadata including sample URLs and descriptive traits
GEMINI_MUSIC_MODELS: shared music-generation model list for consumer model pickers
GEMINI_MUSIC_MODEL_DISPLAY_NAMES: user-facing labels for known music models
GEMINI_VIDEO_MODELS: shared video-generation model list for consumer model pickers
GEMINI_VIDEO_MODEL_DISPLAY_NAMES: user-facing labels for known video models
GEMINI_IMAGE_MODELS: shared allowlist covering Gemini image models plus imagen-4.0-generate-001
GEMINI_LIVE_MODELS: shared live-model list for real-time voice/video flows
GEMINI_LIVE_MODEL_DISPLAY_NAMES: user-facing labels for known live models
GEMINI_IMAGE_MODEL_CAPABILITIES: model-aware image limits and supported config options for dynamic UIs
GEMINI_TEXT_MODEL_CAPABILITIES: model-aware text limits and supported config options for dynamic UIs
GEMINI_INTERACTION_MODEL_CAPABILITIES: model-aware Interactions support hints and config options for dynamic UIs
GEMINI_EMBEDDING_MODEL_CAPABILITIES: model-aware embedding limits and supported config options for dynamic UIs
GEMINI_AUDIO_MODEL_CAPABILITIES: model-aware audio/TTS limits and supported config options for dynamic UIs
GEMINI_MUSIC_MODEL_CAPABILITIES: model-aware music limits and supported config options for dynamic UIs
GEMINI_VIDEO_MODEL_CAPABILITIES: model-aware video limits and supported config options for dynamic UIs
GEMINI_LIVE_MODEL_CAPABILITIES: model-aware live-session limits and supported config options for dynamic UIs
model-aware image handling that keeps Gemini image-model requests on their native output path while still allowing explicit output format control for Imagen where the API supports it

Installation

pnpm add @villutur/gemini-ai-lib

Supported Models

The package exports model catalogs for the currently supported and curated model IDs below.

Text

gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.5-pro
gemini-3.5-flash
gemini-3-flash-preview
gemini-3.1-flash-lite
gemini-3.1-pro-preview

Embeddings

gemini-embedding-001
gemini-embedding-2

Image

gemini-2.5-flash-image
gemini-3.1-flash-image-preview
gemini-3-pro-image-preview
imagen-4.0-generate-001

Audio

gemini-3.1-flash-tts-preview
gemini-2.5-flash-preview-tts
gemini-2.5-pro-preview-tts

Music

lyria-3-clip-preview
lyria-3-pro-preview

Video

veo-3.1-generate-preview
veo-3.1-fast-generate-preview

Live

gemini-3.1-flash-live-preview
gemini-2.5-flash-native-audio-preview-12-2025

Interactions

Models:

gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.5-pro
gemini-3.1-flash-lite
gemini-3.5-flash
gemini-3-flash-preview
gemini-3.1-pro-preview
lyria-3-clip-preview
lyria-3-pro-preview

Agents:

deep-research-pro-preview-12-2025
deep-research-preview-04-2026
deep-research-max-preview-04-2026

Usage

Server-side usage is the default and preferred integration path.

GeminiLiveChatSession is the main exception right now: it currently depends on browser APIs such as navigator.mediaDevices, AudioContext, and AudioWorkletNode, so it only works in client-side runtime contexts.

import { GeminiTextService } from "@villutur/gemini-ai-lib";

const textService = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
});

const response = await textService.generateTextString("Summarize the current rollout status in three bullets.", {
  model: "gemini-3.5-flash",
  systemInstruction: "Answer like a pragmatic product engineer. Be concise and explicit.",
  temperature: 0.4,
});

generateContent(...) is now the canonical one-shot text method. The older generateText(...) method is still available as a deprecated backward-compatible alias.

You can also import the most common Gemini SDK types directly from @villutur/gemini-ai-lib instead of mixing imports from @google/genai:

import {
  GeminiTextService,
  type ContentListUnion,
  type GenerateContentResponse,
  type Part,
} from "@villutur/gemini-ai-lib";

const parts: Part[] = [
  { text: "Summarize this rollout update in two bullets." },
  { text: "Team A completed the migration, but monitoring still needs follow-up." },
];

const contents: ContentListUnion = [{ role: "user", parts }];

const textService = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
});

const response: GenerateContentResponse = await textService.generateContent(contents, {
  model: "gemini-3.5-flash",
});

The root package also re-exports a curated image-focused subset of @google/genai types and values such as ImageConfig, GenerateImagesConfig, GenerateImagesResponse, GeneratedImage, GeneratedImageMask, and PersonGeneration.

The root package also re-exports the SDK Interactions type namespace plus library aliases such as GeminiInteraction, GeminiInteractionCreateParams, GeminiInteractionSSEEvent, and common Interactions step/content types.

Persistent text chat can layer on top of GeminiChatService while still letting the consuming project own validation and request shaping.

import { createGeminiTextChatHistory, GeminiChatService } from "@villutur/gemini-ai-lib";

const chatService = new GeminiChatService({
  apiKey: process.env.GEMINI_API_KEY,
  model: "gemini-3.5-flash",
  history: createGeminiTextChatHistory([
    {
      role: "user",
      text: "Give me the safest rollout order for Prompt Workbench v1.",
    },
    {
      role: "model",
      text: "Start with internal dogfooding, then expand to a small feature-flagged cohort.",
    },
  ]),
});

const text = await chatService.sendMessageString("Now add the top two risks and a rollback trigger.");

GeminiInteractionsService is a thin wrapper around the Gemini Interactions API. It returns SDK responses unchanged and leaves session storage, tool execution loops, retention UI, and user-facing error mapping to the consuming app. The Interactions API is beta, so schemas may change.

Interaction objects are stored by default by the Gemini API (store=true) so they can be retrieved, continued with previous_interaction_id, or run in the background. Set store: false when a consumer explicitly wants to opt out, but stateful continuation and background behavior depend on stored interactions. For Gemini 3.5 Flash, use Interactions generationConfig.thinkingLevel (minimal, low, medium, or high) for thinking effort. Function results must match the preceding function-call id, name, and response count.

import { GeminiInteractionsService, type GeminiInteractionCreateParams } from "@villutur/gemini-ai-lib";

const interactions = new GeminiInteractionsService({
  apiKey: process.env.GEMINI_API_KEY,
});

const first = await interactions.create({
  model: "gemini-3.5-flash",
  input: "Give me a one-sentence project risk summary.",
} satisfies GeminiInteractionCreateParams);

const followUp = await interactions.create({
  model: "gemini-3.5-flash",
  previous_interaction_id: first.id,
  input: "Now suggest the safest next action.",
});

const retrieved = await interactions.get(followUp.id, { include_input: true });
const finalText = retrieved.steps
  .find((step) => step.type === "model_output")
  ?.content?.find((content) => content.type === "text")?.text;

const stream = await interactions.createStream({
  model: "gemini-3.5-flash",
  input: "Stream a concise status update.",
  stream: true,
});

for await (const event of stream) {
  console.log(event.event_type);
}

const research = await interactions.create({
  agent: "deep-research-pro-preview-12-2025",
  input: "Research current SVG editor automation approaches.",
  background: true,
});

await interactions.cancel(research.id);
console.log(retrieved.status, finalText);

Projects can also inject their own structured logger adapter when they want Gemini request lifecycle events to land in an app-owned sink.

import { GeminiTextService, type LoggerAdapter } from "@villutur/gemini-ai-lib";

const logger: LoggerAdapter = {
  log(event) {
    console.log(event.source, event.level, event.message, event.metadata);
  },
};

const textService = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
  logger,
});

Projects that own model policy in their own codebase can also reuse the library's thinking-config and response-metadata helpers without giving up control of route contracts or storage.

import {
  createGeminiThinkingConfigForModel,
  normalizeGeminiResponseMetadata,
  GeminiTextService,
} from "@villutur/gemini-ai-lib";

const service = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await service.generateContent("Compare the rollout risks in three bullets.", {
  model: "gemini-3.1-pro-preview",
  thinkingConfig: createGeminiThinkingConfigForModel("gemini-3.1-pro-preview", {
    includeThoughts: false,
  }),
});

const telemetry = normalizeGeminiResponseMetadata(result);

Consumers can also render text-model selectors directly from shared exports:

import { GEMINI_TEXT_MODELS, GEMINI_TEXT_MODEL_DISPLAY_NAMES } from "@villutur/gemini-ai-lib";

const options = GEMINI_TEXT_MODELS.map((model) => ({
  value: model,
  label: GEMINI_TEXT_MODEL_DISPLAY_NAMES[model],
}));

Interactions model and agent pickers can use separate catalogs:

import {
  GEMINI_INTERACTION_AGENTS,
  GEMINI_INTERACTION_MODELS,
  getInteractionAgentDisplayName,
  getInteractionModelDisplayName,
} from "@villutur/gemini-ai-lib";

const interactionModelOptions = GEMINI_INTERACTION_MODELS.map((model) => ({
  value: model,
  label: getInteractionModelDisplayName(model),
}));

const interactionAgentOptions = GEMINI_INTERACTION_AGENTS.map((agent) => ({
  value: agent,
  label: getInteractionAgentDisplayName(agent),
}));

Embedding model pickers can use the same pattern:

import { GEMINI_EMBEDDING_MODELS, getEmbeddingModelDisplayName } from "@villutur/gemini-ai-lib";

const embeddingOptions = GEMINI_EMBEDDING_MODELS.map((model) => ({
  value: model,
  label: getEmbeddingModelDisplayName(model),
}));

Audio/TTS model pickers can use the same pattern:

import { GEMINI_AUDIO_MODELS, getAudioModelDisplayName } from "@villutur/gemini-ai-lib";

const audioOptions = GEMINI_AUDIO_MODELS.map((model) => ({
  value: model,
  label: getAudioModelDisplayName(model),
}));

Audio/TTS voice pickers can also use the exported curated voice catalog:

import { GEMINI_AUDIO_VOICES, GeminiAudioService } from "@villutur/gemini-ai-lib";

const selectedVoice = GEMINI_AUDIO_VOICES[0] ?? "Kore";
const audioService = new GeminiAudioService();

const audioBuffer = await audioService.generateAudio("Hello from Gemini TTS.", undefined, {
  model: "gemini-3.1-flash-tts-preview",
  voiceName: selectedVoice,
});

Music model pickers can use the same pattern:

import { GEMINI_MUSIC_MODELS, getMusicModelDisplayName } from "@villutur/gemini-ai-lib";

const musicOptions = GEMINI_MUSIC_MODELS.map((model) => ({
  value: model,
  label: getMusicModelDisplayName(model),
}));

Video model pickers can use the same pattern:

import { GEMINI_VIDEO_MODELS, getVideoModelDisplayName } from "@villutur/gemini-ai-lib";

const videoOptions = GEMINI_VIDEO_MODELS.map((model) => ({
  value: model,
  label: getVideoModelDisplayName(model),
}));

Live model pickers can use the same pattern:

import { GEMINI_LIVE_MODELS, getLiveModelDisplayName } from "@villutur/gemini-ai-lib";

const liveOptions = GEMINI_LIVE_MODELS.map((model) => ({
  value: model,
  label: getLiveModelDisplayName(model),
}));

Consumers that need model-aware config dialogs can build controls directly from capability and option exports:

import { getImageModelCapabilities, getImageModelConfigOptions } from "@villutur/gemini-ai-lib";

const modelId = "imagen-4.0-generate-001";
const capabilities = getImageModelCapabilities(modelId);
const optionDescriptors = getImageModelConfigOptions(modelId);

// Example policy in consumer code:
// max attachment slots = model limit - attachments already reserved elsewhere
const reservedAttachmentSlots = 2;
const maxReferenceImages = capabilities.attachmentLimits.maxReferenceImages ?? 0;
const remainingAttachmentBudget = Math.max(0, maxReferenceImages - reservedAttachmentSlots);

Embedding consumers can also drive model-aware retrieval controls from shared capability exports:

import { getEmbeddingModelCapabilities, getEmbeddingModelConfigOptions } from "@villutur/gemini-ai-lib";

const embeddingModel = "gemini-embedding-2";
const embeddingCapabilities = getEmbeddingModelCapabilities(embeddingModel);
const embeddingOptionDescriptors = getEmbeddingModelConfigOptions(embeddingModel);

const supportsMultimodalRetrieval = embeddingCapabilities.inputLimits.supportsMultimodalInput;
const recommendedDimensions = embeddingCapabilities.outputLimits.recommendedOutputDimensions;

Embedding support in v1 uses the stable Gemini API embedContent(...) flow. Vertex-only options such as mimeType and autoTruncate are intentionally not part of the typed runtime API.

Audio/TTS consumers can also drive their config controls from model-aware capability exports:

import {
  getAudioModelCapabilities,
  getAudioModelConfigOptions,
  getAudioVoiceOptions,
} from "@villutur/gemini-ai-lib";

const audioModel = "gemini-3.1-flash-tts-preview";
const audioCapabilities = getAudioModelCapabilities(audioModel);
const audioOptionDescriptors = getAudioModelConfigOptions(audioModel);
const audioVoices = getAudioVoiceOptions();

const supportsDialogue = audioCapabilities.speakerLimits.supportsMultiSpeaker;
const maxSpeakers = audioCapabilities.speakerLimits.maxSpeakers ?? 1;
const defaultVoice = audioCapabilities.defaultVoiceName;

GeminiAudioService.generateAudio(...) defaults to gemini-3.1-flash-tts-preview. The model is documented as text input, audio output, 8,192 input tokens, 16,384 output tokens, multi-speaker TTS with up to two speakers, and Batch API capable.

The current @google/genai SDK surface exposes voiceName, languageCode, responseModalities, and multiSpeakerVoiceConfig for TTS request shaping. The Gemini docs also mention controls such as speaking rate, pitch, and volume gain, but those are not exported here until the SDK exposes a stable typed contract for them.

The exported voice catalog is consumer guidance metadata for building pickers and defaults. GenerateAudioOptions.voiceName intentionally remains a plain string so callers can stay forward-compatible with newly available voices.

GEMINI_LIVE_CONFIG_OPTIONS.voiceName uses the same curated voice-name list, so Live clients can build consistent voice pickers from the same exported catalog.

Music consumers can drive Lyria config controls from model-aware capability exports:

import { getMusicModelCapabilities, getMusicModelConfigOptions } from "@villutur/gemini-ai-lib";

const musicModel = "lyria-3-pro-preview";
const musicCapabilities = getMusicModelCapabilities(musicModel);
const musicOptionDescriptors = getMusicModelConfigOptions(musicModel);

const supportsImageGuidance = musicCapabilities.attachmentLimits.supportsImageInput;
const bestFor = musicCapabilities.outputLimits.bestFor;

Lyria music generation in v1 is intentionally based on the official stable generateContent(...) flow. Richer controls such as BPM, intensity, generate-lyrics, and vocal-type remain prompt-driven and are not part of the typed runtime API yet.

Video consumers can also drive Veo controls from model-aware capability exports:

import { getVideoModelCapabilities, getVideoModelConfigOptions } from "@villutur/gemini-ai-lib";

const videoModel = "veo-3.1-fast-generate-preview";
const videoCapabilities = getVideoModelCapabilities(videoModel);
const videoOptionDescriptors = getVideoModelConfigOptions(videoModel);

const maxReferenceImages = videoCapabilities.attachmentLimits.maxReferenceImages ?? 0;
const supportsVideoExtension = videoCapabilities.attachmentLimits.supportsVideoInput;

Video generation is long-running and operation-based. The video service normalizes operation polling, while downloading generated files remains under consumer control through the underlying SDK client.

The official Gemini video docs may describe more knobs over time, but this library only exports video config metadata for the stable @google/genai 2.x contract plus explicitly typed generateVideos(...) config fields.

Live-session UIs can use the same capability pattern:

import { getLiveModelCapabilities, getLiveModelConfigOptions } from "@villutur/gemini-ai-lib";

const liveModel = "gemini-3.1-flash-live-preview";
const liveCapabilities = getLiveModelCapabilities(liveModel);
const liveOptions = getLiveModelConfigOptions(liveModel);

Live sessions are currently client-side only because they depend on browser audio and media APIs. GeminiLiveChatSession also ships with a bundled working AudioWorklet, so you do not need to host your own /audio-processor.js just to get started:

import { GeminiLiveChatSession } from "@villutur/gemini-ai-lib";

const liveSession = new GeminiLiveChatSession({
  apiKey: process.env.NEXT_PUBLIC_GEMINI_API_KEY,
  model: "gemini-3.1-flash-live-preview",
  systemInstruction: "You are a concise voice assistant.",
  voiceName: "Aoede",
  thinkingLevel: "minimal",
  onSetupComplete() {
    console.log("Live session ready.");
  },
  onOutputTranscription(text, isFinal) {
    console.log("Model said:", text, isFinal ? "(final)" : "(partial)");
  },
  onError(error) {
    console.error("Live session error:", error);
  },
});

await liveSession.connect("Say hello and ask how you can help.");

If you want to manage the worklet module URL explicitly, you can use the exported helper:

import {
  createGeminiLiveAudioWorkletModuleUrl,
  GeminiLiveChatSession,
  revokeGeminiLiveAudioWorkletModuleUrl,
} from "@villutur/gemini-ai-lib";

const audioWorkletModulePath = createGeminiLiveAudioWorkletModuleUrl();

try {
  const liveSession = new GeminiLiveChatSession({
    apiKey: process.env.NEXT_PUBLIC_GEMINI_API_KEY,
    audioWorkletModulePath,
  });

  await liveSession.connect();
} finally {
  revokeGeminiLiveAudioWorkletModuleUrl(audioWorkletModulePath);
}

Embedding generation supports both text-only and multimodal inputs:

import { GeminiEmbeddingService } from "@villutur/gemini-ai-lib";

const embeddingService = new GeminiEmbeddingService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await embeddingService.embedText("How do I build a robust semantic search index?", {
  model: "gemini-embedding-001",
  taskType: "RETRIEVAL_QUERY",
  outputDimensionality: 768,
});

const firstVector = result.embedding;

You can also embed multiple text entries in one request:

const batchResult = await embeddingService.embedTexts(
  [
    "What is the meaning of life?",
    "What is the purpose of existence?",
    "How do I bake a cake?",
  ],
  {
    model: "gemini-embedding-001",
    taskType: "SEMANTIC_SIMILARITY",
  },
);

const vectors = batchResult.embeddings;

gemini-embedding-2 also supports multimodal embedding. One content entry with multiple parts returns one aggregated embedding, while multiple entries return multiple embeddings:

import { GeminiAttachmentHelper, GeminiEmbeddingService } from "@villutur/gemini-ai-lib";
import { readFile } from "node:fs/promises";

const imageBuffer = await readFile("./dog.png");
const imagePart = GeminiAttachmentHelper.CreateFromBuffer(imageBuffer, "image/png");

const embeddingService = new GeminiEmbeddingService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await embeddingService.embedContent(
  {
    parts: [{ text: "An image of a dog" }, imagePart],
  },
  {
    model: "gemini-embedding-2",
    taskType: "RETRIEVAL_DOCUMENT",
    outputDimensionality: 1536,
  },
);

const aggregatedVector = result.embedding;

Reduced dimensions such as 768 and 1536 can be a strong storage/latency tradeoff, but consumer-side normalization may still be appropriate for similarity-focused use cases. Switching between gemini-embedding-001 and gemini-embedding-2 requires reindexing because the vector spaces are not interchangeable.

Music generation uses Lyria through the stable generateContent(...) path:

import { GeminiMusicService } from "@villutur/gemini-ai-lib";

const musicService = new GeminiMusicService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await musicService.generateMusicFromPrompt(
  "Create a 30-second cheerful acoustic folk track with guitar and harmonica.",
  {
    model: "lyria-3-clip-preview",
  },
);

const firstClip = result.audioBuffer;
const description = result.text;

You can also guide Lyria with an image reference:

import { readFile } from "node:fs/promises";
import { GeminiAttachmentHelper, GeminiMusicService } from "@villutur/gemini-ai-lib";

const coverArt = await readFile("./reference-cover.jpg");
const imagePart = GeminiAttachmentHelper.CreateFromBuffer(coverArt, "image/jpeg");

const musicService = new GeminiMusicService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await musicService.generateMusicFromImage(
  imagePart,
  "Create warm cinematic music that matches the color and mood of the image.",
  {
    model: "lyria-3-pro-preview",
  },
);

Lyria RealTime is out of scope for this package surface today and is tracked in docs/future-work.md.

For lightweight UI/model-picker usage without importing runtime services, you can also import model catalogs directly from the subpath entry:

import {
  GEMINI_TEXT_MODELS,
  GEMINI_EMBEDDING_MODELS,
  GEMINI_AUDIO_MODELS,
  GEMINI_AUDIO_VOICES,
  GEMINI_AUDIO_VOICE_CATALOG,
  GEMINI_MUSIC_MODELS,
  GEMINI_VIDEO_MODELS,
  GEMINI_IMAGE_MODELS,
  GEMINI_LIVE_MODELS,
  GEMINI_INTERACTION_MODELS,
  GEMINI_INTERACTION_AGENTS,
  getInteractionAgentDisplayName,
  getInteractionModelDisplayName,
  getEmbeddingModelDisplayName,
  getAudioModelDisplayName,
  getAudioVoiceNames,
  getAudioVoiceOptions,
  getMusicModelDisplayName,
  getVideoModelDisplayName,
  getTextModelDisplayName,
  getLiveModelDisplayName,
} from "@villutur/gemini-ai-lib/model-catalogs";

Capability metadata is also available from a dedicated subpath entry:

import {
  GEMINI_EMBEDDING_MODEL_CAPABILITIES,
  GEMINI_EMBEDDING_CONFIG_OPTIONS,
  GEMINI_AUDIO_MODEL_CAPABILITIES,
  GEMINI_AUDIO_CONFIG_OPTIONS,
  GEMINI_AUDIO_VOICES,
  GEMINI_AUDIO_VOICE_CATALOG,
  GEMINI_MUSIC_MODEL_CAPABILITIES,
  GEMINI_MUSIC_CONFIG_OPTIONS,
  GEMINI_VIDEO_MODEL_CAPABILITIES,
  GEMINI_VIDEO_CONFIG_OPTIONS,
  GEMINI_IMAGE_MODEL_CAPABILITIES,
  GEMINI_IMAGE_CONFIG_OPTIONS,
  GEMINI_LIVE_MODEL_CAPABILITIES,
  GEMINI_LIVE_CONFIG_OPTIONS,
  GEMINI_INTERACTION_MODEL_CAPABILITIES,
  GEMINI_INTERACTION_CONFIG_OPTIONS,
  GEMINI_TEXT_MODEL_CAPABILITIES,
  GEMINI_TEXT_CONFIG_OPTIONS,
  getInteractionModelCapabilities,
  getInteractionModelConfigOptions,
  getEmbeddingModelCapabilities,
  getEmbeddingModelConfigOptions,
  getEmbeddingModelInputLimits,
  getAudioModelCapabilities,
  getAudioModelConfigOptions,
  getAudioModelSpeakerLimits,
  getAudioVoiceNames,
  getAudioVoiceOptions,
  getMusicModelAttachmentLimits,
  getMusicModelCapabilities,
  getMusicModelConfigOptions,
  getVideoModelAttachmentLimits,
  getVideoModelCapabilities,
  getVideoModelConfigOptions,
  getLiveModelCapabilities,
  getLiveModelFeatureFlags,
  getImageModelAttachmentLimits,
  getTextModelAttachmentLimits,
  getTextModelCapabilities,
  getTextModelConfigOptions,
} from "@villutur/gemini-ai-lib/model-capabilities";

Image-capable projects can also keep their own policy layer while reusing the shared image service and model list.

For Gemini image models, the Gemini API returns the model's native image format. Do not assume outputMimeType is supported there. The library keeps that behavior model-aware and only forwards explicit output-format controls to Imagen-style routes where they are actually supported.

GeminiImageService now has two layers:

generateContent(...): a raw Gemini image-model wrapper around models.generateContent(...) that returns the SDK response unchanged
generateImage(...) / generateImageFromPrompt(...): ergonomic helpers that normalize generated images into the existing GenerateImageResult structure

import { GEMINI_IMAGE_MODELS, GeminiImageService } from "@villutur/gemini-ai-lib";

const imageService = new GeminiImageService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await imageService.generateImageFromPrompt("Create a clean geometric product logo.", {
  model: "gemini-3.1-flash-image-preview",
  aspectRatio: "1:1",
});

If you want the transparent Gemini image-model response instead of the normalized helper result, use generateContent(...) directly:

import {
  GeminiAttachmentHelper,
  GeminiImageService,
  type ContentListUnion,
  type GenerateContentResponse,
} from "@villutur/gemini-ai-lib";

const imageService = new GeminiImageService({
  apiKey: process.env.GEMINI_API_KEY,
});

const contents: ContentListUnion = [
  {
    role: "user",
    parts: [
      { text: "Create a clean product-shot style render." },
      GeminiAttachmentHelper.CreateFromBuffer(referenceBuffer, "image/png"),
    ],
  },
];

const response: GenerateContentResponse = await imageService.generateContent(contents, {
  model: "gemini-3.1-flash-image-preview",
  config: {
    responseModalities: ["IMAGE", "TEXT"],
    imageConfig: {
      aspectRatio: "1:1",
      imageSize: "1K",
    },
  },
});

If you need explicit output-format control, use an Imagen-capable model:

const result = await imageService.generateImageFromPrompt("Create a clean geometric product logo.", {
  model: "imagen-4.0-generate-001",
  aspectRatio: "1:1",
  outputMimeType: "image/png",
});

Consumer-facing image capability metadata in this package is docs-first and conservative. In practice that means:

Gemini image-model capability metadata does not advertise Imagen-only output format or compression controls
gemini-2.5-flash-image is treated as a conservative 1K-tier helper model even though prose docs may describe a broader approximate pixel ceiling
the raw generateContent(...) wrapper remains SDK-aligned, while getImageModelCapabilities(...) and getImageModelConfigOptions(...) stay optimized for app-facing UI and helper defaults

Video generation follows a long-running-operation flow:

import { GeminiVideoService } from "@villutur/gemini-ai-lib";

const videoService = new GeminiVideoService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await videoService.generateVideoFromPrompt("A cinematic drone shot over a snowy forest at sunrise.", {
  model: "veo-3.1-fast-generate-preview",
  aspectRatio: "16:9",
  resolution: "1080p",
  durationSeconds: 8,
  generateAudio: true,
});

const firstVideo = result.generatedVideos[0]?.video;
if (firstVideo) {
  await videoService.getClient().files.download({
    file: firstVideo,
    downloadPath: "./veo-output.mp4",
  });
}

If you want to build multimodal request parts yourself, GeminiAttachmentHelper can convert buffers or files into Gemini Part objects:

import { readFile } from "node:fs/promises";
import { GeminiAttachmentHelper, GeminiImageService } from "@villutur/gemini-ai-lib";

const imageBuffer = await readFile("./reference.png");
const referencePart = GeminiAttachmentHelper.CreateFromBuffer(imageBuffer, "image/png");

const imageService = new GeminiImageService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await imageService.generateImage(
  [{ text: "Create a product-shot style render that matches the reference image lighting." }, referencePart],
  {
    model: "gemini-3.1-flash-image-preview",
    aspectRatio: "1:1",
  },
);

Environment Guidance

Prefer GEMINI_API_KEY for server-side usage.
NEXT_PUBLIC_GEMINI_API_KEY is treated as a deliberate browser-oriented fallback, not the default integration path.
Projects should not depend on a public Gemini key unless the user explicitly wants a browser-side integration and accepts that tradeoff.

The base service now resolves keys in this order:

explicit apiKey passed in code
GEMINI_API_KEY
NEXT_PUBLIC_GEMINI_API_KEY

When examples or app-level model catalogs need refreshing, use Google's official Gemini model index as the source of truth:

https://ai.google.dev/gemini-api/docs/models.md.txt

Public Contract

Import from the package name, not from src/ or sibling repo paths.
Keep generic Gemini SDK concerns here.
This library exports model limits and config-option metadata; consuming projects still own final UI policy, validation, and product-specific constraints.
Keep reusable history shaping and portable chat-session helpers here when they are app-agnostic.
Keep logger contracts and lifecycle emission generic here, while letting the consuming project own storage, retention, and log-history UI.
Keep app-specific model allowlists, request validation, transport contracts, and user-facing error mapping in the consuming project.

Development

Install dependencies:

pnpm install

Build the package:

pnpm build

Run watch mode:

pnpm dev

Run typecheck only:

pnpm typecheck

Release Workflow

This package uses a repo-safe manual release flow:

pnpm release:bump --patch
pnpm release:publish

Full release instructions, rollback guidance, and safety checks are documented in docs/release.md.

Repository Notes

Public exports are defined in src/index.ts.
Package entrypoints and build outputs are defined in package.json.
The library currently ships both ESM and CJS output from dist/.
Deferred library ideas and transport follow-ups live in docs/future-work.md.
Repository-specific contributor guidance lives in AGENTS.md.