npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@villutur/gemini-ai-lib

v0.6.6

Published

Reusable TypeScript wrappers and helpers for Google's Gemini API.

Downloads

1,155

Readme

gemini-ai-lib

Reusable TypeScript wrappers around the Gemini SDK for applications, tools, and libraries that want a cleaner typed integration surface.

This is my Gemini AI library/helper that I use in my own projects. I am sharing it in case someone else finds it useful.

If you want a very basic place to try the current functionality and see implementation examples, take a look at gemini-ai-lib-playground.

PRs are welcome.

Changes In 0.6.6

  • @google/genai is updated to ^2.6.0.
  • The package now documents Node.js >=20.0.0, matching the upstream SDK runtime requirement.

Changes In 0.6.5

  • gemini-3.5-flash is now included in the text and Interactions catalogs.
  • Text and chat helpers now default to gemini-3.5-flash; the previous gemini-3-flash-preview model remains available in the catalogs for compatibility.
  • Gemini 3.5 Flash uses thinkingLevel (minimal, low, medium, high) for thinking control. Google recommends leaving Gemini 3.x sampling parameters at their model defaults unless a consumer has a specific reason to override them.

Breaking Changes In 0.6.4

  • @google/genai is upgraded to ^2.1.0. Consumers should run their normal typecheck after upgrading because SDK type names and Interactions response shapes changed.
  • Interactions responses use the new steps schema. Read interaction.steps and stream step.* events instead of legacy outputs/content-delta shapes.
  • The text and Interactions catalogs now use gemini-3.1-flash-lite instead of gemini-3.1-flash-lite-preview.
  • The embedding catalog now uses gemini-embedding-2 instead of gemini-embedding-2-preview. Existing indexes still need a full reindex when switching between embedding model families.
  • GeminiLiveChatSession now defaults to gemini-3.1-flash-live-preview. The previous gemini-2.5-flash-native-audio-preview-12-2025 model remains in GEMINI_LIVE_MODELS for compatibility.
  • Gemini 3.1 Flash Live uses thinkingLevel (minimal, low, medium, high) instead of thinkingBudget; proactive audio and affective dialogue are not advertised for that model.

Purpose

  • Keep reusable Gemini SDK wiring out of individual projects
  • Provide small, composable services for text, embeddings, chat, image, music, audio, video, and live workflows
  • Let consuming projects own app-specific validation, model allowlists, route contracts, and user-facing error handling

Services

  • GeminiBaseService: shared client setup, API key resolution, and tool configuration
  • GeminiTextService: one-shot text generation helpers
  • GeminiEmbeddingService: text and multimodal embedding helpers
  • GeminiChatService: persistent multi-turn chat wrapper
  • GeminiAudioService: text-to-speech generation helpers
  • GeminiMusicService: non-realtime Lyria music generation helpers
  • GeminiImageService: image generation and SVG generation helpers
  • GeminiVideoService: Veo video generation and operation polling helpers
  • GeminiInteractionsService: raw Gemini Interactions API wrapper for models and agents
  • GeminiLiveChatSession: real-time live-session wrapper, currently client-side only

Helpers, Catalogs, and Metadata Exports

  • createGeminiThinkingConfig(...): reusable low-level thinking-config helper
  • createGeminiThinkingConfigForModel(...): model-aware helper that keeps thinkingLevel and thinkingBudget aligned with the selected Gemini model
  • normalizeGeminiResponseMetadata(...): reusable response-metadata normalizer for latency, finish reason, response ids, and usage payloads
  • GeminiAttachmentHelper: browser and server helpers for turning files and buffers into Gemini Part objects
  • structured logging contracts and logger adapters for injecting app-owned sinks into Gemini request flows
  • createGeminiLiveAudioWorkletModuleUrl(...): browser helper for creating a working AudioWorklet module URL for Gemini Live microphone capture
  • GEMINI_LIVE_AUDIO_WORKLET_SOURCE: bundled worklet source for consumers that want to self-host or inspect the processor
  • GEMINI_TEXT_MODELS: shared text-model list for consumer model pickers
  • GEMINI_TEXT_MODEL_DISPLAY_NAMES: user-facing labels for known text models
  • GEMINI_INTERACTION_MODELS: shared Interactions model list for consumer model pickers
  • GEMINI_INTERACTION_MODEL_DISPLAY_NAMES: user-facing labels for known Interactions models
  • GEMINI_INTERACTION_AGENTS: shared Interactions agent list, including Deep Research preview agents
  • GEMINI_INTERACTION_AGENT_DISPLAY_NAMES: user-facing labels for known Interactions agents
  • GEMINI_EMBEDDING_MODELS: shared embedding-model list for consumer model pickers
  • GEMINI_EMBEDDING_MODEL_DISPLAY_NAMES: user-facing labels for known embedding models
  • GEMINI_AUDIO_MODELS: shared audio/TTS model list for consumer model pickers
  • GEMINI_AUDIO_MODEL_DISPLAY_NAMES: user-facing labels for known audio/TTS models
  • GEMINI_AUDIO_VOICES: curated prebuilt Gemini voice names for audio/TTS pickers
  • GEMINI_AUDIO_VOICE_CATALOG: richer voice metadata including sample URLs and descriptive traits
  • GEMINI_MUSIC_MODELS: shared music-generation model list for consumer model pickers
  • GEMINI_MUSIC_MODEL_DISPLAY_NAMES: user-facing labels for known music models
  • GEMINI_VIDEO_MODELS: shared video-generation model list for consumer model pickers
  • GEMINI_VIDEO_MODEL_DISPLAY_NAMES: user-facing labels for known video models
  • GEMINI_IMAGE_MODELS: shared allowlist covering Gemini image models plus imagen-4.0-generate-001
  • GEMINI_LIVE_MODELS: shared live-model list for real-time voice/video flows
  • GEMINI_LIVE_MODEL_DISPLAY_NAMES: user-facing labels for known live models
  • GEMINI_IMAGE_MODEL_CAPABILITIES: model-aware image limits and supported config options for dynamic UIs
  • GEMINI_TEXT_MODEL_CAPABILITIES: model-aware text limits and supported config options for dynamic UIs
  • GEMINI_INTERACTION_MODEL_CAPABILITIES: model-aware Interactions support hints and config options for dynamic UIs
  • GEMINI_EMBEDDING_MODEL_CAPABILITIES: model-aware embedding limits and supported config options for dynamic UIs
  • GEMINI_AUDIO_MODEL_CAPABILITIES: model-aware audio/TTS limits and supported config options for dynamic UIs
  • GEMINI_MUSIC_MODEL_CAPABILITIES: model-aware music limits and supported config options for dynamic UIs
  • GEMINI_VIDEO_MODEL_CAPABILITIES: model-aware video limits and supported config options for dynamic UIs
  • GEMINI_LIVE_MODEL_CAPABILITIES: model-aware live-session limits and supported config options for dynamic UIs
  • model-aware image handling that keeps Gemini image-model requests on their native output path while still allowing explicit output format control for Imagen where the API supports it

Installation

pnpm add @villutur/gemini-ai-lib

Supported Models

The package exports model catalogs for the currently supported and curated model IDs below.

Text

  • gemini-2.5-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-pro
  • gemini-3.5-flash
  • gemini-3-flash-preview
  • gemini-3.1-flash-lite
  • gemini-3.1-pro-preview

Embeddings

  • gemini-embedding-001
  • gemini-embedding-2

Image

  • gemini-2.5-flash-image
  • gemini-3.1-flash-image-preview
  • gemini-3-pro-image-preview
  • imagen-4.0-generate-001

Audio

  • gemini-3.1-flash-tts-preview
  • gemini-2.5-flash-preview-tts
  • gemini-2.5-pro-preview-tts

Music

  • lyria-3-clip-preview
  • lyria-3-pro-preview

Video

  • veo-3.1-generate-preview
  • veo-3.1-fast-generate-preview

Live

  • gemini-3.1-flash-live-preview
  • gemini-2.5-flash-native-audio-preview-12-2025

Interactions

Models:

  • gemini-2.5-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-pro
  • gemini-3.1-flash-lite
  • gemini-3.5-flash
  • gemini-3-flash-preview
  • gemini-3.1-pro-preview
  • lyria-3-clip-preview
  • lyria-3-pro-preview

Agents:

  • deep-research-pro-preview-12-2025
  • deep-research-preview-04-2026
  • deep-research-max-preview-04-2026

Usage

Server-side usage is the default and preferred integration path.

GeminiLiveChatSession is the main exception right now: it currently depends on browser APIs such as navigator.mediaDevices, AudioContext, and AudioWorkletNode, so it only works in client-side runtime contexts.

import { GeminiTextService } from "@villutur/gemini-ai-lib";

const textService = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
});

const response = await textService.generateTextString("Summarize the current rollout status in three bullets.", {
  model: "gemini-3.5-flash",
  systemInstruction: "Answer like a pragmatic product engineer. Be concise and explicit.",
  temperature: 0.4,
});

generateContent(...) is now the canonical one-shot text method. The older generateText(...) method is still available as a deprecated backward-compatible alias.

You can also import the most common Gemini SDK types directly from @villutur/gemini-ai-lib instead of mixing imports from @google/genai:

import {
  GeminiTextService,
  type ContentListUnion,
  type GenerateContentResponse,
  type Part,
} from "@villutur/gemini-ai-lib";

const parts: Part[] = [
  { text: "Summarize this rollout update in two bullets." },
  { text: "Team A completed the migration, but monitoring still needs follow-up." },
];

const contents: ContentListUnion = [{ role: "user", parts }];

const textService = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
});

const response: GenerateContentResponse = await textService.generateContent(contents, {
  model: "gemini-3.5-flash",
});

The root package also re-exports a curated image-focused subset of @google/genai types and values such as ImageConfig, GenerateImagesConfig, GenerateImagesResponse, GeneratedImage, GeneratedImageMask, and PersonGeneration.

The root package also re-exports the SDK Interactions type namespace plus library aliases such as GeminiInteraction, GeminiInteractionCreateParams, GeminiInteractionSSEEvent, and common Interactions step/content types.

Persistent text chat can layer on top of GeminiChatService while still letting the consuming project own validation and request shaping.

import { createGeminiTextChatHistory, GeminiChatService } from "@villutur/gemini-ai-lib";

const chatService = new GeminiChatService({
  apiKey: process.env.GEMINI_API_KEY,
  model: "gemini-3.5-flash",
  history: createGeminiTextChatHistory([
    {
      role: "user",
      text: "Give me the safest rollout order for Prompt Workbench v1.",
    },
    {
      role: "model",
      text: "Start with internal dogfooding, then expand to a small feature-flagged cohort.",
    },
  ]),
});

const text = await chatService.sendMessageString("Now add the top two risks and a rollback trigger.");

GeminiInteractionsService is a thin wrapper around the Gemini Interactions API. It returns SDK responses unchanged and leaves session storage, tool execution loops, retention UI, and user-facing error mapping to the consuming app. The Interactions API is beta, so schemas may change.

Interaction objects are stored by default by the Gemini API (store=true) so they can be retrieved, continued with previous_interaction_id, or run in the background. Set store: false when a consumer explicitly wants to opt out, but stateful continuation and background behavior depend on stored interactions. For Gemini 3.5 Flash, use Interactions generationConfig.thinkingLevel (minimal, low, medium, or high) for thinking effort. Function results must match the preceding function-call id, name, and response count.

import { GeminiInteractionsService, type GeminiInteractionCreateParams } from "@villutur/gemini-ai-lib";

const interactions = new GeminiInteractionsService({
  apiKey: process.env.GEMINI_API_KEY,
});

const first = await interactions.create({
  model: "gemini-3.5-flash",
  input: "Give me a one-sentence project risk summary.",
} satisfies GeminiInteractionCreateParams);

const followUp = await interactions.create({
  model: "gemini-3.5-flash",
  previous_interaction_id: first.id,
  input: "Now suggest the safest next action.",
});

const retrieved = await interactions.get(followUp.id, { include_input: true });
const finalText = retrieved.steps
  .find((step) => step.type === "model_output")
  ?.content?.find((content) => content.type === "text")?.text;

const stream = await interactions.createStream({
  model: "gemini-3.5-flash",
  input: "Stream a concise status update.",
  stream: true,
});

for await (const event of stream) {
  console.log(event.event_type);
}

const research = await interactions.create({
  agent: "deep-research-pro-preview-12-2025",
  input: "Research current SVG editor automation approaches.",
  background: true,
});

await interactions.cancel(research.id);
console.log(retrieved.status, finalText);

Projects can also inject their own structured logger adapter when they want Gemini request lifecycle events to land in an app-owned sink.

import { GeminiTextService, type LoggerAdapter } from "@villutur/gemini-ai-lib";

const logger: LoggerAdapter = {
  log(event) {
    console.log(event.source, event.level, event.message, event.metadata);
  },
};

const textService = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
  logger,
});

Projects that own model policy in their own codebase can also reuse the library's thinking-config and response-metadata helpers without giving up control of route contracts or storage.

import {
  createGeminiThinkingConfigForModel,
  normalizeGeminiResponseMetadata,
  GeminiTextService,
} from "@villutur/gemini-ai-lib";

const service = new GeminiTextService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await service.generateContent("Compare the rollout risks in three bullets.", {
  model: "gemini-3.1-pro-preview",
  thinkingConfig: createGeminiThinkingConfigForModel("gemini-3.1-pro-preview", {
    includeThoughts: false,
  }),
});

const telemetry = normalizeGeminiResponseMetadata(result);

Consumers can also render text-model selectors directly from shared exports:

import { GEMINI_TEXT_MODELS, GEMINI_TEXT_MODEL_DISPLAY_NAMES } from "@villutur/gemini-ai-lib";

const options = GEMINI_TEXT_MODELS.map((model) => ({
  value: model,
  label: GEMINI_TEXT_MODEL_DISPLAY_NAMES[model],
}));

Interactions model and agent pickers can use separate catalogs:

import {
  GEMINI_INTERACTION_AGENTS,
  GEMINI_INTERACTION_MODELS,
  getInteractionAgentDisplayName,
  getInteractionModelDisplayName,
} from "@villutur/gemini-ai-lib";

const interactionModelOptions = GEMINI_INTERACTION_MODELS.map((model) => ({
  value: model,
  label: getInteractionModelDisplayName(model),
}));

const interactionAgentOptions = GEMINI_INTERACTION_AGENTS.map((agent) => ({
  value: agent,
  label: getInteractionAgentDisplayName(agent),
}));

Embedding model pickers can use the same pattern:

import { GEMINI_EMBEDDING_MODELS, getEmbeddingModelDisplayName } from "@villutur/gemini-ai-lib";

const embeddingOptions = GEMINI_EMBEDDING_MODELS.map((model) => ({
  value: model,
  label: getEmbeddingModelDisplayName(model),
}));

Audio/TTS model pickers can use the same pattern:

import { GEMINI_AUDIO_MODELS, getAudioModelDisplayName } from "@villutur/gemini-ai-lib";

const audioOptions = GEMINI_AUDIO_MODELS.map((model) => ({
  value: model,
  label: getAudioModelDisplayName(model),
}));

Audio/TTS voice pickers can also use the exported curated voice catalog:

import { GEMINI_AUDIO_VOICES, GeminiAudioService } from "@villutur/gemini-ai-lib";

const selectedVoice = GEMINI_AUDIO_VOICES[0] ?? "Kore";
const audioService = new GeminiAudioService();

const audioBuffer = await audioService.generateAudio("Hello from Gemini TTS.", undefined, {
  model: "gemini-3.1-flash-tts-preview",
  voiceName: selectedVoice,
});

Music model pickers can use the same pattern:

import { GEMINI_MUSIC_MODELS, getMusicModelDisplayName } from "@villutur/gemini-ai-lib";

const musicOptions = GEMINI_MUSIC_MODELS.map((model) => ({
  value: model,
  label: getMusicModelDisplayName(model),
}));

Video model pickers can use the same pattern:

import { GEMINI_VIDEO_MODELS, getVideoModelDisplayName } from "@villutur/gemini-ai-lib";

const videoOptions = GEMINI_VIDEO_MODELS.map((model) => ({
  value: model,
  label: getVideoModelDisplayName(model),
}));

Live model pickers can use the same pattern:

import { GEMINI_LIVE_MODELS, getLiveModelDisplayName } from "@villutur/gemini-ai-lib";

const liveOptions = GEMINI_LIVE_MODELS.map((model) => ({
  value: model,
  label: getLiveModelDisplayName(model),
}));

Consumers that need model-aware config dialogs can build controls directly from capability and option exports:

import { getImageModelCapabilities, getImageModelConfigOptions } from "@villutur/gemini-ai-lib";

const modelId = "imagen-4.0-generate-001";
const capabilities = getImageModelCapabilities(modelId);
const optionDescriptors = getImageModelConfigOptions(modelId);

// Example policy in consumer code:
// max attachment slots = model limit - attachments already reserved elsewhere
const reservedAttachmentSlots = 2;
const maxReferenceImages = capabilities.attachmentLimits.maxReferenceImages ?? 0;
const remainingAttachmentBudget = Math.max(0, maxReferenceImages - reservedAttachmentSlots);

Embedding consumers can also drive model-aware retrieval controls from shared capability exports:

import { getEmbeddingModelCapabilities, getEmbeddingModelConfigOptions } from "@villutur/gemini-ai-lib";

const embeddingModel = "gemini-embedding-2";
const embeddingCapabilities = getEmbeddingModelCapabilities(embeddingModel);
const embeddingOptionDescriptors = getEmbeddingModelConfigOptions(embeddingModel);

const supportsMultimodalRetrieval = embeddingCapabilities.inputLimits.supportsMultimodalInput;
const recommendedDimensions = embeddingCapabilities.outputLimits.recommendedOutputDimensions;

Embedding support in v1 uses the stable Gemini API embedContent(...) flow. Vertex-only options such as mimeType and autoTruncate are intentionally not part of the typed runtime API.

Audio/TTS consumers can also drive their config controls from model-aware capability exports:

import {
  getAudioModelCapabilities,
  getAudioModelConfigOptions,
  getAudioVoiceOptions,
} from "@villutur/gemini-ai-lib";

const audioModel = "gemini-3.1-flash-tts-preview";
const audioCapabilities = getAudioModelCapabilities(audioModel);
const audioOptionDescriptors = getAudioModelConfigOptions(audioModel);
const audioVoices = getAudioVoiceOptions();

const supportsDialogue = audioCapabilities.speakerLimits.supportsMultiSpeaker;
const maxSpeakers = audioCapabilities.speakerLimits.maxSpeakers ?? 1;
const defaultVoice = audioCapabilities.defaultVoiceName;

GeminiAudioService.generateAudio(...) defaults to gemini-3.1-flash-tts-preview. The model is documented as text input, audio output, 8,192 input tokens, 16,384 output tokens, multi-speaker TTS with up to two speakers, and Batch API capable.

The current @google/genai SDK surface exposes voiceName, languageCode, responseModalities, and multiSpeakerVoiceConfig for TTS request shaping. The Gemini docs also mention controls such as speaking rate, pitch, and volume gain, but those are not exported here until the SDK exposes a stable typed contract for them.

The exported voice catalog is consumer guidance metadata for building pickers and defaults. GenerateAudioOptions.voiceName intentionally remains a plain string so callers can stay forward-compatible with newly available voices.

GEMINI_LIVE_CONFIG_OPTIONS.voiceName uses the same curated voice-name list, so Live clients can build consistent voice pickers from the same exported catalog.

Music consumers can drive Lyria config controls from model-aware capability exports:

import { getMusicModelCapabilities, getMusicModelConfigOptions } from "@villutur/gemini-ai-lib";

const musicModel = "lyria-3-pro-preview";
const musicCapabilities = getMusicModelCapabilities(musicModel);
const musicOptionDescriptors = getMusicModelConfigOptions(musicModel);

const supportsImageGuidance = musicCapabilities.attachmentLimits.supportsImageInput;
const bestFor = musicCapabilities.outputLimits.bestFor;

Lyria music generation in v1 is intentionally based on the official stable generateContent(...) flow. Richer controls such as BPM, intensity, generate-lyrics, and vocal-type remain prompt-driven and are not part of the typed runtime API yet.

Video consumers can also drive Veo controls from model-aware capability exports:

import { getVideoModelCapabilities, getVideoModelConfigOptions } from "@villutur/gemini-ai-lib";

const videoModel = "veo-3.1-fast-generate-preview";
const videoCapabilities = getVideoModelCapabilities(videoModel);
const videoOptionDescriptors = getVideoModelConfigOptions(videoModel);

const maxReferenceImages = videoCapabilities.attachmentLimits.maxReferenceImages ?? 0;
const supportsVideoExtension = videoCapabilities.attachmentLimits.supportsVideoInput;

Video generation is long-running and operation-based. The video service normalizes operation polling, while downloading generated files remains under consumer control through the underlying SDK client.

The official Gemini video docs may describe more knobs over time, but this library only exports video config metadata for the stable @google/genai 2.x contract plus explicitly typed generateVideos(...) config fields.

Live-session UIs can use the same capability pattern:

import { getLiveModelCapabilities, getLiveModelConfigOptions } from "@villutur/gemini-ai-lib";

const liveModel = "gemini-3.1-flash-live-preview";
const liveCapabilities = getLiveModelCapabilities(liveModel);
const liveOptions = getLiveModelConfigOptions(liveModel);

Live sessions are currently client-side only because they depend on browser audio and media APIs. GeminiLiveChatSession also ships with a bundled working AudioWorklet, so you do not need to host your own /audio-processor.js just to get started:

import { GeminiLiveChatSession } from "@villutur/gemini-ai-lib";

const liveSession = new GeminiLiveChatSession({
  apiKey: process.env.NEXT_PUBLIC_GEMINI_API_KEY,
  model: "gemini-3.1-flash-live-preview",
  systemInstruction: "You are a concise voice assistant.",
  voiceName: "Aoede",
  thinkingLevel: "minimal",
  onSetupComplete() {
    console.log("Live session ready.");
  },
  onOutputTranscription(text, isFinal) {
    console.log("Model said:", text, isFinal ? "(final)" : "(partial)");
  },
  onError(error) {
    console.error("Live session error:", error);
  },
});

await liveSession.connect("Say hello and ask how you can help.");

If you want to manage the worklet module URL explicitly, you can use the exported helper:

import {
  createGeminiLiveAudioWorkletModuleUrl,
  GeminiLiveChatSession,
  revokeGeminiLiveAudioWorkletModuleUrl,
} from "@villutur/gemini-ai-lib";

const audioWorkletModulePath = createGeminiLiveAudioWorkletModuleUrl();

try {
  const liveSession = new GeminiLiveChatSession({
    apiKey: process.env.NEXT_PUBLIC_GEMINI_API_KEY,
    audioWorkletModulePath,
  });

  await liveSession.connect();
} finally {
  revokeGeminiLiveAudioWorkletModuleUrl(audioWorkletModulePath);
}

Embedding generation supports both text-only and multimodal inputs:

import { GeminiEmbeddingService } from "@villutur/gemini-ai-lib";

const embeddingService = new GeminiEmbeddingService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await embeddingService.embedText("How do I build a robust semantic search index?", {
  model: "gemini-embedding-001",
  taskType: "RETRIEVAL_QUERY",
  outputDimensionality: 768,
});

const firstVector = result.embedding;

You can also embed multiple text entries in one request:

const batchResult = await embeddingService.embedTexts(
  [
    "What is the meaning of life?",
    "What is the purpose of existence?",
    "How do I bake a cake?",
  ],
  {
    model: "gemini-embedding-001",
    taskType: "SEMANTIC_SIMILARITY",
  },
);

const vectors = batchResult.embeddings;

gemini-embedding-2 also supports multimodal embedding. One content entry with multiple parts returns one aggregated embedding, while multiple entries return multiple embeddings:

import { GeminiAttachmentHelper, GeminiEmbeddingService } from "@villutur/gemini-ai-lib";
import { readFile } from "node:fs/promises";

const imageBuffer = await readFile("./dog.png");
const imagePart = GeminiAttachmentHelper.CreateFromBuffer(imageBuffer, "image/png");

const embeddingService = new GeminiEmbeddingService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await embeddingService.embedContent(
  {
    parts: [{ text: "An image of a dog" }, imagePart],
  },
  {
    model: "gemini-embedding-2",
    taskType: "RETRIEVAL_DOCUMENT",
    outputDimensionality: 1536,
  },
);

const aggregatedVector = result.embedding;

Reduced dimensions such as 768 and 1536 can be a strong storage/latency tradeoff, but consumer-side normalization may still be appropriate for similarity-focused use cases. Switching between gemini-embedding-001 and gemini-embedding-2 requires reindexing because the vector spaces are not interchangeable.

Music generation uses Lyria through the stable generateContent(...) path:

import { GeminiMusicService } from "@villutur/gemini-ai-lib";

const musicService = new GeminiMusicService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await musicService.generateMusicFromPrompt(
  "Create a 30-second cheerful acoustic folk track with guitar and harmonica.",
  {
    model: "lyria-3-clip-preview",
  },
);

const firstClip = result.audioBuffer;
const description = result.text;

You can also guide Lyria with an image reference:

import { readFile } from "node:fs/promises";
import { GeminiAttachmentHelper, GeminiMusicService } from "@villutur/gemini-ai-lib";

const coverArt = await readFile("./reference-cover.jpg");
const imagePart = GeminiAttachmentHelper.CreateFromBuffer(coverArt, "image/jpeg");

const musicService = new GeminiMusicService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await musicService.generateMusicFromImage(
  imagePart,
  "Create warm cinematic music that matches the color and mood of the image.",
  {
    model: "lyria-3-pro-preview",
  },
);

Lyria RealTime is out of scope for this package surface today and is tracked in docs/future-work.md.

For lightweight UI/model-picker usage without importing runtime services, you can also import model catalogs directly from the subpath entry:

import {
  GEMINI_TEXT_MODELS,
  GEMINI_EMBEDDING_MODELS,
  GEMINI_AUDIO_MODELS,
  GEMINI_AUDIO_VOICES,
  GEMINI_AUDIO_VOICE_CATALOG,
  GEMINI_MUSIC_MODELS,
  GEMINI_VIDEO_MODELS,
  GEMINI_IMAGE_MODELS,
  GEMINI_LIVE_MODELS,
  GEMINI_INTERACTION_MODELS,
  GEMINI_INTERACTION_AGENTS,
  getInteractionAgentDisplayName,
  getInteractionModelDisplayName,
  getEmbeddingModelDisplayName,
  getAudioModelDisplayName,
  getAudioVoiceNames,
  getAudioVoiceOptions,
  getMusicModelDisplayName,
  getVideoModelDisplayName,
  getTextModelDisplayName,
  getLiveModelDisplayName,
} from "@villutur/gemini-ai-lib/model-catalogs";

Capability metadata is also available from a dedicated subpath entry:

import {
  GEMINI_EMBEDDING_MODEL_CAPABILITIES,
  GEMINI_EMBEDDING_CONFIG_OPTIONS,
  GEMINI_AUDIO_MODEL_CAPABILITIES,
  GEMINI_AUDIO_CONFIG_OPTIONS,
  GEMINI_AUDIO_VOICES,
  GEMINI_AUDIO_VOICE_CATALOG,
  GEMINI_MUSIC_MODEL_CAPABILITIES,
  GEMINI_MUSIC_CONFIG_OPTIONS,
  GEMINI_VIDEO_MODEL_CAPABILITIES,
  GEMINI_VIDEO_CONFIG_OPTIONS,
  GEMINI_IMAGE_MODEL_CAPABILITIES,
  GEMINI_IMAGE_CONFIG_OPTIONS,
  GEMINI_LIVE_MODEL_CAPABILITIES,
  GEMINI_LIVE_CONFIG_OPTIONS,
  GEMINI_INTERACTION_MODEL_CAPABILITIES,
  GEMINI_INTERACTION_CONFIG_OPTIONS,
  GEMINI_TEXT_MODEL_CAPABILITIES,
  GEMINI_TEXT_CONFIG_OPTIONS,
  getInteractionModelCapabilities,
  getInteractionModelConfigOptions,
  getEmbeddingModelCapabilities,
  getEmbeddingModelConfigOptions,
  getEmbeddingModelInputLimits,
  getAudioModelCapabilities,
  getAudioModelConfigOptions,
  getAudioModelSpeakerLimits,
  getAudioVoiceNames,
  getAudioVoiceOptions,
  getMusicModelAttachmentLimits,
  getMusicModelCapabilities,
  getMusicModelConfigOptions,
  getVideoModelAttachmentLimits,
  getVideoModelCapabilities,
  getVideoModelConfigOptions,
  getLiveModelCapabilities,
  getLiveModelFeatureFlags,
  getImageModelAttachmentLimits,
  getTextModelAttachmentLimits,
  getTextModelCapabilities,
  getTextModelConfigOptions,
} from "@villutur/gemini-ai-lib/model-capabilities";

Image-capable projects can also keep their own policy layer while reusing the shared image service and model list.

For Gemini image models, the Gemini API returns the model's native image format. Do not assume outputMimeType is supported there. The library keeps that behavior model-aware and only forwards explicit output-format controls to Imagen-style routes where they are actually supported.

GeminiImageService now has two layers:

  • generateContent(...): a raw Gemini image-model wrapper around models.generateContent(...) that returns the SDK response unchanged
  • generateImage(...) / generateImageFromPrompt(...): ergonomic helpers that normalize generated images into the existing GenerateImageResult structure
import { GEMINI_IMAGE_MODELS, GeminiImageService } from "@villutur/gemini-ai-lib";

const imageService = new GeminiImageService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await imageService.generateImageFromPrompt("Create a clean geometric product logo.", {
  model: "gemini-3.1-flash-image-preview",
  aspectRatio: "1:1",
});

If you want the transparent Gemini image-model response instead of the normalized helper result, use generateContent(...) directly:

import {
  GeminiAttachmentHelper,
  GeminiImageService,
  type ContentListUnion,
  type GenerateContentResponse,
} from "@villutur/gemini-ai-lib";

const imageService = new GeminiImageService({
  apiKey: process.env.GEMINI_API_KEY,
});

const contents: ContentListUnion = [
  {
    role: "user",
    parts: [
      { text: "Create a clean product-shot style render." },
      GeminiAttachmentHelper.CreateFromBuffer(referenceBuffer, "image/png"),
    ],
  },
];

const response: GenerateContentResponse = await imageService.generateContent(contents, {
  model: "gemini-3.1-flash-image-preview",
  config: {
    responseModalities: ["IMAGE", "TEXT"],
    imageConfig: {
      aspectRatio: "1:1",
      imageSize: "1K",
    },
  },
});

If you need explicit output-format control, use an Imagen-capable model:

const result = await imageService.generateImageFromPrompt("Create a clean geometric product logo.", {
  model: "imagen-4.0-generate-001",
  aspectRatio: "1:1",
  outputMimeType: "image/png",
});

Consumer-facing image capability metadata in this package is docs-first and conservative. In practice that means:

  • Gemini image-model capability metadata does not advertise Imagen-only output format or compression controls
  • gemini-2.5-flash-image is treated as a conservative 1K-tier helper model even though prose docs may describe a broader approximate pixel ceiling
  • the raw generateContent(...) wrapper remains SDK-aligned, while getImageModelCapabilities(...) and getImageModelConfigOptions(...) stay optimized for app-facing UI and helper defaults

Video generation follows a long-running-operation flow:

import { GeminiVideoService } from "@villutur/gemini-ai-lib";

const videoService = new GeminiVideoService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await videoService.generateVideoFromPrompt("A cinematic drone shot over a snowy forest at sunrise.", {
  model: "veo-3.1-fast-generate-preview",
  aspectRatio: "16:9",
  resolution: "1080p",
  durationSeconds: 8,
  generateAudio: true,
});

const firstVideo = result.generatedVideos[0]?.video;
if (firstVideo) {
  await videoService.getClient().files.download({
    file: firstVideo,
    downloadPath: "./veo-output.mp4",
  });
}

If you want to build multimodal request parts yourself, GeminiAttachmentHelper can convert buffers or files into Gemini Part objects:

import { readFile } from "node:fs/promises";
import { GeminiAttachmentHelper, GeminiImageService } from "@villutur/gemini-ai-lib";

const imageBuffer = await readFile("./reference.png");
const referencePart = GeminiAttachmentHelper.CreateFromBuffer(imageBuffer, "image/png");

const imageService = new GeminiImageService({
  apiKey: process.env.GEMINI_API_KEY,
});

const result = await imageService.generateImage(
  [{ text: "Create a product-shot style render that matches the reference image lighting." }, referencePart],
  {
    model: "gemini-3.1-flash-image-preview",
    aspectRatio: "1:1",
  },
);

Environment Guidance

  • Prefer GEMINI_API_KEY for server-side usage.
  • NEXT_PUBLIC_GEMINI_API_KEY is treated as a deliberate browser-oriented fallback, not the default integration path.
  • Projects should not depend on a public Gemini key unless the user explicitly wants a browser-side integration and accepts that tradeoff.

The base service now resolves keys in this order:

  1. explicit apiKey passed in code
  2. GEMINI_API_KEY
  3. NEXT_PUBLIC_GEMINI_API_KEY

When examples or app-level model catalogs need refreshing, use Google's official Gemini model index as the source of truth:

  • https://ai.google.dev/gemini-api/docs/models.md.txt

Public Contract

  • Import from the package name, not from src/ or sibling repo paths.
  • Keep generic Gemini SDK concerns here.
  • This library exports model limits and config-option metadata; consuming projects still own final UI policy, validation, and product-specific constraints.
  • Keep reusable history shaping and portable chat-session helpers here when they are app-agnostic.
  • Keep logger contracts and lifecycle emission generic here, while letting the consuming project own storage, retention, and log-history UI.
  • Keep app-specific model allowlists, request validation, transport contracts, and user-facing error mapping in the consuming project.

Development

Install dependencies:

pnpm install

Build the package:

pnpm build

Run watch mode:

pnpm dev

Run typecheck only:

pnpm typecheck

Release Workflow

This package uses a repo-safe manual release flow:

pnpm release:bump --patch
pnpm release:publish

Full release instructions, rollback guidance, and safety checks are documented in docs/release.md.

Repository Notes

  • Public exports are defined in src/index.ts.
  • Package entrypoints and build outputs are defined in package.json.
  • The library currently ships both ESM and CJS output from dist/.
  • Deferred library ideas and transport follow-ups live in docs/future-work.md.
  • Repository-specific contributor guidance lives in AGENTS.md.