multimodal-msg

v0.1.3

Published

4 months ago

Provider-agnostic multimodal message builder

0High
0Medium
0Low

silupanda

multimodal-msg

Provider-agnostic multimodal message builder for OpenAI, Anthropic, and Gemini APIs.

Description

Every major LLM provider accepts multimodal content in messages, but no two providers use the same format. OpenAI wraps images in image_url blocks with data URL encoding. Anthropic uses source objects with raw base64 and a separate media_type field. Gemini uses inlineData inside a parts array with different field names entirely. These differences extend across every content type: images, audio, documents, text, system messages, and role naming.

multimodal-msg solves this with a fluent builder API. Construct your multimodal message once, then render it for any supported provider. The package handles base64 encoding, data URL construction, MIME type detection, system message placement, and role mapping -- all with zero runtime dependencies and no I/O.

import { msg } from 'multimodal-msg'

const message = msg('user')
  .text('Describe this image.')
  .image(imageBuffer)

message.forOpenAI()    // OpenAI-formatted message object
message.forAnthropic() // Anthropic-formatted message object
message.forGemini()    // Gemini-formatted message object

Installation

npm install multimodal-msg

Requires Node.js 18 or later. Zero runtime dependencies.

Quick Start

Build a message and render for a provider

import { msg } from 'multimodal-msg'
import { readFileSync } from 'fs'

const image = readFileSync('./photo.png')

const message = msg('user')
  .text('What is in this image?')
  .image(image)

// Render for OpenAI
const openaiMsg = message.forOpenAI()
// {
//   role: 'user',
//   content: [
//     { type: 'text', text: 'What is in this image?' },
//     { type: 'image_url', image_url: { url: 'data:image/png;base64,...' } }
//   ]
// }

// Render for Anthropic
const anthropicMsg = message.forAnthropic()
// {
//   role: 'user',
//   content: [
//     { type: 'text', text: 'What is in this image?' },
//     { type: 'image', source: { type: 'base64', media_type: 'image/png', data: '...' } }
//   ]
// }

// Render for Gemini
const geminiMsg = message.forGemini()
// {
//   role: 'user',
//   parts: [
//     { text: 'What is in this image?' },
//     { inlineData: { mimeType: 'image/png', data: '...' } }
//   ]
// }

Build a conversation

import { conversation, msg } from 'multimodal-msg'

const image = readFileSync('./chart.png')

const conv = conversation()
  .system('You are a data analyst.')
  .user(
    msg('user')
      .text('What trend does this chart show?')
      .image(image)
  )
  .assistant('The chart shows a steady upward trend.')
  .user('Can you quantify the growth rate?')

conv.forOpenAI()    // system as first message in array
conv.forAnthropic() // system as top-level field, separate from messages
conv.forGemini()    // system as systemInstruction, assistant mapped to 'model' role

Convert between providers

import { convertMessage, convertConversation } from 'multimodal-msg'

// Convert a single message from OpenAI format to Anthropic format
const anthropicMsg = convertMessage(
  { role: 'user', content: 'Hello' },
  'openai',
  'anthropic'
)

// Convert an entire conversation from OpenAI format to Gemini format
const geminiConv = convertConversation(
  {
    messages: [
      { role: 'system', content: 'You are helpful.' },
      { role: 'user', content: 'Hi' }
    ]
  },
  'openai',
  'gemini'
)
// {
//   systemInstruction: { parts: [{ text: 'You are helpful.' }] },
//   contents: [{ role: 'user', parts: [{ text: 'Hi' }] }]
// }

Features

Three providers, one API -- Build messages once, render for OpenAI, Anthropic, or Gemini with a single method call.
Full multimodal support -- Text, images (Buffer, URL, base64, data URL), audio, and documents in a single fluent chain.
Automatic MIME detection -- Detects MIME types from Buffer magic bytes, file extensions, and data URL prefixes. Override with explicit mimeType when needed.
Automatic encoding -- Handles base64 encoding of Buffers, data URL construction for OpenAI, and raw base64 extraction for Anthropic and Gemini.
Conversation builder -- Constructs multi-turn conversations with correct system message placement per provider (inline message for OpenAI, top-level field for Anthropic, systemInstruction for Gemini).
Cross-provider conversion -- Convert existing provider-specific messages and conversations to any other provider format with convertMessage and convertConversation.
Provider-aware role mapping -- Maps assistant to model for Gemini, handles developer role from OpenAI, and maps system to user for Anthropic message arrays.
Graceful degradation -- Unsupported content types render as text fallbacks (e.g., audio on Anthropic renders as [Audio not supported by Anthropic], documents on OpenAI render as [Document: filename]).
Serializable internal format -- .toJSON() returns a provider-agnostic representation for logging, storage, and debugging.
Zero runtime dependencies -- Uses only built-in Node.js APIs (Buffer). No external packages.
Full TypeScript support -- Written in TypeScript with exported types for all interfaces, options, and provider output formats.

API Reference

`msg(role?): MessageBuilder`

Creates a new message builder.

Parameters:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | role | 'user' \| 'assistant' \| 'system' | 'user' | The message role. |

Returns: MessageBuilder

`MessageBuilder.text(text): MessageBuilder`

Adds a text content part to the message.

msg('user').text('Hello, world!')

`MessageBuilder.image(data, options?): MessageBuilder`

Adds an image content part. Accepts a Buffer, Uint8Array, URL string, data URL string, or raw base64 string.

// From a Buffer (MIME auto-detected from magic bytes)
msg('user').image(readFileSync('./photo.png'))

// From a URL
msg('user').image('https://example.com/photo.jpg')

// From a data URL
msg('user').image('data:image/gif;base64,R0lGODlh...')

// From a Buffer with explicit options
msg('user').image(buffer, { mimeType: 'image/webp', detail: 'high', filename: 'photo.webp' })

Options (ImageOptions):

| Option | Type | Description | |--------|------|-------------| | mimeType | string | Override auto-detected MIME type. | | detail | 'low' \| 'high' \| 'auto' | Image detail level (used by OpenAI). | | filename | string | Optional filename metadata. |

`MessageBuilder.audio(data, options?): MessageBuilder`

Adds an audio content part. Accepts a Buffer, Uint8Array, or base64 string.

msg('user').audio(audioBuffer, { mimeType: 'audio/mpeg', format: 'mp3' })

Options (AudioOptions):

| Option | Type | Description | |--------|------|-------------| | mimeType | string | Override auto-detected MIME type. | | format | string | Audio format identifier (e.g., 'mp3', 'wav'). Defaults to the subtype of the MIME type. |

`MessageBuilder.document(data, options?): MessageBuilder`

Adds a document content part. Accepts a Buffer, Uint8Array, URL string, or base64 string.

msg('user').document(pdfBuffer, { mimeType: 'application/pdf', filename: 'report.pdf' })

Options (DocumentOptions):

| Option | Type | Description | |--------|------|-------------| | mimeType | string | Override auto-detected MIME type. | | filename | string | Optional filename metadata. |

`MessageBuilder.forOpenAI(options?): OpenAIMessage`

Renders the message in OpenAI's API format.

const result = msg('user').text('Hello').forOpenAI()
// { role: 'user', content: 'Hello' }

For text-only messages, content is a plain string. For multimodal messages, content is an array of content blocks.

Options:

| Option | Type | Description | |--------|------|-------------| | detail | string | Override detail level for all image parts. |

`MessageBuilder.forAnthropic(): AnthropicMessage`

Renders the message in Anthropic's API format. System role messages are mapped to user role in the output.

const result = msg('user').text('Hello').forAnthropic()
// { role: 'user', content: 'Hello' }

`MessageBuilder.forGemini(): GeminiContent`

Renders the message in Gemini's API format. The assistant role is mapped to model.

const result = msg('assistant').text('Hello').forGemini()
// { role: 'model', parts: [{ text: 'Hello' }] }

`MessageBuilder.for(provider): OpenAIMessage | AnthropicMessage | GeminiContent`

Generic renderer that dispatches to the correct provider-specific method.

const provider = 'anthropic'
const result = msg('user').text('Hello').for(provider)

`MessageBuilder.toJSON(): InternalMessage`

Returns the provider-agnostic internal representation.

const internal = msg('user').text('Hello').toJSON()
// { role: 'user', parts: [{ type: 'text', text: 'Hello' }] }

`conversation(): ConversationBuilder`

Creates a new conversation builder.

`ConversationBuilder.system(text): ConversationBuilder`

Sets the system message for the conversation.

conversation().system('You are a helpful assistant.')

`ConversationBuilder.user(msg): ConversationBuilder`

Adds a user message. Accepts a string or a MessageBuilder instance.

conversation()
  .user('Hello!')
  .user(msg('user').text('Look at this.').image(buffer))

`ConversationBuilder.assistant(msg): ConversationBuilder`

Adds an assistant message. Accepts a string or a MessageBuilder instance.

conversation().assistant('I can help with that.')

`ConversationBuilder.forOpenAI(): OpenAIConversation`

Renders the conversation for OpenAI. The system message is included as the first message with role: 'system'.

const result = conversation()
  .system('You are helpful.')
  .user('Hi')
  .forOpenAI()
// {
//   messages: [
//     { role: 'system', content: 'You are helpful.' },
//     { role: 'user', content: 'Hi' }
//   ]
// }

`ConversationBuilder.forAnthropic(): AnthropicConversation`

Renders the conversation for Anthropic. The system message is extracted to a top-level system field, separate from the messages array.

const result = conversation()
  .system('You are helpful.')
  .user('Hi')
  .forAnthropic()
// {
//   system: 'You are helpful.',
//   messages: [{ role: 'user', content: 'Hi' }]
// }

`ConversationBuilder.forGemini(): GeminiConversation`

Renders the conversation for Gemini. The system message is placed in systemInstruction. The assistant role is mapped to model.

const result = conversation()
  .system('You are helpful.')
  .user('Hi')
  .assistant('Hello!')
  .forGemini()
// {
//   systemInstruction: { parts: [{ text: 'You are helpful.' }] },
//   contents: [
//     { role: 'user', parts: [{ text: 'Hi' }] },
//     { role: 'model', parts: [{ text: 'Hello!' }] }
//   ]
// }

`ConversationBuilder.for(provider): OpenAIConversation | AnthropicConversation | GeminiConversation`

Generic renderer that dispatches to the correct provider-specific method.

`ConversationBuilder.toJSON(): InternalConversation`

Returns the provider-agnostic internal representation.

const internal = conversation().system('sys').user('hi').toJSON()
// { system: 'sys', messages: [{ role: 'user', parts: [{ type: 'text', text: 'hi' }] }] }

`convertMessage(message, fromProvider, toProvider)`

Converts a single provider-specific message to another provider's format.

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | message | OpenAIMessage \| AnthropicMessage \| GeminiContent | The source message. | | fromProvider | Provider | The provider format of the source message. | | toProvider | Provider | The target provider format. |

Returns: OpenAIMessage | AnthropicMessage | GeminiContent

Handles conversion of all content types including text, images (both base64 and URL), audio, and documents. Parses provider-specific structures (OpenAI's image_url and input_audio, Anthropic's source blocks, Gemini's inlineData and fileData) into an internal representation, then renders for the target provider.

import { convertMessage } from 'multimodal-msg'

// OpenAI image message to Anthropic
const anthropicMsg = convertMessage(
  {
    role: 'user',
    content: [
      { type: 'image_url', image_url: { url: 'data:image/png;base64,abc123' } }
    ]
  },
  'openai',
  'anthropic'
)
// content: [{ type: 'image', source: { type: 'base64', media_type: 'image/png', data: 'abc123' } }]

`convertConversation(conversation, fromProvider, toProvider)`

Converts a full conversation from one provider's format to another.

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | conversation | OpenAIConversation \| AnthropicConversation \| GeminiConversation | The source conversation. | | fromProvider | Provider | The provider format of the source conversation. | | toProvider | Provider | The target provider format. |

Returns: OpenAIConversation | AnthropicConversation | GeminiConversation

Handles system message extraction and re-placement according to each provider's conventions. Converts all messages including their multimodal content parts.

import { convertConversation } from 'multimodal-msg'

const geminiConv = convertConversation(
  {
    messages: [
      { role: 'system', content: 'You are helpful.' },
      { role: 'user', content: 'Hi' },
      { role: 'assistant', content: 'Hello!' }
    ]
  },
  'openai',
  'gemini'
)
// {
//   systemInstruction: { parts: [{ text: 'You are helpful.' }] },
//   contents: [
//     { role: 'user', parts: [{ text: 'Hi' }] },
//     { role: 'model', parts: [{ text: 'Hello!' }] }
//   ]
// }

MIME Detection Utilities

`detectMimeFromBuffer(buf): string | null`

Detects MIME type from a Buffer's magic bytes. Supports JPEG, PNG, GIF, WebP, and PDF.

import { detectMimeFromBuffer } from 'multimodal-msg'

const buf = readFileSync('./photo.png')
detectMimeFromBuffer(buf) // 'image/png'

`detectMimeFromExtension(filename): string | null`

Detects MIME type from a file extension. Supports: .jpg, .jpeg, .png, .gif, .webp, .mp3, .wav, .ogg, .flac, .pdf, .txt.

import { detectMimeFromExtension } from 'multimodal-msg'

detectMimeFromExtension('photo.jpg')  // 'image/jpeg'
detectMimeFromExtension('clip.wav')   // 'audio/wav'
detectMimeFromExtension('file.xyz')   // null

`detectMimeFromDataUrl(dataUrl): string | null`

Extracts the MIME type from a data URL prefix.

import { detectMimeFromDataUrl } from 'multimodal-msg'

detectMimeFromDataUrl('data:image/gif;base64,R0lGODlh...') // 'image/gif'
detectMimeFromDataUrl('not-a-data-url')                      // null

`resolveSource(source, options?)`

Resolves a ContentSource (Buffer, Uint8Array, or string) into a normalized { data, mimeType, sourceType } object. This is the internal resolution function used by all builder methods.

Buffer/Uint8Array: Base64-encodes the data and detects MIME from magic bytes.
Data URL string: Extracts the base64 payload and parses the MIME type.
HTTP/HTTPS URL string: Passes through as-is with sourceType: 'url'.
Raw base64 string: Passes through as-is; requires mimeType or filename in options.

Throws an Error if MIME type cannot be determined and is not provided via options.

Configuration

Provider Output Format Reference

Each content type renders differently per provider:

| Content Type | OpenAI | Anthropic | Gemini | |-------------|--------|-----------|--------| | Text | { type: 'text', text } | { type: 'text', text } | { text } | | Image (base64) | { type: 'image_url', image_url: { url: 'data:...' } } | { type: 'image', source: { type: 'base64', media_type, data } } | { inlineData: { mimeType, data } } | | Image (URL) | { type: 'image_url', image_url: { url } } | { type: 'image', source: { type: 'url', url } } | { fileData: { mimeType, fileUri } } | | Audio | { type: 'input_audio', input_audio: { data, format } } | [text fallback] | { inlineData: { mimeType, data } } | | Document (base64) | [text fallback] | { type: 'document', source: { type: 'base64', media_type, data } } | { inlineData: { mimeType, data } } | | Document (URL) | [text fallback] | { type: 'document', source: { type: 'url', url } } | { fileData: { mimeType, fileUri } } |

System Message Handling

Each provider handles system messages differently. The ConversationBuilder and convertConversation account for these differences automatically:

| Provider | System Message Placement | |----------|-------------------------| | OpenAI | Inline as first message: { role: 'system', content: '...' } | | Anthropic | Top-level field: { system: '...', messages: [...] } | | Gemini | Separate instruction: { systemInstruction: { parts: [{ text: '...' }] }, contents: [...] } |

Role Mapping

| Internal Role | OpenAI | Anthropic | Gemini | |---------------|--------|-----------|--------| | user | user | user | user | | assistant | assistant | assistant | model | | system | system | user | user |

Error Handling

multimodal-msg throws standard Error instances in the following cases:

MIME Type Detection Failure

When a Buffer is provided without an explicit mimeType and the magic bytes do not match any known format:

const unknownBuffer = Buffer.from([0x00, 0x01, 0x02, 0x03])

// Throws: "Cannot determine MIME type from buffer. Provide options.mimeType."
msg('user').image(unknownBuffer)

// Fix: provide mimeType explicitly
msg('user').image(unknownBuffer, { mimeType: 'image/webp' })

Data URL Parse Failure

When a data URL string cannot be parsed for its MIME type:

// Throws: "Cannot parse MIME type from data URL."

Raw Base64 Without MIME Type

When a raw base64 string is provided without mimeType or filename:

// Throws: "Cannot determine MIME type from base64 string. Provide options.mimeType or options.filename."
msg('user').image('aGVsbG8=')

// Fix: provide mimeType or filename
msg('user').image('aGVsbG8=', { mimeType: 'image/png' })
msg('user').image('aGVsbG8=', { filename: 'photo.png' })

Unsupported Content Type Fallbacks

Rather than throwing, unsupported content types are rendered as text placeholders:

Audio on Anthropic: { type: 'text', text: '[Audio not supported by Anthropic]' }
Documents on OpenAI: { type: 'text', text: '[Document: report.pdf]' } (includes filename when available)

Advanced Usage

Builder Reuse

A single builder instance can render for multiple providers. The internal state is not modified by rendering:

const message = msg('user')
  .text('Analyze this image.')
  .image('https://example.com/chart.png', { detail: 'high' })

const openai = message.forOpenAI()
const anthropic = message.forAnthropic()
const gemini = message.forGemini()

Dynamic Provider Selection

Use the .for(provider) method when the target provider is determined at runtime:

function sendToLLM(provider: Provider, prompt: string, imageUrl: string) {
  const message = msg('user').text(prompt).image(imageUrl)
  return message.for(provider)
}

Multimodal Conversations with Mixed Content

Combine MessageBuilder instances with plain strings in a conversation:

const conv = conversation()
  .system('You are a document analyst.')
  .user(
    msg('user')
      .text('Summarize this PDF.')
      .document(pdfBuffer, { mimeType: 'application/pdf', filename: 'report.pdf' })
  )
  .assistant('The report covers Q4 financial results...')
  .user('What about the charts on page 3?')
  .user(
    msg('user')
      .text('Here is page 3.')
      .image(page3Screenshot)
  )

Cross-Provider Conversion with Multimodal Content

Convert messages containing images between providers. The converter handles format differences in base64 encoding, URL references, and content block structure:

// An OpenAI message with a base64 image
const openaiMsg = {
  role: 'user' as const,
  content: [
    { type: 'text', text: 'Describe this.' },
    { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } }
  ]
}

// Convert to Anthropic format
const anthropicMsg = convertMessage(openaiMsg, 'openai', 'anthropic')
// {
//   role: 'user',
//   content: [
//     { type: 'text', text: 'Describe this.' },
//     { type: 'image', source: { type: 'base64', media_type: 'image/png', data: 'iVBOR...' } }
//   ]
// }

Serialization and Logging

Use .toJSON() to capture the provider-agnostic internal representation for logging or storage. Reconstruct and render later for any provider:

const internal = msg('user')
  .text('Hello')
  .image('https://example.com/img.png')
  .toJSON()

// internal is a plain JSON-serializable object:
// {
//   role: 'user',
//   parts: [
//     { type: 'text', text: 'Hello' },
//     { type: 'image', data: 'https://example.com/img.png', mimeType: 'image/png', sourceType: 'url', url: 'https://example.com/img.png' }
//   ]
// }

TypeScript

multimodal-msg is written in TypeScript and ships type declarations alongside the compiled JavaScript. All public interfaces, option types, and provider output types are exported:

import type {
  // Core types
  Provider,
  ContentSource,
  ContentPart,
  TextPart,
  ImagePart,
  AudioPart,
  DocumentPart,

  // Internal representation
  InternalMessage,
  InternalConversation,

  // Option types
  ImageOptions,
  AudioOptions,
  DocumentOptions,

  // Provider output types
  OpenAIMessage,
  AnthropicMessage,
  GeminiContent,

  // Provider conversation types
  OpenAIConversation,
  AnthropicConversation,
  GeminiConversation,

  // Builder interfaces
  MessageBuilder,
  ConversationBuilder,
} from 'multimodal-msg'

The Provider type is a string union ('openai' | 'anthropic' | 'gemini') that can be used for type-safe provider selection:

function renderForProvider(provider: Provider) {
  return msg('user').text('Hello').for(provider)
}

The ContentSource type (Buffer | Uint8Array | string) represents all accepted input formats for binary content methods (.image(), .audio(), .document()).

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

multimodal-msg

Description

Installation

Quick Start

Build a message and render for a provider

Build a conversation

Convert between providers

Features

API Reference

msg(role?): MessageBuilder

MessageBuilder.text(text): MessageBuilder

MessageBuilder.image(data, options?): MessageBuilder

MessageBuilder.audio(data, options?): MessageBuilder

MessageBuilder.document(data, options?): MessageBuilder

MessageBuilder.forOpenAI(options?): OpenAIMessage

MessageBuilder.forAnthropic(): AnthropicMessage

MessageBuilder.forGemini(): GeminiContent

MessageBuilder.for(provider): OpenAIMessage | AnthropicMessage | GeminiContent

MessageBuilder.toJSON(): InternalMessage

conversation(): ConversationBuilder

ConversationBuilder.system(text): ConversationBuilder

ConversationBuilder.user(msg): ConversationBuilder

ConversationBuilder.assistant(msg): ConversationBuilder

ConversationBuilder.forOpenAI(): OpenAIConversation

ConversationBuilder.forAnthropic(): AnthropicConversation

ConversationBuilder.forGemini(): GeminiConversation

ConversationBuilder.for(provider): OpenAIConversation | AnthropicConversation | GeminiConversation

ConversationBuilder.toJSON(): InternalConversation

convertMessage(message, fromProvider, toProvider)

convertConversation(conversation, fromProvider, toProvider)

MIME Detection Utilities

detectMimeFromBuffer(buf): string | null

detectMimeFromExtension(filename): string | null

detectMimeFromDataUrl(dataUrl): string | null

resolveSource(source, options?)

Configuration

Provider Output Format Reference

System Message Handling

Role Mapping

Error Handling

MIME Type Detection Failure

Data URL Parse Failure

Raw Base64 Without MIME Type

Unsupported Content Type Fallbacks

Advanced Usage

Builder Reuse

Dynamic Provider Selection

Multimodal Conversations with Mixed Content

Cross-Provider Conversion with Multimodal Content

Serialization and Logging

TypeScript

License

`msg(role?): MessageBuilder`

`MessageBuilder.text(text): MessageBuilder`

`MessageBuilder.image(data, options?): MessageBuilder`

`MessageBuilder.audio(data, options?): MessageBuilder`

`MessageBuilder.document(data, options?): MessageBuilder`

`MessageBuilder.forOpenAI(options?): OpenAIMessage`

`MessageBuilder.forAnthropic(): AnthropicMessage`

`MessageBuilder.forGemini(): GeminiContent`

`MessageBuilder.for(provider): OpenAIMessage | AnthropicMessage | GeminiContent`

`MessageBuilder.toJSON(): InternalMessage`

`conversation(): ConversationBuilder`

`ConversationBuilder.system(text): ConversationBuilder`

`ConversationBuilder.user(msg): ConversationBuilder`

`ConversationBuilder.assistant(msg): ConversationBuilder`

`ConversationBuilder.forOpenAI(): OpenAIConversation`

`ConversationBuilder.forAnthropic(): AnthropicConversation`

`ConversationBuilder.forGemini(): GeminiConversation`

`ConversationBuilder.for(provider): OpenAIConversation | AnthropicConversation | GeminiConversation`

`ConversationBuilder.toJSON(): InternalConversation`

`convertMessage(message, fromProvider, toProvider)`

`convertConversation(conversation, fromProvider, toProvider)`

`detectMimeFromBuffer(buf): string | null`

`detectMimeFromExtension(filename): string | null`

`detectMimeFromDataUrl(dataUrl): string | null`

`resolveSource(source, options?)`