@reaatech/media-pipeline-mcp-google

v0.4.0

Published

10 days ago

Google Cloud provider — Document AI for OCR/table/field extraction, Vertex AI Gemini for image description

0High
0Medium
0Low

reaatech

@reaatech/media-pipeline-mcp-google

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Google Cloud provider for the media pipeline framework. Uses Document AI for production-grade OCR, table extraction, and structured field extraction, plus Vertex AI Gemini for vision-based image description. Supports per-page cost tracking and document byte hashing for cache efficiency.

Installation

npm install @reaatech/media-pipeline-mcp-google
# or
pnpm add @reaatech/media-pipeline-mcp-google

Feature Overview

Document AI OCR with plain text, structured JSON, or markdown output
Document AI table extraction with structural parsing (headers + body rows)
Document AI field extraction with configurable JSON schema and type coercion
Vertex AI Gemini image description at three detail levels
Page-level confidence scores on OCR output
Streaming support for Gemini image description (supportsStreaming)
SHA-256 hashing of document bytes in cache keys
Service account JSON key file authentication

Quick Start

import { GoogleProvider } from "@reaatech/media-pipeline-mcp-google";

const provider = new GoogleProvider({
  projectId: "my-gcp-project",
  location: "us",
  documentAiProcessorId: "abc123def456",
  geminiModel: "gemini-1.5-pro",
});

// OCR a scanned document
const text = await provider.execute({
  operation: "document.ocr",
  params: { image_data: docBuffer, output_format: "markdown", mime_type: "image/png" },
  config: {},
});

// Extract tables from a financial report
const tables = await provider.execute({
  operation: "document.extract_tables",
  params: { image_data: reportBuffer, output_format: "json", mime_type: "application/pdf" },
  config: {},
});
console.log(JSON.parse(tables.data.toString())); // Array of { headers, rows }

// Extract structured fields from a form
const fields = await provider.execute({
  operation: "document.extract_fields",
  params: {
    image_data: formBuffer,
    field_schema: { name: "string", date: "date", amount: "number", approved: "boolean" },
    mime_type: "image/png",
  },
  config: {},
});

// Describe an image with Gemini
const description = await provider.execute({
  operation: "image.describe",
  params: { image_data: photoBuffer, detail_level: "structured", mime_type: "image/jpeg" },
  config: {},
});

Supported Operations

| Operation | Service | Default Model | Description | |-----------|---------|---------------|-------------| | document.ocr | Document AI | Processor ID | Text extraction with page-level confidence | | document.extract_tables | Document AI | Form Parser | Table extraction as markdown or JSON arrays | | document.extract_fields | Document AI | Entity Extractor | Schema-driven field extraction with type coercion | | image.describe | Vertex AI | gemini-1.5-pro | Vision-based image description |

Configuration Parameters

`document.ocr`

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Document image as raw buffer | | output_format | string | "plain_text" | Output format: plain_text, structured_json, markdown | | mime_type | string | "image/png" | Document MIME type |

`document.extract_tables`

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Document image as raw buffer | | output_format | string | "markdown" | Output format: markdown, json | | mime_type | string | "image/png" | Document MIME type |

`document.extract_fields`

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Document image as raw buffer | | field_schema | Record<string, string> | required | Schema mapping field names to types (string, number, date, boolean) | | mime_type | string | "image/png" | Document MIME type |

`image.describe`

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Image as raw buffer | | detail_level | string | "detailed" | Description detail: brief, detailed, structured | | mime_type | string | "image/png" | Image MIME type |

API Reference

`GoogleProvider`

class GoogleProvider extends MediaProvider {
  constructor(config: GoogleProviderConfig)

  healthCheck(): Promise<ProviderHealth>
  estimateCost(input: ProviderInput): Promise<CostEstimate>
  execute(input: ProviderInput): Promise<ProviderOutput>
}

`GoogleProviderConfig`

interface GoogleProviderConfig {
  projectId: string;                  // GCP project ID (required)
  location?: string;                  // Default: "us" for Document AI, "us-central1" for Vertex AI
  documentAiProcessorId?: string;     // Document AI processor ID
  geminiModel?: string;               // Default: "gemini-1.5-pro"
  keyFile?: string;                   // Path to service account JSON key file
  timeout?: number;                   // Request timeout in ms
}

Factory Function

import { defineGoogleProvider } from "@reaatech/media-pipeline-mcp-google";

const provider = defineGoogleProvider({
  projectId: "my-gcp-project",
  documentAiProcessorId: "abc123",
});

Key Methods

| Method | Returns | Description | |--------|---------|-------------| | healthCheck() | ProviderHealth | Validates connectivity by calling getProcessor on the configured Document AI processor | | estimateCost(input) | CostEstimate | Returns fixed per-page/per-image cost from pricing table | | execute(input) | ProviderOutput | Routes to Document AI or Vertex AI based on operation type |

Non-Retryable Errors

The provider classifies these errors as non-retryable: permission denied, invalid credentials, project not found, processor not found, quota exceeded.

Type Coercion for Field Extraction

Fields extracted via document.extract_fields are coerced to the types specified in the schema:

| Schema Type | Conversion | |-------------|-----------| | string | Pass-through | | number | parseFloat() with fallback to 0 | | boolean | Matches "true" / "yes" (case-insensitive) | | date | Parsed to ISO 8601 string |

Cost Estimation

| Operation | Cost | |-----------|------| | document.ocr | $0.001 / page | | document.extract_tables | $0.01 / page | | document.extract_fields | $0.01 / page | | image.describe | $0.0025 / image |

Costs are fixed per-operation rates from pricing.json. Gemini description costs are per-image without token-based metering at the provider level.

Cache Configuration

The provider exposes static cacheConfig with deterministic and non-deterministic parameters.

Deterministic parameters: prompt, model, system, generationConfig, temperature, top_p, top_k, max_output_tokens, seed, document_data (SHA-256 hashed), processor_id, mime_type

Non-deterministic parameters: request_id

Document bytes are hashed with SHA-256 during normalization so cache keys for Document AI operations remain compact. Gemini operations include seed as a deterministic parameter — providing a fixed seed enables reproducible outputs and cache hits.

Health Check

The health check calls getProcessor on the configured Document AI processor to validate GCP credentials and connectivity. Returns { healthy: true, latency: <ms> } on success, or { healthy: false, error: "<message>" } on failure. If no documentAiProcessorId is configured, the check still passes if client construction succeeds.

Environment Variables

| Variable | Description | |----------|-------------| | GOOGLE_PROJECT_ID | GCP project ID | | GOOGLE_LOCATION | GCP location for Document AI / Vertex AI | | GOOGLE_DOCUMENT_AI_PROCESSOR_ID | Document AI processor ID | | GOOGLE_GEMINI_MODEL | Gemini model override | | GOOGLE_APPLICATION_CREDENTIALS | Service account JSON path |

Related Packages

@reaatech/media-pipeline-mcp-provider-core — Base provider class
@reaatech/media-pipeline-mcp-server — MCP server
@reaatech/media-pipeline-mcp-anthropic — Alternative document extraction provider (Claude Sonnet)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@reaatech/media-pipeline-mcp-google

Installation

Feature Overview

Quick Start

Supported Operations

Configuration Parameters

document.ocr

document.extract_tables

document.extract_fields

image.describe

API Reference

GoogleProvider

GoogleProviderConfig

Factory Function

Key Methods

Non-Retryable Errors

Type Coercion for Field Extraction

Cost Estimation

Cache Configuration

Health Check

Environment Variables

Related Packages

License

`document.ocr`

`document.extract_tables`

`document.extract_fields`

`image.describe`

`GoogleProvider`

`GoogleProviderConfig`