npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/media-pipeline-mcp-google

v0.4.0

Published

Google Cloud provider — Document AI for OCR/table/field extraction, Vertex AI Gemini for image description

Readme

@reaatech/media-pipeline-mcp-google

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Google Cloud provider for the media pipeline framework. Uses Document AI for production-grade OCR, table extraction, and structured field extraction, plus Vertex AI Gemini for vision-based image description. Supports per-page cost tracking and document byte hashing for cache efficiency.

Installation

npm install @reaatech/media-pipeline-mcp-google
# or
pnpm add @reaatech/media-pipeline-mcp-google

Feature Overview

  • Document AI OCR with plain text, structured JSON, or markdown output
  • Document AI table extraction with structural parsing (headers + body rows)
  • Document AI field extraction with configurable JSON schema and type coercion
  • Vertex AI Gemini image description at three detail levels
  • Page-level confidence scores on OCR output
  • Streaming support for Gemini image description (supportsStreaming)
  • SHA-256 hashing of document bytes in cache keys
  • Service account JSON key file authentication

Quick Start

import { GoogleProvider } from "@reaatech/media-pipeline-mcp-google";

const provider = new GoogleProvider({
  projectId: "my-gcp-project",
  location: "us",
  documentAiProcessorId: "abc123def456",
  geminiModel: "gemini-1.5-pro",
});

// OCR a scanned document
const text = await provider.execute({
  operation: "document.ocr",
  params: { image_data: docBuffer, output_format: "markdown", mime_type: "image/png" },
  config: {},
});

// Extract tables from a financial report
const tables = await provider.execute({
  operation: "document.extract_tables",
  params: { image_data: reportBuffer, output_format: "json", mime_type: "application/pdf" },
  config: {},
});
console.log(JSON.parse(tables.data.toString())); // Array of { headers, rows }

// Extract structured fields from a form
const fields = await provider.execute({
  operation: "document.extract_fields",
  params: {
    image_data: formBuffer,
    field_schema: { name: "string", date: "date", amount: "number", approved: "boolean" },
    mime_type: "image/png",
  },
  config: {},
});

// Describe an image with Gemini
const description = await provider.execute({
  operation: "image.describe",
  params: { image_data: photoBuffer, detail_level: "structured", mime_type: "image/jpeg" },
  config: {},
});

Supported Operations

| Operation | Service | Default Model | Description | |-----------|---------|---------------|-------------| | document.ocr | Document AI | Processor ID | Text extraction with page-level confidence | | document.extract_tables | Document AI | Form Parser | Table extraction as markdown or JSON arrays | | document.extract_fields | Document AI | Entity Extractor | Schema-driven field extraction with type coercion | | image.describe | Vertex AI | gemini-1.5-pro | Vision-based image description |

Configuration Parameters

document.ocr

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Document image as raw buffer | | output_format | string | "plain_text" | Output format: plain_text, structured_json, markdown | | mime_type | string | "image/png" | Document MIME type |

document.extract_tables

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Document image as raw buffer | | output_format | string | "markdown" | Output format: markdown, json | | mime_type | string | "image/png" | Document MIME type |

document.extract_fields

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Document image as raw buffer | | field_schema | Record<string, string> | required | Schema mapping field names to types (string, number, date, boolean) | | mime_type | string | "image/png" | Document MIME type |

image.describe

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | image_data | Buffer | required | Image as raw buffer | | detail_level | string | "detailed" | Description detail: brief, detailed, structured | | mime_type | string | "image/png" | Image MIME type |

API Reference

GoogleProvider

class GoogleProvider extends MediaProvider {
  constructor(config: GoogleProviderConfig)

  healthCheck(): Promise<ProviderHealth>
  estimateCost(input: ProviderInput): Promise<CostEstimate>
  execute(input: ProviderInput): Promise<ProviderOutput>
}

GoogleProviderConfig

interface GoogleProviderConfig {
  projectId: string;                  // GCP project ID (required)
  location?: string;                  // Default: "us" for Document AI, "us-central1" for Vertex AI
  documentAiProcessorId?: string;     // Document AI processor ID
  geminiModel?: string;               // Default: "gemini-1.5-pro"
  keyFile?: string;                   // Path to service account JSON key file
  timeout?: number;                   // Request timeout in ms
}

Factory Function

import { defineGoogleProvider } from "@reaatech/media-pipeline-mcp-google";

const provider = defineGoogleProvider({
  projectId: "my-gcp-project",
  documentAiProcessorId: "abc123",
});

Key Methods

| Method | Returns | Description | |--------|---------|-------------| | healthCheck() | ProviderHealth | Validates connectivity by calling getProcessor on the configured Document AI processor | | estimateCost(input) | CostEstimate | Returns fixed per-page/per-image cost from pricing table | | execute(input) | ProviderOutput | Routes to Document AI or Vertex AI based on operation type |

Non-Retryable Errors

The provider classifies these errors as non-retryable: permission denied, invalid credentials, project not found, processor not found, quota exceeded.

Type Coercion for Field Extraction

Fields extracted via document.extract_fields are coerced to the types specified in the schema:

| Schema Type | Conversion | |-------------|-----------| | string | Pass-through | | number | parseFloat() with fallback to 0 | | boolean | Matches "true" / "yes" (case-insensitive) | | date | Parsed to ISO 8601 string |

Cost Estimation

| Operation | Cost | |-----------|------| | document.ocr | $0.001 / page | | document.extract_tables | $0.01 / page | | document.extract_fields | $0.01 / page | | image.describe | $0.0025 / image |

Costs are fixed per-operation rates from pricing.json. Gemini description costs are per-image without token-based metering at the provider level.

Cache Configuration

The provider exposes static cacheConfig with deterministic and non-deterministic parameters.

Deterministic parameters: prompt, model, system, generationConfig, temperature, top_p, top_k, max_output_tokens, seed, document_data (SHA-256 hashed), processor_id, mime_type

Non-deterministic parameters: request_id

Document bytes are hashed with SHA-256 during normalization so cache keys for Document AI operations remain compact. Gemini operations include seed as a deterministic parameter — providing a fixed seed enables reproducible outputs and cache hits.

Health Check

The health check calls getProcessor on the configured Document AI processor to validate GCP credentials and connectivity. Returns { healthy: true, latency: <ms> } on success, or { healthy: false, error: "<message>" } on failure. If no documentAiProcessorId is configured, the check still passes if client construction succeeds.

Environment Variables

| Variable | Description | |----------|-------------| | GOOGLE_PROJECT_ID | GCP project ID | | GOOGLE_LOCATION | GCP location for Document AI / Vertex AI | | GOOGLE_DOCUMENT_AI_PROCESSOR_ID | Document AI processor ID | | GOOGLE_GEMINI_MODEL | Gemini model override | | GOOGLE_APPLICATION_CREDENTIALS | Service account JSON path |

Related Packages

License

MIT