npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@intentsolutionsio/jeremy-vertex-terraform

v2.0.0

Published

Terraform configurations for Vertex AI platform and Agent Engine

Readme

Jeremy Vertex Terraform

🎯 VERTEX AI MODEL GARDEN & AI INFRASTRUCTURE

Terraform infrastructure specialist for broader Vertex AI services including Model Garden, Gemini endpoints, vector search, ML pipelines, and enterprise AI infrastructure (NOT Agent Engine - use jeremy-adk-terraform for that).

⚠️ Important: What This Plugin Is For

✅ THIS PLUGIN IS FOR:

  • Vertex AI Model Garden deployments (foundation models)
  • Gemini API endpoints (gemini-pro, gemini-2.0-flash)
  • Vector Search infrastructure (ScaNN-based similarity search)
  • Vertex AI Pipelines (Kubeflow Pipelines for ML workflows)
  • Endpoint deployment (model serving infrastructure)
  • Batch prediction jobs
  • ML model training infrastructure
  • Feature Store for ML feature management

❌ THIS PLUGIN IS NOT FOR:

  • Agent Engine infrastructure (use jeremy-adk-terraform for ADK agents)
  • Cloud Run deployments (use jeremy-genkit-terraform)
  • Self-managed ML infrastructure

Overview

This plugin provides Terraform modules for deploying Vertex AI services including Model Garden foundation models, Gemini API endpoints, vector search for RAG applications, ML pipelines, and production model serving infrastructure.

Key Infrastructure Components:

  • google_vertex_ai_endpoint for model serving
  • google_vertex_ai_deployed_model for model versions
  • google_vertex_ai_index for vector search
  • google_vertex_ai_index_endpoint for similarity search
  • google_vertex_ai_feature_store for feature management
  • Cloud Storage for model artifacts
  • BigQuery for ML model training

Installation

/plugin install jeremy-vertex-terraform@claude-code-plugins-plus

Prerequisites & Dependencies

Required Tools

1. Terraform:

# Install Terraform 1.5+
wget https://releases.hashicorp.com/terraform/1.6.0/terraform_1.6.0_linux_amd64.zip
unzip terraform_1.6.0_linux_amd64.zip
sudo mv terraform /usr/local/bin/

# Verify
terraform version  # Should show 1.5.0+

2. gcloud CLI:

# Install gcloud
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# Update to latest
gcloud components update

# Authenticate
gcloud auth application-default login

3. Terraform Google Provider:

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 5.0"
    }
  }
}

Required Google Cloud APIs

# Enable all required APIs
gcloud services enable \
    aiplatform.googleapis.com \
    compute.googleapis.com \
    storage.googleapis.com \
    bigquery.googleapis.com \
    logging.googleapis.com \
    monitoring.googleapis.com \
    cloudtrace.googleapis.com \
    --project=YOUR_PROJECT_ID

Required IAM Permissions

# Service account for Terraform needs:
- roles/aiplatform.admin              # Deploy Vertex AI resources
- roles/storage.admin                 # Manage model artifacts
- roles/bigquery.admin                # ML training datasets
- roles/compute.networkAdmin          # VPC for private endpoints
- roles/monitoring.admin              # Observability
- roles/iam.serviceAccountAdmin       # Service account management

Features

Model Garden Deployment: Foundation models (Gemini, PaLM, Claude, Llama) ✅ Gemini API Endpoints: Dedicated endpoints with rate limiting ✅ Vector Search: ScaNN-based similarity search for RAG ✅ ML Pipelines: Kubeflow Pipelines for training workflows ✅ Model Serving: Production endpoints with auto-scaling ✅ Batch Predictions: Large-scale inference jobs ✅ Feature Store: Centralized feature management ✅ Monitoring: Model performance tracking and drift detection

Quick Start

Natural Language Activation

"Create Terraform for Gemini endpoint deployment"
"Deploy vector search for RAG application"
"Set up Vertex AI Pipeline for model training"
"Create Feature Store for ML features"
"Deploy custom model to Vertex AI endpoint"

Terraform Module Structure

1. Gemini API Endpoint

# gemini_endpoint.tf

# Gemini 2.0 Flash endpoint
resource "google_vertex_ai_endpoint" "gemini_endpoint" {
  display_name = "gemini-2-0-flash-endpoint"
  location     = var.region
  project      = var.project_id

  description = "Production Gemini 2.0 Flash endpoint"

  # Network configuration
  network = google_compute_network.vertex_vpc.id

  # Encryption
  encryption_spec {
    kms_key_name = google_kms_crypto_key.model_key.id
  }
}

# Deploy Gemini model
resource "google_vertex_ai_deployed_model" "gemini_flash" {
  endpoint = google_vertex_ai_endpoint.gemini_endpoint.id

  model = "publishers/google/models/gemini-2.0-flash-001"

  display_name = "gemini-2-0-flash-001"

  dedicated_resources {
    machine_spec {
      machine_type = "n1-standard-4"
    }

    min_replica_count = var.min_replicas
    max_replica_count = var.max_replicas

    autoscaling_metric_specs {
      metric_name = "aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle"
      target      = 70
    }
  }

  # Traffic split
  traffic_split = {
    "0" = 100
  }
}

# Service account for endpoint
resource "google_service_account" "vertex_sa" {
  account_id   = "vertex-ai-endpoint-sa"
  display_name = "Vertex AI Endpoint Service Account"
}

resource "google_project_iam_member" "vertex_permissions" {
  for_each = toset([
    "roles/aiplatform.user",
    "roles/storage.objectViewer",
    "roles/logging.logWriter"
  ])

  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.vertex_sa.email}"
}

2. Vector Search Infrastructure

# vector_search.tf

# Vector index for embeddings
resource "google_vertex_ai_index" "embeddings_index" {
  display_name = "${var.app_name}-embeddings-index"
  region       = var.region
  project      = var.project_id

  description = "Vector search index for RAG application"

  metadata {
    contents_delta_uri = google_storage_bucket.embeddings.url

    config {
      dimensions                  = 768  # text-embedding-gecko dimensions
      approximate_neighbors_count = 150
      distance_measure_type       = "DOT_PRODUCT_DISTANCE"

      algorithm_config {
        tree_ah_config {
          leaf_node_embedding_count    = 1000
          leaf_nodes_to_search_percent = 7
        }
      }

      shard_size = "SHARD_SIZE_MEDIUM"
    }
  }

  index_update_method = "STREAM_UPDATE"
}

# Index endpoint for queries
resource "google_vertex_ai_index_endpoint" "embeddings_endpoint" {
  display_name = "${var.app_name}-embeddings-endpoint"
  region       = var.region
  project      = var.project_id

  description = "Vector search endpoint"

  # Private VPC
  network = "projects/${data.google_project.project.number}/global/networks/${google_compute_network.vertex_vpc.name}"

  public_endpoint_enabled = false
}

# Deploy index to endpoint
resource "google_vertex_ai_index_endpoint_deployed_index" "deployed" {
  index_endpoint = google_vertex_ai_index_endpoint.embeddings_endpoint.id
  index          = google_vertex_ai_index.embeddings_index.id

  deployed_index_id = "deployed_embeddings_index"
  display_name      = "Deployed Embeddings Index"

  dedicated_resources {
    machine_spec {
      machine_type = "n1-standard-16"
    }

    min_replica_count = 2
    max_replica_count = 10

    autoscaling_metric_specs {
      metric_name = "aiplatform.googleapis.com/prediction/online/cpu/utilization"
      target      = 70
    }
  }

  enable_access_logging = true
}

# Storage bucket for embeddings
resource "google_storage_bucket" "embeddings" {
  name     = "${var.project_id}-${var.app_name}-embeddings"
  location = var.region

  uniform_bucket_level_access = true

  versioning {
    enabled = true
  }

  lifecycle_rule {
    condition {
      num_newer_versions = 3
    }
    action {
      type = "Delete"
    }
  }
}

3. Custom Model Deployment

# custom_model.tf

# Upload model to Cloud Storage
resource "google_storage_bucket" "model_artifacts" {
  name     = "${var.project_id}-ml-models"
  location = var.region

  uniform_bucket_level_access = true
  versioning {
    enabled = true
  }
}

resource "google_storage_bucket_object" "model_artifact" {
  name   = "models/${var.model_name}/model.pkl"
  bucket = google_storage_bucket.model_artifacts.name
  source = var.model_path
}

# Register model
resource "google_vertex_ai_model" "custom_model" {
  display_name = var.model_name
  region       = var.region
  project      = var.project_id

  description = "Custom ML model"

  version_aliases = ["production"]

  # Model artifact location
  artifact_uri = "gs://${google_storage_bucket.model_artifacts.name}/models/${var.model_name}/"

  # Container spec for serving
  container_spec {
    image_uri = "us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest"

    env {
      name  = "MODEL_NAME"
      value = var.model_name
    }

    ports {
      container_port = 8080
    }

    predict_route = "/predict"
    health_route  = "/health"
  }

  # Encryption
  encryption_spec {
    kms_key_name = google_kms_crypto_key.model_key.id
  }
}

# Create serving endpoint
resource "google_vertex_ai_endpoint" "model_endpoint" {
  display_name = "${var.model_name}-endpoint"
  location     = var.region
  project      = var.project_id

  network = google_compute_network.vertex_vpc.id
}

# Deploy model to endpoint
resource "google_vertex_ai_deployed_model" "deployed" {
  endpoint = google_vertex_ai_endpoint.model_endpoint.id
  model    = google_vertex_ai_model.custom_model.id

  display_name = "${var.model_name}-v1"

  dedicated_resources {
    machine_spec {
      machine_type = "n1-standard-4"

      accelerator_type  = "NVIDIA_TESLA_T4"
      accelerator_count = 1
    }

    min_replica_count = 1
    max_replica_count = 5

    autoscaling_metric_specs {
      metric_name = "aiplatform.googleapis.com/prediction/online/cpu/utilization"
      target      = 60
    }
  }

  traffic_split = {
    "0" = 100
  }
}

4. Vertex AI Pipelines

# pipelines.tf

# Pipeline storage bucket
resource "google_storage_bucket" "pipeline_root" {
  name     = "${var.project_id}-pipeline-root"
  location = var.region

  uniform_bucket_level_access = true

  lifecycle_rule {
    condition {
      age = 30
    }
    action {
      type = "Delete"
    }
  }
}

# Artifact Registry for pipeline containers
resource "google_artifact_registry_repository" "pipeline_containers" {
  repository_id = "vertex-pipelines"
  location      = var.region
  format        = "DOCKER"

  description = "Container images for Vertex AI Pipelines"
}

# Service account for pipelines
resource "google_service_account" "pipeline_sa" {
  account_id   = "vertex-pipeline-runner"
  display_name = "Vertex AI Pipeline Runner"
}

resource "google_project_iam_member" "pipeline_permissions" {
  for_each = toset([
    "roles/aiplatform.user",
    "roles/storage.admin",
    "roles/bigquery.dataEditor",
    "roles/logging.logWriter"
  ])

  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.pipeline_sa.email}"
}

# Example: Training pipeline trigger
resource "google_cloudfunctions2_function" "pipeline_trigger" {
  name     = "trigger-training-pipeline"
  location = var.region

  build_config {
    runtime     = "python312"
    entry_point = "trigger_pipeline"

    source {
      storage_source {
        bucket = google_storage_bucket.pipeline_root.name
        object = "functions/trigger.zip"
      }
    }
  }

  service_config {
    available_memory   = "256M"
    timeout_seconds    = 60
    service_account_email = google_service_account.pipeline_sa.email

    environment_variables = {
      PROJECT_ID    = var.project_id
      PIPELINE_ROOT = "gs://${google_storage_bucket.pipeline_root.name}"
      REGION        = var.region
    }
  }
}

5. Feature Store

# feature_store.tf

resource "google_vertex_ai_featurestore" "main" {
  name   = "${var.app_name}-featurestore"
  region = var.region

  online_serving_config {
    fixed_node_count = 1
  }

  encryption_spec {
    kms_key_name = google_kms_crypto_key.feature_key.id
  }

  force_destroy = false
}

# Entity type (e.g., users, items)
resource "google_vertex_ai_featurestore_entitytype" "users" {
  name         = "users"
  featurestore = google_vertex_ai_featurestore.main.id

  monitoring_config {
    snapshot_analysis {
      disabled = false
    }
  }
}

# Features
resource "google_vertex_ai_featurestore_entitytype_feature" "user_age" {
  name       = "age"
  entitytype = google_vertex_ai_featurestore_entitytype.users.id

  value_type = "INT64"

  description = "User age in years"
}

resource "google_vertex_ai_featurestore_entitytype_feature" "user_ltv" {
  name       = "lifetime_value"
  entitytype = google_vertex_ai_featurestore_entitytype.users.id

  value_type = "DOUBLE"

  description = "User lifetime value"
}

6. Batch Prediction

# batch_prediction.tf

# Batch prediction job
resource "google_vertex_ai_batch_prediction_job" "batch_inference" {
  display_name = "${var.model_name}-batch-prediction"
  location     = var.region

  model = google_vertex_ai_model.custom_model.id

  input_config {
    instances_format = "jsonl"

    gcs_source {
      uris = ["gs://${google_storage_bucket.model_artifacts.name}/batch-input/*.jsonl"]
    }
  }

  output_config {
    predictions_format = "jsonl"

    gcs_destination {
      output_uri_prefix = "gs://${google_storage_bucket.model_artifacts.name}/batch-output/"
    }
  }

  dedicated_resources {
    machine_spec {
      machine_type      = "n1-standard-4"
      accelerator_type  = "NVIDIA_TESLA_T4"
      accelerator_count = 1
    }

    starting_replica_count = 1
    max_replica_count      = 10
  }

  service_account = google_service_account.vertex_sa.email
}

7. Monitoring & Observability

# monitoring.tf

# Dashboard for model endpoints
resource "google_monitoring_dashboard" "vertex_dashboard" {
  dashboard_json = jsonencode({
    displayName = "${var.app_name} Vertex AI Dashboard"

    mosaicLayout = {
      columns = 12

      tiles = [
        # Prediction requests
        {
          width  = 6
          height = 4
          widget = {
            title = "Prediction Requests"
            xyChart = {
              dataSets = [{
                timeSeriesQuery = {
                  timeSeriesFilter = {
                    filter = "metric.type=\"aiplatform.googleapis.com/prediction/online/prediction_count\" resource.type=\"aiplatform.googleapis.com/Endpoint\""
                  }
                }
              }]
            }
          }
        },

        # Prediction latency
        {
          xPos   = 6
          width  = 6
          height = 4
          widget = {
            title = "Prediction Latency (p95)"
            xyChart = {
              dataSets = [{
                timeSeriesQuery = {
                  timeSeriesFilter = {
                    filter = "metric.type=\"aiplatform.googleapis.com/prediction/online/response_latencies\" resource.type=\"aiplatform.googleapis.com/Endpoint\""

                    aggregation = {
                      alignmentPeriod     = "60s"
                      perSeriesAligner    = "ALIGN_DELTA"
                      crossSeriesReducer  = "REDUCE_PERCENTILE_95"
                    }
                  }
                }
              }]
            }
          }
        },

        # Error rate
        {
          yPos   = 4
          width  = 6
          height = 4
          widget = {
            title = "Error Rate"
            xyChart = {
              dataSets = [{
                timeSeriesQuery = {
                  timeSeriesFilter = {
                    filter = "metric.type=\"aiplatform.googleapis.com/prediction/online/error_count\" resource.type=\"aiplatform.googleapis.com/Endpoint\""
                  }
                }
              }]
            }
          }
        },

        # Replica utilization
        {
          xPos   = 6
          yPos   = 4
          width  = 6
          height = 4
          widget = {
            title = "Replica Utilization"
            xyChart = {
              dataSets = [{
                timeSeriesQuery = {
                  timeSeriesFilter = {
                    filter = "metric.type=\"aiplatform.googleapis.com/prediction/online/replicas\" resource.type=\"aiplatform.googleapis.com/Endpoint\""
                  }
                }
              }]
            }
          }
        }
      ]
    }
  })
}

# Alert: High latency
resource "google_monitoring_alert_policy" "high_latency" {
  display_name = "${var.app_name} - High Prediction Latency"
  combiner     = "OR"

  conditions {
    display_name = "P95 latency > 5s"

    condition_threshold {
      filter          = "metric.type=\"aiplatform.googleapis.com/prediction/online/response_latencies\" resource.type=\"aiplatform.googleapis.com/Endpoint\""
      duration        = "300s"
      comparison      = "COMPARISON_GT"
      threshold_value = 5000

      aggregations {
        alignment_period     = "60s"
        per_series_aligner   = "ALIGN_DELTA"
        cross_series_reducer = "REDUCE_PERCENTILE_95"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.id]
}

# Alert: High error rate
resource "google_monitoring_alert_policy" "high_errors" {
  display_name = "${var.app_name} - High Error Rate"
  combiner     = "OR"

  conditions {
    display_name = "Error rate > 5%"

    condition_threshold {
      filter          = "metric.type=\"aiplatform.googleapis.com/prediction/online/error_count\" resource.type=\"aiplatform.googleapis.com/Endpoint\""
      duration        = "300s"
      comparison      = "COMPARISON_GT"
      threshold_value = 0.05

      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_RATE"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.id]
}

Variables

# variables.tf

variable "project_id" {
  description = "Google Cloud project ID"
  type        = string
}

variable "region" {
  description = "Region for Vertex AI resources"
  type        = string
  default     = "us-central1"
}

variable "app_name" {
  description = "Application name prefix"
  type        = string
}

# Endpoint configuration
variable "min_replicas" {
  description = "Minimum number of replicas"
  type        = number
  default     = 1
}

variable "max_replicas" {
  description = "Maximum number of replicas"
  type        = number
  default     = 10
}

# Model configuration
variable "model_name" {
  description = "Custom model name"
  type        = string
}

variable "model_path" {
  description = "Local path to model artifact"
  type        = string
}

# Vector search
variable "embedding_dimensions" {
  description = "Dimensions for embeddings (768 for gecko, 1536 for OpenAI)"
  type        = number
  default     = 768
}

# Alerting
variable "alert_email" {
  description = "Email for monitoring alerts"
  type        = string
}

Deployment Workflow

1. Initialize Terraform

terraform init

2. Plan Infrastructure

terraform plan \
  -var="project_id=my-project" \
  -var="app_name=my-app" \
  -var="model_name=custom-model" \
  -var="model_path=./model.pkl" \
  -var="[email protected]"

3. Apply Configuration

terraform apply \
  -var="project_id=my-project" \
  -var="app_name=my-app" \
  -var="model_name=custom-model" \
  -var="model_path=./model.pkl" \
  -var="[email protected]"

4. Verify Deployment

# Check endpoints
gcloud ai endpoints list --region=us-central1

# Check deployed models
gcloud ai models list --region=us-central1

# Check vector search indexes
gcloud ai indexes list --region=us-central1

# Test prediction
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/endpoints/ENDPOINT_ID:predict \
  -d '{"instances": [{"feature1": 1.0, "feature2": "value"}]}'

Integration with Other Plugins

jeremy-adk-terraform

  • jeremy-adk-terraform: Agent Engine (ADK agents)
  • jeremy-vertex-terraform: Model serving & ML infrastructure (this plugin)

jeremy-vertex-engine

  • Terraform provisions endpoints → Engine inspector validates (for Agent Engine only)

jeremy-vertex-validator

  • Terraform provisions infrastructure → Validator checks production readiness

Use Cases

Gemini API Deployment

"Create Terraform for Gemini 2.0 Flash endpoint"
"Deploy Gemini Pro with auto-scaling"

Vector Search for RAG

"Set up vector search infrastructure for RAG application"
"Deploy embeddings index with 768 dimensions"

Custom Model Serving

"Deploy custom scikit-learn model to Vertex AI"
"Create endpoint for TensorFlow model with GPU"

Batch Predictions

"Set up batch prediction job for large dataset"
"Deploy batch inference with T4 GPUs"

Feature Store

"Create Feature Store for user features"
"Deploy feature serving for real-time predictions"

Best Practices

Private Endpoints: Use VPC for production endpoints ✅ Auto-scaling: Configure based on traffic patterns ✅ Monitoring: Deploy dashboards and alerts ✅ Encryption: Use CMEK for sensitive models ✅ Version Control: Tag model versions ✅ Cost Optimization: Use preemptible VMs for batch jobs ✅ Traffic Splitting: Blue/green deployments ✅ Model Registry: Organize models in Vertex AI Model Registry

Requirements

  • Terraform >= 1.5.0
  • Google Cloud Provider >= 5.0
  • Google Cloud Project with billing enabled
  • Appropriate IAM permissions
  • Model artifacts prepared
  • gcloud CLI

License

MIT

Support

  • Issues: https://github.com/jeremylongshore/claude-code-plugins/issues
  • Discussions: https://github.com/jeremylongshore/claude-code-plugins/discussions

Version

1.0.1 (2025) - Comprehensive Vertex AI infrastructure (Model Garden, Vector Search, Pipelines)