@shakudo/shakudo-platform-mcp

v1.1.3

Published

2 days ago

Model Context Protocol server for Shakudo Platform API with enhanced image builder tools

0High
0Medium
0Low

yiran-shakudo

mcp model-context-protocol shakudo hyperplane graphql ai ml data-platform

Hyperplane MCP Server

A Model Context Protocol (MCP) server that provides AI assistants with comprehensive access to the Hyperplane API for managing data science and machine learning workloads on Kubernetes.

Overview

This MCP server provides 3 tested and production-ready microservice management tools that can be used by AI assistants like Claude. Currently focused on core microservice operations with additional tools to be enabled after testing:

✅ Microservice Management: Search, restart, and monitor microservices (PRODUCTION-TESTED)

🚧 Additional Categories (126+ tools) - Disabled until testing complete:

Pipeline Job Management, Interactive Computing, Distributed Computing, Container Images, Platform Services, Security & Access, Monitoring, Multi-Cluster, and more...

Installation

Prerequisites

Bun runtime (recommended) or Node.js 18 or higher

Install Dependencies

bun install

Build

bun run build

The server automatically connects to the Hyperplane API at http://api-server.hyperplane-core.svc.cluster.local:80/graphql.

Usage

With Claude Desktop

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "shakudo-microservices": {
      "command": "bun",
      "args": ["run", "dev"],
      "cwd": "/absolute/path/to/shakudo-platform-mcp/mcp-server",
      "env": {
        "HYPERPLANE_API_ENDPOINT": "http://api-server.hyperplane-core.svc.cluster.local:80/graphql"
      }
    }
  }
}

Setup Steps:

Build the server:

cd /path/to/shakudo-platform-mcp/mcp-server
bun run build

Update the configuration path:
- Replace /absolute/path/to/shakudo-platform-mcp/mcp-server with your actual path
- Ensure the path is absolute, not relative

Alternative using built version:

{
  "mcpServers": {
    "shakudo-microservices": {
      "command": "node",
      "args": ["/absolute/path/to/shakudo-platform-mcp/mcp-server/build/index.js"],
      "env": {
        "HYPERPLANE_API_ENDPOINT": "http://api-server.hyperplane-core.svc.cluster.local:80/graphql"
      }
    }
  }
}

Restart Claude Desktop after saving the configuration
Verify connection: You should see "shakudo-microservices" in the MCP section of Claude Desktop

Available Tools After Setup:

searchMicroservice - Find microservices by name
restartService - Restart microservices/services
getPodEvents - Monitor logs and startup progress

With MCP Inspector

For development and testing:

bun run inspector

Direct Usage

bun run dev

Quick Start

Use the provided run script:

./run.sh

Available Tools

✅ Microservice Management (3 tools) - PRODUCTION-TESTED

searchMicroservice - Search for microservices by name (Key tool for microservice discovery) ✅ TESTED
restartService - Restart microservices/services directly ✅ TESTED
getPodEvents - Get pod events and logs for microservices ✅ TESTED

🚧 Additional Tool Categories (Disabled Until Testing Complete)

Pipeline Job Management (9 untested tools):

createPipelineJob, cancelPipelineJob, triggerJobInstance, getJobStatistics, createScheduledJob, cancelScheduledJob, checkJobYamlEdited, getJobPodSpec, scaleService

Interactive Computing Sessions (7 untested tools):

createHyperHubSession, cancelSession, restartSessionProcess, getSessionStatistics, updateSessionGroup, getSessionPodSpec, checkUserServiceUrl

Distributed Computing Clusters (10 untested tools):

getDaskClusterCount, cancelUserDaskPods, getRayClusterCount, createDaskCluster, createRayCluster, deleteDaskCluster, deleteRayCluster, scaleDaskCluster, scaleRayCluster, cancelUserRayPods

Container Image Management (7 untested tools):

createImageBuilderJob, getImageBuilderStatistics, cancelImageBuilderJob, getImageBuilderJob, listImageBuilderJobs, deleteImageBuilderJob, retryImageBuilderJob

Platform Applications (12 untested tools):

installPlatformApp, getPlatformCatalogue, getPlatformAppStatistics, scaleStackComponentDown, scaleStackComponentToDefault, installStackComponent, getHelmAppVersion, getAllHelmAppVersions, uninstallPlatformApp, updatePlatformApp, countPinnedPlatformApps, restartPlatformApp

Security & Access Control (9 untested tools):

createHyperplaneSecret, createServiceAccount, deleteServiceAccount, toggleAirGapMode, checkAirGapMode, checkNamespaceAccess, getAuthorizationPolicies, updateServiceAccount, deactivateServiceAccount

Monitoring & Observability (8 untested tools):

getPodData, getPVCStatus, getPriorityClasses, getNamespaceEventLogs, getNamespaceServices, checkActiveTraffic, getResourceMetrics, getClusterHealth

Notifications & Alerting (4 untested tools):

sendEmailNotification, addNotification, createJobNotificationTarget, triggerAlert

Namespace Management (3 untested tools):

getNamespaces, scaleDownNamespaceResources, scaleNamespaceResourcesToDefault

Traffic Management (5 untested tools):

createTrafficSplitVsvc, updateTrafficSplitVsvc, deactivateTrafficSplitVsvc, getTrafficSplitVsvcs, countTrafficSplitVsvcs

Multi-Cluster Management (6 untested tools):

createSatelliteCluster, cancelSatellitePipelineJob, cancelScheduledJobOnSatelliteCluster, countJobsOnSatelliteCluster, getChildJobsOnSatelliteCluster, countSatelliteEnvironmentConfig

Environment Configuration (2 untested tools):

createEnvironmentConfig, countEnvironmentConfig

User Management (2 untested tools):

getOrCreateHyperplaneUser, updateHyperplaneUserGroup

Data Management (4 untested tools):

getDatalakeBucketName, getBytebaseSQLCode, createCloudSqlProxy, countCloudSqlProxy

Note: These 126+ additional tools will be gradually enabled as they are tested and verified to work correctly with the Hyperplane API.

Example Usage

Here are some example interactions with the MCP server:

Create a Machine Learning Pipeline

Create a new ML training pipeline job named "customer-churn-model" that:
- Uses the pipeline YAML at "pipelines/ml/churn-model.yaml"
- Has 3 retry attempts
- Sends notifications to "ml-team-slack"
- Requires 4 CPU cores and 16GB RAM

Launch a Development Environment

Create a new Jupyter session called "data-exploration" using the "python-ml-gpu" environment with 2 CPUs, 8GB RAM, and 1 GPU for exploring the customer dataset.

Deploy a Dask Cluster

Create a Dask cluster named "data-processing-cluster" with 5 worker nodes, each having 2 CPUs and 4GB RAM, for parallel data processing.

Monitor System Health

Check the health status of the Kubernetes cluster and show me any pods in the "ml-platform" namespace that are having issues.

Find and Restart Microservices

Search for the "shakbot-service-v2" microservice and restart it.

Example workflow ✅ TESTED & WORKING:

Search: searchMicroservice with term "shakbot-service-v2" ✅
Restart: restartService with the returned microservice ID ✅
Monitor: getPodEvents with the microservice ID to track startup progress ✅
Alternative: scaleService with replicas 0 (stop) then 1 (start)

Complete Microservice Restart & Monitoring Pattern

Real-world usage pattern verified with shakbot-service-v2:

# Step 1: Find the microservice
searchMicroservice("shakbot-service-v2")
# Returns: ID "62051482-0ae7-42cd-91b8-919224f8e4fb", status "in progress"

# Step 2: Restart the service
restartService("62051482-0ae7-42cd-91b8-919224f8e4fb") 
# Returns: "Succeeded"

# Step 3: Monitor startup progress with periodic log checks
getPodEvents("62051482-0ae7-42cd-91b8-919224f8e4fb")

Typical Startup Sequence Observed:

Pod Creation (0-10s): Kubernetes containers created and started
Environment Setup (10-20s): Package manager installation, config files
Dependencies (20-40s): npm/pnpm package installation (475 packages)
Database Init (40-50s): Table creation, migrations
Application Start (50-60s): Next.js server ready, service operational

Monitoring Tips:

Check logs immediately after restart to confirm pod creation
Monitor every 10-30 seconds during dependency installation
Look for "Ready in Xms" message for full startup confirmation
Total startup time typically 45-90 seconds for full services

Development

Watch Mode

bun run watch

Testing with Inspector

bun run inspector

Error Handling

The MCP server returns raw GraphQL errors as requested, providing full error details for debugging and troubleshooting.

Contributing

Fork the repository
Create a feature branch
Make your changes
Test with the MCP inspector
Submit a pull request

License

MIT License - see the LICENSE file for details.