gpuse-mcp-server

v0.3.37

Published

2 months ago

GPUse MCP server for Claude Code, Codex, Gemini, Cursor, Windsurf, and other MCP clients

0High
0Medium
0Low

gpuse

GPUse MCP Server

Server Overview

Name: GPUse MCP Server
Summary: GPUse MCP Server provides AI agents with tools to discover templates, launch GPU workloads, monitor instances, manage billing, and handle account verification.
Description: The GPUse MCP Server exposes the full lifecycle of GPUse’s GPU provisioning platform. Agents can browse on-demand GPU options, list and recommend serverless templates, launch managed or custom builds, provision on‑demand VMs, poll status, stream logs, surface checkout links, verify account codes, and shut down compute resources without human intervention.

Tools

recommend

AI-powered compute recommendation based on task description and requirements. The engine prefers on-demand GPUs when you’re bringing your own model/container, and will ask follow‑up questions (often VRAM needs or container image) when required. IMPORTANT: This tool returns a complete compute_plan that you can pass directly to start_compute. The compute_plan includes all required fields for both serverless and on-demand deployments: gpu_type, gpu_count, machine_type, region, zone, and container_image/container_config or template_id. If the backend only supports the legacy templates-only recommender, the MCP server will return a serverless template plan and note that full on‑demand recommendations require enabling POST /api/v1/recommend.

Parameters:

task_description: (Optional, string) Natural-language description of the GPU workload or AI task (e.g. "fine-tune Llama-3 8B on financial QA", "serve Whisper-large for streaming audio", "run inference on custom vision model"). Include model size, framework, or compute requirements for best template match.
budget_usd_per_hour: (Optional, number) Optional hard ceiling for hourly GPU cost in USD. Use decimals for cents (e.g. 1.50 for $1.50/hour). Options exceeding this budget are excluded from recommendations.
priority: (Optional, string) Optional optimization hint for compute selection. "speed" favors fastest GPUs and lowest latency, "cost" favors cheapest compute options, "balanced" (default) weighs both factors equally.
session_id: (Optional, string) Optional session ID from a previous needs_input response. Pass this to continue a multi-turn recommendation conversation.
answer: (Optional, string) Answer to the follow_up_question from a previous needs_input response. Required when continuing a session with session_id.

catalog

Browse the full GPUse compute catalog. This lists both:

On‑demand GPU accelerators (VRAM, valid counts, machine families) for raw GPU provisioning, where you bring your own image/model.
Serverless templates for quick managed starts.

Agents can launch any option immediately via the 5‑minute grace window, then surface the checkout link if the human wants more runtime. Outputs include templates[] plus accelerators[] when available. If the unified catalog isn’t enabled on the backend yet, the MCP server falls back to /api/v1/catalog/gpus to surface on‑demand GPU options anyway.

Parameters:

limit: (Optional, integer) Maximum templates to return per page (default: 10, max: 50).
offset: (Optional, integer) Number of templates to skip before returning results (for pagination).

describe_template_endpoints

Show every endpoint plus ready-to-run request examples for a given template. Retrieve the full API surface for a GPUse GPU template. Agents can inspect HTTP methods, paths, summaries, and example payloads straight from the manifest, along with docs links, usage notes, and instructions for how to call the template once the compute instance is running. Ideal when you want to double-check the endpoint contract before provisioning or need copy/paste-ready examples for the AI coding agent you're orchestrating. Outputs: Returns endpoints[] with method, path, summary, and request examples; docs_url and docs_path for documentation; alternatives[] suggesting nearby templates if exact match unavailable. Error recovery: If TEMPLATE_NOT_FOUND, call catalog to discover valid template_id values, or use the alternatives array to pick the closest match.

Parameters:

template_id: (string) Template identifier (case-insensitive); matches entries in catalog.

start_compute

Launch a GPUse GPU compute instance. Provision a GPU compute instance on GPUse. IMPORTANT: If you don't have all required fields (gpu_type, gpu_count, machine_type, region, zone, container_config or template_id), call the recommend tool first. It returns a complete compute_plan that you can pass directly to this tool for a seamless launch—no manual field mapping required. Options: - Pass compute_plan from recommend for optimal, availability-aware configuration (recommended) - Use template_id for managed templates (pre-configured models) - Specify gpu_type and container_config for fully custom deployments Supports T4, L4, V100, A100, H100 GPUs with configurable machine types and zones. The server automatically chooses between cached bearer tokens and the 5-minute grace period. The response includes compute_id, checkout_url, endpoint_url, status_url, and logs_url. When a field is null, call the helper tools to retrieve updates. Outputs: Returns compute_id (feed to helper tools), checkout_url for payment, endpoint_url once ready, status/logs URLs for monitoring, and grace_remaining_seconds. Error recovery: If AUTH_REQUIRED or GRACE_EXHAUSTED, call auth_helper to authenticate then retry. If TEMPLATE_UNAVAILABLE, call catalog for alternatives. If QUOTA_EXCEEDED, try a different region or GPU type.

Parameters:

template_id: (Optional, string) Template identifier from recommend or catalog. Use for managed template deployments.
task_description: (Optional, string) Optional context describing the goal; improves logging.
duration_minutes: (Optional, integer) Optional requested runtime. Grace defaults to 5 minutes.
build_source: (Optional, object) Custom deployment payload matching POST /api/v1/custom.
project_hint: (Optional, string) Optional slug to group grace-period runs. Use stable identifiers like a repository name to avoid exhausting grace for unrelated tasks.
compute_type: (Optional, string) Compute type selection. "serverless" (default) uses managed templates with L4 GPU. "on_demand" allows configurable GPU types (T4, L4, V100, A100, H100).
gpu_type: (Optional, string) Required for on_demand compute. GPU accelerator type. Use catalog tool to see available GPUs with pricing and availability.
gpu_count: (Optional, integer) Number of GPUs to attach. Default 1. Valid counts depend on GPU type.
machine_type: (Optional, string) Machine type (e.g., n1-standard-8, a2-highgpu-1g). If not specified, a recommended machine type is selected based on gpu_type.
zone: (Optional, string) Deployment zone (e.g., us-central1-a). If not specified, an available zone is selected based on quota and capacity.
disk_size_gb: (Optional, integer) Boot disk size in GB. Minimum 40GB. Default 50GB.
container_config: (Optional, object) Container configuration for on_demand compute. Required if not using template_id. Properties: image (container image URL), port (service port, default 8000).
region: (Optional, string) Preferred region (e.g., us-central1). Used for zone selection if zone not specified.
compute_plan: (Optional, object) Pass the compute_plan object directly from the recommend tool response. When provided, compute_plan fields override corresponding top-level fields. This enables the recommend → start_compute workflow without manual field mapping.

start_custom

Build a bespoke container via POST /api/v1/custom with streaming logs. Build and package a custom GPU container on demand. Provide inline Dockerfile content, a Git repository URL, or a storage object and GPUse builds the container for you. The response returns build_id, target_image, and estimated costs so autonomous agents can iterate without waiting on humans. Use this when managed templates don't cover your workload requirements. Outputs: Returns build_id for tracking (use with get_logs to monitor build progress), target_image for deployment, and cost_estimate with hourly rate. Error recovery: If BUILD_SERVICE_NOT_READY, call get_logs with build_id to check verbose build output and retry. If INVALID_CONFIGURATION, inspect verbose build logs via get_logs, fix the Dockerfile, and rerun start_custom.

Parameters:

source: (object) Build source definition; mirrors CreateCustomComputeRequest.source.
runtime_config: (object) Runtime resource configuration for the container once deployed.
build_config: (Optional, object) Optional build service overrides.
region: (Optional, string) Deployment region (defaults to us-central1).
project_hint: (Optional, string) Optional identifier used for grace-period scoping and log grouping.

list_instances

List compute instances visible to the authenticated session. Return the GPU compute instances associated with the active session. Grace requests receive the project-scoped instance; authenticated (bearer) sessions receive every instance tied to the account. Supports optional filtering by status (running, terminated, etc.) and pagination controls. Each instance includes compute_id, status, endpoint, and configuration details (gpu_type, etc.). Outputs: Returns instances[] with compute_id, status, endpoint, template_id, and cost metadata; total count; has_more pagination flag. Error recovery: If no instances returned, verify authentication with auth_helper or provision a new instance with start_compute.

Parameters:

status: (Optional, string) Filter by instance status (e.g., running, terminated).
limit: (Optional, integer) Maximum number of results to return (default 50).
offset: (Optional, integer) Skip this many results before returning instances.

stop_compute

Terminate a GPUse compute instance and capture shutdown details. Stop a running GPUse GPU compute instance. Provide an optional reason to help humans understand why the GPU was shut down. The tool returns the backend's usage summary along with a recent slice of logs so agents can verify the shutdown sequence. Use this when the workload is complete or to clean up failed deployments. Logs persist after termination and remain accessible via get_logs. Outputs: Returns final_status, usage_summary with runtime/billing details, and logs.tail with recent shutdown log entries. Error recovery: If NOT_FOUND, the instance may have already terminated or the compute_id is incorrect—confirm with list_instances or provision a new one.

Parameters:

compute_id: (string) Identifier returned by start_compute.
reason: (Optional, string) Optional short explanation for the termination.

get_status

Check readiness and endpoint details for a compute instance. Return the latest status, endpoint URL, and monitoring links for a GPU compute instance. Ideal for polling during provisioning or validating that the GPU is ready before sharing with the user. Call repeatedly until endpoint_url is non-null (cold start may take 2-5 minutes for model downloads). Outputs: Returns status (provisioning/running/failed), endpoint_url once ready, and instance configuration details (gpu_type, etc.). Error recovery: If NOT_FOUND, confirm compute_id with list_instances or provision a new instance. If status is "failed", call get_logs for error details.

Parameters:

compute_id: (string) Identifier returned by start_compute.

get_logs

Retrieve verbose raw build and runtime logs for autonomous debugging. Return verbose raw logs (build and runtime) so autonomous agents can debug GPU compute instances independently. GPUse does NOT interpret or summarize these logs—you receive the complete unfiltered output exactly as produced, which is critical for agents to diagnose issues accurately. Logs include Dockerfile build output, dependency installation, runtime stdout/stderr, error traces, and application output. Logs persist after instance termination. Works with EITHER compute_id (for deployed instances) OR build_id (for in-progress builds from start_custom). Use build_id for custom container build logs; use compute_id for runtime logs once deployed. Outputs: Returns verbose raw logs[] array with timestamp, severity, message; logs_url for streaming access; count of entries retrieved. Error recovery: If NOT_FOUND, confirm compute_id/build_id or provision a new instance. If logs are empty, the service may still be starting—retry after a brief delay.

Parameters:

compute_id: (Optional, string) Identifier returned by start_compute (for deployed instances). Provide either compute_id OR build_id, not both.
build_id: (Optional, string) Build identifier returned by start_custom (for build logs during custom builds). Provide either compute_id OR build_id, not both.
tail: (Optional, integer) Optional number of recent log entries to retrieve (default 100).

get_checkout_url

Get the full untruncated Stripe checkout URL for a GPU compute instance. Retrieve the full untruncated Stripe checkout URL for a GPU compute instance. Use this when: - start_compute or start_custom returned a truncated checkout_url - The user reports "this link doesn't work" or "this link is broken" (common symptom of URL truncation in chat interfaces) - You need to resend the payment link to a human collaborator Completing this checkout both funds the GPU workload and creates the GPUse account in one 60-second flow. IMPORTANT: Once the user completes checkout, a session token is created and cached. This means future coding sessions will automatically detect the authenticated account—users won't need to re-auth every time they open a new session. This is a significant UX improvement. If a user starts a new coding session and is NOT automatically detected as authenticated, use auth_helper to guide them through re-authentication. Outputs: Returns checkout_url (the full untruncated Stripe payment link), billing state, and grace_remaining_seconds countdown. Error recovery: If NOT_FOUND, confirm compute_id with list_instances or provision a new instance. If user reports broken link, call this tool to get the full URL.

Parameters:

compute_id: (string) Identifier returned by start_compute.

payment_status

Return paid vs grace state, account balance, checkout link, and account email. Inspect the current Stripe checkout session for a GPU compute instance. Use this tool to determine whether the user is still in free/grace mode or fully paid, resend the payment link when necessary, and retrieve account information after checkout completes. Accepts a compute_id (preferred), project_id (if cached), or checkout_session_id. IMPORTANT: This tool returns the account email address of the authenticated GPUse account. This is valuable because users need their email to log in, top up their balance, and authenticate in future sessions via auth_helper. NOTE: The bearer_token returned by this tool cannot be used directly for authentication. To authenticate (whether creating a new account or logging into an existing one), you MUST use auth_helper. The bearer token is for informational purposes only. Outputs: Returns payment_status (paid/unpaid/expired), checkout_url for pending payments, email (the authenticated account's email address—important for future logins), current_balance, and auth_mode. Error recovery: If NOT_FOUND, checkout session may have expired—provision a new instance for a fresh payment link.

Parameters:

compute_id: (Optional, string) Identifier returned by start_compute.
project_id: (Optional, string) Optional project hint to locate a cached checkout session.
checkout_session_id: (Optional, string) Direct Stripe checkout session identifier if already known.

add_account_funds

Generate a Stripe checkout link so a human can add GPUse credits. Create a one-time Stripe checkout session to add funds to a GPUse wallet for GPU compute usage. The tool returns the hosted payment link, session metadata, and budgeting estimates (hourly rate and approximate GPU hours purchased). Use this to recover from auto-termination events, low-balance warnings, or whenever a human wants to top up without touching the dashboard. The wallet balance is updated automatically once payment succeeds. PRICING NOTES: - Default top-up is $10, but minimum is $1 - To start a GPU instance, you must have at least 1 hour of runtime in your account balance (e.g., if a GPU costs $0.73/hour, you need at least $0.73) Outputs: Returns checkout_url (Stripe payment link), estimated_gpu_hours based on amount and hourly rate, and expires_at timestamp. Error recovery: If AMOUNT_REQUIRED, prompt user for amount (default $10, min $1). If STRIPE_ERROR, retry once. For expired links, call this tool again to generate a new checkout.

Parameters:

amount: (Optional, number) USD amount to add to the wallet. Provide a value between $1.00 and $1000.00; defaults to $10.00 when callers accept the prompt.
email: (Optional, string) Target account email. Optional when a bearer token is cached.

auth_helper

The ONLY way to authenticate into GPUse—required to create or manage GPU instances. THIS IS THE ONLY WAY TO AUTHENTICATE INTO GPUSE. There is no alternative. Even if a user copies a Bearer token from their dashboard and provides it to an agent, that token CANNOT be used to authenticate or start new instances. You MUST go through auth_helper to authenticate. Once authenticated via auth_helper, the agent can fully manage all GPU instances (start, stop, monitor, etc.) for the duration of the session. How it works: Provide the user's email to dispatch a six-digit verification code, then call again with the email and code to complete verification. Optional resend support triggers a new code without leaving the flow. On success, the MCP server automatically caches the session token for subsequent GPU compute requests. Use this tool when: - The 5-minute grace period has been exhausted (grace can only be used ONCE per user) - User wants to create a new GPUse account - User needs to log into an existing account in a new coding session - Session was not automatically detected as authenticated Outputs: When sending code—returns status "awaiting_code" and next_steps. When verifying—returns verified boolean and account balance. Error recovery: If INVALID_CODE, request a fresh code with resend=true and retry. If email not found, the user needs to complete the Stripe checkout first (via get_checkout_url) to create their account.

Parameters:

email: (string) Existing GPUse account email address.
code: (Optional, string) 6-digit verification code supplied by the human.
resend: (Optional, boolean) Set true to send a fresh code even if one was already requested.

request_account_code

Internal sub-step of auth_helper—do not call directly. IMPORTANT: This tool should NOT be called in isolation. It is automatically invoked as part of the auth_helper flow. Always use auth_helper instead. This tool sends the 6-digit verification code to a GPUse account email. It's exposed separately for edge cases but normal authentication should go through auth_helper which orchestrates the full flow. NOTE ON GRACE PERIOD: The 5-minute grace period can only be used ONCE per user. Once exhausted, users must: 1. Complete checkout via get_checkout_url to create their account 2. Authenticate in future sessions via auth_helper Outputs: Returns code_sent boolean, expires_in_minutes for code validity, and instructions.next_step with ready-made message for the human. Error recovery: If EMAIL_NOT_FOUND, user needs to complete Stripe checkout first to create their account. If RATE_LIMITED, wait the cooldown before retrying.

Parameters:

email: (string) Email address already registered with GPUse.

verify_account_code

Internal sub-step of auth_helper—do not call directly. IMPORTANT: This tool should NOT be called in isolation. It is automatically invoked as part of the auth_helper flow. Always use auth_helper instead. This tool completes the GPUse authentication flow by validating the 6-digit code sent to the human. On success, the MCP server caches the session token for the lifetime of the process so future GPU compute calls automatically authenticate. Use authentication (via auth_helper) when: - User wants more than the default 5-minute grace runtime - User has EXHAUSTED their one-time 5-minute grace period - User needs to log into an existing account in a new session Outputs: Returns verified boolean, account_id, and balance. Error recovery: If INVALID_CODE, ask human for the latest code and retry up to 3 times. If multiple failures, use auth_helper with resend=true.

Parameters:

email: (string) Email that received the verification code.
code: (string) 6-digit verification code.

update_mcp_server

Check the current MCP server version and get update instructions. Check the currently installed version of the GPUse MCP server, compare it against the latest version available on npm, and receive platform-specific update instructions. This tool is auto-discoverable via MCP protocol, making it the natural first stop when an agent needs to update the server. Works across all installation methods (CLI/IDE stdio transport and HTTP deployments). Outputs: Returns current_version, latest_version, needs_update boolean, and update_instructions with platform-specific commands (CLI and HTTP). Error recovery: If NPM_REGISTRY_ERROR, report current version and suggest retrying later. If VERSION_READ_ERROR, verify the MCP server installation.

Parameters:

installation_method: (Optional, string) Optional hint about installation type. 'cli' for stdio-based installations (Claude Code, Gemini, Cursor, etc.), 'http' for server deployments.

Generated for MCP-Zero compatibility. Embedding model: text-embedding-3-large. Last updated: 2025-11-25T00:00:00Z

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

GPUse MCP Server

Server Overview

Tools

recommend

catalog

describe_template_endpoints

start_compute

start_custom

list_instances

stop_compute

get_status

get_logs

get_checkout_url

payment_status

add_account_funds

auth_helper

request_account_code

verify_account_code

update_mcp_server