@infatoshi/gpu-mcp-server

v1.0.2

Published

3 months ago

Pure Node.js Model Context Protocol server for GPU CUDA compilation via Modal. No Python dependency - just install and run CUDA code on Modal GPUs.

0High
0Medium
0Low

infatoshi

mcp gpu cuda modal stdio node.js

gpu-mcp

An MCP (Model Context Protocol) server that lets you compile and run CUDA code on Modal's B200 GPUs directly from your editor (Cursor, VS Code, etc.). No local GPU setup required - just write CUDA code and run it on cloud GPUs instantly.

Repository layout

modal_app.py: Modal application defining the compile_and_run GPU function and its build cache.
mcp_modal_cuda_server.py: FastMCP server that exposes Modal workflows as MCP tools.
.cursor/mcp.json: Example client configuration that wires the MCP server into Cursor via uv run.
PROVE.md: Required change log capturing rationale, patches, and validation notes.

Prerequisites

Modal account with GPU access and the Modal CLI authenticated on your machine.
Python 3.12 with uv installed for environment management.
modal and mcp[cli] packages installed inside the uv-managed virtual environment.
Optional: CUDA project sources you want to compile and run through the service.

Quickstart

1. Setup (One-time)

# Install Python dependencies
uv venv
uv pip install modal mcp fastmcp

# Deploy to Modal (already done for this repo)
modal deploy modal_app.py

# Make Node.js wrapper executable
chmod +x index.js

2. Test It

# Test the MCP server works
python test_mcp.py

# Test the NPX wrapper
python test_npx.py

# Test vector addition example (1M elements on B200 GPU)
python test_vector_add.py

3. Use with Cursor

Add to ~/.cursor/mcp.json (replace path with your actual project location):

{
  "mcpServers": {
    "gpu-cuda": {
      "command": "node",
      "args": ["/Users/infatoshi/cuda/gpu-mcp/index.js"],
      "env": {
        "PROJECT_ROOT": "/Users/infatoshi/cuda/gpu-mcp",
        "MODAL_APP": "cuda-kernel-runner",
        "MODAL_GPU": "B200"
      }
    }
  }
}

Important: Use the absolute path to index.js in the args field.

Test the config works:

python test_cursor_config.py

Restart Cursor and the GPU tools will be available!

Having issues? See CURSOR_SETUP.md for detailed setup guide and troubleshooting.

Usage Guide

Available MCP Tools

cuda_compile_run - Compile and run CUDA code on Modal B200 GPU
- Parameters: files_glob, entry, arch (default: "sm_100"), extra_nvcc_flags, run_args
- Example: {"files_glob": "src/**/*.cu", "entry": "src/main.cu", "arch": "sm_100"}
cuda_template - Generate starter CUDA template
- Parameters: name (default: "vadd")
- Example: {"name": "my_kernel"}
env_list_images - List available CUDA base images
- No parameters required
env_set_default - Set default CUDA version and GPU type
- Parameters: cuda_version (default: "12.8.1"), gpu (default: "B200")
- Example: {"cuda_version": "12.8.1", "gpu": "B200"}

Common Operations

# Test vector addition with 1M elements
python test_vector_add.py

# Run direct MCP server test
python test_mcp.py

# Verify Cursor configuration
python test_cursor_config.py

Examples

test/hello.cu - Simple kernel printing from GPU threads
examples/vector_add.cu - Optimized vector addition (128 GB/s on B200)
examples/matrix_mul.cu - Tiled matrix multiplication with shared memory

Architecture Support

B200 (Blackwell): sm_100 (default)
H200/H100 (Hopper): sm_90
A100: sm_80

MCP tools

env_list_images: Returns recommended CUDA base image tags that already include nvcc.
env_set_default(cuda_version, gpu): Sets process-local defaults for CUDA version and Modal GPU type.
cuda_template(name): Generates a starter CUDA source file you can write to disk before compiling.
cuda_compile_run(...): Packages matching project files, uploads them to Modal, builds with nvcc or CMake, and executes the resulting binary.

Each tool returns JSON data, so clients should parse responses instead of treating them as plain text.

Modal function contract

The compile_and_run Modal function expects a request like:

{
  "files": [{"path": "src/main.cu", "content": "__global__ void ..."}],
  "entry": "src/main.cu",
  "build_system": "nvcc",
  "nvcc": {
    "arch": "sm_100",
    "gencode": ["arch=compute_100,code=sm_100"],
    "flags": ["-O3", "--use_fast_math"]
  },
  "run": {"cmd": "./a.out", "args": ["--n", "1048576"]},
  "workdir": ""
}

The server hashes file content plus nvcc flags to seed a persistent Modal volume (cuda-build-cache) so repeated builds are instantaneous. When build_system is set to cmake, the function runs cmake -S . -B build followed by cmake --build build -j before execution.

Configuration

Environment variables override defaults for both the MCP server and the Modal function:

PROJECT_ROOT: Filesystem root the MCP server is allowed to read (defaults to the current working directory).
MODAL_APP: Name of the deployed Modal application (default cuda-kernel-runner).
CUDA_VERSION: CUDA toolkit tag used for the underlying Modal image.
CUDA_FLAVOR: Base image flavor, default devel-ubuntu24.04.
MODAL_GPU: Requested Modal GPU type (B200 by default, but H200, H100, A100-80GB, etc. are valid).
TIMEOUT_S, MIN_CONTAINERS, SCALEDOWN_WINDOW: Modal scheduling knobs for long-running kernels and autoscaling.

Set these variables in your shell or via your MCP client configuration to keep runtime behavior hot-swappable without code edits.

What Works Now

✅ MCP Server: Fully functional stdio-based server
✅ Modal Integration: Deployed and running on B200 GPUs
✅ NPX Support: Node.js wrapper for easy Cursor integration
✅ CUDA Compilation: nvcc with sm_100 (Blackwell) support
✅ Build Caching: Instant re-runs with content-based caching
✅ Examples: hello.cu, vector_add.cu, matrix_mul.cu all tested
✅ Performance: 128 GB/s memory bandwidth on vector operations

Examples Included

test/hello.cu: Simple kernel printing from GPU threads
examples/vector_add.cu: Optimized vector addition with performance metrics
examples/matrix_mul.cu: Tiled matrix multiplication with shared memory

All examples tested and verified working on Modal B200 GPU.

Development notes

Keep pure logic in libraries and surface I/O only through MCP tools (mcp_modal_cuda_server.py orchestrates Modal calls).
Extend configuration with YAML files under config/ if you need additional runtime toggles; avoid hardcoding values into scripts.
Document every change in PROVE.md, including the diff and validation steps, to maintain provenance across deployments.

Performance Tips

# Use optimization flags for faster code
extra_nvcc_flags=["-O3", "--use_fast_math"]

# Enable PTX for forward compatibility
use_default_gencode=true

Build caching is automatic - repeated runs with the same source are instant.

Complete Setup Guide

Initial Setup

# Install dependencies
uv venv
uv pip install modal mcp fastmcp

# Deploy to Modal (one-time, already done)
modal deploy modal_app.py

# Make scripts executable
chmod +x index.js

Verification

Run the complete verification suite:

./verify_all.sh

Expected output:

✓ Python environment exists
✓ All Python dependencies installed
✓ MCP server test passed
✓ NPX wrapper test passed
✓ Vector add test passed (128 GB/s on B200)
✓ All example files present
✓ Cursor config test passed
✓ All verification checks passed!

Cursor Integration

Complete Configuration

Add to ~/.cursor/mcp.json (replace with your actual project path):

{
  "mcpServers": {
    "gpu-cuda": {
      "command": "node",
      "args": ["/Users/infatoshi/cuda/gpu-mcp/index.js"],
      "env": {
        "PROJECT_ROOT": "/Users/infatoshi/cuda/gpu-mcp",
        "MODAL_APP": "cuda-kernel-runner",
        "MODAL_GPU": "B200"
      }
    }
  }
}

Critical rules:

Use absolute paths in both args and PROJECT_ROOT
NEVER use cwd - it causes "Cannot find module" errors
Replace /Users/infatoshi/cuda/gpu-mcp with your path

Test Configuration

python test_cursor_config.py

Expected:

✓ Server started successfully: modal-cuda
✓ Cursor MCP config is correct!

Troubleshooting

Error: "Cannot find module"

Cause: Using relative path or cwd in config
Fix: Use absolute path as shown above

Error: "No server info found"

Cause: Server failed to start
Fix: Check config with python test_cursor_config.py

Error: "Virtual environment not found"

Fix: Run uv venv && uv pip install modal mcp fastmcp

Live Test Results

Vector Addition (1M elements on B200)

Status: ✅ SUCCESS
GPU: B200 (sm_100)
Elements: 1,048,576
Time: 0.098 ms
Bandwidth: 128 GB/s
Verification: All results correct

Hello World Kernel

Status: ✅ SUCCESS
GPU: B200 (sm_100)
Threads: 8 (2 blocks × 4 threads)
Output: All threads printed correctly

Project Status

Component              Status    Notes
──────────────────────────────────────────────────
Python Environment     ✅        uv venv created
Dependencies          ✅        modal, mcp, fastmcp installed
Modal Deployment      ✅        cuda-kernel-runner deployed
MCP Server           ✅        stdio transport working
Node.js Wrapper      ✅        ESM, NPX compatible
CUDA Examples        ✅        hello, vector_add, matmul tested
B200 GPU Access      ✅        sm_100 compilation working
Build Caching        ✅        Content-based hashing active
Cursor Integration   ✅        Config provided
Tests                ✅        All passing

Status: ✅ PRODUCTION READY