@infatoshi/gpu-mcp-server
v1.0.2
Published
Pure Node.js Model Context Protocol server for GPU CUDA compilation via Modal. No Python dependency - just install and run CUDA code on Modal GPUs.
Maintainers
Readme
gpu-mcp
An MCP (Model Context Protocol) server that lets you compile and run CUDA code on Modal's B200 GPUs directly from your editor (Cursor, VS Code, etc.). No local GPU setup required - just write CUDA code and run it on cloud GPUs instantly.
Repository layout
modal_app.py: Modal application defining thecompile_and_runGPU function and its build cache.mcp_modal_cuda_server.py: FastMCP server that exposes Modal workflows as MCP tools..cursor/mcp.json: Example client configuration that wires the MCP server into Cursor viauv run.PROVE.md: Required change log capturing rationale, patches, and validation notes.
Prerequisites
- Modal account with GPU access and the Modal CLI authenticated on your machine.
- Python 3.12 with uv installed for environment management.
modalandmcp[cli]packages installed inside the uv-managed virtual environment.- Optional: CUDA project sources you want to compile and run through the service.
Quickstart
1. Setup (One-time)
# Install Python dependencies
uv venv
uv pip install modal mcp fastmcp
# Deploy to Modal (already done for this repo)
modal deploy modal_app.py
# Make Node.js wrapper executable
chmod +x index.js2. Test It
# Test the MCP server works
python test_mcp.py
# Test the NPX wrapper
python test_npx.py
# Test vector addition example (1M elements on B200 GPU)
python test_vector_add.py3. Use with Cursor
Add to ~/.cursor/mcp.json (replace path with your actual project location):
{
"mcpServers": {
"gpu-cuda": {
"command": "node",
"args": ["/Users/infatoshi/cuda/gpu-mcp/index.js"],
"env": {
"PROJECT_ROOT": "/Users/infatoshi/cuda/gpu-mcp",
"MODAL_APP": "cuda-kernel-runner",
"MODAL_GPU": "B200"
}
}
}
}Important: Use the absolute path to index.js in the args field.
Test the config works:
python test_cursor_config.pyRestart Cursor and the GPU tools will be available!
Having issues? See CURSOR_SETUP.md for detailed setup guide and troubleshooting.
Usage Guide
Available MCP Tools
cuda_compile_run- Compile and run CUDA code on Modal B200 GPU- Parameters:
files_glob,entry,arch(default: "sm_100"),extra_nvcc_flags,run_args - Example:
{"files_glob": "src/**/*.cu", "entry": "src/main.cu", "arch": "sm_100"}
- Parameters:
cuda_template- Generate starter CUDA template- Parameters:
name(default: "vadd") - Example:
{"name": "my_kernel"}
- Parameters:
env_list_images- List available CUDA base images- No parameters required
env_set_default- Set default CUDA version and GPU type- Parameters:
cuda_version(default: "12.8.1"),gpu(default: "B200") - Example:
{"cuda_version": "12.8.1", "gpu": "B200"}
- Parameters:
Common Operations
# Test vector addition with 1M elements
python test_vector_add.py
# Run direct MCP server test
python test_mcp.py
# Verify Cursor configuration
python test_cursor_config.pyExamples
test/hello.cu- Simple kernel printing from GPU threadsexamples/vector_add.cu- Optimized vector addition (128 GB/s on B200)examples/matrix_mul.cu- Tiled matrix multiplication with shared memory
Architecture Support
- B200 (Blackwell):
sm_100(default) - H200/H100 (Hopper):
sm_90 - A100:
sm_80
MCP tools
env_list_images: Returns recommended CUDA base image tags that already include nvcc.env_set_default(cuda_version, gpu): Sets process-local defaults for CUDA version and Modal GPU type.cuda_template(name): Generates a starter CUDA source file you can write to disk before compiling.cuda_compile_run(...): Packages matching project files, uploads them to Modal, builds with nvcc or CMake, and executes the resulting binary.
Each tool returns JSON data, so clients should parse responses instead of treating them as plain text.
Modal function contract
The compile_and_run Modal function expects a request like:
{
"files": [{"path": "src/main.cu", "content": "__global__ void ..."}],
"entry": "src/main.cu",
"build_system": "nvcc",
"nvcc": {
"arch": "sm_100",
"gencode": ["arch=compute_100,code=sm_100"],
"flags": ["-O3", "--use_fast_math"]
},
"run": {"cmd": "./a.out", "args": ["--n", "1048576"]},
"workdir": ""
}The server hashes file content plus nvcc flags to seed a persistent Modal volume (cuda-build-cache) so repeated builds are instantaneous. When build_system is set to cmake, the function runs cmake -S . -B build followed by cmake --build build -j before execution.
Configuration
Environment variables override defaults for both the MCP server and the Modal function:
PROJECT_ROOT: Filesystem root the MCP server is allowed to read (defaults to the current working directory).MODAL_APP: Name of the deployed Modal application (defaultcuda-kernel-runner).CUDA_VERSION: CUDA toolkit tag used for the underlying Modal image.CUDA_FLAVOR: Base image flavor, defaultdevel-ubuntu24.04.MODAL_GPU: Requested Modal GPU type (B200 by default, but H200, H100, A100-80GB, etc. are valid).TIMEOUT_S,MIN_CONTAINERS,SCALEDOWN_WINDOW: Modal scheduling knobs for long-running kernels and autoscaling.
Set these variables in your shell or via your MCP client configuration to keep runtime behavior hot-swappable without code edits.
What Works Now
✅ MCP Server: Fully functional stdio-based server
✅ Modal Integration: Deployed and running on B200 GPUs
✅ NPX Support: Node.js wrapper for easy Cursor integration
✅ CUDA Compilation: nvcc with sm_100 (Blackwell) support
✅ Build Caching: Instant re-runs with content-based caching
✅ Examples: hello.cu, vector_add.cu, matrix_mul.cu all tested
✅ Performance: 128 GB/s memory bandwidth on vector operations
Examples Included
test/hello.cu: Simple kernel printing from GPU threadsexamples/vector_add.cu: Optimized vector addition with performance metricsexamples/matrix_mul.cu: Tiled matrix multiplication with shared memory
All examples tested and verified working on Modal B200 GPU.
Development notes
- Keep pure logic in libraries and surface I/O only through MCP tools (
mcp_modal_cuda_server.pyorchestrates Modal calls). - Extend configuration with YAML files under
config/if you need additional runtime toggles; avoid hardcoding values into scripts. - Document every change in
PROVE.md, including the diff and validation steps, to maintain provenance across deployments.
Performance Tips
# Use optimization flags for faster code
extra_nvcc_flags=["-O3", "--use_fast_math"]
# Enable PTX for forward compatibility
use_default_gencode=trueBuild caching is automatic - repeated runs with the same source are instant.
Complete Setup Guide
Initial Setup
# Install dependencies
uv venv
uv pip install modal mcp fastmcp
# Deploy to Modal (one-time, already done)
modal deploy modal_app.py
# Make scripts executable
chmod +x index.jsVerification
Run the complete verification suite:
./verify_all.shExpected output:
✓ Python environment exists
✓ All Python dependencies installed
✓ MCP server test passed
✓ NPX wrapper test passed
✓ Vector add test passed (128 GB/s on B200)
✓ All example files present
✓ Cursor config test passed
✓ All verification checks passed!Cursor Integration
Complete Configuration
Add to ~/.cursor/mcp.json (replace with your actual project path):
{
"mcpServers": {
"gpu-cuda": {
"command": "node",
"args": ["/Users/infatoshi/cuda/gpu-mcp/index.js"],
"env": {
"PROJECT_ROOT": "/Users/infatoshi/cuda/gpu-mcp",
"MODAL_APP": "cuda-kernel-runner",
"MODAL_GPU": "B200"
}
}
}
}Critical rules:
- Use absolute paths in both
argsandPROJECT_ROOT - NEVER use
cwd- it causes "Cannot find module" errors - Replace
/Users/infatoshi/cuda/gpu-mcpwith your path
Test Configuration
python test_cursor_config.pyExpected:
✓ Server started successfully: modal-cuda
✓ Cursor MCP config is correct!Troubleshooting
Error: "Cannot find module"
- Cause: Using relative path or
cwdin config - Fix: Use absolute path as shown above
Error: "No server info found"
- Cause: Server failed to start
- Fix: Check config with
python test_cursor_config.py
Error: "Virtual environment not found"
- Fix: Run
uv venv && uv pip install modal mcp fastmcp
Live Test Results
Vector Addition (1M elements on B200)
Status: ✅ SUCCESS
GPU: B200 (sm_100)
Elements: 1,048,576
Time: 0.098 ms
Bandwidth: 128 GB/s
Verification: All results correctHello World Kernel
Status: ✅ SUCCESS
GPU: B200 (sm_100)
Threads: 8 (2 blocks × 4 threads)
Output: All threads printed correctlyProject Status
Component Status Notes
──────────────────────────────────────────────────
Python Environment ✅ uv venv created
Dependencies ✅ modal, mcp, fastmcp installed
Modal Deployment ✅ cuda-kernel-runner deployed
MCP Server ✅ stdio transport working
Node.js Wrapper ✅ ESM, NPX compatible
CUDA Examples ✅ hello, vector_add, matmul tested
B200 GPU Access ✅ sm_100 compilation working
Build Caching ✅ Content-based hashing active
Cursor Integration ✅ Config provided
Tests ✅ All passingStatus: ✅ PRODUCTION READY
