npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@infatoshi/gpu-mcp-server

v1.0.2

Published

Pure Node.js Model Context Protocol server for GPU CUDA compilation via Modal. No Python dependency - just install and run CUDA code on Modal GPUs.

Readme

gpu-mcp

An MCP (Model Context Protocol) server that lets you compile and run CUDA code on Modal's B200 GPUs directly from your editor (Cursor, VS Code, etc.). No local GPU setup required - just write CUDA code and run it on cloud GPUs instantly.

Repository layout

  • modal_app.py: Modal application defining the compile_and_run GPU function and its build cache.
  • mcp_modal_cuda_server.py: FastMCP server that exposes Modal workflows as MCP tools.
  • .cursor/mcp.json: Example client configuration that wires the MCP server into Cursor via uv run.
  • PROVE.md: Required change log capturing rationale, patches, and validation notes.

Prerequisites

  • Modal account with GPU access and the Modal CLI authenticated on your machine.
  • Python 3.12 with uv installed for environment management.
  • modal and mcp[cli] packages installed inside the uv-managed virtual environment.
  • Optional: CUDA project sources you want to compile and run through the service.

Quickstart

1. Setup (One-time)

# Install Python dependencies
uv venv
uv pip install modal mcp fastmcp

# Deploy to Modal (already done for this repo)
modal deploy modal_app.py

# Make Node.js wrapper executable
chmod +x index.js

2. Test It

# Test the MCP server works
python test_mcp.py

# Test the NPX wrapper
python test_npx.py

# Test vector addition example (1M elements on B200 GPU)
python test_vector_add.py

3. Use with Cursor

Add to ~/.cursor/mcp.json (replace path with your actual project location):

{
  "mcpServers": {
    "gpu-cuda": {
      "command": "node",
      "args": ["/Users/infatoshi/cuda/gpu-mcp/index.js"],
      "env": {
        "PROJECT_ROOT": "/Users/infatoshi/cuda/gpu-mcp",
        "MODAL_APP": "cuda-kernel-runner",
        "MODAL_GPU": "B200"
      }
    }
  }
}

Important: Use the absolute path to index.js in the args field.

Test the config works:

python test_cursor_config.py

Restart Cursor and the GPU tools will be available!

Having issues? See CURSOR_SETUP.md for detailed setup guide and troubleshooting.

Usage Guide

Available MCP Tools

  1. cuda_compile_run - Compile and run CUDA code on Modal B200 GPU

    • Parameters: files_glob, entry, arch (default: "sm_100"), extra_nvcc_flags, run_args
    • Example: {"files_glob": "src/**/*.cu", "entry": "src/main.cu", "arch": "sm_100"}
  2. cuda_template - Generate starter CUDA template

    • Parameters: name (default: "vadd")
    • Example: {"name": "my_kernel"}
  3. env_list_images - List available CUDA base images

    • No parameters required
  4. env_set_default - Set default CUDA version and GPU type

    • Parameters: cuda_version (default: "12.8.1"), gpu (default: "B200")
    • Example: {"cuda_version": "12.8.1", "gpu": "B200"}

Common Operations

# Test vector addition with 1M elements
python test_vector_add.py

# Run direct MCP server test
python test_mcp.py

# Verify Cursor configuration
python test_cursor_config.py

Examples

  • test/hello.cu - Simple kernel printing from GPU threads
  • examples/vector_add.cu - Optimized vector addition (128 GB/s on B200)
  • examples/matrix_mul.cu - Tiled matrix multiplication with shared memory

Architecture Support

  • B200 (Blackwell): sm_100 (default)
  • H200/H100 (Hopper): sm_90
  • A100: sm_80

MCP tools

  • env_list_images: Returns recommended CUDA base image tags that already include nvcc.
  • env_set_default(cuda_version, gpu): Sets process-local defaults for CUDA version and Modal GPU type.
  • cuda_template(name): Generates a starter CUDA source file you can write to disk before compiling.
  • cuda_compile_run(...): Packages matching project files, uploads them to Modal, builds with nvcc or CMake, and executes the resulting binary.

Each tool returns JSON data, so clients should parse responses instead of treating them as plain text.

Modal function contract

The compile_and_run Modal function expects a request like:

{
  "files": [{"path": "src/main.cu", "content": "__global__ void ..."}],
  "entry": "src/main.cu",
  "build_system": "nvcc",
  "nvcc": {
    "arch": "sm_100",
    "gencode": ["arch=compute_100,code=sm_100"],
    "flags": ["-O3", "--use_fast_math"]
  },
  "run": {"cmd": "./a.out", "args": ["--n", "1048576"]},
  "workdir": ""
}

The server hashes file content plus nvcc flags to seed a persistent Modal volume (cuda-build-cache) so repeated builds are instantaneous. When build_system is set to cmake, the function runs cmake -S . -B build followed by cmake --build build -j before execution.

Configuration

Environment variables override defaults for both the MCP server and the Modal function:

  • PROJECT_ROOT: Filesystem root the MCP server is allowed to read (defaults to the current working directory).
  • MODAL_APP: Name of the deployed Modal application (default cuda-kernel-runner).
  • CUDA_VERSION: CUDA toolkit tag used for the underlying Modal image.
  • CUDA_FLAVOR: Base image flavor, default devel-ubuntu24.04.
  • MODAL_GPU: Requested Modal GPU type (B200 by default, but H200, H100, A100-80GB, etc. are valid).
  • TIMEOUT_S, MIN_CONTAINERS, SCALEDOWN_WINDOW: Modal scheduling knobs for long-running kernels and autoscaling.

Set these variables in your shell or via your MCP client configuration to keep runtime behavior hot-swappable without code edits.

What Works Now

MCP Server: Fully functional stdio-based server
Modal Integration: Deployed and running on B200 GPUs
NPX Support: Node.js wrapper for easy Cursor integration
CUDA Compilation: nvcc with sm_100 (Blackwell) support
Build Caching: Instant re-runs with content-based caching
Examples: hello.cu, vector_add.cu, matrix_mul.cu all tested
Performance: 128 GB/s memory bandwidth on vector operations

Examples Included

  • test/hello.cu: Simple kernel printing from GPU threads
  • examples/vector_add.cu: Optimized vector addition with performance metrics
  • examples/matrix_mul.cu: Tiled matrix multiplication with shared memory

All examples tested and verified working on Modal B200 GPU.

Development notes

  • Keep pure logic in libraries and surface I/O only through MCP tools (mcp_modal_cuda_server.py orchestrates Modal calls).
  • Extend configuration with YAML files under config/ if you need additional runtime toggles; avoid hardcoding values into scripts.
  • Document every change in PROVE.md, including the diff and validation steps, to maintain provenance across deployments.

Performance Tips

# Use optimization flags for faster code
extra_nvcc_flags=["-O3", "--use_fast_math"]

# Enable PTX for forward compatibility
use_default_gencode=true

Build caching is automatic - repeated runs with the same source are instant.

Complete Setup Guide

Initial Setup

# Install dependencies
uv venv
uv pip install modal mcp fastmcp

# Deploy to Modal (one-time, already done)
modal deploy modal_app.py

# Make scripts executable
chmod +x index.js

Verification

Run the complete verification suite:

./verify_all.sh

Expected output:

✓ Python environment exists
✓ All Python dependencies installed
✓ MCP server test passed
✓ NPX wrapper test passed
✓ Vector add test passed (128 GB/s on B200)
✓ All example files present
✓ Cursor config test passed
✓ All verification checks passed!

Cursor Integration

Complete Configuration

Add to ~/.cursor/mcp.json (replace with your actual project path):

{
  "mcpServers": {
    "gpu-cuda": {
      "command": "node",
      "args": ["/Users/infatoshi/cuda/gpu-mcp/index.js"],
      "env": {
        "PROJECT_ROOT": "/Users/infatoshi/cuda/gpu-mcp",
        "MODAL_APP": "cuda-kernel-runner",
        "MODAL_GPU": "B200"
      }
    }
  }
}

Critical rules:

  • Use absolute paths in both args and PROJECT_ROOT
  • NEVER use cwd - it causes "Cannot find module" errors
  • Replace /Users/infatoshi/cuda/gpu-mcp with your path

Test Configuration

python test_cursor_config.py

Expected:

✓ Server started successfully: modal-cuda
✓ Cursor MCP config is correct!

Troubleshooting

Error: "Cannot find module"

  • Cause: Using relative path or cwd in config
  • Fix: Use absolute path as shown above

Error: "No server info found"

  • Cause: Server failed to start
  • Fix: Check config with python test_cursor_config.py

Error: "Virtual environment not found"

  • Fix: Run uv venv && uv pip install modal mcp fastmcp

Live Test Results

Vector Addition (1M elements on B200)

Status: ✅ SUCCESS
GPU: B200 (sm_100)
Elements: 1,048,576
Time: 0.098 ms
Bandwidth: 128 GB/s
Verification: All results correct

Hello World Kernel

Status: ✅ SUCCESS
GPU: B200 (sm_100)
Threads: 8 (2 blocks × 4 threads)
Output: All threads printed correctly

Project Status

Component              Status    Notes
──────────────────────────────────────────────────
Python Environment     ✅        uv venv created
Dependencies          ✅        modal, mcp, fastmcp installed
Modal Deployment      ✅        cuda-kernel-runner deployed
MCP Server           ✅        stdio transport working
Node.js Wrapper      ✅        ESM, NPX compatible
CUDA Examples        ✅        hello, vector_add, matmul tested
B200 GPU Access      ✅        sm_100 compilation working
Build Caching        ✅        Content-based hashing active
Cursor Integration   ✅        Config provided
Tests                ✅        All passing

Status: ✅ PRODUCTION READY