llama-mcp-server

v0.1.1

Published

21 days ago

MCP server bridging Claude Code to local llama.cpp

0High
0Medium
0Low

ahays248

mcp llama.cpp llama claude ai local-llm

llama-mcp-server

MCP server bridging Claude Code to local llama.cpp. Run local LLMs alongside Claude for experimentation, testing, and cost-effective inference.

Requirements

Node.js 18+
llama.cpp with llama-server built
A GGUF model file

Installation

npm install llama-mcp-server

Or clone and build from source:

git clone https://github.com/ahays248/llama-mcp-server
cd llama-mcp-server
npm install
npm run build

Configuration

Configure via environment variables:

| Variable | Description | Default | |----------|-------------|---------| | LLAMA_SERVER_URL | URL of llama-server | http://localhost:8080 | | LLAMA_SERVER_TIMEOUT | Request timeout in ms | 30000 | | LLAMA_MODEL_PATH | Path to GGUF model file | (none) | | LLAMA_SERVER_PATH | Path to llama-server binary | llama-server |

Usage with Claude Code

Option 1: Plugin Installation (Recommended)

Due to a known bug in Claude Code, non-plugin MCP servers may connect but not expose their tools. The workaround is to install llama-mcp-server as a plugin via a local marketplace.

Step 1: Create the marketplace structure

llama-marketplace/
├── .claude-plugin/
│   └── marketplace.json
└── plugins/
    └── llama/
        ├── .claude-plugin/
        │   └── plugin.json
        └── .mcp.json

Step 2: Create marketplace.json

// llama-marketplace/.claude-plugin/marketplace.json
{
  "name": "llama-marketplace",
  "description": "Local marketplace for llama.cpp MCP plugin",
  "owner": {
    "name": "Your Name"
  },
  "plugins": [
    {
      "name": "llama",
      "description": "llama.cpp MCP server for local LLM inference",
      "source": "./plugins/llama"
    }
  ]
}

Step 3: Create plugin.json

// llama-marketplace/plugins/llama/.claude-plugin/plugin.json
{
  "name": "llama",
  "version": "0.1.0",
  "description": "llama.cpp MCP server for local LLM inference"
}

Step 4: Create .mcp.json

// llama-marketplace/plugins/llama/.mcp.json
{
  "mcpServers": {
    "llama": {
      "command": "npx",
      "args": ["-y", "llama-mcp-server"],
      "env": {
        "LLAMA_SERVER_URL": "http://localhost:8080",
        "LLAMA_MODEL_PATH": "/path/to/your/model.gguf",
        "LLAMA_SERVER_PATH": "/path/to/llama-server"
      }
    }
  }
}

Step 5: Install the plugin

# Add the local marketplace
claude plugin marketplace add /path/to/llama-marketplace

# Install the plugin
claude plugin install llama@llama-marketplace

# Restart Claude Code

After restart, tools will appear as mcp__plugin_llama_llama__*.

Option 2: Direct MCP Configuration

Note: This method may not work due to the bug mentioned above. If tools don't appear after adding the server, use Option 1.

Add to your Claude Code MCP configuration:

claude mcp add llama -e LLAMA_SERVER_URL=http://localhost:8080 -e LLAMA_MODEL_PATH=/path/to/model.gguf -e LLAMA_SERVER_PATH=/path/to/llama-server -- npx -y llama-mcp-server

Or add manually to ~/.claude.json:

{
  "mcpServers": {
    "llama": {
      "command": "npx",
      "args": ["-y", "llama-mcp-server"],
      "env": {
        "LLAMA_SERVER_URL": "http://localhost:8080",
        "LLAMA_MODEL_PATH": "/path/to/your/model.gguf",
        "LLAMA_SERVER_PATH": "/path/to/llama-server"
      }
    }
  }
}

Tools

Server Tools

| Tool | Description | |------|-------------| | llama_health | Check if llama-server is running and get status | | llama_props | Get or set server properties | | llama_models | List available/loaded models | | llama_slots | View current slot processing state | | llama_metrics | Get Prometheus-compatible metrics |

Token Tools

| Tool | Description | |------|-------------| | llama_tokenize | Convert text to token IDs | | llama_detokenize | Convert token IDs back to text | | llama_apply_template | Format chat messages using model's template |

Inference Tools

| Tool | Description | |------|-------------| | llama_complete | Generate text completion from a prompt | | llama_chat | Chat completion (OpenAI-compatible) | | llama_embed | Generate embeddings for text | | llama_infill | Code completion with prefix and suffix context | | llama_rerank | Rerank documents by relevance to a query |

Model Management Tools

| Tool | Description | |------|-------------| | llama_load_model | Load a model (router mode only) | | llama_unload_model | Unload the current model (router mode only) |

LoRA Tools

| Tool | Description | |------|-------------| | llama_lora_list | List loaded LoRA adapters | | llama_lora_set | Set LoRA adapter scales |

Process Control Tools

| Tool | Description | |------|-------------| | llama_start | Start llama-server as a child process | | llama_stop | Stop the llama-server process |

Example: Starting llama-server and Running Inference

User: Start llama-server with my local model

Claude: I'll start llama-server for you.
[Uses llama_start tool with model path]

User: Generate a haiku about coding

Claude: Let me use the local model for that.
[Uses llama_complete tool]

Result:
Lines of code cascade
Through the silent morning hours
Bugs flee from the light

Development

# Run tests
npm test

# Type check
npm run typecheck

# Build
npm run build

# Watch mode for development
npm run dev

Troubleshooting

Tools don't appear in Claude Code

Symptom: Server shows "Connected" in claude mcp list but no llama_* tools are available.

Cause: Known bug in Claude Code where non-plugin MCP servers don't expose tools (#12164).

Solution: Use the plugin installation method (Option 1 above).

HTTP 501 errors for certain tools

Some tools require specific server configurations:

| Tool | Requirement | |------|-------------| | llama_metrics | Start llama-server with --metrics flag | | llama_embed | Start llama-server with --embedding flag or use an embedding model | | llama_infill | Use a model with fill-in-middle support (e.g., CodeLlama, DeepSeek Coder) | | llama_rerank | Use a reranker model | | llama_load_model / llama_unload_model | llama-server must be in router mode |

Connection refused errors

Symptom: Cannot connect to llama-server at http://localhost:8080

Solutions:

Use llama_start to start the server, or
Start llama-server manually: llama-server -m /path/to/model.gguf
Check that LLAMA_SERVER_URL matches where llama-server is running

WSL/Windows path issues

When running in WSL, ensure paths use Linux format:

✓ /home/user/models/model.gguf
✗ C:\Users\user\models\model.gguf

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

llama-mcp-server

Requirements

Installation

Configuration

Usage with Claude Code

Option 1: Plugin Installation (Recommended)

Option 2: Direct MCP Configuration

Tools

Server Tools

Token Tools

Inference Tools

Model Management Tools

LoRA Tools

Process Control Tools

Example: Starting llama-server and Running Inference

Development

Troubleshooting

Tools don't appear in Claude Code

HTTP 501 errors for certain tools

Connection refused errors

WSL/Windows path issues

License