llama-mcp-server
v0.1.1
Published
MCP server bridging Claude Code to local llama.cpp
Maintainers
Readme
llama-mcp-server
MCP server bridging Claude Code to local llama.cpp. Run local LLMs alongside Claude for experimentation, testing, and cost-effective inference.
Requirements
- Node.js 18+
- llama.cpp with
llama-serverbuilt - A GGUF model file
Installation
npm install llama-mcp-serverOr clone and build from source:
git clone https://github.com/ahays248/llama-mcp-server
cd llama-mcp-server
npm install
npm run buildConfiguration
Configure via environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| LLAMA_SERVER_URL | URL of llama-server | http://localhost:8080 |
| LLAMA_SERVER_TIMEOUT | Request timeout in ms | 30000 |
| LLAMA_MODEL_PATH | Path to GGUF model file | (none) |
| LLAMA_SERVER_PATH | Path to llama-server binary | llama-server |
Usage with Claude Code
Option 1: Plugin Installation (Recommended)
Due to a known bug in Claude Code, non-plugin MCP servers may connect but not expose their tools. The workaround is to install llama-mcp-server as a plugin via a local marketplace.
Step 1: Create the marketplace structure
llama-marketplace/
├── .claude-plugin/
│ └── marketplace.json
└── plugins/
└── llama/
├── .claude-plugin/
│ └── plugin.json
└── .mcp.jsonStep 2: Create marketplace.json
// llama-marketplace/.claude-plugin/marketplace.json
{
"name": "llama-marketplace",
"description": "Local marketplace for llama.cpp MCP plugin",
"owner": {
"name": "Your Name"
},
"plugins": [
{
"name": "llama",
"description": "llama.cpp MCP server for local LLM inference",
"source": "./plugins/llama"
}
]
}Step 3: Create plugin.json
// llama-marketplace/plugins/llama/.claude-plugin/plugin.json
{
"name": "llama",
"version": "0.1.0",
"description": "llama.cpp MCP server for local LLM inference"
}Step 4: Create .mcp.json
// llama-marketplace/plugins/llama/.mcp.json
{
"mcpServers": {
"llama": {
"command": "npx",
"args": ["-y", "llama-mcp-server"],
"env": {
"LLAMA_SERVER_URL": "http://localhost:8080",
"LLAMA_MODEL_PATH": "/path/to/your/model.gguf",
"LLAMA_SERVER_PATH": "/path/to/llama-server"
}
}
}
}Step 5: Install the plugin
# Add the local marketplace
claude plugin marketplace add /path/to/llama-marketplace
# Install the plugin
claude plugin install llama@llama-marketplace
# Restart Claude CodeAfter restart, tools will appear as mcp__plugin_llama_llama__*.
Option 2: Direct MCP Configuration
Note: This method may not work due to the bug mentioned above. If tools don't appear after adding the server, use Option 1.
Add to your Claude Code MCP configuration:
claude mcp add llama -e LLAMA_SERVER_URL=http://localhost:8080 -e LLAMA_MODEL_PATH=/path/to/model.gguf -e LLAMA_SERVER_PATH=/path/to/llama-server -- npx -y llama-mcp-serverOr add manually to ~/.claude.json:
{
"mcpServers": {
"llama": {
"command": "npx",
"args": ["-y", "llama-mcp-server"],
"env": {
"LLAMA_SERVER_URL": "http://localhost:8080",
"LLAMA_MODEL_PATH": "/path/to/your/model.gguf",
"LLAMA_SERVER_PATH": "/path/to/llama-server"
}
}
}
}Tools
Server Tools
| Tool | Description |
|------|-------------|
| llama_health | Check if llama-server is running and get status |
| llama_props | Get or set server properties |
| llama_models | List available/loaded models |
| llama_slots | View current slot processing state |
| llama_metrics | Get Prometheus-compatible metrics |
Token Tools
| Tool | Description |
|------|-------------|
| llama_tokenize | Convert text to token IDs |
| llama_detokenize | Convert token IDs back to text |
| llama_apply_template | Format chat messages using model's template |
Inference Tools
| Tool | Description |
|------|-------------|
| llama_complete | Generate text completion from a prompt |
| llama_chat | Chat completion (OpenAI-compatible) |
| llama_embed | Generate embeddings for text |
| llama_infill | Code completion with prefix and suffix context |
| llama_rerank | Rerank documents by relevance to a query |
Model Management Tools
| Tool | Description |
|------|-------------|
| llama_load_model | Load a model (router mode only) |
| llama_unload_model | Unload the current model (router mode only) |
LoRA Tools
| Tool | Description |
|------|-------------|
| llama_lora_list | List loaded LoRA adapters |
| llama_lora_set | Set LoRA adapter scales |
Process Control Tools
| Tool | Description |
|------|-------------|
| llama_start | Start llama-server as a child process |
| llama_stop | Stop the llama-server process |
Example: Starting llama-server and Running Inference
User: Start llama-server with my local model
Claude: I'll start llama-server for you.
[Uses llama_start tool with model path]
User: Generate a haiku about coding
Claude: Let me use the local model for that.
[Uses llama_complete tool]
Result:
Lines of code cascade
Through the silent morning hours
Bugs flee from the lightDevelopment
# Run tests
npm test
# Type check
npm run typecheck
# Build
npm run build
# Watch mode for development
npm run devTroubleshooting
Tools don't appear in Claude Code
Symptom: Server shows "Connected" in claude mcp list but no llama_* tools are available.
Cause: Known bug in Claude Code where non-plugin MCP servers don't expose tools (#12164).
Solution: Use the plugin installation method (Option 1 above).
HTTP 501 errors for certain tools
Some tools require specific server configurations:
| Tool | Requirement |
|------|-------------|
| llama_metrics | Start llama-server with --metrics flag |
| llama_embed | Start llama-server with --embedding flag or use an embedding model |
| llama_infill | Use a model with fill-in-middle support (e.g., CodeLlama, DeepSeek Coder) |
| llama_rerank | Use a reranker model |
| llama_load_model / llama_unload_model | llama-server must be in router mode |
Connection refused errors
Symptom: Cannot connect to llama-server at http://localhost:8080
Solutions:
- Use
llama_startto start the server, or - Start llama-server manually:
llama-server -m /path/to/model.gguf - Check that
LLAMA_SERVER_URLmatches where llama-server is running
WSL/Windows path issues
When running in WSL, ensure paths use Linux format:
- ✓
/home/user/models/model.gguf - ✗
C:\Users\user\models\model.gguf
License
MIT
