xcomet-mcp-server
v0.3.5
Published
MCP Server for xCOMET translation quality evaluation
Downloads
935
Maintainers
Readme
xCOMET MCP Server
⚠️ This is an unofficial community project, not affiliated with Unbabel.
Translation quality evaluation MCP Server powered by xCOMET (eXplainable COMET).
🎯 Overview
xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide:
- Quality Scoring: Scores between 0-1 indicating translation quality
- Error Detection: Identifies error spans with severity levels (minor/major/critical)
- Batch Processing: Evaluate multiple translation pairs efficiently (optimized single model load)
- GPU Support: Optional GPU acceleration for faster inference
graph LR
A[AI Agent] --> B[Node.js MCP Server]
B --> C[Python FastAPI Server]
C --> D[xCOMET Model<br/>Persistent in Memory]
D --> C
C --> B
B --> A
style D fill:#9f9🔧 Prerequisites
Python Environment
xCOMET requires Python with the following packages:
pip install "unbabel-comet>=2.2.0" fastapi uvicornModel Download
The first run will download the xCOMET model (~14GB for XL, ~42GB for XXL):
# Test model availability
python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"Node.js
- Node.js >= 18.0.0
- npm or yarn
📦 Installation
# Clone the repository
git clone https://github.com/shuji-bonji/xcomet-mcp-server.git
cd xcomet-mcp-server
# Install dependencies
npm install
# Build
npm run build🚀 Usage
With Claude Desktop (npx)
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"]
}
}
}With Claude Code
claude mcp add xcomet -- npx -y xcomet-mcp-serverLocal Installation
If you prefer a local installation:
npm install -g xcomet-mcp-serverThen configure:
{
"mcpServers": {
"xcomet": {
"command": "xcomet-mcp-server"
}
}
}HTTP Mode (Remote Access)
TRANSPORT=http PORT=3000 npm startThen connect to http://localhost:3000/mcp
🛠️ Available Tools
xcomet_evaluate
Evaluate translation quality for a single source-translation pair.
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| source | string | ✅ | Original source text |
| translation | string | ✅ | Translated text to evaluate |
| reference | string | ❌ | Reference translation |
| source_lang | string | ❌ | Source language code (ISO 639-1) |
| target_lang | string | ❌ | Target language code (ISO 639-1) |
| response_format | "json" | "markdown" | ❌ | Output format (default: "json") |
| use_gpu | boolean | ❌ | Use GPU for inference (default: false) |
Example:
{
"source": "The quick brown fox jumps over the lazy dog.",
"translation": "素早い茶色のキツネが怠惰な犬を飛び越える。",
"source_lang": "en",
"target_lang": "ja",
"use_gpu": true
}Response:
{
"score": 0.847,
"errors": [],
"summary": "Good quality (score: 0.847) with 0 error(s) detected."
}xcomet_detect_errors
Focus on detecting and categorizing translation errors.
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| source | string | ✅ | Original source text |
| translation | string | ✅ | Translated text to analyze |
| reference | string | ❌ | Reference translation |
| min_severity | "minor" | "major" | "critical" | ❌ | Minimum severity (default: "minor") |
| response_format | "json" | "markdown" | ❌ | Output format |
| use_gpu | boolean | ❌ | Use GPU for inference (default: false) |
xcomet_batch_evaluate
Evaluate multiple translation pairs in a single request.
Performance Note: With the persistent server architecture (v0.3.0+), the model stays loaded in memory. Batch evaluation processes all pairs efficiently without reloading the model.
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| pairs | array | ✅ | Array of {source, translation, reference?} (max 500) |
| source_lang | string | ❌ | Source language code |
| target_lang | string | ❌ | Target language code |
| response_format | "json" | "markdown" | ❌ | Output format |
| use_gpu | boolean | ❌ | Use GPU for inference (default: false) |
| batch_size | number | ❌ | Batch size 1-64 (default: 8). Larger = faster but uses more memory |
Example:
{
"pairs": [
{"source": "Hello", "translation": "こんにちは"},
{"source": "Goodbye", "translation": "さようなら"}
],
"use_gpu": true,
"batch_size": 16
}🔗 Integration with Other MCP Servers
xCOMET MCP Server is designed to work alongside other MCP servers for complete translation workflows:
sequenceDiagram
participant Agent as AI Agent
participant DeepL as DeepL MCP Server
participant xCOMET as xCOMET MCP Server
Agent->>DeepL: Translate text
DeepL-->>Agent: Translation result
Agent->>xCOMET: Evaluate quality
xCOMET-->>Agent: Score + Errors
Agent->>Agent: Decide: Accept or retry?Recommended Workflow
- Translate using DeepL MCP Server (official)
- Evaluate using xCOMET MCP Server
- Iterate if quality is below threshold
Example: DeepL + xCOMET Integration
Configure both servers in Claude Desktop:
{
"mcpServers": {
"deepl": {
"command": "npx",
"args": ["-y", "@anthropic/deepl-mcp-server"],
"env": {
"DEEPL_API_KEY": "your-api-key"
}
},
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"]
}
}
}Then ask Claude:
"Translate this text to Japanese using DeepL, then evaluate the translation quality with xCOMET. If the score is below 0.8, suggest improvements."
⚙️ Configuration
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| TRANSPORT | stdio | Transport mode: stdio or http |
| PORT | 3000 | HTTP server port (when TRANSPORT=http) |
| XCOMET_MODEL | Unbabel/XCOMET-XL | xCOMET model to use |
| XCOMET_PYTHON_PATH | (auto-detect) | Python executable path (see below) |
| XCOMET_PRELOAD | false | Pre-load model at startup (v0.3.1+) |
| XCOMET_DEBUG | false | Enable verbose debug logging (v0.3.1+) |
Model Selection
Choose the model based on your quality/performance needs:
| Model | Parameters | Size | Memory | Reference | Quality | Use Case |
|-------|------------|------|--------|-----------|---------|----------|
| Unbabel/XCOMET-XL | 3.5B | ~14GB | ~8-10GB | Optional | ⭐⭐⭐⭐ | Recommended for most use cases |
| Unbabel/XCOMET-XXL | 10.7B | ~42GB | ~20GB | Optional | ⭐⭐⭐⭐⭐ | Highest quality, requires more resources |
| Unbabel/wmt22-comet-da | 580M | ~2GB | ~3GB | Required | ⭐⭐⭐ | Lightweight, faster loading |
Important:
wmt22-comet-darequires areferencetranslation for evaluation. XCOMET models support referenceless evaluation.
Tip: If you experience memory issues or slow model loading, try
Unbabel/wmt22-comet-dafor faster performance with slightly lower accuracy (but remember to provide reference translations).
To use a different model, set the XCOMET_MODEL environment variable:
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_MODEL": "Unbabel/XCOMET-XXL"
}
}
}
}Python Path Auto-Detection
The server automatically detects a Python environment with unbabel-comet installed:
XCOMET_PYTHON_PATHenvironment variable (if set)- pyenv versions (
~/.pyenv/versions/*/bin/python3) - checks forcometmodule - Homebrew Python (
/opt/homebrew/bin/python3,/usr/local/bin/python3) - Fallback:
python3command
This ensures the server works correctly even when the MCP host (e.g., Claude Desktop) uses a different Python than your terminal.
Example: Explicit Python path configuration
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PYTHON_PATH": "/Users/you/.pyenv/versions/3.11.0/bin/python3"
}
}
}
}⚡ Performance
Persistent Server Architecture (v0.3.0+)
The server uses a persistent Python FastAPI server that keeps the xCOMET model loaded in memory:
| Request | Time | Notes | |---------|------|-------| | First request | ~25-90s | Model loading (varies by model size) | | Subsequent requests | ~500ms | Model already loaded |
This provides a 177x speedup for consecutive evaluations compared to reloading the model each time.
Eager Loading (v0.3.1+)
Enable XCOMET_PRELOAD=true to pre-load the model at server startup:
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PRELOAD": "true"
}
}
}
}With preload enabled, all requests are fast (~500ms), including the first one.
graph LR
A[MCP Request] --> B[Node.js Server]
B --> C[Python FastAPI Server]
C --> D[xCOMET Model<br/>in Memory]
D --> C
C --> B
B --> A
style D fill:#9f9Batch Processing Optimization
The xcomet_batch_evaluate tool processes all pairs with a single model load:
| Pairs | Estimated Time | |-------|----------------| | 10 | ~30-40 sec | | 50 | ~1-1.5 min | | 100 | ~2 min |
GPU vs CPU Performance
| Mode | 100 Pairs (Estimated) | |------|----------------------| | CPU (batch_size=8) | ~2 min | | GPU (batch_size=16) | ~20-30 sec |
Note: GPU requires CUDA-compatible hardware and PyTorch with CUDA support. If GPU is not available, set
use_gpu: false(default).
Best Practices
1. Let the persistent server do its job
With v0.3.0+, the model stays in memory. Multiple xcomet_evaluate calls are now efficient:
✅ Fast: First call loads model, subsequent calls reuse it
xcomet_evaluate(pair1) # ~90s (model loads)
xcomet_evaluate(pair2) # ~500ms (model cached)
xcomet_evaluate(pair3) # ~500ms (model cached)2. For many pairs, use batch evaluation
✅ Even faster: Batch all pairs in one call
xcomet_batch_evaluate(allPairs) # Optimal throughput3. Memory considerations
- XCOMET-XL requires ~8-10GB RAM
- For large batches (500 pairs), ensure sufficient memory
- If memory is limited, split into smaller batches (100-200 pairs)
Auto-Restart (v0.3.1+)
The server automatically recovers from failures:
- Monitors health every 30 seconds
- Restarts after 3 consecutive health check failures
- Up to 3 restart attempts before giving up
📊 Quality Score Interpretation
| Score Range | Quality | Recommendation | |-------------|---------|----------------| | 0.9 - 1.0 | Excellent | Ready for use | | 0.7 - 0.9 | Good | Minor review recommended | | 0.5 - 0.7 | Fair | Post-editing needed | | 0.0 - 0.5 | Poor | Re-translation recommended |
🔍 Troubleshooting
Common Issues
"No module named 'comet'"
Cause: Python environment without unbabel-comet installed.
Solution:
# Check which Python is being used
python3 -c "import sys; print(sys.executable)"
# Install all required packages
pip install "unbabel-comet>=2.2.0" fastapi uvicorn
# Or specify Python path explicitly
export XCOMET_PYTHON_PATH=/path/to/python3Model download fails or times out
Cause: Large model files (~14GB for XL) require stable internet connection.
Solution:
# Pre-download the model manually
python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"GPU not detected
Cause: PyTorch not installed with CUDA support.
Solution:
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# If False, reinstall PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu118Slow performance on Mac (MPS)
Cause: Mac MPS (Metal Performance Shaders) has compatibility issues with some operations.
Solution: The server automatically uses num_workers=1 for Mac MPS compatibility. For best performance on Mac, use CPU mode (use_gpu: false).
High memory usage or crashes
Cause: XCOMET-XL requires ~8-10GB RAM.
Solutions:
- Use the persistent server (v0.3.0+): Model loads once and stays in memory, avoiding repeated memory spikes
- Use a lighter model: Set
XCOMET_MODEL=Unbabel/wmt22-comet-dafor lower memory usage (~3GB) - Reduce batch size: For large batches, process in smaller chunks (100-200 pairs)
- Close other applications: Free up RAM before running large evaluations
# Check available memory
free -h # Linux
vm_stat | head -5 # macOSVS Code or IDE crashes during evaluation
Cause: High memory usage from the xCOMET model (~8-10GB for XL).
Solution:
- With v0.3.0+, the model loads once and stays in memory (no repeated loading)
- If memory is still an issue, use a lighter model:
XCOMET_MODEL=Unbabel/wmt22-comet-da - Close other memory-intensive applications before evaluation
Getting Help
If you encounter issues:
- Check the GitHub Issues
- Enable debug logging by checking Claude Desktop's Developer Mode logs
- Open a new issue with:
- Your OS and Python version
- The error message
- Your configuration (without sensitive data)
🧪 Development
# Install dependencies
npm install
# Build TypeScript
npm run build
# Watch mode
npm run dev
# Test with MCP Inspector
npm run inspect📋 Changelog
See CHANGELOG.md for version history and updates.
📝 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- Unbabel for the xCOMET model
- Anthropic for the MCP protocol
- Model Context Protocol community
