@vespermcp/mcp-server
v1.2.4
Published
AI-powered dataset discovery, quality analysis, and preparation MCP server with multimodal support (text, image, audio, video)
Maintainers
Readme
Vesper MCP Server 🚀
AI-powered dataset discovery, quality analysis, and preparation with multimodal support (text, image, audio, video).
Vesper is a Model Context Protocol (MCP) server that helps you find, analyze, and prepare high-quality datasets for machine learning projects. It integrates seamlessly with AI assistants like Claude, providing autonomous dataset workflows.
✨ Features
🔍 Dataset Discovery
- Search across HuggingFace, Kaggle, UCI ML Repository, and more
- Intelligent ranking based on quality, safety, and relevance
- Automatic metadata extraction and enrichment
📊 Quality Analysis
- Text: Missing data, duplicates, column profiling
- Images: Resolution, corruption, blur detection
- Audio: Sample rate, duration, silence detection
- Video: FPS, frame validation, corruption risk
- Unified Reports: Consolidated quality scores (0-100) with recommendations
🛠️ Data Preparation
- Automated cleaning pipelines
- Format conversion (CSV, JSON, Parquet)
- Train/test/validation splitting
- Automatic installation to project directories
🎯 Multimodal Support
- Analyze mixed datasets (text + images + audio)
- Media-specific quality metrics
- Intelligent modality detection
📦 Installation
🚀 Quick Start (VS Code + Copilot)
The fastest way to install Vesper and configure it for GitHub Copilot Chat or Cursor is to run the automated setup:
npx -y @vespermcp/mcp-server@latest --setup- Select Visual Studio Code (Settings.json) from the list.
- Restart VS Code.
- Open Copilot Chat and look for the MCP Servers section.
🛠️ Configuration
Vesper supports:
- GitHub Copilot Chat: Automated setup via
settings.json. - Cursor: Automated setup via
mcp.json. - Claude Desktop: Automated setup via
claude_desktop_config.json.
Manual Python Setup (if needed)
pip install opencv-python pillow numpy librosa soundfile⚙️ MCP Configuration
For Cursor
- Go to Settings > Features > MCP
- Click Add New MCP Server
- Enter:
- Name:
vesper - Type:
command - Command:
vesper
- Name:
For Claude Desktop
Vesper attempts to auto-configure itself! Restart Claude and check. If not:
{
"mcpServers": {
"vesper": {
"command": "vesper",
"args": [],
"env": {
"HF_TOKEN": "your-huggingface-token"
}
}
}
}Note: If the
vespercommand isn't found, you can stick to the absolute path method.
Environment Variables (Optional)
KAGGLE_USERNAME&KAGGLE_KEY: For Kaggle dataset accessHF_TOKEN: For private HuggingFace datasets
Optional Kaggle Setup (Not Required)
Core Vesper works without any API keys. Keys are only needed when you explicitly use Kaggle or gated Hugging Face.
Install optional Kaggle client only if you need Kaggle source access:
pip install kagglevespermcp config keysThe setup wizard supports skip and stores keys securely via OS keyring when available,
with fallback to ~/.vesper/config.toml.
or use Kaggle's native file:
~/.kaggle/kaggle.json
If credentials are missing and you run Kaggle commands, Vesper shows:
Kaggle support requires API key. Run 'vespermcp config keys' (30 seconds).
CLI Examples
vespermcp discover --source kaggle "credit risk" --limit 10
vespermcp discover --source huggingface "credit risk" --limit 10
vespermcp download kaggle username/dataset-name
vespermcp download kaggle https://www.kaggle.com/datasets/username/dataset-name --target-dir ./data🚀 Quick Start
After installation and configuration, restart your AI assistant and try:
search_datasets(query="sentiment analysis", limit=5)prepare_dataset(query="image classification cats vs dogs")generate_quality_report(
dataset_id="huggingface:imdb",
dataset_path="/path/to/data"
)📚 Available Tools
Dataset Discovery
search_datasets
Search for datasets across multiple sources.
Parameters:
query(string): Search querylimit(number, optional): Max results (default: 10)min_quality_score(number, optional): Minimum quality threshold
Example:
search_datasets(query="medical imaging", limit=5, min_quality_score=70)Data Preparation
prepare_dataset
Download, analyze, and prepare a dataset for use.
Parameters:
query(string): Dataset search query or ID
Example:
prepare_dataset(query="squad")export_dataset
Export a prepared dataset to a custom directory with format conversion.
Parameters:
dataset_id(string): Dataset identifiertarget_dir(string): Export directoryformat(string, optional): Output format (csv, json, parquet)
Example:
export_dataset(
dataset_id="huggingface:imdb",
target_dir="./my-data",
format="csv"
)Quality Analysis
analyze_image_quality
Analyze image datasets for resolution, corruption, and blur.
Parameters:
path(string): Path to image file or folder
Example:
analyze_image_quality(path="/path/to/images")analyze_media_quality
Analyze audio/video files for quality metrics.
Parameters:
path(string): Path to media file or folder
Example:
analyze_media_quality(path="/path/to/audio")generate_quality_report
Generate a comprehensive unified quality report for multimodal datasets.
Parameters:
dataset_id(string): Dataset identifierdataset_path(string): Path to dataset directory
Example:
generate_quality_report(
dataset_id="my-dataset",
dataset_path="/path/to/data"
)Data Splitting
split_dataset
Split a dataset into train/test/validation sets.
Parameters:
dataset_id(string): Dataset identifiertrain_ratio(number): Training set ratio (0-1)test_ratio(number): Test set ratio (0-1)val_ratio(number, optional): Validation set ratio (0-1)
Example:
split_dataset(
dataset_id="my-dataset",
train_ratio=0.7,
test_ratio=0.2,
val_ratio=0.1
)🏗️ Architecture
Vesper is built with:
- TypeScript for the MCP server
- Python for image/audio/video processing
- SQLite for metadata storage
- Transformers.js for semantic search
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
📄 License
MIT License - see LICENSE for details.
🐛 Issues & Support
- Issues: https://github.com/vesper/mcp-server/issues
- Discussions: https://github.com/vesper/mcp-server/discussions
🌟 Acknowledgments
Built with:
Made with ❤️ by the Vesper Team
