matchmaker-analysis-mcp
v1.0.0
Published
MCP server for dating profile analysis
Readme
LLM Matchmaker Simulation
An experiment to predict dating app accept/reject decisions using LLM personas. Each user profile becomes an AI agent that evaluates potential matches based on personality, preferences, and writing style.
Note: This project uses OpenRouter, which provides unified API access to hundreds of AI models including GPT-5-mini, GPT-4o, Claude, and more. OpenRouter offers better pricing and model flexibility compared to using OpenAI directly.
🆕 MCP Server Available!
You can now run the photo analysis pipeline through a Model Context Protocol (MCP) server, allowing AI assistants like Claude Desktop to analyze profiles directly!
Quick Start:
./setup-mcp.sh # Unix/Mac
# or
.\setup-mcp.ps1 # WindowsSee QUICKSTART_MCP.md for 5-minute setup guide or MCP_SERVER_README.md for full documentation.
Overview
This system:
- Reads match pairs from
out.json(which users were matched) - Looks up detailed profile data from
profiles.json - Simulates each user's decision using an LLM persona
- Compares predictions to actual decisions from
out.json - Reports accuracy metrics
Setup
Prerequisites
- Bun runtime (or Node.js)
- OpenRouter API key (Get one here)
Installation
# Install dependencies
bun install
# Or with npm
npm installConfiguration
Create a .env file with your OpenRouter API key:
# Single API key for all models
OPENROUTER_API_KEY=your_api_key_here
# Model configuration (OpenRouter format: provider/model-name)
TEXT_MODEL=openai/gpt-5-mini # Default: text-only evaluation
VISION_MODEL=google/gemini-pro-1.5 # Default: vision + text evaluation
# Other available models:
# Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet
# Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet
# See https://openrouter.ai/models for full list
# Scoring weights (must sum to 1.0, default values shown)
WEIGHT_PHYSICAL=0.50 # Physical attraction (50%)
WEIGHT_LIFESTYLE=0.30 # Lifestyle compatibility (30%)
WEIGHT_PERSONALITY=0.15 # Personality match (15%)
WEIGHT_INTENTIONS=0.05 # Dating intentions alignment (5%)
# Optional: Set sample size (default: 10)
SAMPLE_SIZE=10
# Optional: Set offset for which matches to test
SAMPLE_OFFSET=0Usage
Run the simulation
bun index.tsThis will:
- Load profiles and matches
- Simulate decisions for the specified sample size
- Display results in the console
- Save detailed results to
simulation_results.json
Sample Output
🚀 Starting LLM Matchmaker Simulation
Text Model: openai/gpt-5-mini
Vision Model: google/gemini-pro-1.5
Sample size: 10
Sample offset: 0
📚 Loading profiles.json...
✅ Loaded 8102 profiles
✅ Mapped 8100 profiles by userId
📊 Loading out.json...
✅ Loaded 3495 match interactions
🎯 Running simulation on 10 matches (indices 0 to 9)...
======================================================================
Match 1/10
======================================================================
User 1 ID: 68dff2779005e5f5dacbddd0
Profile: Female, 35 years old
User 2 ID: 680c097e8cf7846e270a19ec
Profile: Male, 31 years old
👤 User 1 (68dff2779005e5f5dacbddd0) evaluating User 2 (680c097e8cf7846e270a19ec)...
Real decision: ❌ REJECT
Predicted: ❌ REJECT (overall: 0.42)
Result: ✅ CORRECT
📊 CATEGORY SCORES:
Physical: 0.35 (Text: 0.30, Vision: 0.40)
Lifestyle: 0.45 (Text: 0.50, Vision: 0.40)
Personality: 0.50 (Text: 0.50, Vision: 0.50)
Intentions: 0.60 (Text: 0.70, Vision: 0.50)
💭 TEXT REASONING:
While they seem nice, our hobbies don't align well. I'm into outdoor
activities but they prefer staying in. Height difference might also
be an issue given my preferences.
👁️ VISUAL ANALYSIS:
Photos show a casual style with indoor settings. They appear to prefer
cozy environments. Not quite my physical type based on my preferences.
...
============================================================
📊 FINAL RESULTS
============================================================
Overall Accuracy: 75.00% (15/20 correct)
User 1 Accuracy: 70.00%
User 2 Accuracy: 80.00%
Real Acceptance Rate: 25.00%
Predicted Acceptance Rate: 30.00%
Score Analysis:
Average Score: 0.452
Average Score for Real Accepts: 0.683
Average Score for Real Rejects: 0.341
Score Separation: 0.342 (higher is better)
Confusion Matrix:
True Positives: 4
True Negatives: 11
False Positives: 1
False Negatives: 4Project Structure
.
├── types.ts # TypeScript interfaces for profiles and matches
├── prompts.ts # LLM prompt engineering for persona simulation
├── index.ts # Main simulation engine
├── results.ts # Metrics calculation and reporting utilities
├── validate.ts # Data validation script (no API calls)
├── profiles.json # User profile data (dictionary)
├── out.json # Match interactions (input + ground truth)
├── simulation_results.json # Detailed output (generated)
├── README.md # This file
├── SETUP.md # Setup instructions
├── SCORING_SYSTEM.md # Details on the 0-1 scoring system
└── OUTPUT_GUIDE.md # Guide to enhanced debugging outputHow It Works
Data Flow
- Input:
out.jsonprovides match pairs and actual decisions - Lookup:
profiles.jsonprovides detailed profile data indexed byuserId - Simulate: For each match, create two LLM personas (one per user)
- Compare: Compare predicted decisions to actual decisions
Persona Simulation
Each user becomes an LLM agent with:
- Demographics (age, gender, height, ethnicity)
- Preferences (gender, age range, ethnicity, physical attraction)
- Personality (green flags, red flags, political/religious beliefs)
- Writing style (analyzed from text length and tone)
Multi-Dimensional Scoring
The system evaluates 4 compatibility dimensions:
- Physical Attraction (50% weight): Visual appeal, style, photos
- Lifestyle Compatibility (30% weight): Hobbies, activities, interests
- Personality Match (15% weight): Vibe, energy, communication style
- Intention Alignment (5% weight): Dating goals (casual vs serious)
Each dimension scored 0.0-1.0, then combined via weighted average.
Dual-Track Evaluation
Text Track: Text-only LLM reads profile descriptions and scores all 4 dimensions based on written content.
Vision Track: Vision-capable LLM (Gemini) analyzes actual profile photos and scores dimensions based on visual cues.
Final Score: Average of text and vision scores for each dimension, then weighted sum (≥0.5 = ACCEPT).
Hard Filters
Before LLM evaluation, the system checks:
- Gender preferences (
expectedGender) - Age range preferences (
ageRange) - Ethnicity preferences (
expectedEthnicity)
If hard filters fail → instant reject (saves API costs).
Key Features
- Multi-dimensional scoring: 4 compatibility categories (Physical, Lifestyle, Personality, Intentions)
- Dual-track evaluation: Text-only LLM + Vision LLM analyze each candidate independently
- Vision analysis: Gemini evaluates actual profile photos, not just text descriptions
- Persona-based prediction: LLM adopts user's personality/preferences
- Weighted aggregation: Configurable weights for each compatibility dimension
- Continuous scoring (0-1): Nuanced confidence scores for each category
- Dual reasoning display: See both text-based reasoning and visual analysis
- Category breakdown: Understand exactly why matches succeed or fail
- Writing style analysis: Matches casual vs serious users
- Score separation metrics: Measures model's ability to distinguish accept/reject
- UserId debugging: Display actual userIds instead of redacted names
- Comprehensive metrics: Accuracy, confusion matrix, acceptance rates, per-category analysis
Tuning
To improve accuracy:
Adjust scoring weights in
.env:- Increase
WEIGHT_PHYSICALif physical attraction matters more - Increase
WEIGHT_LIFESTYLEif hobby alignment is key - Weights must sum to 1.0
- Increase
Change models in
.env:- Text models:
openai/gpt-5-mini,anthropic/claude-3.5-sonnet - Vision models:
google/gemini-pro-1.5,google/gemini-flash-1.5,openai/gpt-4o - See OpenRouter models for full list
- Text models:
Adjust prompts in
prompts.ts:- Make personas more picky/lenient
- Emphasize different evaluation criteria
Modify evaluation strategy in
index.ts:- Change how text/vision scores are merged (currently simple average)
- Add minimum thresholds for specific categories
Cost Estimation
Dual-Track System (Text + Vision):
Using openai/gpt-5-mini + google/gemini-pro-1.5:
- Sample of 10 matches: ~$0.08-0.15 (40 API calls: 20 text + 20 vision)
- Full dataset (~3.5K matches): ~$280-525 (14K API calls)
Cost breakdown per match:
- Text track: ~$0.02 (gpt-5-mini)
- Vision track: ~$0.04-0.08 (gemini-pro-1.5 with 3-5 images)
- Total per match: ~$0.06-0.10
Budget Options:
- Use
google/gemini-flash-1.5for faster, cheaper vision (~50% cost reduction) - Reduce to text-only by setting
VISION_MODEL=""(50% cost reduction, but loses visual analysis)
💡 Tip: Check OpenRouter pricing for current rates and model availability.
Success Criteria
- ✅ >50% accuracy (baseline)
- 🎯 65-75% accuracy (target)
- ✅ Reasonable persona behavior (qualitative)
License
MIT
