matchmaker-analysis-mcp

v1.0.0

Published

2 months ago

MCP server for dating profile analysis

0High
0Medium
0Low

nelonmelon

mcp dating analysis ai

LLM Matchmaker Simulation

An experiment to predict dating app accept/reject decisions using LLM personas. Each user profile becomes an AI agent that evaluates potential matches based on personality, preferences, and writing style.

Note: This project uses OpenRouter, which provides unified API access to hundreds of AI models including GPT-5-mini, GPT-4o, Claude, and more. OpenRouter offers better pricing and model flexibility compared to using OpenAI directly.

🆕 MCP Server Available!

You can now run the photo analysis pipeline through a Model Context Protocol (MCP) server, allowing AI assistants like Claude Desktop to analyze profiles directly!

Quick Start:

./setup-mcp.sh  # Unix/Mac
# or
.\setup-mcp.ps1  # Windows

See QUICKSTART_MCP.md for 5-minute setup guide or MCP_SERVER_README.md for full documentation.

Overview

This system:

Reads match pairs from out.json (which users were matched)
Looks up detailed profile data from profiles.json
Simulates each user's decision using an LLM persona
Compares predictions to actual decisions from out.json
Reports accuracy metrics

Setup

Prerequisites

Bun runtime (or Node.js)
OpenRouter API key (Get one here)

Installation

# Install dependencies
bun install

# Or with npm
npm install

Configuration

Create a .env file with your OpenRouter API key:

# Single API key for all models
OPENROUTER_API_KEY=your_api_key_here

# Model configuration (OpenRouter format: provider/model-name)
TEXT_MODEL=openai/gpt-5-mini          # Default: text-only evaluation
VISION_MODEL=google/gemini-pro-1.5    # Default: vision + text evaluation

# Other available models:
# Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet
# Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet
# See https://openrouter.ai/models for full list

# Scoring weights (must sum to 1.0, default values shown)
WEIGHT_PHYSICAL=0.50      # Physical attraction (50%)
WEIGHT_LIFESTYLE=0.30     # Lifestyle compatibility (30%)
WEIGHT_PERSONALITY=0.15   # Personality match (15%)
WEIGHT_INTENTIONS=0.05    # Dating intentions alignment (5%)

# Optional: Set sample size (default: 10)
SAMPLE_SIZE=10

# Optional: Set offset for which matches to test
SAMPLE_OFFSET=0

Usage

Run the simulation

bun index.ts

This will:

Load profiles and matches
Simulate decisions for the specified sample size
Display results in the console
Save detailed results to simulation_results.json

Sample Output

🚀 Starting LLM Matchmaker Simulation

Text Model: openai/gpt-5-mini
Vision Model: google/gemini-pro-1.5
Sample size: 10
Sample offset: 0

📚 Loading profiles.json...
✅ Loaded 8102 profiles
✅ Mapped 8100 profiles by userId
📊 Loading out.json...
✅ Loaded 3495 match interactions

🎯 Running simulation on 10 matches (indices 0 to 9)...

======================================================================
Match 1/10
======================================================================
User 1 ID: 68dff2779005e5f5dacbddd0
  Profile: Female, 35 years old
User 2 ID: 680c097e8cf7846e270a19ec
  Profile: Male, 31 years old

👤 User 1 (68dff2779005e5f5dacbddd0) evaluating User 2 (680c097e8cf7846e270a19ec)...
   Real decision: ❌ REJECT
   Predicted: ❌ REJECT (overall: 0.42)
   Result: ✅ CORRECT

   📊 CATEGORY SCORES:
      Physical:    0.35 (Text: 0.30, Vision: 0.40)
      Lifestyle:   0.45 (Text: 0.50, Vision: 0.40)
      Personality: 0.50 (Text: 0.50, Vision: 0.50)
      Intentions:  0.60 (Text: 0.70, Vision: 0.50)

   💭 TEXT REASONING:
   While they seem nice, our hobbies don't align well. I'm into outdoor 
   activities but they prefer staying in. Height difference might also 
   be an issue given my preferences.

   👁️  VISUAL ANALYSIS:
   Photos show a casual style with indoor settings. They appear to prefer
   cozy environments. Not quite my physical type based on my preferences.

...

============================================================
📊 FINAL RESULTS
============================================================

Overall Accuracy: 75.00% (15/20 correct)
User 1 Accuracy: 70.00%
User 2 Accuracy: 80.00%

Real Acceptance Rate: 25.00%
Predicted Acceptance Rate: 30.00%

Score Analysis:
  Average Score: 0.452
  Average Score for Real Accepts: 0.683
  Average Score for Real Rejects: 0.341
  Score Separation: 0.342 (higher is better)

Confusion Matrix:
  True Positives: 4
  True Negatives: 11
  False Positives: 1
  False Negatives: 4

Project Structure

.
├── types.ts          # TypeScript interfaces for profiles and matches
├── prompts.ts        # LLM prompt engineering for persona simulation
├── index.ts          # Main simulation engine
├── results.ts        # Metrics calculation and reporting utilities
├── validate.ts       # Data validation script (no API calls)
├── profiles.json     # User profile data (dictionary)
├── out.json          # Match interactions (input + ground truth)
├── simulation_results.json  # Detailed output (generated)
├── README.md         # This file
├── SETUP.md          # Setup instructions
├── SCORING_SYSTEM.md # Details on the 0-1 scoring system
└── OUTPUT_GUIDE.md   # Guide to enhanced debugging output

How It Works

Data Flow

Input: out.json provides match pairs and actual decisions
Lookup: profiles.json provides detailed profile data indexed by userId
Simulate: For each match, create two LLM personas (one per user)
Compare: Compare predicted decisions to actual decisions

Persona Simulation

Each user becomes an LLM agent with:

Demographics (age, gender, height, ethnicity)
Preferences (gender, age range, ethnicity, physical attraction)
Personality (green flags, red flags, political/religious beliefs)
Writing style (analyzed from text length and tone)

Multi-Dimensional Scoring

The system evaluates 4 compatibility dimensions:

Physical Attraction (50% weight): Visual appeal, style, photos
Lifestyle Compatibility (30% weight): Hobbies, activities, interests
Personality Match (15% weight): Vibe, energy, communication style
Intention Alignment (5% weight): Dating goals (casual vs serious)

Each dimension scored 0.0-1.0, then combined via weighted average.

Dual-Track Evaluation

Text Track: Text-only LLM reads profile descriptions and scores all 4 dimensions based on written content.

Vision Track: Vision-capable LLM (Gemini) analyzes actual profile photos and scores dimensions based on visual cues.

Final Score: Average of text and vision scores for each dimension, then weighted sum (≥0.5 = ACCEPT).

Hard Filters

Before LLM evaluation, the system checks:

Gender preferences (expectedGender)
Age range preferences (ageRange)
Ethnicity preferences (expectedEthnicity)

If hard filters fail → instant reject (saves API costs).

Key Features

Multi-dimensional scoring: 4 compatibility categories (Physical, Lifestyle, Personality, Intentions)
Dual-track evaluation: Text-only LLM + Vision LLM analyze each candidate independently
Vision analysis: Gemini evaluates actual profile photos, not just text descriptions
Persona-based prediction: LLM adopts user's personality/preferences
Weighted aggregation: Configurable weights for each compatibility dimension
Continuous scoring (0-1): Nuanced confidence scores for each category
Dual reasoning display: See both text-based reasoning and visual analysis
Category breakdown: Understand exactly why matches succeed or fail
Writing style analysis: Matches casual vs serious users
Score separation metrics: Measures model's ability to distinguish accept/reject
UserId debugging: Display actual userIds instead of redacted names
Comprehensive metrics: Accuracy, confusion matrix, acceptance rates, per-category analysis

Tuning

To improve accuracy:

Adjust scoring weights in .env:
- Increase WEIGHT_PHYSICAL if physical attraction matters more
- Increase WEIGHT_LIFESTYLE if hobby alignment is key
- Weights must sum to 1.0
Change models in .env:
- Text models: openai/gpt-5-mini, anthropic/claude-3.5-sonnet
- Vision models: google/gemini-pro-1.5, google/gemini-flash-1.5, openai/gpt-4o
- See OpenRouter models for full list
Adjust prompts in prompts.ts:
- Make personas more picky/lenient
- Emphasize different evaluation criteria
Modify evaluation strategy in index.ts:
- Change how text/vision scores are merged (currently simple average)
- Add minimum thresholds for specific categories

Cost Estimation

Dual-Track System (Text + Vision):

Using openai/gpt-5-mini + google/gemini-pro-1.5:

Sample of 10 matches: ~$0.08-0.15 (40 API calls: 20 text + 20 vision)
Full dataset (~3.5K matches): ~$280-525 (14K API calls)

Cost breakdown per match:

Text track: ~$0.02 (gpt-5-mini)
Vision track: ~$0.04-0.08 (gemini-pro-1.5 with 3-5 images)
Total per match: ~$0.06-0.10

Budget Options:

Use google/gemini-flash-1.5 for faster, cheaper vision (~50% cost reduction)
Reduce to text-only by setting VISION_MODEL="" (50% cost reduction, but loses visual analysis)

💡 Tip: Check OpenRouter pricing for current rates and model availability.

Success Criteria

✅ >50% accuracy (baseline)
🎯 65-75% accuracy (target)
✅ Reasonable persona behavior (qualitative)

License

MIT