@dolusoft/hirebase-mcp
v1.1.15
Published
HireBase - AI-powered CV search engine with LanceDB and MCP
Readme
HireBase
AI-powered recruitment platform built as an MCP Server. Parses CVs, extracts structured data with GPT, stores them in a LanceDB vector database, and manages the entire hiring pipeline — from candidate matching to interview scheduling.
Architecture
DDD + Clean Architecture with clear separation of concerns:
src/
├── domain/ # Entities, Value Objects, Repository Interfaces
├── application/ # Use Cases, Ports, DTOs
├── infrastructure/ # LanceDB, OpenAI, PDF/DOCX/OCR Parsers
├── interface/ # MCP Server, Tools, Dashboard, CLI
└── shared/ # Errors, Types, UtilitiesDesign Principle: MCP Server = Data layer (parse, store, search, return). The AI client using the tools handles analysis, ranking strategy, and match explanations.
Tech Stack
| Technology | Purpose |
|---|---|
| TypeScript 5.9 | Language |
| LanceDB | Vector database (embedded, serverless) |
| OpenAI text-embedding-3-small | 1536-dim embeddings |
| OpenAI gpt-5-mini | Structured CV extraction (Responses API) |
| MCP SDK | Model Context Protocol server |
| Nuxt 4 + Nuxt UI | Real-time dashboard |
| unpdf | PDF text extraction |
| mammoth | DOCX text extraction |
| Tesseract.js | OCR for image-based PDFs |
| Google Calendar + Contacts API | Interview scheduling & contact sync (OAuth2, zero dependencies) |
MCP Tools (30)
CV Management
| Tool | Description |
|---|---|
| add_cv | Parse CV file (PDF/DOCX), extract structured data, embed, store |
| update_cv | Update existing CV, archive old version |
| delete_cv | Delete candidate and all related data |
| get_cv_detail | Full candidate data with download URL |
| get_cv_chunks | CV section chunks with optional section filter |
| get_cv_versions | Version history |
| list_cvs | Paginated candidate list with sorting |
| manage_tags | Add/remove/list tags |
Search & Matching
| Tool | Description |
|---|---|
| semantic_search | Vector similarity search with optional section filter |
| filter_candidates | Structured filtering (skills, location, experience, languages, tags) |
| match_candidates | Composite scoring: semantic similarity + skill overlap + loyalty factor |
| export_results | Export as JSON or CSV |
Job Posting Management
| Tool | Description |
|---|---|
| add_job_posting | Create job posting with required/preferred skills and metadata |
| list_job_postings | List all job postings |
| get_job_posting | Retrieve specific job posting details |
| update_job_posting | Update existing job posting |
| delete_job_posting | Delete a job posting |
Recruitment Pipeline
| Tool | Description |
|---|---|
| add_to_pipeline | Add candidate to a job posting pipeline |
| update_pipeline_status | Update application status |
| add_pipeline_note | Add notes, call attempts, or interview records |
| set_pending_action | Track pending actions with due dates and owner (us/candidate) |
| get_pipeline | Full pipeline view for a job posting |
| get_candidate_history | Candidate's recruitment history across all postings |
| get_pending_actions | All overdue and upcoming pending actions |
Pipeline statuses: new → contacted → unreachable · not_interested · interview_scheduled → no_show · interviewed → rejected · offer_sent → hired
Google Integration
| Tool | Description |
|---|---|
| authorize_google | OAuth2 authorization flow — opens browser for Google consent (one-time, covers Calendar + Contacts) |
| create_calendar_event | Create events for interviews, meetings, and follow-ups |
| add_contact | Save candidate contact info to Google Contacts with optional group/label |
Google tools only appear when
GOOGLE_CLIENT_IDandGOOGLE_CLIENT_SECRETare configured. Enable People API and Calendar API in Google Cloud Console.
System
| Tool | Description |
|---|---|
| get_stats | Database statistics (candidates, skills, locations, etc.) |
| get_server_status | Server info, dashboard URL, connected clients |
| reset_database | Drop and recreate all tables |
How It Works
CV Ingestion
CV File (PDF/DOCX/Scanned PDF)
│
▼
[1] Parse text (unpdf / mammoth / Tesseract.js OCR)
│
▼
[2] Extract structured data (GPT-5 mini)
│ name, experience, education, skills, languages...
▼
[3] Chunk into sections
│ summary, each experience, education, skills, projects, certifications, full text
▼
[4] Generate embeddings (text-embedding-3-small, 1536d)
│
▼
[5] Store in LanceDB (candidates + cv_chunks + cv_versions)Candidate Matching
match_candidates uses a composite scoring algorithm:
compositeScore = (0.5 × vectorScore) + (0.3 × skillOverlap) + (0.2 × loyaltyFactor)| Component | Weight | Description |
|---|---|---|
| Vector Score | 50% | Semantic similarity between CV and job description |
| Skill Overlap | 30% | Ratio of matched required skills to total required skills |
| Loyalty Factor | 20% | Candidate stability: stable = 1.0, moderate = 0.7, flight_risk = 0.4 |
Recruitment Pipeline
Job Posting
│
▼
[1] match_candidates → ranked candidate list
│
▼
[2] add_to_pipeline → create applications (status: new)
│
▼
[3] update_pipeline_status → track progress through hiring stages
│ add_pipeline_note → log calls, interviews, notes
│ set_pending_action → assign follow-up tasks
▼
[4] get_pending_actions → monitor overdue/upcoming itemsDashboard
HireBase includes a real-time dashboard built with Nuxt 4 + Nuxt UI + Tailwind CSS. The dashboard connects via WebSocket and provides live tracking of MCP tool executions.
Enable via environment variable:
DASHBOARD_ENABLED=true
DASHBOARD_PORT=61496The dashboard URL is available in get_server_status output.
Database Schema
LanceDB tables:
| Table | Purpose |
|---|---|
| candidates | Candidate profiles (name, email, skills, experience, loyalty score, tags) |
| cv_chunks | Searchable CV sections with 1536-dim vector embeddings |
| cv_versions | Archived CV versions for update history |
| job_postings | Job listings with required/preferred skills and status |
| applications | Pipeline records linking candidates to jobs with status tracking |
| application_events | Audit trail: status changes, call attempts, interview records |
Installation
npm (recommended)
npm install -g @dolusoft/hirebase-mcpFrom source
git clone https://github.com/dolusoft/hirebase.git
cd hirebase
pnpm install
pnpm buildPrerequisites
- Node.js >= 22
- OpenAI API Key
- Google Cloud OAuth2 credentials (optional, for calendar integration)
MCP Configuration
Claude Code (npx)
claude mcp add hirebase -s user \
-e OPENAI_API_KEY=sk-your-key \
-e EMBEDDING_MODEL=text-embedding-3-small \
-e DASHBOARD_ENABLED=true \
-e GOOGLE_CLIENT_ID=your-client-id \
-e GOOGLE_CLIENT_SECRET=your-client-secret \
-- npx -y @dolusoft/hirebase-mcpClaude Desktop
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"hirebase": {
"command": "npx",
"args": ["-y", "@dolusoft/hirebase-mcp"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"LANCEDB_PATH": "./data/lancedb",
"GOOGLE_CLIENT_ID": "your-client-id",
"GOOGLE_CLIENT_SECRET": "your-client-secret"
}
}
}
}Environment Variables
| Variable | Default | Description |
|---|---|---|
| OPENAI_API_KEY | required | OpenAI API key |
| LANCEDB_PATH | ./data/lancedb | LanceDB storage path |
| EMBEDDING_MODEL | text-embedding-3-small | Embedding model |
| EXTRACTION_MODEL | gpt-5-mini | CV extraction model |
| DASHBOARD_ENABLED | true | Enable real-time dashboard |
| DASHBOARD_PORT | 0 (random) | Dashboard server port |
| GOOGLE_CLIENT_ID | — | Google OAuth2 client ID (for calendar) |
| GOOGLE_CLIENT_SECRET | — | Google OAuth2 client secret (for calendar) |
Key Design Decisions
- LanceDB has no UPDATE — update is implemented as delete + add at repository level
- Complex fields stored as JSON strings — skills, experience, etc. serialized in Utf8 columns
- Batch embedding — all chunks embedded in a single OpenAI API call
- Seed rows — tables use
__seed__rows for schema inference, filtered out in queries - Versioning — old CV data archived before updates, full history accessible
- Section-level chunking — each work experience is a separate chunk for granular matching
- OCR fallback — image-based PDFs automatically processed with Tesseract.js
- Composite scoring — matching combines semantic, skill-based, and loyalty signals
- Google Calendar — zero-dependency OAuth2 + Calendar API using Node.js built-in
fetch
License
MIT
