@dolusoft/hirebase-mcp
v1.4.0
Published
HireBase - AI-powered CV search engine with LanceDB and MCP
Readme
HireBase
AI-powered recruitment platform built as an MCP Server. Parses CVs, extracts structured data with GPT, stores them in a LanceDB vector database, and manages the entire hiring pipeline — from candidate matching to interview scheduling.
Architecture
DDD + Clean Architecture with clear separation of concerns:
src/
├── domain/ # Entities, Value Objects, Repository Interfaces
├── application/ # Use Cases, Ports, DTOs
├── infrastructure/ # LanceDB, OpenAI, PDF/DOCX/OCR Parsers
├── interface/ # MCP Server, Tools, Dashboard, CLI
└── shared/ # Errors, Types, UtilitiesDesign Principle: MCP Server = Data layer (parse, store, search, return). The AI client using the tools handles analysis, ranking strategy, and match explanations.
Tech Stack
| Technology | Purpose |
|---|---|
| TypeScript 5.9 | Language |
| LanceDB | Vector database (embedded, serverless) |
| OpenAI text-embedding-3-small | 1536-dim embeddings |
| OpenAI gpt-5-mini | Structured CV extraction (Responses API) |
| MCP SDK | Model Context Protocol server |
| Nuxt 4 + Nuxt UI | Real-time dashboard |
| unpdf | PDF text extraction |
| mammoth | DOCX text extraction |
| Tesseract.js | OCR for image-based PDFs |
MCP Tools (38)
CV Management
| Tool | Description |
|---|---|
| add_cv | Parse CV file (PDF/DOCX), extract structured data, embed, store |
| update_cv | Update existing CV, archive old version |
| delete_cv | Delete candidate and all related data |
| get_cv_detail | Full candidate data with download URL |
| get_cv_chunks | CV section chunks with optional section filter |
| get_cv_versions | Version history |
| list_cvs | Paginated candidate list with sorting |
| manage_tags | Add/remove/list tags |
| check_duplicate_candidate | Check if a candidate already exists by name or email before importing |
Search & Matching
| Tool | Description |
|---|---|
| semantic_search | Vector similarity search with optional section filter |
| filter_candidates | Structured filtering (skills, location, experience, languages, tags) |
| match_candidates | Composite scoring: semantic similarity + skill overlap + loyalty factor |
| compare_candidates | Compare 2–5 candidates side by side, optionally with job skill match scores |
| export_results | Export as JSON or CSV |
Job Posting Management
| Tool | Description |
|---|---|
| add_job_posting | Create job posting with required/preferred skills and metadata |
| list_job_postings | List all job postings |
| get_job_posting | Retrieve specific job posting details |
| update_job_posting | Update existing job posting |
| delete_job_posting | Delete a job posting |
Reference CVs
| Tool | Description |
|---|---|
| add_reference_cv | Mark a candidate as a reference/benchmark CV for a job posting |
| list_reference_cvs | List all reference CVs for a job posting |
| remove_reference_cv | Remove a reference CV from a job posting |
Recruitment Pipeline
| Tool | Description |
|---|---|
| add_to_pipeline | Add candidate to a job posting pipeline |
| update_pipeline_status | Update application status |
| add_pipeline_note | Add notes, call attempts, or interview records |
| set_pending_action | Track pending actions with due dates and owner (us/candidate) |
| get_pipeline | Full pipeline view for a job posting |
| get_pipeline_candidates | Lightweight list of candidates in a pipeline (ID, name, status only) |
| get_candidate_history | Candidate's recruitment history across all postings |
| get_pending_actions | All overdue and upcoming pending actions |
Pipeline statuses: new → contacted → unreachable · not_interested · interview_scheduled → no_show · interviewed → rejected · offer_sent → hired
Screening
| Tool | Description |
|---|---|
| get_screening_history | Shortlisted, screened out, and in-pipeline candidates for a job posting |
| generate_screening_report | Summary report with statistics, shortlisted, screened out, and pending candidates |
Call Batches
| Tool | Description |
|---|---|
| create_call_batch | Create a call batch: assign candidates to a person for calling |
| get_call_batch_status | Status of a call batch: candidates, phone numbers, and call outcomes |
| update_call_outcome | Update call outcome: pending, reached, or unreachable |
System
| Tool | Description |
|---|---|
| get_stats | Database statistics (candidates, skills, locations, etc.) |
| get_server_status | Server version, dashboard URL, connected clients, config |
| reset_database | Drop and recreate all tables |
How It Works
CV Ingestion
CV File (PDF/DOCX/Scanned PDF)
│
▼
[1] Parse text (unpdf / mammoth / Tesseract.js OCR)
│
▼
[2] Extract structured data (GPT-5 mini)
│ name, experience, education, skills, languages...
▼
[3] Chunk into sections
│ summary, each experience, education, skills, projects, certifications, full text
▼
[4] Generate embeddings (text-embedding-3-small, 1536d)
│
▼
[5] Store in LanceDB (candidates + cv_chunks + cv_versions)Candidate Matching
match_candidates uses a composite scoring algorithm:
compositeScore = (0.5 × vectorScore) + (0.3 × skillOverlap) + (0.2 × loyaltyFactor)| Component | Weight | Description |
|---|---|---|
| Vector Score | 50% | Semantic similarity between CV and job description |
| Skill Overlap | 30% | Ratio of matched required skills to total required skills |
| Loyalty Factor | 20% | Candidate stability: stable = 1.0, moderate = 0.7, flight_risk = 0.4 |
Recruitment Pipeline
Job Posting
│
▼
[1] match_candidates → ranked candidate list
│
▼
[2] add_to_pipeline → create applications (status: new)
│
▼
[3] update_pipeline_status → track progress through hiring stages
│ add_pipeline_note → log calls, interviews, notes
│ set_pending_action → assign follow-up tasks
▼
[4] get_pending_actions → monitor overdue/upcoming itemsDashboard
HireBase includes a real-time dashboard built with Nuxt 4 + Nuxt UI + Tailwind CSS. The dashboard connects via WebSocket and provides live tracking of MCP tool executions.
Enable via environment variable:
DASHBOARD_ENABLED=true
DASHBOARD_PORT=61496The dashboard URL is available in get_server_status output.
Database Schema
LanceDB tables:
| Table | Purpose |
|---|---|
| candidates | Candidate profiles (name, email, skills, experience, loyalty score, tags) |
| cv_chunks | Searchable CV sections with 1536-dim vector embeddings |
| cv_versions | Archived CV versions for update history |
| job_postings | Job listings with required/preferred skills and status |
| applications | Pipeline records linking candidates to jobs with status tracking |
| application_events | Audit trail: status changes, call attempts, interview records |
Installation
Prerequisites
- Node.js >= 22 (download)
- OpenAI API Key — required for CV extraction and embeddings. Get one at platform.openai.com/api-keys
Important: HireBase will not start without a valid
OPENAI_API_KEY. If you seeOPENAI_API_KEY environment variable is required, make sure the key is set in your MCP configuration (see below).
Claude Code (recommended)
No global install needed — npx downloads and runs the latest version automatically:
claude mcp add hirebase -s user \
-e OPENAI_API_KEY=sk-your-key \
-e DASHBOARD_ENABLED=true \
-- npx -y @dolusoft/hirebase-mcpThat's it. Restart Claude Code and HireBase tools will be available.
Claude Desktop
Add to your config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"hirebase": {
"command": "npx",
"args": ["-y", "@dolusoft/hirebase-mcp"],
"env": {
"OPENAI_API_KEY": "sk-your-key",
"DASHBOARD_ENABLED": "true"
}
}
}
}Other MCP Clients
Any MCP-compatible client can use HireBase. The key requirement is passing OPENAI_API_KEY as an environment variable:
# Run directly
OPENAI_API_KEY=sk-your-key npx -y @dolusoft/hirebase-mcp
# Or install globally
npm install -g @dolusoft/hirebase-mcp
OPENAI_API_KEY=sk-your-key hirebase-mcpFrom Source
git clone https://github.com/dolusoft/hirebase.git
cd hirebase
pnpm install
pnpm build
cp .env.example .env # edit .env and add your OPENAI_API_KEY
pnpm startEnvironment Variables
| Variable | Default | Description |
|---|---|---|
| OPENAI_API_KEY | — | Required. OpenAI API key for embeddings and CV extraction |
| LANCEDB_PATH | ./data/lancedb | LanceDB storage path (relative to CWD) |
| EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model |
| EXTRACTION_MODEL | gpt-5-mini | OpenAI model for structured CV extraction |
| DASHBOARD_ENABLED | true | Enable real-time dashboard |
| DASHBOARD_PORT | 0 (random) | Dashboard server port (0 = auto-assign) |
Key Design Decisions
- LanceDB has no UPDATE — update is implemented as delete + add at repository level
- Complex fields stored as JSON strings — skills, experience, etc. serialized in Utf8 columns
- Batch embedding — all chunks embedded in a single OpenAI API call
- Seed rows — tables use
__seed__rows for schema inference, filtered out in queries - Versioning — old CV data archived before updates, full history accessible
- Section-level chunking — each work experience is a separate chunk for granular matching
- OCR fallback — image-based PDFs automatically processed with Tesseract.js
- Composite scoring — matching combines semantic, skill-based, and loyalty signals
License
MIT
