job-search-scan

v1.0.2

Published

20 days ago

Multi-agent job scanning pipeline for Claude Code — discovers, extracts, and evaluates job listings

0High
0Medium
0Low

mariogj1987

claude-code job-search automation agents mcp

Job Search Scan

A Claude Code skill that scans job boards, pulls descriptions, and scores them against your CV and preferences. You configure which companies to watch and what you care about, then run /scan. It does the rest.

Three agents run in sequence. A discovery agent hits public ATS APIs (Greenhouse, Lever, Ashby) to find open positions. An extraction agent grabs full job descriptions for anything the APIs missed. An evaluation agent reads each description against your CV and preferences, then writes a pass/fail report. Python hooks handle the data plumbing between stages.

What It Does

Discovery — Fetches every open listing from your configured company portals. API-first, with Playwright as a fallback for custom career pages.
Extraction — Fills in missing job descriptions by visiting the posting URL directly. Looks for schema.org JobPosting JSON-LD on the page.
Evaluation — Runs a 4-step checklist per job: dealbreakers, hard preferences, CV-to-JD alignment, soft preferences. Writes pass or fail and explains why.

The output is a dated Markdown report in reports/.

Quick Start

1. Install

npx job-search-scan

Run this inside the directory where you want the skill installed. It copies agents, hooks, config templates, and scripts into place.

2. Configure

Four files in config/ need your input before the first run:

| File | What to put in it | |------|-------------------| | portals.yml | Companies you want to track. Each entry needs an ATS type and slug. | | filters.yml | Title keywords (positive and negative) for quick filtering after discovery. | | preferences.yml | Location, salary floor, company size, industries, dealbreakers, priorities. | | cv.md | Your CV in Markdown. The evaluation agent reads this to judge fit. |

The config ships with commented-out examples. Uncomment and fill in your own values.

A validation script runs at the start of every scan and will tell you exactly what's missing or malformed.

3. Run

/scan

That's it. The skill orchestrates all three stages, reports progress to stdout, and writes the final evaluation to reports/YYYY-MM-DD-evaluation.md.

Pipeline

/scan
  │
  ├─ preflight + config validation
  │
  ├─ Stage 1: Discovery Agent
  │     reads portals.yml
  │     calls ATS APIs (or Playwright for custom portals)
  │     writes raw-discoveries.json
  │     ── post_discovery hook ──> validates, deduplicates, upserts to DB, applies keyword filters
  │
  ├─ Stage 2: Extraction Agent
  │     ── pre_extraction hook ──> builds work file from DB (qualified jobs with null descriptions)
  │     visits job URLs via Playwright
  │     writes extraction_output.json
  │     ── post_extraction hook ──> updates DB with descriptions
  │
  └─ Stage 3: Evaluation Agent
        ── pre_evaluation hook ──> builds work file from DB (qualified jobs with descriptions)
        reads cv.md + preferences.yml
        scores each job on a 4-step checklist
        writes evaluation report + evaluation_output.json
        ── post_evaluation hook ──> updates DB with pass/fail verdicts

Stages that have nothing to process (zero qualified jobs, zero missing descriptions) skip cleanly. The scan still completes.

Supported ATS Platforms

| ATS | API Pattern | Companies Using It | |-----|------------|-------------------| | Greenhouse | boards-api.greenhouse.io/v1/boards/{slug}/jobs | Airbnb, Stripe, Figma, Coinbase, Vercel, Discord, Databricks, GitLab, Reddit, Squarespace | | Lever | api.lever.co/v0/postings/{slug} | Spotify, Plaid | | Ashby | api.ashbyhq.com/posting-api/job-board/{slug} | Notion, Ramp, Linear, Replit, Cursor | | Custom | Playwright browser automation | Any careers page (you provide the URL) |

All three API-backed platforms use public, unauthenticated GET endpoints. No API keys needed.

Adding a company means adding 4-5 lines to portals.yml. See docs/ats-endpoints.md for the full endpoint reference if you need to debug or add a new ATS type.

Agents and Hooks

Agents

| Agent | Stage | Tools | What It Produces | |-------|-------|-------|-----------------| | Discovery | 1 | Read, Bash (fetch_ats.py), Playwright | data/raw-discoveries.json | | Extraction | 2 | Read, Write, Playwright | data/_tmp/extraction_output.json | | Evaluation | 3 | Read, Write | reports/YYYY-MM-DD-evaluation.md + data/_tmp/evaluation_output.json |

Hooks

Hooks fire automatically via SubagentStart and SubagentStop matchers in .claude/settings.json. You don't call them.

| Hook | Fires | What It Does | |------|-------|-------------| | pre_extraction.py | Before extraction agent | Queries DB for jobs needing descriptions, writes work file. Blocks the agent if there's nothing to do. | | pre_evaluation.py | Before evaluation agent | Queries DB for jobs ready to evaluate, writes work file. Blocks if empty. | | post_discovery.py | After discovery agent | Validates, deduplicates, upserts to DB, applies keyword filters from filters.yml. | | post_extraction_db.py | After extraction agent | Updates DB with extracted descriptions. | | post_evaluation.py | After evaluation agent | Updates DB with pass/fail verdicts. |

Data

data/jobs.db.json is a flat JSON file that acts as the database. Hooks read and write it between stages. You don't need to touch it, but it's human-readable if you want to inspect state.

Project Structure

.claude/
  agents/          # Agent definitions (discovery, extraction, evaluation)
  skills/scan/     # The /scan orchestrator skill
  settings.json    # Hook wiring (SubagentStart/SubagentStop matchers)
bin/
  install.js       # npx installer
config/
  portals.yml      # Which companies to scan
  filters.yml      # Title keyword filters
  preferences.yml  # Your job preferences
  cv.md            # Your CV
data/
  jobs.db.json     # Job database (auto-managed by hooks)
docs/
  ats-endpoints.md # ATS API reference
reports/           # Evaluation reports land here
scripts/           # All Python hooks and utilities

Requirements

Claude Code with agent and skill support
Python 3.10+ with pyyaml (pip install -r requirements.txt)
Playwright MCP server (only needed if you configure ats: custom portals)

Example Output

A typical /scan session prints something like this:

SCAN: PREFLIGHT -> checks passed
SCAN: CONFIG -> check passed
SCAN: start (pipeline: discovery -> extraction -> evaluation)

STAGE 1 DISCOVERY:
DISCOVERY: 247 listings across 15 of 17 portals (14 via API, 1 via playwright fallback)
DISCOVERY: artifact data/raw-discoveries.json
POST-DISCOVERY: hook_report {new=12, known=235, qualified=8, disqualified=4, validation_failed=0}

STAGE 2 EXTRACTION: skipped (0 jobs need extraction)

STAGE 3 EVALUATION:
Total: 8 jobs evaluated (3 pass, 5 fail)
Report written: reports/2026-04-12-evaluation.md

SCAN: ok (3 pass / 5 fail; report at reports/2026-04-12-evaluation.md)

FAQ

Can I add companies that aren't on Greenhouse, Lever, or Ashby?

Yes. Set ats: custom and provide a careers_url in portals.yml. The discovery agent will use Playwright to scrape the page. Results vary depending on how the site is built.

What if I only have a CV but no preferences?

The evaluation agent detects this and switches to CV-only mode. It skips the dealbreaker and preference checks and scores purely on CV-to-JD alignment.

How does deduplication work?

The post-discovery hook deduplicates by (title, company) pair. If a job was already seen in a previous scan, it's marked known and skipped.

Where do reports go?

reports/YYYY-MM-DD-evaluation.md. One file per scan date. If you run /scan twice on the same day, the second run overwrites the first report.

License

MIT