scoutli
v0.0.2
Published
A CLI-based LinkedIn scraper for personal/educational use
Downloads
166
Readme
scoutli
A CLI-based LinkedIn scraper to scrape LinkedIn profiles and companies into structured JSON.
Disclaimer: This tool is for personal/educational use only. Use of this tool may violate LinkedIn's Terms of Service. Use at your own risk. Not for commercial use.
Setup
npm install
npm run buildFor Playwright browser fallback, install browsers:
npx playwright install chromiumAuthentication
Log in with your LinkedIn email and password. A browser window will open, authenticate, and capture your session automatically:
scoutli auth login --email "[email protected]" --password "your-password"
scoutli auth status # check if session is valid
scoutli auth logout # remove stored credentialsIf LinkedIn asks for 2FA or a CAPTCHA, complete it in the browser window — the CLI will wait up to 2 minutes for you to finish.
Alternatively, you can authenticate with manually copied session tokens:
scoutli auth token --li-at "YOUR_LI_AT_TOKEN" --jsessionid "YOUR_JSESSIONID"Credentials are stored in ~/.scout/config.json with restricted file permissions.
Scraping
Individual Profiles
scoutli scrape profile <linkedin-username>
scoutli scrape profile johndoe -o profile.jsonOutput includes: name, headline, summary, location, profile picture, followers, connections, experience, education, skills, certifications, projects, languages, and websites.
Companies
scoutli scrape company <company-slug>
scoutli scrape company google -o google.jsonOutput includes: name, description, website, industry, company size, headquarters, founded year, type, specialties, logo, and follower count.
Options
| Flag | Description |
|------|-------------|
| -o, --output <file> | Write JSON to file instead of stdout |
| --method <auto\|voyager\|dom> | Force scraping method (default: auto) |
| --no-headless | Show browser window (DOM method) |
| -v, --verbose | Enable debug logging |
How It Works
The scraper uses a hybrid approach:
- Voyager API (primary): LinkedIn's internal REST API returns structured JSON directly — fast and reliable
- Playwright DOM (fallback): Browser automation extracts data from the rendered page when the API fails
In auto mode (default), it tries Voyager first and falls back to Playwright.
Development Guide
Prerequisites
- Node.js >= 20
- A LinkedIn account
Getting Started
# Install dependencies
npm install
# Install Playwright browsers (needed for DOM fallback)
npx playwright install chromium
# Run in dev mode (no build step needed, uses tsx)
npx tsx bin/scout.ts --helpTesting the CLI
# 1. Authenticate (opens browser for login)
npx tsx bin/scout.ts auth login \
--email "[email protected]" \
--password "your-password"
# 2. Verify auth works
npx tsx bin/scout.ts auth status
# 3. Scrape a profile (use any public LinkedIn username)
npx tsx bin/scout.ts scrape profile williamhgates
# 4. Save output to a file
npx tsx bin/scout.ts scrape profile williamhgates -o bill.json
# 5. Scrape a company
npx tsx bin/scout.ts scrape company microsoft -o microsoft.json
# 6. Use verbose mode to see what's happening
npx tsx bin/scout.ts -v scrape profile williamhgates
# 7. Force DOM scraping method (uses Playwright browser)
npx tsx bin/scout.ts scrape profile williamhgates --method dom
# 8. Show the browser while DOM scraping (useful for debugging)
npx tsx bin/scout.ts scrape profile williamhgates --method dom --no-headlessBuilding
# Build for production
npm run build
# Run the built version
node dist/bin/scout.js --helpProject Structure
bin/scout.ts Entry point (#!/usr/bin/env node)
src/
index.ts CLI setup (Commander.js)
commands/
auth.ts scoutli auth login|token|status|logout
scrape.ts scoutli scrape profile|company
auth/
config.ts Read/write ~/.scout/config.json
login.ts Playwright-based email/password login
validate.ts Validate tokens via Voyager /me
scraper/
voyager/
client.ts HTTP client (headers, cookies)
profile.ts Profile API endpoint + parser
company.ts Company API endpoint + parser
playwright/
browser.ts Launch browser with cookies
profile-dom.ts DOM extraction for profiles
company-dom.ts DOM extraction for companies
profile-scraper.ts Orchestrator (Voyager → DOM fallback)
company-scraper.ts Orchestrator for companies
schemas/
profile.ts Zod schema for profile JSON output
company.ts Zod schema for company JSON output
config.ts Zod schema for stored config
utils/
logger.ts Colored log output
rate-limit.ts Random delay between requestsTroubleshooting
| Issue | Fix |
|-------|-----|
| Session expired or invalid | Run scoutli auth login again to get a fresh session. |
| Rate limited by LinkedIn | Wait a few minutes. Avoid rapid successive scrapes. |
| Could not find profile data | The username may be wrong, or LinkedIn's API response format changed. Try --method dom. |
| playwright errors | Run npx playwright install chromium to install the browser. |
| Voyager API returns unexpected data | Use -v flag to see raw debug output. Try --method dom as fallback. |
