scoutli

v0.0.2

Published

a day ago

A CLI-based LinkedIn scraper for personal/educational use

Downloads

166

0High
0Medium
0Low

chrisling-dev

scoutli

A CLI-based LinkedIn scraper to scrape LinkedIn profiles and companies into structured JSON.

Disclaimer: This tool is for personal/educational use only. Use of this tool may violate LinkedIn's Terms of Service. Use at your own risk. Not for commercial use.

Setup

npm install
npm run build

For Playwright browser fallback, install browsers:

npx playwright install chromium

Authentication

Log in with your LinkedIn email and password. A browser window will open, authenticate, and capture your session automatically:

scoutli auth login --email "[email protected]" --password "your-password"
scoutli auth status    # check if session is valid
scoutli auth logout    # remove stored credentials

If LinkedIn asks for 2FA or a CAPTCHA, complete it in the browser window — the CLI will wait up to 2 minutes for you to finish.

Alternatively, you can authenticate with manually copied session tokens:

scoutli auth token --li-at "YOUR_LI_AT_TOKEN" --jsessionid "YOUR_JSESSIONID"

Credentials are stored in ~/.scout/config.json with restricted file permissions.

Scraping

Individual Profiles

scoutli scrape profile <linkedin-username>
scoutli scrape profile johndoe -o profile.json

Output includes: name, headline, summary, location, profile picture, followers, connections, experience, education, skills, certifications, projects, languages, and websites.

Companies

scoutli scrape company <company-slug>
scoutli scrape company google -o google.json

Output includes: name, description, website, industry, company size, headquarters, founded year, type, specialties, logo, and follower count.

Options

| Flag | Description | |------|-------------| | -o, --output <file> | Write JSON to file instead of stdout | | --method <auto\|voyager\|dom> | Force scraping method (default: auto) | | --no-headless | Show browser window (DOM method) | | -v, --verbose | Enable debug logging |

How It Works

The scraper uses a hybrid approach:

Voyager API (primary): LinkedIn's internal REST API returns structured JSON directly — fast and reliable
Playwright DOM (fallback): Browser automation extracts data from the rendered page when the API fails

In auto mode (default), it tries Voyager first and falls back to Playwright.

Development Guide

Prerequisites

Node.js >= 20
A LinkedIn account

Getting Started

# Install dependencies
npm install

# Install Playwright browsers (needed for DOM fallback)
npx playwright install chromium

# Run in dev mode (no build step needed, uses tsx)
npx tsx bin/scout.ts --help

Testing the CLI

# 1. Authenticate (opens browser for login)
npx tsx bin/scout.ts auth login \
  --email "[email protected]" \
  --password "your-password"

# 2. Verify auth works
npx tsx bin/scout.ts auth status

# 3. Scrape a profile (use any public LinkedIn username)
npx tsx bin/scout.ts scrape profile williamhgates

# 4. Save output to a file
npx tsx bin/scout.ts scrape profile williamhgates -o bill.json

# 5. Scrape a company
npx tsx bin/scout.ts scrape company microsoft -o microsoft.json

# 6. Use verbose mode to see what's happening
npx tsx bin/scout.ts -v scrape profile williamhgates

# 7. Force DOM scraping method (uses Playwright browser)
npx tsx bin/scout.ts scrape profile williamhgates --method dom

# 8. Show the browser while DOM scraping (useful for debugging)
npx tsx bin/scout.ts scrape profile williamhgates --method dom --no-headless

Building

# Build for production
npm run build

# Run the built version
node dist/bin/scout.js --help

Project Structure

bin/scout.ts                          Entry point (#!/usr/bin/env node)
src/
  index.ts                            CLI setup (Commander.js)
  commands/
    auth.ts                           scoutli auth login|token|status|logout
    scrape.ts                         scoutli scrape profile|company
  auth/
    config.ts                         Read/write ~/.scout/config.json
    login.ts                          Playwright-based email/password login
    validate.ts                       Validate tokens via Voyager /me
  scraper/
    voyager/
      client.ts                       HTTP client (headers, cookies)
      profile.ts                      Profile API endpoint + parser
      company.ts                      Company API endpoint + parser
    playwright/
      browser.ts                      Launch browser with cookies
      profile-dom.ts                  DOM extraction for profiles
      company-dom.ts                  DOM extraction for companies
    profile-scraper.ts                Orchestrator (Voyager → DOM fallback)
    company-scraper.ts                Orchestrator for companies
  schemas/
    profile.ts                        Zod schema for profile JSON output
    company.ts                        Zod schema for company JSON output
    config.ts                         Zod schema for stored config
  utils/
    logger.ts                         Colored log output
    rate-limit.ts                     Random delay between requests

Troubleshooting

| Issue | Fix | |-------|-----| | Session expired or invalid | Run scoutli auth login again to get a fresh session. | | Rate limited by LinkedIn | Wait a few minutes. Avoid rapid successive scrapes. | | Could not find profile data | The username may be wrong, or LinkedIn's API response format changed. Try --method dom. | | playwright errors | Run npx playwright install chromium to install the browser. | | Voyager API returns unexpected data | Use -v flag to see raw debug output. Try --method dom as fallback. |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

scoutli

Setup

Authentication

Scraping

Individual Profiles

Companies

Options

How It Works

Development Guide

Prerequisites

Getting Started

Testing the CLI

Building

Project Structure

Troubleshooting