@shashwat-0077/linkedin-job-scraper

v1.1.2

Published

25 days ago

Automated LinkedIn job scraper with session persistence and Gmail verification code fetching

0High
0Medium
0Low

shashwat-0077

linkedin scraper jobs playwright automation job-search web-scraping gmail-api

LinkedIn Job Scraper

Automated LinkedIn job scraper with session persistence and Gmail API verification code fetching.

Features

✅ Session Persistence - Saves login cookies to avoid repeated logins and CAPTCHAs
✅ Gmail API Integration - Automatically fetches verification codes from your Gmail
✅ Company Details - Extracts comprehensive company information
✅ External Apply URLs - Captures direct application links
✅ Flexible Filtering - Search by keywords, location, date, experience level, job type, and remote options
✅ TypeScript - Fully typed for better DX

Installation

npm install linkedin-job-scraper

Prerequisites

Playwright Browser:

npx playwright install chromium

LinkedIn Account: Valid LinkedIn credentials
Gmail API OAuth2 Tokens (for automatic verification):

npm run gmail:token

Quick Start

Create linkedin-scraper.config.cjs in your project root:

module.exports = {
    email: '[email protected]',
    password: 'your-linkedin-password',
    headless: false,
    gmailClientId: 'your-gmail-client-id.apps.googleusercontent.com',
    gmailClientSecret: 'your-gmail-client-secret',
    gmailRedirectUri: 'http://localhost',
    gmailRefreshToken: 'your-gmail-refresh-token',
    gmailAccessToken: 'your-gmail-access-token',
    sessionFile: './sessions/linkedin-session.json',
    tokenFile: './sessions/gmail-token.json',
};

Use the scraper:

import { LinkedInScraper } from 'linkedin-job-scraper';

const scraper = new LinkedInScraper();

const jobs = await scraper.searchJobs(
    {
        keywords: 'Software Engineer',
        location: 'San Francisco, CA',
        datePosted: 'past-week',
        experienceLevel: ['entry-level', 'associate'],
        jobType: ['full-time'],
        remote: ['remote', 'hybrid'],
    },
    10, // max jobs
);

console.log(`Found ${jobs.length} jobs`);
await scraper.close();

Configuration

Configuration File

Create a linkedin-scraper.config.cjs file in your project root:

module.exports = {
    // LinkedIn Credentials
    email: '[email protected]',
    password: 'your-password',

    // Browser options
    headless: false, // Set to true to run without visible browser
    silent: false, // Set to true to suppress console output

    // Gmail API OAuth2 Credentials (for automatic verification code fetching)
    gmailClientId: 'your-client-id.apps.googleusercontent.com',
    gmailClientSecret: 'your-client-secret',
    gmailRedirectUri: 'http://localhost',
    gmailRefreshToken: 'your-refresh-token',
    gmailAccessToken: 'your-access-token',

    // Optional: Custom file paths
    sessionFile: './sessions/linkedin-session.json',
    tokenFile: './sessions/gmail-token.json',
};

Note: For security, you can load values from environment variables in the config file:
email: process.env.LINKEDIN_EMAIL,

Important: Add linkedin-scraper.config.cjs to .gitignore if it contains sensitive credentials.

Gmail API Setup

Generate OAuth2 tokens:

npm run gmail:token

Follow the authorization URL printed in the terminal
Paste the authorization code when prompted
Tokens will be saved to token.json
Copy the tokens to your .env file

Usage

Basic Search

After creating your linkedin-scraper.config.cjs file:

import { LinkedInScraper } from 'linkedin-job-scraper';

const scraper = new LinkedInScraper();

try {
    const jobs = await scraper.searchJobs(
        {
            keywords: 'Full Stack Developer',
            location: 'Remote',
        },
        20,
    );

    jobs.forEach((job) => {
        console.log(`${job.title} at ${job.companyName}`);
        console.log(`Location: ${job.location}`);
        console.log(`Apply: ${job.applyUrl}\n`);
    });
} finally {
    await scraper.close();
}

Advanced Filtering

const jobs = await scraper.searchJobs(
    {
        keywords: 'Machine Learning Engineer',
        location: 'New York, NY',
        datePosted: 'past-24-hours', // 'past-24-hours' | 'past-week' | 'past-month'
        experienceLevel: ['mid-senior', 'director'], // 'internship' | 'entry-level' | 'associate' | 'mid-senior' | 'director' | 'executive'
        jobType: ['full-time', 'contract'], // 'full-time' | 'part-time' | 'contract' | 'temporary' | 'volunteer' | 'internship'
        remote: ['remote'], // 'on-site' | 'remote' | 'hybrid'
    },
    50,
);

Custom Configuration

All configuration is done via the linkedin-scraper.config.cjs file. Simply edit this file to customize settings:

// linkedin-scraper.config.cjs
module.exports = {
    email: '[email protected]',
    password: 'your-password',
    headless: true, // Run in headless mode
    silent: true, // Suppress console output
    gmailClientId: 'your-client-id',
    gmailClientSecret: 'your-client-secret',
    gmailRedirectUri: 'http://localhost',
    gmailRefreshToken: 'your-refresh-token',
    gmailAccessToken: 'your-access-token',
    sessionFile: './custom/path/session.json',
    tokenFile: './custom/path/token.json',
};

API Reference

`LinkedInScraper`

Constructor

new LinkedInScraper();

Configuration is loaded automatically from linkedin-scraper.config.cjs in your project root.

Configuration Options:

interface LinkedInScraperConfig {
    email: string; // LinkedIn email (required)
    password: string; // LinkedIn password (required)
    headless?: boolean; // Run browser in headless mode (default: false)
    silent?: boolean; // Suppress all console output (default: false)
    gmailClientId: string; // Gmail API client ID (required)
    gmailClientSecret: string; // Gmail API client secret (required)
    gmailRedirectUri: string; // Gmail OAuth redirect URI (required)
    gmailRefreshToken: string; // Gmail OAuth refresh token (required)
    gmailAccessToken: string; // Gmail OAuth access token (required)
    sessionFile?: string; // Path to save LinkedIn session cookies (optional)
    tokenFile?: string; // Path to save Gmail tokens (optional)
}

Methods

`searchJobs(filters, maxJobs)`

Searches for jobs on LinkedIn.

searchJobs(filters?: SearchFilters, maxJobs: number = 10): Promise<Job[]>

Parameters:

filters: Search filters (optional)
maxJobs: Maximum number of jobs to scrape (default: 10)

Returns: Array of Job objects

`close()`

Closes the browser and cleans up resources.

close(): Promise<void>

Types

`SearchFilters`

interface SearchFilters {
    keywords?: string;
    location?: string;
    datePosted?: 'any-time' | 'past-24-hours' | 'past-week' | 'past-month';
    experienceLevel?: Array<'internship' | 'entry-level' | 'associate' | 'mid-senior' | 'director' | 'executive'>;
    jobType?: Array<'full-time' | 'part-time' | 'contract' | 'temporary' | 'volunteer' | 'internship'>;
    remote?: Array<'on-site' | 'remote' | 'hybrid'>;
}

`Job`

interface Job {
    id: string;
    title: string;
    link: string;
    applyUrl: string;
    location: string;
    postedAt: string;
    companyName: string;
    companyLinkedinUrl?: string;
    companyWebsite?: string;
    companyDescription?: string;
    companyAddress?: string;
    companyEmployeesCount?: string;
    description: string;
    industries?: string;
}

Session Persistence

The scraper automatically saves your login session to linkedin-session.json after successful authentication. On subsequent runs:

✅ Loads saved cookies
✅ Validates session
✅ Skips login if session is valid
✅ Only logs in again if session expired

This dramatically reduces CAPTCHA challenges and speeds up execution.

How It Works

Initialization: Launches Playwright browser
Session Check: Loads saved cookies if available
Login: Authenticates with LinkedIn (if needed)
Verification: Automatically fetches code from Gmail API if challenged
Search: Applies filters and navigates job listings
Scraping: Extracts job details and company information
Company Details: Fetches additional company data in parallel
Cleanup: Closes browser and saves session

##Troubleshooting

CAPTCHA Challenges

Session persistence reduces CAPTCHAs by ~90%
If CAPTCHA appears, complete it manually in the browser
The session will be saved for future runs

Gmail Verification Not Working

# Regenerate tokens
npm run gmail:token

# Check tokens in .env
# Ensure refresh_token is valid

Session Not Loading

# Delete saved session and login again
rm linkedin-session.json

Examples

See the examples/ directory for more usage examples.

License

ISC

Contributing

Pull requests are welcome! For major changes, please open an issue first.

Note: This tool is for educational purposes. Please respect LinkedIn's Terms of Service and use responsibly.