@shashwat-0077/linkedin-job-scraper
v1.1.2
Published
Automated LinkedIn job scraper with session persistence and Gmail verification code fetching
Maintainers
Readme
LinkedIn Job Scraper
Automated LinkedIn job scraper with session persistence and Gmail API verification code fetching.
Features
✅ Session Persistence - Saves login cookies to avoid repeated logins and CAPTCHAs
✅ Gmail API Integration - Automatically fetches verification codes from your Gmail
✅ Company Details - Extracts comprehensive company information
✅ External Apply URLs - Captures direct application links
✅ Flexible Filtering - Search by keywords, location, date, experience level, job type, and remote options
✅ TypeScript - Fully typed for better DX
Installation
npm install linkedin-job-scraperPrerequisites
- Playwright Browser:
npx playwright install chromiumLinkedIn Account: Valid LinkedIn credentials
Gmail API OAuth2 Tokens (for automatic verification):
npm run gmail:tokenQuick Start
- Create
linkedin-scraper.config.cjsin your project root:
module.exports = {
email: '[email protected]',
password: 'your-linkedin-password',
headless: false,
gmailClientId: 'your-gmail-client-id.apps.googleusercontent.com',
gmailClientSecret: 'your-gmail-client-secret',
gmailRedirectUri: 'http://localhost',
gmailRefreshToken: 'your-gmail-refresh-token',
gmailAccessToken: 'your-gmail-access-token',
sessionFile: './sessions/linkedin-session.json',
tokenFile: './sessions/gmail-token.json',
};- Use the scraper:
import { LinkedInScraper } from 'linkedin-job-scraper';
const scraper = new LinkedInScraper();
const jobs = await scraper.searchJobs(
{
keywords: 'Software Engineer',
location: 'San Francisco, CA',
datePosted: 'past-week',
experienceLevel: ['entry-level', 'associate'],
jobType: ['full-time'],
remote: ['remote', 'hybrid'],
},
10, // max jobs
);
console.log(`Found ${jobs.length} jobs`);
await scraper.close();Configuration
Configuration File
Create a linkedin-scraper.config.cjs file in your project root:
module.exports = {
// LinkedIn Credentials
email: '[email protected]',
password: 'your-password',
// Browser options
headless: false, // Set to true to run without visible browser
silent: false, // Set to true to suppress console output
// Gmail API OAuth2 Credentials (for automatic verification code fetching)
gmailClientId: 'your-client-id.apps.googleusercontent.com',
gmailClientSecret: 'your-client-secret',
gmailRedirectUri: 'http://localhost',
gmailRefreshToken: 'your-refresh-token',
gmailAccessToken: 'your-access-token',
// Optional: Custom file paths
sessionFile: './sessions/linkedin-session.json',
tokenFile: './sessions/gmail-token.json',
};Note: For security, you can load values from environment variables in the config file:
email: process.env.LINKEDIN_EMAIL,
Important: Add
linkedin-scraper.config.cjsto.gitignoreif it contains sensitive credentials.
Gmail API Setup
- Generate OAuth2 tokens:
npm run gmail:tokenFollow the authorization URL printed in the terminal
Paste the authorization code when prompted
Tokens will be saved to
token.jsonCopy the tokens to your
.envfile
Usage
Basic Search
After creating your linkedin-scraper.config.cjs file:
import { LinkedInScraper } from 'linkedin-job-scraper';
const scraper = new LinkedInScraper();
try {
const jobs = await scraper.searchJobs(
{
keywords: 'Full Stack Developer',
location: 'Remote',
},
20,
);
jobs.forEach((job) => {
console.log(`${job.title} at ${job.companyName}`);
console.log(`Location: ${job.location}`);
console.log(`Apply: ${job.applyUrl}\n`);
});
} finally {
await scraper.close();
}Advanced Filtering
const jobs = await scraper.searchJobs(
{
keywords: 'Machine Learning Engineer',
location: 'New York, NY',
datePosted: 'past-24-hours', // 'past-24-hours' | 'past-week' | 'past-month'
experienceLevel: ['mid-senior', 'director'], // 'internship' | 'entry-level' | 'associate' | 'mid-senior' | 'director' | 'executive'
jobType: ['full-time', 'contract'], // 'full-time' | 'part-time' | 'contract' | 'temporary' | 'volunteer' | 'internship'
remote: ['remote'], // 'on-site' | 'remote' | 'hybrid'
},
50,
);Custom Configuration
All configuration is done via the linkedin-scraper.config.cjs file. Simply edit this file to customize settings:
// linkedin-scraper.config.cjs
module.exports = {
email: '[email protected]',
password: 'your-password',
headless: true, // Run in headless mode
silent: true, // Suppress console output
gmailClientId: 'your-client-id',
gmailClientSecret: 'your-client-secret',
gmailRedirectUri: 'http://localhost',
gmailRefreshToken: 'your-refresh-token',
gmailAccessToken: 'your-access-token',
sessionFile: './custom/path/session.json',
tokenFile: './custom/path/token.json',
};API Reference
LinkedInScraper
Constructor
new LinkedInScraper();Configuration is loaded automatically from linkedin-scraper.config.cjs in your project root.
Configuration Options:
interface LinkedInScraperConfig {
email: string; // LinkedIn email (required)
password: string; // LinkedIn password (required)
headless?: boolean; // Run browser in headless mode (default: false)
silent?: boolean; // Suppress all console output (default: false)
gmailClientId: string; // Gmail API client ID (required)
gmailClientSecret: string; // Gmail API client secret (required)
gmailRedirectUri: string; // Gmail OAuth redirect URI (required)
gmailRefreshToken: string; // Gmail OAuth refresh token (required)
gmailAccessToken: string; // Gmail OAuth access token (required)
sessionFile?: string; // Path to save LinkedIn session cookies (optional)
tokenFile?: string; // Path to save Gmail tokens (optional)
}Methods
searchJobs(filters, maxJobs)
Searches for jobs on LinkedIn.
searchJobs(filters?: SearchFilters, maxJobs: number = 10): Promise<Job[]>Parameters:
filters: Search filters (optional)maxJobs: Maximum number of jobs to scrape (default: 10)
Returns: Array of Job objects
close()
Closes the browser and cleans up resources.
close(): Promise<void>Types
SearchFilters
interface SearchFilters {
keywords?: string;
location?: string;
datePosted?: 'any-time' | 'past-24-hours' | 'past-week' | 'past-month';
experienceLevel?: Array<'internship' | 'entry-level' | 'associate' | 'mid-senior' | 'director' | 'executive'>;
jobType?: Array<'full-time' | 'part-time' | 'contract' | 'temporary' | 'volunteer' | 'internship'>;
remote?: Array<'on-site' | 'remote' | 'hybrid'>;
}Job
interface Job {
id: string;
title: string;
link: string;
applyUrl: string;
location: string;
postedAt: string;
companyName: string;
companyLinkedinUrl?: string;
companyWebsite?: string;
companyDescription?: string;
companyAddress?: string;
companyEmployeesCount?: string;
description: string;
industries?: string;
}Session Persistence
The scraper automatically saves your login session to linkedin-session.json after successful authentication. On subsequent runs:
- ✅ Loads saved cookies
- ✅ Validates session
- ✅ Skips login if session is valid
- ✅ Only logs in again if session expired
This dramatically reduces CAPTCHA challenges and speeds up execution.
How It Works
- Initialization: Launches Playwright browser
- Session Check: Loads saved cookies if available
- Login: Authenticates with LinkedIn (if needed)
- Verification: Automatically fetches code from Gmail API if challenged
- Search: Applies filters and navigates job listings
- Scraping: Extracts job details and company information
- Company Details: Fetches additional company data in parallel
- Cleanup: Closes browser and saves session
##Troubleshooting
CAPTCHA Challenges
- Session persistence reduces CAPTCHAs by ~90%
- If CAPTCHA appears, complete it manually in the browser
- The session will be saved for future runs
Gmail Verification Not Working
# Regenerate tokens
npm run gmail:token
# Check tokens in .env
# Ensure refresh_token is validSession Not Loading
# Delete saved session and login again
rm linkedin-session.jsonExamples
See the examples/ directory for more usage examples.
License
ISC
Contributing
Pull requests are welcome! For major changes, please open an issue first.
Note: This tool is for educational purposes. Please respect LinkedIn's Terms of Service and use responsibly.
