marink-studentjobs
v1.0.0
Published
A fully automated system to scrape IT student job listings from studentski-poslovi.hr and send email notifications for new job postings. This project utilizes Node.js for web scraping and email handling, along with GitHub Actions for automation.
Readme
Student Job Scraper
A fully automated system to scrape IT student job listings from studentski-poslovi.hr and send email notifications for new job postings. This project utilizes Node.js for web scraping and email handling, along with GitHub Actions for automation.
Features
- Automated Job Scraping: Fetches the latest IT job listings every 6 hours.
- Email Notifications: Sends an email summarizing new job postings.
- GitHub Actions Automation: Ensures the script runs at regular intervals with no manual intervention.
- Job History Tracking: Tracks previously scraped jobs to avoid duplicate notifications.
How It Works
- The script scrapes IT job postings from studentski-poslovi.hr.
- It compares the newly scraped jobs with previously saved jobs (previous_jobs.json).
- If new jobs are found, an email is sent to the configured recipient(s).
- The job history is updated and committed back to the repository.
Prerequisites
- Node.js: Ensure Node.js (v20.16.0 or later) is installed.
- GitHub Secrets: Configure the following secrets in your repository:
- EMAIL_USER: Your email address (used as the sender).
- EMAIL_APP_PASSWORD: Your email account's app-specific password.
- EMAIL_TO: The recipient's email address.
- GH_TOKEN: A GitHub personal access token for committing updates to the repository.
Installation
- Clone the repository:
git clone https://github.com/yourusername/studentjobs.git cd studentjobs - Install the required dependencies:
npm install - Create a .env file with the following content:
[email protected] EMAIL_APP_PASSWORD=your-app-password [email protected] - (Optional) Add a previous_jobs.json file to track job history. An empty array is sufficient:
[]
Automation with GitHub Actions
This project uses GitHub Actions to run the script every 6 hours. The workflow is defined in .github/workflows/scrape-jobs.yml.
Key Features of the Workflow:
- Automated Scheduling: The script runs every 6 hours using cron.
- Push Updates: Updates to previous_jobs.json are committed and pushed back to the repository.
- Manual Triggering: The workflow can also be triggered manually via the GitHub Actions interface.
License
This project is licensed under the MIT License. See the LICENSE file for details.
