coursera-scraper

v1.0.5

Published

6 days ago

Unofficial Coursera scraper/downloader CLI for Node.js.

0High
0Medium
0Low

sipayisko

coursera scraper downloader nodejs cli typescript

⚠️ Disclaimer: This project is not affiliated with, endorsed by, or sponsored by Coursera.

It uses a locally saved Coursera session, validates coursera.org/learn/... URLs, discovers course/module/lesson links, extracts video or reading content, and writes the results to local downloads/ folders. Internally, it uses Playwright for browser-driven login, navigation, and response interception.

Requirements

Node.js 20+
Google Chrome installed for Playwright's channel: "chrome" launch mode

Install

Install from npm

Install the CLI globally:

npm install -g coursera-scraper

Install the Chrome browser that Playwright will launch:

npx -y playwright install chrome

Start the CLI:

coursera-dl

Install from source

Install project dependencies:

npm install

Install the Chrome browser that Playwright will launch:

npx playwright install chrome

Build the TypeScript sources:

npm run build

Start the CLI:

npm run cli

First run

Run the CLI:

coursera-dl

Choose the authentication flow when prompted, complete the login in the opened Chrome window, and wait for the CLI to confirm that your local session was saved.
After login, run the CLI again and paste a course URL such as:

https://www.coursera.org/learn/course-slug/home/welcome

The downloader will save output under the local downloads/ folder.

Usage

Interactive CLI:

coursera-dl

Interactive CLI from a source checkout:

npm run cli

Direct download entry point from a source checkout:

npm run download "https://www.coursera.org/learn/course-slug/home/welcome"

Queue commands:

coursera-dl queue add "https://www.coursera.org/learn/course-slug/home/welcome" --concurrency 3
coursera-dl queue list
coursera-dl queue run
coursera-dl queue remove QUEUE_ITEM_ID
coursera-dl queue retry-failed

The persistent queue is stored in ~/.coursera-scraper/queue.json, so queued links survive restarts.

Security posture

Session state is stored outside the repository in ~/.coursera-scraper/auth.json.
Downloaded filenames and folders are sanitized before being written to disk.
Downloads are restricted to https:// URLs, block localhost/private-network targets, cap redirects, and enforce a 2 GB per-file limit.
Parallel downloads are capped to reduce accidental rate spikes.

Responsible use

Only access content you are enrolled in and allowed to access.
Do not commit auth.json, screenshots, course exports, or debug dumps.
Check SECURITY.md before opening issues.

Open source caveats

The repository is technically safer after hardening, but publishing a public downloader for a proprietary learning platform can still carry policy, copyright, and trademark risk. This README does not hide this fact. Before making the repository public, review:

Coursera Terms of Use
any local copyright exceptions or fair-use assumptions you are relying on
whether the project name and README wording imply affiliation

Development

npm run scan:sensitive
npm run lint
npm run build
npm run audit:prod

License

MIT. See LICENSE.