scrapademic
v1.1.5
Published
Academic profile scraper (Google Scholar, etc.)
Downloads
9
Maintainers
Readme
📚 Scrapademic
Scrapademic is a powerful, zero-setup JavaScript CLI + library to scrape academic profile data from Google Scholar.
Built using Puppeteer with stealth plugin to avoid detection, Scrapademic extracts full publication lists including authors, journal info, citation count, and more.
✨ Features
- 🔍 Scrapes Google Scholar profiles
- 🧠 Detects all publications, not just recent ones
- 📊 Sort publications by citations or year
- 🛡️ Uses stealth mode to bypass bot detection
- 📦 Use as both CLI and Node.js library
- 📃 Output to txt, json, csv, sql, or markdown
🚀 Installation
npm install -g scrapademic # for CLI use
# or
npm install scrapademic # for programmatic use🔪 CLI Usage
scrapademic <userId> [options]CLI Examples
scrapademic TESLA1618
# Scrape all publications sorted by citations (default)
scrapademic TESLA1618 -y -r -l 5
# Scrape 5 recent publications sorted by year
scrapademic TESLA1618 -r -l 3 -o json -f output.json
# Scrape 3 recent pubs and save as JSON to output.json
scrapademic TESLA1618 -o md
# Scrape all and save as Markdown (scholar_output.md)CLI Options
| Flag | Description |
| ----------------------- | --------------------------------------------------------- |
| <userId> | Required. Google Scholar user ID |
| -y, --year | Sort publications by year instead of citations |
| -a, --all | Scrape all publications (default behavior) |
| -r, --recent | Scrape only recent publications |
| -l, --limit <number> | Limit results when using --recent |
| --no-stealth | Disable stealth mode (enabled by default) |
| -o, --output <format> | Save to file: txt, json, csv, sql, md |
| -f, --file <filename> | Specify output filename (default: scholar_output.<ext>) |
🧪 Library Usage
import { scrapeScholar } from "scrapademic";
const data = await scrapeScholar("TESLA1618", {
sortBy: "year", // or "citations" (default)
allPublications: true, // true: fetch all, false: first page only
limit: 10, // max results if allPublications is false
useStealth: true, // default true
});
console.log(data);Example Output:
[
{
"title": "Deep Learning for Cats",
"authors": ["Jane Doe", "John Smith", "and others"],
"journal": "Journal of Feline Studies",
"year": "2022",
"citedBy": 54
}
]scrapeScholar Options
| Option | Type | Default | Description |
| ----------------- | ------- | ----------- | ------------------------------------------------------------- |
| sortBy | String | "citations" | Sort publications by "citations" or "year" |
| allPublications | Boolean | true | If true, loads all papers by clicking "Show more" repeatedly |
| limit | Number | 6 | If allPublications is false, sets max publications to fetch |
| useStealth | Boolean | true | Use Puppeteer stealth plugin to avoid detection |
🧰 Output Formats
| Format | Description |
| ------ | ------------------------------------ |
| txt | Title only per line |
| json | Full structured data (default print) |
| csv | Spreadsheet-friendly |
| sql | SQL insert statements |
| md | Markdown list |
🤩 Tech Stack
📚 Roadmap
- [x] Google Scholar scraper
- [x] CLI support with file export
- [ ] Add support for ResearchGate
- [ ] Add support for Semantic Scholar
- [ ] Automatic citation tracking
🧑💻 Author
Made with 🧠 by Rajieb R.
📄 License
GNU General Public License v3.0 — free to use, modify, and share.
⭐️ Star the Repo
If you find this helpful, please consider giving it a ⭐️ on GitHub!
