@blackridder22/google-news-scraper
v0.0.1
Published
Lightweight async scraper for Google News
Maintainers
Readme
Google News Scraper 📰
A lightweight, asynchronous scraper for Google News that retrieves articles, resolves redirects, and ensures clean data.
Features
- 🔍 Search & Topics: Scrape by search term or topic URL.
- 🔗 Smart Redirect Resolution: Automatically resolves Google's "ugly" tracking URLs to the direct publisher links.
- 🖼️ High-Quality Images: Extracts high-resolution images (
og:image) from the source article, replacing low-quality Google thumbnails. - 🧹 Auto-Filtering: Optional strict filtering to ensure you only get data with resolved URLs and clean images.
- ⏱️ Timeframe Support: Filter news by hours, days, years (e.g.,
1h,7d,1y).
Installation
npm install @blackridder22/google-news-scraperQuick Start
const googleNewsScraper = require('@blackridder22/google-news-scraper');
(async () => {
const articles = await googleNewsScraper({
searchTerm: "Artificial Intelligence",
prettyURLs: true,
timeframe: "1d",
filter: true, // Only return articles with resolved links and images
puppeteerArgs: ['--no-sandbox']
});
console.log(articles);
})();Configuration Options
The function accepts a configuration object with the following properties:
| Property | Type | Default | Description |
|----------|------|---------|-------------|
| searchTerm | string | null | The search query (e.g., "Crypto"). |
| baseUrl | string | ... | Alternate base URL (e.g., for specific topic pages). |
| prettyURLs | boolean | true | Resolve Google redirects to actual publisher URLs. |
| filter | boolean | false | New! If true, removes any article where the URL or Image could not be resolved (i.e., still points to news.google.com). |
| timeframe | string | 7d | Filter by age: h (hours), d (days), y (years). Example: 12h. |
| puppeteerArgs | array | [] | Additional flags for Puppeteer (e.g., ['--no-sandbox']). |
| limit | number | null | Limit the number of results returned. |
| getArticleContent| boolean| false | Experimental: Attempts to fetch full article text (slow). |
Output Format
Returns an array of article objects:
[
{
"title": "Example News Title",
"link": "https://www.nytimes.com/...",
"image": "https://www.nytimes.com/images/...",
"source": "New York Times",
"datetime": "2025-12-22T10:00:00.000Z",
"time": "2 hours ago",
"articleType": "regular"
}
]Why use filter: true?
Google News provides "tracking" URLs and internal thumbnail images (news.google.com/api/attachments/...).
- Without Filter: You get 100% of results, but some may have ugly URLs or protected images.
- With Filter: The scraper verifies everything. If it can't resolve the redirect or find a high-quality
og:imageon the publisher's site, it drops that result. You get fewer results, but they are guaranteed to be "clean".
License
MIT
