@ducksguse/news-scraper-sdk
v1.0.0
Published
Official Goose-Powered News Scraper SDK. Honk!
Maintainers
Readme
🦆 @ducksguse/news-scraper-sdk
The Official Goose-Powered News Scraper. Honk-driven intelligence for your application.
This SDK allows you to unleash our digital geese to monitor the web, scrape news, and bring you golden eggs (clusters of data).
Features
- Goose Vision: Track news by RSS feeds or Google News topics.
- Flock Clustering: Automatically groups related articles so you don't hear the same honk twice.
- Noise Filtering: Use
negative_keywordsto hiss at bad articles. - Instant Flight (Backfill): The goose flies back in time (3 days) immediately after you give it a task.
- Webhooks: Get a "HONK!" notification when new data arrives.
Installation
npm install @ducksguse/news-scraper-sdk
# or
yarn add @ducksguse/news-scraper-sdkQuick Start
import { GooseNewsClient } from '@ducksguse/news-scraper-sdk';
// Initialize the Goose
const goose = new GooseNewsClient(
'https://your-news-service-url.com/api/v1',
'YOUR_API_KEY'
);
// Optional: Check if goose is awake
goose.honk(); // Output: "HONK! The goose is ready to scrape."
async function main() {
// 1. Give the Goose a Mission (Create Profile)
const profile = await goose.createProfile({
name: "Construction Projects - Texas",
description: "Tracking new hotel construction and renovations in Texas",
sources: [
{
type: "google_news",
query: "hotel construction Texas",
language: "en"
}
],
// Smart Filters
filters: {
green_flags: {
"intent": ["breaking ground", "permit approved"]
},
// Hiss at these words (exclude them)
negative_keywords: ["website redesign", "digital transformation"],
min_relevance_score: 0.75
},
extraction_schema: {
"project_name": "string",
"budget": "string"
},
schedule: {
initial_lookback_hours: 72, // Fly back 3 days immediately
check_interval_hours: 4 // Check again every 4 hours
}
});
console.log(`Mission accepted! Profile ID: ${profile.id}`);
console.log("The goose has taken flight... 🪿");
}
main();Core Concepts
Watch Profiles (Missions)
A Watch Profile is a mission you give to the goose. It tells the goose where to look (Sources) and what to ignore (Negative Keywords).
Clusters (Golden Eggs)
The goose doesn't just bring you random sticks. It groups related articles into Clusters. If 10 sources write about the same event, you get 1 Cluster.
Backfill (Time Travel)
When you create a profile, the goose immediately performs a Backfill. It scrapes the history (default: 72 hours) so you get data instantly.
API Reference
goose.createProfile(data)
Creates a new mission. Triggers immediate backfill.
goose.getProfiles()
Lists all active missions.
goose.getProfileClusters(profileId)
Gets the latest golden eggs (clusters) for a specific mission.
goose.honk()
Verifies the client is initialized. HONK!
License
ISC - Made with ❤️ and 🪿 by DucksGuse.
