npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@aikeytake/social-automation

v2.1.0

Published

Content research and aggregation tool for AI agents

Readme

Social Automation — Content Research Tool

Content aggregation tool that scrapes AI news from multiple sources and stores structured JSON for AI agents to consume.

What It Does

  • Scrapes 17 RSS feeds (TechCrunch, OpenAI, Anthropic, Claude Blog, Google AI, DeepMind, HuggingFace, arXiv, and more)
  • Scrapes Reddit (7 AI subreddits, top posts with 100+ upvotes)
  • Scrapes Hacker News (AI-related stories with 50+ points)
  • Scrapes LinkedIn KOL posts via BrightData SERP (top 20 KOLs from your list)
  • Outputs a trending.json with the top 20 ranked items
  • Everything saved as structured JSON for AI agents

Quick Start

cd /home/vankhoa/projects/social-automation
npm install
npm run scrape

The Only Command You Need

npm run scrape

Output saved to data/YYYY-MM-DD/:

| File | Contents | |------|----------| | all.json | All items from all sources combined | | trending.json | Top 20 items ranked by engagement score | | rss.json | All RSS feed items | | reddit.json | All Reddit posts | | hackernews.json | All Hacker News stories | | linkedin.json | LinkedIn KOL posts via BrightData |

Project Structure

social-automation/
├── src/
│   ├── fetchers/
│   │   ├── rss.js          # 17 RSS feeds
│   │   ├── reddit.js       # 7 AI subreddits
│   │   ├── hackernews.js   # HN top stories
│   │   └── linkedin.js     # LinkedIn KOL posts via BrightData SERP
│   ├── utils/
│   │   └── logger.js
│   ├── cli.js
│   └── index.js            # Main scraper
├── config/
│   └── sources.json        # All source configuration
├── data/
│   └── YYYY-MM-DD/         # Daily scraped output
├── .env                    # API keys
└── package.json

Configuration

Environment Variables (.env)

Already configured. Key variables:

BRIGHTDATA_API_KEY=...        # Used for LinkedIn KOL scraping
BRIGHTDATA_ZONE=mcp_unlocker  # BrightData zone
ANTHROPIC_API_KEY=...         # Claude API (for future AI processing)

Sources (config/sources.json)

RSS Feeds (17 sources):

  • TechCrunch AI, The Gradient, MIT Technology Review AI
  • OpenAI Blog, Anthropic Blog, Claude Blog
  • Google AI Blog, DeepMind Blog, Hugging Face Blog
  • Meta Engineering, Netflix Tech Blog, AWS ML Blog
  • Microsoft AI Blog, NVIDIA Blog, LinkedIn Engineering
  • arXiv AI (cs.AI), arXiv Machine Learning (cs.LG)

Reddit: MachineLearning, artificial, ArtificialIntelligence, deeplearning, OpenAI, LocalLLaMA, singularity

Hacker News: keyword-filtered (AI, LLM, GPT, Anthropic, etc.), 50+ points

LinkedIn: top 20 KOLs from workspace/marketing/linkedin_kol_clean.json, scraped via BrightData SERP

Adding an RSS Feed

Edit config/sources.json:

{
  "rssFeeds": [
    {
      "name": "My Blog",
      "url": "https://example.com/feed.xml",
      "category": "ai-news",
      "enabled": true
    }
  ]
}

Adjusting LinkedIn KOL Limit

Edit config/sources.json:

{
  "linkedin": {
    "limit": 20
  }
}

Reading the Data

# View today's trending items
cat data/$(date +%Y-%m-%d)/trending.json | jq '.items[] | {rank, title, score}'

# View all items from a specific source
cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.source == "reddit")]'

# Search by keyword
cat data/$(date +%Y-%m-%d)/all.json | jq '[.items[] | select(.title | contains("GPT"))]'

# View LinkedIn KOL posts
cat data/$(date +%Y-%m-%d)/linkedin.json | jq '.items[]'

Using with AI Agents

Point the agent at today's data folder:

Read data/$(date +%Y-%m-%d)/trending.json and create a LinkedIn post about the top trending AI story.

Or for deeper research:

Read data/$(date +%Y-%m-%d)/all.json and summarize the most important AI developments from the last 24 hours.

Browser-Based Sources (Twitter/X & LinkedIn Browser)

Two sources use a real Chrome browser via Playwright to scrape without an API: Twitter/X and LinkedIn Browser. They share the same browser profile stored at data/playwright-profile/.

One-Time Setup

Run the setup script once to log in and save the browser session:

npm run setup:twitter

This opens a real Chrome window. Log in to both X and LinkedIn in that window (they share the same profile). Once you're logged in to both, close the window — the session is saved automatically.

⚠️ Use a dedicated scraping account, not your personal account. Sessions last several weeks. Re-run npm run setup:twitter when you see auth errors.


Twitter / X

Enable in config/sources.json:

"trendingSources": {
  "twitter": {
    "enabled": true,
    "accounts": ["AndrewYNg", "ylecun", "OpenAI", "AnthropicAI", "karpathy"],
    "minLikes": 100,
    "maxTweetsPerAccount": 5,
    "maxAgeHours": 24,
    "delayBetweenAccountsMs": 3000
  }
}

Config options:

| Key | Description | Default | |-----|-------------|---------| | accounts | X handles to scrape (without @) | [] | | minLikes | Skip tweets below this like count | 0 | | maxTweetsPerAccount | Max tweets to fetch per account | 10 | | maxAgeHours | Only include tweets from last N hours | 24 | | delayBetweenAccountsMs | Base delay between accounts (ms) | 3000 |

Run:

npm run test:twitter   # isolated test, prints results, no files written
npm run scrape         # full pipeline

How it works:

  • Visits X home feed first, then searches for each account via the search box
  • Clicks the matching result to navigate to the profile
  • Scrolls the timeline and extracts top N tweets
  • Applies a random 20–30s delay between accounts to avoid rate limiting
  • Account visit order is randomised each run

LinkedIn Browser

Scrapes posts from LinkedIn profiles using direct URL navigation to their recent activity page.

Enable in config/sources.json:

"linkedin_browser": {
  "enabled": true,
  "accounts": ["julienchaumond", "another-slug"],
  "maxPostsPerAccount": 5,
  "maxAgeHours": 48,
  "delayBetweenAccountsMs": 10000
}

The accounts value is the LinkedIn profile slug — the part after linkedin.com/in/.

Config options:

| Key | Description | Default | |-----|-------------|---------| | accounts | LinkedIn profile slugs to scrape | [] | | maxPostsPerAccount | Max posts to fetch per account | 5 | | maxAgeHours | Only include posts from last N hours | 48 | | delayBetweenAccountsMs | Base delay between accounts (ms) | 10000 |

Run:

npm run test:linkedin   # isolated test, prints results, no files written
npm run scrape          # full pipeline

How it works:

  • Navigates directly to linkedin.com/in/{slug}/recent-activity/all/
  • Scrolls to load posts, extracts text, reactions, comments, and time
  • Post URL is constructed from LinkedIn's data-urn attribute
  • Account visit order is randomised each run

Output files

| File | Source | |------|--------| | data/YYYY-MM-DD/twitter.json | Twitter/X posts | | data/YYYY-MM-DD/linkedin_browser.json | LinkedIn browser posts |

Both sources feed into all.json and trending.json automatically.


Troubleshooting

LinkedIn returns 0 items:

  • Check logs for BrightData errors: cat logs/*.log | grep -i linkedin
  • Confirm the KOL file exists: ls /home/vankhoa/projects/aikeytake/workspace/marketing/linkedin_kol_clean.json
  • The BrightData zone mcp_unlocker must exist in your BrightData account

RSS feed fails:

  • Some feeds go down temporarily — the scraper skips them and continues
  • Check logs in logs/ for specific feed errors

No data for today:

# Run the scraper
npm run scrape

# Check if data folder was created
ls data/$(date +%Y-%m-%d)/