npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ai_news

v1.0.0

Published

AI 新闻聚合系统 - 使用 Gemini 分析网页内容并生成中文资讯汇总

Readme

AI 新闻聚合系统

自动化 AI 新闻聚合系统,通过 Gemini Pro 分析网页内容并生成中文资讯汇总,支持通过邮件发送。

功能特性

  • 多源数据抓取:支持手动维护的 URL 列表
  • 页面链接提取:从列表页面(如 HackerNews)自动提取文章链接
  • URL 预验证:过滤无效链接,避免浪费 Gemini 配额
  • 智能去重:URL 标准化 + 文件持久化
  • Gemini Web 自动化:使用 Playwright 控制浏览器,自动查询分析
  • 并发处理:支持多个 Gemini 会话并发处理
  • 内容汇总:按类别分组、排序,生成 Markdown
  • 格式转换:Markdown → HTML(邮件友好)
  • 邮件发送:通过 SMTP 发送到 QQ 邮箱
  • 定时调度:支持 cron 定时任务

安装

# 安装依赖
npm install

# 初始化项目
npm run init

配置

  1. 编辑 .env 文件,配置 QQ 邮箱:
[email protected]
QQ_EMAIL_PASS=your_authorization_code

获取授权码:登录 QQ 邮箱 → 设置 → 账户 → 开启 SMTP 服务 → 生成授权码

  1. data/sources/urls.json 中配置数据源:
{
  "sources": [
    {
      "url": "https://news.ycombinator.com/",
      "type": "dom",
      "linkSelector": ".titleline > a",
      "titleSelector": ".titleline",
      "excludePatterns": ["from\\?site="]
    },
    {
      "url": "https://www.theverge.com/",
      "type": "sitemap"
    },
    {
      "url": "https://techcrunch.com/",
      "type": "rss",
      "feedUrl": "https://techcrunch.com/feed/"
    },
    {
      "url": "https://openai.com/blog",
      "type": "auto"
    }
  ]
}

数据源类型

  • rss: RSS Feed 提取
  • sitemap: Sitemap 提取
  • dom: 使用 CSS 选择器从页面提取
  • auto: 自动检测(依次尝试 RSS → Sitemap)

DOM 类型配置

  • linkSelector: 链接元素的 CSS 选择器
  • titleSelector: 标题元素的 CSS 选择器(可选)
  • excludePatterns: 要排除的 URL 正则模式(可选)

使用

命令行

# 抓取 URL 并使用 Gemini 分析
npm run scrape

# 生成汇总并发送邮件
npm run generate

定时任务

使用 cron skill 配置定时任务:

// 每小时抓取
0 * * * * npm run scrape

// 每天早上 8 点生成汇总
0 8 * * * npm run generate

项目结构

ai_news/
├── src/
│   ├── managers/          # 管理器(URL、去重、存储)
│   ├── scrapers/          # 抓取器(URL 验证、Gemini 控制、列表提取)
│   ├── processors/        # 内容处理
│   ├── generators/        # 汇总生成(Markdown、HTML)
│   ├── email/             # 邮件发送
│   └── scheduler/         # 定时任务
├── scripts/               # 命令行脚本
├── data/                  # 数据目录
├── output/                # 输出目录
└── config/                # 配置文件

配置选项

config/config.json 中可以配置:

{
  "pageExtraction": {
    "enabled": true,           // 是否启用页面链接提取
    "headless": true,          // 是否使用无头浏览器
    "timeout": 30000,          // 页面加载超时(毫秒)
    "selectors": [             // 提取链接的 CSS 选择器
      "article a[href]",
      ".post a[href]",
      ".item a[href]"
    ],
    "filteredDomains": [       // 过滤的域名
      "facebook.com",
      "twitter.com",
      "x.com"
    ]
  },
  "gemini": {
    "concurrency": 2,          // Gemini 并发数
    "chatUrls": [              // 多个 Gemini 会话 URL
      "https://gemini.google.com/app/xxx"
    ]
  }
}

注意事项

  1. 首次使用:首次运行时需要手动在浏览器中登录 Gemini
  2. Session 过期:Gemini Session 会过期,需要重新登录
  3. 网络稳定:Gemini 响应时间不确定,建议设置合理的超时
  4. 邮件频率:QQ 邮箱可能有反垃圾机制,注意发送频率

技术栈

  • Node.js
  • Playwright(浏览器自动化)
  • marked(Markdown 转 HTML)
  • juice(CSS 内联)
  • nodemailer(邮件发送)

License

ISC