npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@talent-scout/data-processor

v0.1.1

Published

[![GitHub Actions](https://github.com/presence-io/talent-scout/actions/workflows/publish.yml/badge.svg)](https://github.com/presence-io/talent-scout/actions/workflows/publish.yml) [![npm: @talent-scout/data-processor](https://img.shields.io/npm/v/%40talen

Readme

@talent-scout/data-processor

GitHub Actions npm: @talent-scout/data-processor Node.js License: MIT

@talent-scout/data-processor 把原始线索变成可评估的候选人。它负责合并、去重、身份识别和规则层评分,是从“线索池”走向“候选池”的关键分水岭。

开发前提

  • Node.js 22+
  • pnpm 10+
  • gh 已安装并登录

在仓库根目录安装依赖:

pnpm install

常用命令

pnpm --filter @talent-scout/data-processor run process
pnpm --filter @talent-scout/data-processor run validate:identity
pnpm --filter @talent-scout/data-processor run build

process 会读取最新的 raw 输出,并写入 workspace-data/output/processed/<timestamp>/

核心模块

  • src/cli.ts: 处理入口
  • src/merge.ts: 候选人合并与信号去重
  • src/identity.ts: 中文开发者身份识别
  • src/scoring.ts: 规则层评分
  • src/query.ts: 读取处理后的数据
  • src/validate-identity.ts: 身份识别调试脚本

设计思想

1. 中文身份是硬过滤,不是加分项

这个项目的目标用户群本来就是“值得关注的中文开发者”。因此身份判断应该在评估前完成,而不是把“是否中文开发者”混进最终评分里,否则会把目标画像和能力判断混在一起。

2. 用 noisy-or 组合弱信号

身份识别不是单条规则命中就结束。这里采用 noisy-or 思路,把位置、邮箱域名、中文 bio、中文 commit、拼音姓名、UTC+8 活跃模式等弱信号累积成整体置信度:

$$ P(中国) = 1 - \prod_i (1 - p_i) $$

这样做的好处是:

  • 强信号可以直接拉高置信度
  • 多个中弱信号可以自然叠加
  • 没有信号时不会虚构高分

3. 规则评分只负责“稳定、可解释”的部分

scoring.ts 只做那些能从 profile 和 repo 特征中稳定算出的维度,例如:

  • stars、followers、语言多样性、最近活跃月数
  • fork 比例
  • 热点追逐和批量 fork 这类反模式

灰区身份和深度技术判断留给 @talent-scout/ai-evaluator

实现流

flowchart LR
  A[raw signals] --> B[merge.ts]
  B --> C[identity.ts]
  C --> D[scoring.ts]
  D --> E[processed/<timestamp>]

算法取舍

  • 合并阶段按信号来源去重,避免同一事件在多个采集器里重复加分
  • 身份识别保留灰区区间,交给 AI 再判断,而不是强行二分类
  • 评分优先保证解释性,避免在规则层引入难以维护的黑盒模型

调试建议

  • 身份误判优先跑 validate:identity 看规则层命中情况
  • 如果某类候选人被系统性低估,先检查 merge 和特征提取,再讨论阈值
  • 修改评分公式时,要同时关注 ai-evaluator 的最终排序是否发生不合理漂移

相关文档