@1-/proxy_fetch
v0.1.12
Published
Fetch, rank, and store high-anonymity proxies
Downloads
2,620
Maintainers
Readme
proxy_fetch : Fetch, verify, and store high-anonymity proxies
Functionality
Fetches elite and anonymous proxies from proxyscrape.com (v4 API) and pubproxy.com, deduplicates by IPv4 address (preserving protocol preference SOCKS5 > SOCKS4 > HTTP and selecting the highest port per IP), validates proxy functionality via three geolocation APIs (ip-api.com, ipapi.co, ipinfo.io), and detects IP exposure. Only verified proxies are stored in TiDB Serverless. The database automatically maintains exactly 3,000,000 highest-ranking entries, pruning excess entries by ascending rank.
Usage demonstration
Install as a dependency:
npm install @1-/proxy_fetchUse programmatically:
import run from "@1-/proxy_fetch/src/run.js";
// Connect to database and save proxies
await run("your-database-url");Or run directly:
bun ./src/run.js your-database-urlDesign rationale
The system balances proxy reliability and storage efficiency. IPv4-based deduplication ensures compact storage; protocol preference and port selection optimize connection quality; three-source geolocation validation and IP exposure detection jointly identify transparent proxies; all new proxies undergo real-time verification before insertion. Database pruning by rank maintains a fixed-size pool of high-quality proxies.
Technology stack
- Runtime: Bun
- Database: TiDB Serverless
- Core dependencies: @1-/ipv4, @3-/int, @3-/req, @3-/split, cli-progress, http-proxy-agent, socks, socks-proxy-agent
Code structure
src/
├── api/
│ ├── proxyscrape.js # proxyscrape.com v4 API wrapper (supports pagination)
│ └── pubproxy.js # pubproxy.com API wrapper (with 50 retry attempts)
├── dump.js # Database schema export utility
├── ipFetch.js # Main proxy fetching logic, integrates multiple APIs and IPv4 deduplication
├── ping.js # Three-source geolocation validation, IP exposure detection, and result parsing
├── request.js # Low-level HTTP/SOCKS request handling with SOCKS4/SOCKS5/HTTP proxy support
├── run.js # Main entry point coordinating fetch and storage workflow
└── save.js # TiDB storage logic with existence check, batch verification, and precise pruningHistorical context
Proxy functionality was integrated into the world's first web server, CERN httpd, developed by Tim Berners-Lee at CERN in 1991. Released in June 1991 and announced publicly in August, it ran on a NeXT Computer and served as both a web server and a proxy server — establishing the foundational role of proxy technology in the architecture of the World Wide Web.
About
This library is developed by WebC.site.
WebC.site: A new paradigm of web development for AI
proxy_fetch : 获取、验证与存储高匿名代理
功能介绍
从 proxyscrape.com(v4 API)和 pubproxy.com 获取精英级与匿名代理,按 IPv4 地址去重(同 IP 保留协议优先级 SOCKS5 > SOCKS4 > HTTP,端口取最大值),通过三源地理定位 API(ip-api.com、ipapi.co、ipinfo.io)验证代理功能并检测 IP 暴露,仅有效代理存入 TiDB Serverless 数据库。数据库自动维护恰好 3,000,000 条最高排名记录,超出部分按 rank 升序精确清理。
使用演示
安装为依赖项:
npm install @1-/proxy_fetch编程调用:
import run from "@1-/proxy_fetch/src/run.js";
// 连接数据库并保存代理
await run("your-database-url");或直接运行:
bun ./src/run.js your-database-url设计思路
系统在代理可靠性与存储效率之间取得平衡。IPv4 地址去重确保高效存储;协议优先级与端口优选策略保障连接质量;三源地理定位验证与 IP 暴露检测共同识别透明代理;所有新代理均经实时验证后才入库。数据库通过 rank 字段实现精确裁剪,维持固定规模的高质量代理池。
技术栈
- 运行时:Bun
- 数据库:TiDB Serverless
- 核心依赖:@1-/ipv4, @3-/int, @3-/req, @3-/split, cli-progress, http-proxy-agent, socks, socks-proxy-agent
代码结构
src/
├── api/
│ ├── proxyscrape.js # proxyscrape.com v4 API 封装(支持分页)
│ └── pubproxy.js # pubproxy.com API 封装(带50次重试机制)
├── dump.js # 数据库表结构导出工具
├── ipFetch.js # 代理获取主逻辑,整合多源API并执行IPv4去重
├── ping.js # 三源地理定位验证、IP暴露检测与结果解析
├── request.js # 底层HTTP/SOCKS请求封装,支持SOCKS4/SOCKS5/HTTP代理验证
├── run.js # 主入口点,协调获取与存储流程
└── save.js # TiDB存储逻辑,含存在性检查、批量验证与精确裁剪历史故事
代理功能集成于世界上首个网页服务器 CERN httpd,由蒂姆·伯纳斯-李于 1991 年在欧洲核子研究中心(CERN)开发。该软件于 1991 年 6 月发布,8 月向公众宣布,运行于 NeXT 计算机之上,兼具网页服务器与代理服务器双重角色——印证代理技术自万维网诞生之初即为互联网基础设施的核心组件。
关于
本库由 WebC.site 开发。
WebC.site : 面向人工智能的网站开发新范式
