scholarly-js
v0.1.4
Published
TypeScript library inspired by scholarly for retrieving Google Scholar author and publication metadata.
Maintainers
Readme
scholarly-js
scholarly-js 是一个用于获取 Google Scholar 作者/论文元数据的 TypeScript 库,参考了 Python scholarly 的接口风格。
注意:Google Scholar 对自动化请求有严格限制。高频请求可能触发验证码或 IP 封禁,请合理控制频率。
安装
npm install scholarly-js快速开始
import { scholarly } from "scholarly-js";
const authorIter = scholarly.searchAuthor("Geoffrey Hinton");
const firstAuthor = (await authorIter.next()).value;
if (firstAuthor) {
const fullAuthor = await scholarly.fill(firstAuthor, {
sections: ["basics", "indices", "publications"],
publication_limit: 5,
});
console.log(fullAuthor.name, fullAuthor.hindex);
}
const pubIter = scholarly.searchPubs("attention is all you need", {
sort_by: "relevance",
year_low: 2017,
});
const firstPub = (await pubIter.next()).value;
if (firstPub) {
const filledPub = await scholarly.fill(firstPub);
const bibtex = await scholarly.bibtex(filledPub);
console.log(bibtex);
}代理(ProxyGenerator)
和 Python scholarly 一样,可以先创建 ProxyGenerator 再注入到 scholarly:
import { ProxyGenerator, scholarly } from "scholarly-js";
const pg = new ProxyGenerator();
pg.ScraperAPI(process.env.SCRAPER_API_KEY!);
// 也支持:
// pg.SingleProxy("http://user:pass@host:port");
// await pg.FreeProxies();
// pg.Luminati("user", "pass", 22225);
// pg.Tor_External(9050);
scholarly.use_proxy(pg);use_proxy(primary, secondary) 支持双代理模式:/citations 请求优先走 secondary,其余请求走 primary。
代理配置面向 Node 运行时;在纯浏览器构建里不要调用 use_proxy()。
核心 API
searchAuthor(name)searchAuthorId(id, filled?, options?)searchKeyword(keyword)searchKeywords(keywords)searchPubs(query, options?)searchPubsByCustomUrl(path)searchSinglePub(title, filled?)searchCitedBy(publicationId, options?)searchOrg(name)searchAuthorByOrganization(organizationId)useProxy(proxyGenerator, secondaryProxyGenerator?)use_proxy(proxyGenerator, secondaryProxyGenerator?)setRetries/set_retries,setTimeout/set_timeoutfill(authorOrPublication, options?)citedBy(publication)getRelatedArticles(publication)bibtex(publication)pprint(entity)
同时提供 Python 风格别名:search_author、search_pubs、search_single_pub、search_citedby、citedby 等。
与 Python scholarly 的差异
- 已实现代理生态(
ProxyGenerator),但部分能力是兼容实现(例如未实现 Tor 内部进程管理)。 fill()对 Author 支持:basics / indices / counts / coauthors / publications。- 搜索与引用相关接口采用
AsyncGenerator。
本地开发
npm run check构建产物输出在 dist/。
