npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

search_paper

v0.1.2

Published

Multi-source academic paper search library (Semantic Scholar, Google Scholar, arXiv)

Readme

search-papers

A TypeScript library for searching academic papers across multiple sources, returning structured and normalized results.

Built on ghostfetch for robust HTTP requests with browser fingerprint spoofing and anti-bot bypass.

Features

  • Multi-source search - Query Google Scholar, Semantic Scholar, and arXiv in parallel
  • Unified Paper interface - All sources return the same structured format regardless of origin
  • Deduplication - Automatically merges duplicate papers across sources using DOI, canonical URL, and title matching
  • Impact Factor ranking - Results sorted by journal Impact Factor (static mapping of ~100 journals)
  • Citation & reference lookup - Retrieve citing/referenced papers via Semantic Scholar
  • Anti-bot bypass - ghostfetch handles browser spoofing, JS challenge solving, and redirect tracking
  • Partial failure tolerance - If one source fails, results from other sources are still returned

Requirements

  • Node.js >= 22.0.0

Installation

npm install search-papers

Quick Start

import { searchPapers, getPaper } from 'search-papers';

// Search across all sources
const result = await searchPapers('attention is all you need', {
  limit: 10,
});
console.log(result.papers);

// Search specific sources only
const arxivOnly = await searchPapers('transformer', {
  sources: ['arxiv'],
  limit: 5,
  sort: 'date',
});

// Look up a single paper by DOI
const paper = await getPaper('10.48550/arXiv.1706.03762');
console.log(paper?.title);

API

searchPapers(query, options?)

Search for papers across multiple sources simultaneously.

const result = await searchPapers('deep learning', {
  sources: ['semantic_scholar', 'google_scholar', 'arxiv'], // default: all
  limit: 10,          // default: 10
  offset: 0,
  year: { from: 2020, to: 2024 },
  sort: 'relevance',  // 'relevance' | 'date' | 'citations'
  client: {
    semanticScholarApiKey: 'your-key', // optional
    proxy: 'http://proxy:8080',        // optional
    timeout: 15000,                    // default: 15000ms
  },
});

Returns: SearchResult

interface SearchResult {
  query: string;
  totalResults?: number;
  papers: Paper[];
  nextPageToken?: string;
  source: SourceType;
  errors?: SourceError[];  // errors from failed sources
}

getPaper(doi, options?)

Look up a single paper by DOI using Semantic Scholar.

const paper = await getPaper('10.1038/nature14539');
// Returns Paper | null

Paper Interface

Every paper returned by any source conforms to this interface:

interface Paper {
  title: string;
  authors: Author[];
  abstract?: string;
  year?: number;
  venue?: string;           // journal or conference name
  doi?: string;
  url: string;              // link to the paper
  canonicalUrl?: string;    // final redirect URL
  pdfUrl?: string;
  citationCount?: number;
  impactFactor?: number;    // journal Impact Factor
  source: SourceType;       // 'google_scholar' | 'semantic_scholar' | 'arxiv'
  sourceId?: string;        // source-specific ID
  tags?: string[];          // e.g. arXiv categories
  references?: string[];
}

Using Individual Sources

For more control, use source classes directly:

import { createClient, SemanticScholarSource, GoogleScholarSource, ArxivSource } from 'search-papers';

const client = createClient();

// Semantic Scholar (implements CitationSource)
const s2 = new SemanticScholarSource(client);
const result = await s2.search('transformer', { limit: 5 });
const paper = await s2.getPaper('DOI:10.48550/arXiv.1706.03762');
const citations = await s2.getCitations('204e3073870fae3d05bcbc2f6a8e263d9b72e776');
const references = await s2.getReferences('204e3073870fae3d05bcbc2f6a8e263d9b72e776');

// Google Scholar (implements PaperSource)
const gs = new GoogleScholarSource(client);
const gsResult = await gs.search('deep learning', { limit: 10 });

// arXiv (implements PaperSource)
const arxiv = new ArxivSource(client);
const arxivResult = await arxiv.search('neural network', { limit: 10 });
const arxivPaper = await arxiv.getPaper('1706.03762');

await client.destroy();

Sources

| Source | Type | Search | Get Paper | Citations | References | Notes | |--------|------|--------|-----------|-----------|------------|-------| | Semantic Scholar | API (JSON) | Yes | Yes (DOI, paperId, etc.) | Yes | Yes | Optional API key for dedicated rate limit | | Google Scholar | Scraping (HTML) | Yes | Yes (title search) | No | No | CAPTCHA risk, 2-5s random delay | | arXiv | API (Atom XML) | Yes | Yes (arXiv ID) | No | No | 3s minimum delay between requests |

Search Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | sources | SourceType[] | All 3 sources | Which sources to query | | limit | number | 10 | Max results to return | | offset | number | 0 | Pagination offset | | year | { from?, to? } | - | Publication year range filter | | sort | string | 'relevance' | Sort order: 'relevance', 'date', 'citations' |

Client Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | browser | string | 'Chrome_131' | Browser to spoof | | timeout | number | 15000 | Request timeout in ms | | proxy | string | - | HTTP proxy URL | | proxyPool | string[] | - | Proxy pool with round-robin rotation | | semanticScholarApiKey | string | - | Semantic Scholar API key |

How It Works

  1. Parallel queries - All selected sources are queried simultaneously via Promise.allSettled
  2. Canonical URL resolution - ghostfetch follows redirects to determine the final URL of each paper
  3. Impact Factor lookup - Each paper's venue is matched against a static journal Impact Factor table
  4. Deduplication - Papers are deduplicated using DOI > canonical URL > normalized title (in priority order), merging metadata from multiple sources
  5. Sorting - Results are sorted by Impact Factor (descending), then by citation count
  6. Limit - Final results are trimmed to the requested limit

Error Handling

The library uses partial failure tolerance. If one source fails, results from other sources are still returned:

const result = await searchPapers('query');

if (result.errors) {
  for (const err of result.errors) {
    console.warn(`${err.source}: ${err.message} (${err.code})`);
    // err.code: 'RATE_LIMITED' | 'CAPTCHA' | 'TIMEOUT' | 'NETWORK_ERROR' | 'PARSE_ERROR' | 'UNKNOWN'
  }
}

// result.papers still contains results from successful sources

Development

npm run build      # Build ESM + CJS + .d.ts via tsup
npm run lint       # Type check with tsc --noEmit
npm run test       # Run unit tests
npm run test:live  # Run live tests (requires LIVE_TEST=true)

License

MIT