researchhub-cli

v0.1.0

Published

3 months ago

GitHub for papers, datasets, and experiments — optimized for AI agents

0High
0Medium
0Low

hasnainx42

ResearchHub CLI

GitHub for papers, datasets, and experiments — optimized for AI agents.

A command-line tool that pulls structured research knowledge from arXiv, Semantic Scholar, HuggingFace, and PapersWithCode into your terminal — and into your AI agents.

research paper 2305.18430
research search "drug repurposing"
research dataset protein-folding
research replicate alphafold
research gap crispr
research context crispr > crispr_context.md

Why

Scientific knowledge is fragmented. Papers live on arXiv. Code lives on GitHub. Datasets live on HuggingFace. Benchmarks live on PapersWithCode. Putting it all together takes hours.

ResearchHub CLI aggregates all of it in a single command — structured, terminal-native, and ready to pipe into LLM agents.

Install

npm install -g researchhub-cli

Or run locally:

git clone https://github.com/your-username/researchhub-cli
cd researchhub-cli
npm install
npm link        # makes `research` available globally

Requirements: Node.js ≥ 18

Commands

`research paper <arxiv-id>`

Fetches a fully structured paper card — title, TL;DR, abstract, citations, code repositories, and key references.

$ research paper 2305.18430

Large Language Models for Drug Discovery
arXiv:2305.18430  •  2023-05-28  •  Author A, Author B et al.
────────────────────────────────────────────────────────────

TL;DR
The approach leverages LLM-guided molecule generation…

Abstract
This paper proposes…

────────────────────────────────────────────────────────────
Citations             142
Fields                Computer Science, Biology

PDF                   https://arxiv.org/pdf/2305.18430
PapersWithCode        https://paperswithcode.com/paper/…

Code
  • https://github.com/…  ★1.2k  [official]

Key References
  • Attention Is All You Need (arXiv:1706.03762)
  • …

`research search <query>`

Searches arXiv and Semantic Scholar simultaneously, deduplicates by arXiv ID, and ranks by citation count.

$ research search "protein language models" --limit 6
$ research search "RLHF" --source arxiv

Options

| Flag | Default | Description | |---|---|---| | -n, --limit <n> | 8 | Max results | | -s, --source <s> | both | arxiv | semantic | both |

`research dataset <query>`

Discovers datasets across HuggingFace and PapersWithCode with download stats, tags, and direct links.

$ research dataset genomics
$ research dataset "clinical NLP"

`research context <topic>`

Generates a single structured Markdown file aggregating papers, datasets, benchmarks, and research gaps — designed to be fed directly to an LLM agent.

$ research context crispr
$ research context "drug repurposing" --output drug_context.md

# pipe to any LLM
$ cat crispr_context.md | llm "Plan a novel experiment"
$ cat crispr_context.md | claude "What are the biggest open problems?"

Options

| Flag | Description | |---|---| | -o, --output <file> | Output file path (default: <topic>_context.md) |

Output format:

# Research Context: CRISPR
> Generated by researchhub-cli on 2026-03-09

## Key Papers
### DeepFM-Crispr: Prediction of CRISPR On-Target Effects...
- arXiv: 2409.05938
- TL;DR: ...
- Abstract: ...

## Datasets
- CRISPR-ML  — https://huggingface.co/datasets/…

## Benchmarks
- CRISPR Off-Target Detection — https://paperswithcode.com/task/…

## Research Gaps
- Low-resource and cross-domain transfer in CRISPR
- …

## Agent Instructions
Use this context to survey the state of the art…

`research replicate <paper>`

Generates a step-by-step reproduction guide. Accepts either an arXiv ID or a keyword search.

$ research replicate alphafold
$ research replicate 2305.18430

Output:

Step 1 — Read the Paper
  https://arxiv.org/pdf/…

Step 2 — Understand the Methods
  • Diffusion model
  • RL optimization

Step 3 — Set Up the Repository
  https://github.com/…  [official]  ★12k  [PyTorch]
  $ git clone https://github.com/…
  $ pip install -r requirements.txt

Step 4 — Download Data
  AlphaFold DB — https://huggingface.co/datasets/…

Step 5 — Run & Evaluate
  Leaderboard & baselines: https://paperswithcode.com/paper/…

`research gap <topic>`

Analyzes recent papers to surface limitations, underexplored directions, and hot sub-topics.

$ research gap "drug repurposing"
$ research gap "vision transformers"

Output:

Detected Limitations in Literature
  • "…limited to single-modal inputs…"
  • "…challenging to scale to clinical settings…"

Suggested Research Directions
  1. Few-shot learning on low-resource variants
  2. Multimodal extensions combining text and structure
  3. Clinical validation and deployment challenges
  …

Active Benchmarks to Target
  • MoleculeNet  (342 papers)
  • DrugBank Benchmark

Hot Sub-topics
  #graph-neural  #clinical-trial  #multi-target

`research benchmark <query>`

Searches task leaderboards on PapersWithCode.

$ research benchmark "question answering"
$ research benchmark "protein structure prediction"

Architecture

CLI (Node.js / Commander.js)
         │
         ├── arXiv API          — paper search + fetch (XML)
         ├── Semantic Scholar   — TL;DR, citations, references
         ├── HuggingFace API    — dataset discovery
         └── PapersWithCode API — code repos, benchmarks, methods

All sources are queried in parallel per command. If a source is unavailable or lacks data, the CLI degrades gracefully and shows what it can.

No API keys required. All sources used are public.

Use with AI Agents

The context command is built specifically for agent workflows:

# Generate context
research context "CRISPR gene editing" --output crispr.md

# Feed to Claude
cat crispr.md | claude "What experiment should I run next?"

# Feed to a local model
cat crispr.md | ollama run llama3 "Summarize the open problems"

# Use in a pipeline
research context "drug repurposing" | python agent.py --plan-experiment

The output is structured Markdown with clear section headers, making it easy for agents to parse and reason over.

Roadmap

[ ] research cite <arxiv-id> — export BibTeX / RIS
[ ] research watch <topic> — poll for new papers on a schedule
[ ] research export <query> --format json — machine-readable output
[ ] Semantic Scholar API key support for higher rate limits
[ ] research author <name> — author profile with paper list

Contributing

Pull requests welcome. To add a new data source, create a module in src/api/ following the existing pattern (each module exports async functions, returns plain objects, and fails silently with null/[] rather than throwing).

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ResearchHub CLI

Why

Install

Commands

research paper <arxiv-id>

research search <query>

research dataset <query>

research context <topic>

research replicate <paper>

research gap <topic>

research benchmark <query>