researchhub-cli
v0.1.0
Published
GitHub for papers, datasets, and experiments — optimized for AI agents
Downloads
86
Readme
ResearchHub CLI
GitHub for papers, datasets, and experiments — optimized for AI agents.
A command-line tool that pulls structured research knowledge from arXiv, Semantic Scholar, HuggingFace, and PapersWithCode into your terminal — and into your AI agents.
research paper 2305.18430
research search "drug repurposing"
research dataset protein-folding
research replicate alphafold
research gap crispr
research context crispr > crispr_context.mdWhy
Scientific knowledge is fragmented. Papers live on arXiv. Code lives on GitHub. Datasets live on HuggingFace. Benchmarks live on PapersWithCode. Putting it all together takes hours.
ResearchHub CLI aggregates all of it in a single command — structured, terminal-native, and ready to pipe into LLM agents.
Install
npm install -g researchhub-cliOr run locally:
git clone https://github.com/your-username/researchhub-cli
cd researchhub-cli
npm install
npm link # makes `research` available globallyRequirements: Node.js ≥ 18
Commands
research paper <arxiv-id>
Fetches a fully structured paper card — title, TL;DR, abstract, citations, code repositories, and key references.
$ research paper 2305.18430
Large Language Models for Drug Discovery
arXiv:2305.18430 • 2023-05-28 • Author A, Author B et al.
────────────────────────────────────────────────────────────
TL;DR
The approach leverages LLM-guided molecule generation…
Abstract
This paper proposes…
────────────────────────────────────────────────────────────
Citations 142
Fields Computer Science, Biology
PDF https://arxiv.org/pdf/2305.18430
PapersWithCode https://paperswithcode.com/paper/…
Code
• https://github.com/… ★1.2k [official]
Key References
• Attention Is All You Need (arXiv:1706.03762)
• …research search <query>
Searches arXiv and Semantic Scholar simultaneously, deduplicates by arXiv ID, and ranks by citation count.
$ research search "protein language models" --limit 6
$ research search "RLHF" --source arxivOptions
| Flag | Default | Description |
|---|---|---|
| -n, --limit <n> | 8 | Max results |
| -s, --source <s> | both | arxiv | semantic | both |
research dataset <query>
Discovers datasets across HuggingFace and PapersWithCode with download stats, tags, and direct links.
$ research dataset genomics
$ research dataset "clinical NLP"research context <topic>
Generates a single structured Markdown file aggregating papers, datasets, benchmarks, and research gaps — designed to be fed directly to an LLM agent.
$ research context crispr
$ research context "drug repurposing" --output drug_context.md
# pipe to any LLM
$ cat crispr_context.md | llm "Plan a novel experiment"
$ cat crispr_context.md | claude "What are the biggest open problems?"Options
| Flag | Description |
|---|---|
| -o, --output <file> | Output file path (default: <topic>_context.md) |
Output format:
# Research Context: CRISPR
> Generated by researchhub-cli on 2026-03-09
## Key Papers
### DeepFM-Crispr: Prediction of CRISPR On-Target Effects...
- arXiv: 2409.05938
- TL;DR: ...
- Abstract: ...
## Datasets
- CRISPR-ML — https://huggingface.co/datasets/…
## Benchmarks
- CRISPR Off-Target Detection — https://paperswithcode.com/task/…
## Research Gaps
- Low-resource and cross-domain transfer in CRISPR
- …
## Agent Instructions
Use this context to survey the state of the art…research replicate <paper>
Generates a step-by-step reproduction guide. Accepts either an arXiv ID or a keyword search.
$ research replicate alphafold
$ research replicate 2305.18430Output:
Step 1 — Read the Paper
https://arxiv.org/pdf/…
Step 2 — Understand the Methods
• Diffusion model
• RL optimization
Step 3 — Set Up the Repository
https://github.com/… [official] ★12k [PyTorch]
$ git clone https://github.com/…
$ pip install -r requirements.txt
Step 4 — Download Data
AlphaFold DB — https://huggingface.co/datasets/…
Step 5 — Run & Evaluate
Leaderboard & baselines: https://paperswithcode.com/paper/…research gap <topic>
Analyzes recent papers to surface limitations, underexplored directions, and hot sub-topics.
$ research gap "drug repurposing"
$ research gap "vision transformers"Output:
Detected Limitations in Literature
• "…limited to single-modal inputs…"
• "…challenging to scale to clinical settings…"
Suggested Research Directions
1. Few-shot learning on low-resource variants
2. Multimodal extensions combining text and structure
3. Clinical validation and deployment challenges
…
Active Benchmarks to Target
• MoleculeNet (342 papers)
• DrugBank Benchmark
Hot Sub-topics
#graph-neural #clinical-trial #multi-targetresearch benchmark <query>
Searches task leaderboards on PapersWithCode.
$ research benchmark "question answering"
$ research benchmark "protein structure prediction"Architecture
CLI (Node.js / Commander.js)
│
├── arXiv API — paper search + fetch (XML)
├── Semantic Scholar — TL;DR, citations, references
├── HuggingFace API — dataset discovery
└── PapersWithCode API — code repos, benchmarks, methodsAll sources are queried in parallel per command. If a source is unavailable or lacks data, the CLI degrades gracefully and shows what it can.
No API keys required. All sources used are public.
Use with AI Agents
The context command is built specifically for agent workflows:
# Generate context
research context "CRISPR gene editing" --output crispr.md
# Feed to Claude
cat crispr.md | claude "What experiment should I run next?"
# Feed to a local model
cat crispr.md | ollama run llama3 "Summarize the open problems"
# Use in a pipeline
research context "drug repurposing" | python agent.py --plan-experimentThe output is structured Markdown with clear section headers, making it easy for agents to parse and reason over.
Roadmap
- [ ]
research cite <arxiv-id>— export BibTeX / RIS - [ ]
research watch <topic>— poll for new papers on a schedule - [ ]
research export <query> --format json— machine-readable output - [ ] Semantic Scholar API key support for higher rate limits
- [ ]
research author <name>— author profile with paper list
Contributing
Pull requests welcome. To add a new data source, create a module in src/api/ following the existing pattern (each module exports async functions, returns plain objects, and fails silently with null/[] rather than throwing).
License
MIT
