@brainwav/rsearch
v0.1.6
Published
rSearch CLI for arXiv
Downloads
419
Readme
rSearch helps developers search, fetch, and download arXiv papers from the terminal
One sentence: This repo provides a Node/TypeScript CLI for arXiv search, metadata fetch, downloads, category browsing, and URL output.
Last updated: 2026-01-07
Table of contents
- Doc requirements
- Prerequisites
- Quickstart
- Common tasks
- Risks and assumptions
- Troubleshooting
- Reference
- Acceptance criteria
- Evidence bundle
Doc requirements
- Audience: Developers and researchers using the CLI to search, fetch, and download arXiv papers.
- Scope: Installation, core commands, verification steps, and usage constraints.
- Non-scope: Contribution workflow, security reporting, and internal architecture (see
CONTRIBUTING.md,SECURITY.md,docs/ADR-001-architecture.md). - Doc owner: jscraik.
- Review cadence: Each release.
- Required approvals: 1 maintainer.
Prerequisites
- Required: Node.js 20+, npm
- Optional: Git, a POSIX shell
Quickstart
1) Install
npm install -g @brainwav/rsearch2) Run a search
rsearch search "cat:cs.AI" --max-results 53) Verify
Expected output:
- A list of results with IDs and titles.
Common tasks
Search arXiv by query
- What you get: titles and IDs (plus URLs in JSON output).
- Steps:
rsearch search "cat:cs.LG" --max-results 10- Verify: output shows
Total resultsand a list of entries.
Filter search results to entries with license metadata
- What you get: only results that include license metadata in arXiv records.
- Steps:
rsearch search "cat:cs.AI" --require-license --max-results 10- Verify: summary mentions filtered results when license metadata is missing.
Fetch metadata by arXiv ID
- What you get: full metadata including abstract, authors, and PDF URL.
- Steps:
rsearch fetch 2002.00762 --json- Verify: JSON includes
absUrlandpdfUrl.
Download PDFs
- What you get: a PDF per ID in the output directory.
- Steps:
rsearch download 2002.00762 --out-dir ./papers- Verify:
./papers/2002.00762.pdfexists.
Export Markdown or JSON with extracted text
- What you get: Markdown or JSON output with text extracted from the PDF.
- Steps:
rsearch download 2002.00762 --format md --out-dir ./papers
rsearch download 2002.00762 --format json --out-dir ./papers- Verify:
./papers/2002.00762.mdor./papers/2002.00762.jsonexists.
Enforce license metadata on downloads
- What you get: downloads fail if arXiv does not provide license metadata.
- Steps:
rsearch download 2002.00762 --format json --require-license --out-dir ./papers- Verify: failures are reported with
License metadata missingwhen unavailable.
Keep PDF while exporting text formats
- What you get: both the text export and the PDF.
- Steps:
rsearch download 2002.00762 --format md --keep-pdf --out-dir ./papers- Verify: both
2002.00762.mdand2002.00762.pdfexist.
Return URLs for agents
- What you get: abstract and PDF URLs per result.
- Steps:
rsearch urls "cat:cs.AI"
rsearch urls --ids 2002.00762 2101.00001
rsearch urls "cat:cs.AI" --require-license- Verify: each line includes an abs URL and PDF URL.
Browse categories
- What you get: the arXiv category taxonomy.
- Steps:
rsearch categories tree
rsearch categories list --group "Computer Science"- Verify: group names and category IDs are listed.
Risks and assumptions
- Assumes arXiv API availability and that users respect rate limits.
- Assumes users review license metadata before reuse; the CLI does not grant rights.
- PDF text extraction may fail on scanned or complex layouts.
- Output files overwrite only when
--overwriteis used; verify output paths before running batch downloads.
Troubleshooting
Symptom: “Provide a search query” or “Provide arXiv IDs”
Cause:
- Missing positional argument or stdin input. Fix:
rsearch search "cat:cs.AI"
rsearch fetch 2002.00762Symptom: “arXiv API request failed (429/5xx)”
Cause:
- Rate limiting or transient server errors. Fix:
- Re-run; the CLI already retries with backoff. Lower
--max-resultsif needed.
Symptom: “Failed to fetch taxonomy”
Cause:
- arXiv taxonomy endpoint unavailable or network blocked. Fix:
- Re-run later or use
--refreshonce connectivity is restored.
Reference
- Repo: https://github.com/jscraik/rSearch.git
- Commands:
search,fetch,download,urls,categories,config,help
- Constraints:
- Default API delay: 3s
- Retry defaults: max-retries=3, retry-base-delay=500ms (
--no-retryto disable) page-size<= 2000max-results<= 30000
- Output schema:
schemas/cli-output.schema.jsonschemas/cli-error.schema.json
- License use:
- arXiv content is licensed by the authors. The CLI may expose a license URL when provided, but it does not grant rights. Always verify permitted use on the arXiv abstract page.
- Usage policy:
- Be courteous to arXiv: include contact info (
--contact) and keep rate limits conservative (--rate-limit).
- Be courteous to arXiv: include contact info (
- Docs:
docs/index.mdCHANGELOG.mdSECURITY.mdSUPPORT.mdCODE_OF_CONDUCT.mdCONTRIBUTING.mddocs/cli-reference.mddocs/configuration.mddocs/release-policy.mddocs/troubleshooting.mddocs/faq.md
docs/roadmap.mddocs/ADR-001-architecture.md
Acceptance criteria
- [ ] Doc requirements reflect current CLI scope and ownership.
- [ ] Examples match available commands and scripts in this repo.
- [ ] License and usage policy notes are present and accurate.
- [ ] Risks and assumptions are explicit and up to date.
- [ ] Links resolve to existing files or URLs.
Evidence bundle
- Standards mapping: CommonMark structure, accessibility (descriptive links), security/privacy guidance for license usage.
- Brand compliance: Documentation signature added; assets present in
brand/. - Automated checks: vale run on 2026-01-07 (0 errors, 0 warnings).
- Review artifact: Self-review completed on 2026-01-07.
- Deviations: None.
brAInwav
from demo to duty
