pi-arxivist
v0.1.6
Published
Fetch arxiv papers as Markdown (pi extension)
Maintainers
Readme
pi-arxivist
Fetch arxiv papers as clean Markdown, right inside pi. Zero config, zero system dependencies.
Arxiv provides LaTeX source tarballs for most papers. fetch_arxiv downloads the source, flattens \input/\include references, and converts the result to Markdown via pandoc. No PDF extraction, no garbled math, no lost structure.
Install
pi install npm:pi-arxivistUsage
fetch_arxiv 1203.6859
fetch_arxiv https://arxiv.org/abs/1203.6859
fetch_arxiv https://arxiv.org/pdf/1203.6859Accepts bare IDs, abstract URLs, or PDF URLs.
What it returns
paper.md— full paper in the cache directory, math preserved as$...$/$$...$$meta.json— full frontmatter as JSON (title, abstract, authors, etc.)preamble.tex— macro definitions that pandoc couldn't process, extracted for inspection
The tool truncates output to fit context limits. Use read on the output path for the full paper.
How it works
- Downloads the source tarball from
arxiv.org/e-print/<id> - Extracts with
tar - Builds a dependency graph from
\input/\includereferences across all.texfiles, and selects the root by indegree - Resolves the graph into a single flat document (circular-reference-safe,
\includeonly-aware) - Converts the full source to Markdown via the official pandoc WASM binary
- Extracts metadata from the pandoc-generated YAML frontmatter
- Extracts unprocessed preamble macros to
preamble.tex
No system pandoc or LaTeX distribution needed.
