@productivities/document-sources

v0.1.0

Published

a month ago

URL → document-source router. Detects PDF, DOCX, EPUB, Google Docs, arXiv, YouTube, Markdown, LaTeX, reStructuredText, Jupyter, audio, and video from a URL. Pure functions, zero runtime dependencies, runs anywhere (browser, service worker, Node, edge).

0High
0Medium
0Low

zachzwy

url document router pdf epub docx arxiv youtube google-docs markdown latex jupyter

@productivities/document-sources

URL → document-source router. Given an arbitrary URL, tells you what kind of document it points at (PDF, EPUB, DOCX, Google Doc, YouTube video, Markdown, LaTeX, reStructuredText, Jupyter notebook, audio, video) and, where useful, rewrites it to a fetchable canonical URL (e.g. arXiv abstract → ar5iv HTML, GitHub blob → raw, Google Doc → text export).

Pure functions, zero runtime dependencies, runs anywhere — browser, MV3 service worker, Node, edge runtimes.

Install

npm install @productivities/document-sources

Usage

import {
  documentSourceFromUrl,
  isPdfUrl,
  isYouTubeUrl,
  parseArxivId,
  arxivHtmlCandidates,
  googleDocTextCandidates,
} from '@productivities/document-sources';

documentSourceFromUrl('https://arxiv.org/pdf/1706.03762');
// → { kind: 'pdf', sourceType: 'pdf', label: 'PDF' }

documentSourceFromUrl('https://docs.google.com/document/d/abc123/edit');
// → { kind: 'document', sourceType: 'google-doc', label: 'Google Doc' }

documentSourceFromUrl('https://www.gutenberg.org/ebooks/1342.epub3.images');
// → { kind: 'document', sourceType: 'epub', label: 'EPUB' }

arxivHtmlCandidates('https://arxiv.org/pdf/1706.03762');
// → [{ url: 'https://ar5iv.labs.arxiv.org/html/1706.03762', label: 'ar5iv HTML' }]

API

documentSourceFromUrl(url) — primary entry point. Returns { kind, sourceType, label, url? } or null.
isPdfUrl(url), isYouTubeUrl(url) — predicates.
parseArxivId(url), parseGoogleDoc(url), parseYouTubeUrl(url) — site-specific extractors.
Candidate generators (URL → fetchable canonical URLs): arxivHtmlCandidates, googleDocTextCandidates, wordDocxCandidates, epubCandidates, markdownCandidates, textCandidates, notebookCandidates, latexCandidates, rstCandidates, mediaCandidates.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@productivities/document-sources

Install

Usage

API

License