@alloc/sitefetch
v0.1.1
Published
Fetch an entire site and save it as a text file
Readme
@alloc/sitefetch
Fetch an entire site and save it as a text file (to be used with AI models).
Install
One-off usage (choose one of the followings):
bunx @alloc/sitefetch
npx @alloc/sitefetch
pnpx @alloc/sitefetchInstall globally (choose one of the followings):
bun i -g @alloc/sitefetch
npm i -g @alloc/sitefetch
pnpm i -g @alloc/sitefetchUsage
sitefetch https://egoist.dev -o site.txt
# or better concurrency
sitefetch https://egoist.dev -o site.txt --concurrency 10Multiple starting URLs
Pass multiple URLs as positional arguments to crawl more than one site at once:
sitefetch https://example.com https://other.com -o out.txtEach URL is crawled independently within its own host and the results are merged into a single output.
Match specific pages
Use the -m, --match flag to specify the pages you want to fetch:
sitefetch https://vite.dev -m "/blog/**" -m "/guide/**"The match pattern is tested against the pathname of target pages, powered by micromatch. Check out all the supported matching features.
Exclude pages
Use the -e, --exclude flag to skip pages whose pathname matches a pattern:
sitefetch https://vite.dev -e "/blog/**" -e "/releases/**"Multiple patterns can be passed. The starting URL is never excluded regardless of the patterns provided.
Limit crawled pages
Use --limit to cap the number of pages fetched. Pass 0 to disable link-following entirely — only the explicitly provided URLs will be fetched:
# fetch at most 20 pages
sitefetch https://vite.dev --limit 20
# fetch only the given URLs, no link crawling
sitefetch https://vite.dev/guide/introduction https://vite.dev/guide/getting-started --limit 0Content selector
We use mozilla/readability to extract readable content from the web page, but on some pages it might return irrelevant contents, in this case you can specify a CSS selector so we know where to find the readable content:
sitefetch https://vite.dev --content-selector ".content"Plug
If you like this, please check out my LLM chat app: https://chatwise.app
API
import { fetchSite } from "@alloc/sitefetch"
await fetchSite("https://egoist.dev", {
//...options
})
// multiple starting URLs
await fetchSite(["https://example.com", "https://other.com"], {
//...options
})Check out options in types.ts.
License
MIT.
