shibuk
v1.1.0
Published
Capture and localize modern websites into static clones with one CLI command.
Maintainers
Readme
shibuk
![]()
shibuk captures a live website, downloads its fetched assets, rewrites paths for local hosting, and runs a local 404 recovery pass. It is designed for modern WebGL, Three.js, Framer, and other asset-heavy marketing sites where a plain wget mirror usually fails.
It supports:
- live browser capture with
puppeteer - optional protected-site capture with
cloakbrowser - HAR-assisted imports for responses that already loaded in your browser
- iterative embedded asset discovery across downloaded JS, HTML, GLTF, manifests, and prerender data
- local request/API mock replay for sites that depend on captured POST or data endpoints
shibuk is a Bun-first CLI. The runtime and implementation prefer Bun-native APIs such as Bun.file, Bun.write, Bun.Glob, and Bun.serve; Node.js compatibility is not a project goal.
Requires Bun 1.3.11 or newer.
Install
Run without installing:
bunx shibuk https://example.comOr install globally with Bun:
bun add -g shibukUsage
Basic:
bunx shibuk https://example.comHAR-assisted import:
bunx shibuk import-har ./capture.har https://example.comNamed output folder:
bunx shibuk https://example.com --name example-siteExplicit output path:
bunx shibuk https://example.com/brand/ --out ./brandPositional output path:
bunx shibuk https://example.com/brand/ ./brandUseful options:
--url <url> Target URL. Optional if the first positional arg is a URL.
--name <name> Output folder name. Auto-generated from the URL when omitted.
--out <dir> Output folder path. Overrides --name.
--origin <origin> Override the origin used for rebasing and missing fetches.
--headful Run the browser visibly instead of headless.
--no-scroll Skip auto-scroll during capture and local testing.
--scroll-step <px> Scroll step in pixels. Default: 800.
--scroll-delay <ms> Delay between scroll steps. Default: 120.
--max-scrolls <n> Maximum scroll steps per pass. Default: 80.
--idle-wait <ms> Wait after page interaction settles. Default: 4000.
--no-rewrite Skip path rebasing.
--no-local-test Skip the local missing-asset recovery pass.
--rounds <n> Number of local 404 recovery rounds. Default: 2.
--browser <engine> Browser backend: puppeteer | cloakbrowser. Default: puppeteer.
--har <path> Import response bodies from a HAR file instead of live capture.
--concurrency <n> Download concurrency. Default: 8.
--timeout <ms> Network timeout per request. Default: 60000.
--extra <url> Add an extra asset URL to download. Repeatable.
--extra-file <path> Read extra URLs from a file, one per line.
--retries <n> Retry count for each download. Default: 2.
--user-agent <value> Override the browser and fetch user agent.
--verbose Print per-request diagnostics.Output
Each run creates a target directory with the cloned files and a .clone/ folder containing capture artifacts such as:
urls.txthar-urls.txtembedded-urls.txtembedded-urls-round-2.txtmanifest-urls.txtsequence-urls.txtmissing-round-*.txtcaptured-entry.html
These files are the first place to inspect when a clone still has runtime 404s.
Successful runs also end with a summary line like:
Clone complete.
Destination folder: /absolute/path/to/outputLocal Smoke Test
Serve the cloned folder directly:
cd example-site
bunx serveDirect-folder hosting is the expected smoke test. Some sites hardcode root-relative fetches and behave differently if served from a parent directory.
Development
This repository treats Bun as the primary runtime, package manager, test runner, and local server toolchain. When touching filesystem code, prefer Bun-native APIs and keep any node:fs/promises usage limited to directory primitives Bun does not currently expose directly.
Install dependencies:
bun installRun checks:
bun run lint
bun run typecheck
bun testFormat the repo:
bun run formatBuild the published CLI:
bun run buildThe build includes a post-step that rewrites @/ path aliases in dist/ to relative imports so bunx shibuk works from a published package without needing tsconfig.json.
Notes:
bun run lintcurrently runs linting only.- Tests live beside their implementation files in
src/. - The local recovery pipeline uses an internal Bun server plus the runtime shim to detect and repair missing assets before the final clone is written.
Release
Releases are driven by semantic-release from GitHub Actions. Commits should follow conventional commit format such as:
fix: handle root-level _next assetsfeat: support positional URL inputdocs: clarify smoke test workflow
