@duyquangnvx/webnovel-downloader
v0.4.0
Published
A pluggable, type-safe webnovel downloader for Node.js.
Readme
webnovel-downloader
A pluggable, type-safe webnovel downloader for Node.js. Part of the webnovel-studio toolkit, but usable standalone.
Status: Pre-alpha. APIs are unstable.
What it does
Given a webnovel URL from a supported site, fetch the full novel as structured data: metadata + ordered chapters. Output is format-agnostic — pair with a separate formatter package (EPUB, TXT, JSON, Markdown) to produce final files.
Supported sites
Sites with Cloudflare/JS challenges are tracked under M5 — transport tier.
| Site | Adapter id | TOC strategy | Notes |
|---|---|---|---|
| truyenfull.today (+ truyenfull.vision, truyenfull.vn) | truyenfull | S1 paginated (/trang-N/) | Auto-rewrites legacy .vision/.vn hosts → truyenfull.today. |
| metruyenchu.com.vn | metruyenchu-com-vn | S2 + S4 hybrid (HTML page-1 + JSON /get/listchap/<bid>) | Distinct from the dead metruyenchu.com brand. |
| wikicv.net (+ truyenwikidich.net) | wikicv | S4 via browser (/book/index XHR intercept; HMAC-signed) | Auto-rewrites truyenwikidich.net host. Requires the browser tier (patchright recommended). |
| tangthuvien.net | tangthuvien | scaffold — M5.1 | Site was unreachable at 2026-05-03 survey; selectors and parsers TBD. |
Transport tier
Sites protected by Cloudflare or rendering content via JS need a real browser. Install one of:
pnpm add patchright # recommended for Cloudflare-fronted sites
pnpm add playwright # works for non-protected pagesBoth are peer-optional. The Downloader resolves the module at runtime: patchright first, then playwright. The shared downloader singleton uses "auto" transport (headless). For a custom transport — e.g. a headed browser to solve a Cloudflare challenge by hand — build your own instance with createDownloader (it wires the built-in adapters for you):
import { createDownloader } from "@duyquangnvx/webnovel-downloader";
const dl = createDownloader({
transport: { mode: "auto", browserOptions: { headed: true } },
});
try {
const result = await dl.download(url);
} finally {
await dl.dispose(); // releases the browser pool
}Modes:
"auto"— undici first, escalate to browser on CF challenge."http-only"— never use the browser; adapters withpreferredTransport: "browser"will throw."browser-required"— every request through the browser.
Manual CF solve: pass browserOptions.headed: true to launch a visible window once; solve the checkbox; cookies cache for ~30 min.
Run a manual end-to-end check across the active adapters with:
pnpm smoke:live # default — first 20 chapters per adapter (~25s)
pnpm smoke:live --quick # first 5 chapters per adapter (~10s)
pnpm smoke:live --metadata-only # fetchMetadata only, no TOC walk (~3s)Modes use DownloadOptions.chapterRange so the TOC walk short-circuits early — no novel size matters.
Adding a new site = implementing a single SiteAdapter. See docs/adapter-spec.md.
Install
pnpm add @duyquangnvx/webnovel-downloader
# optional browser tier (Cloudflare / JS-rendered sites — see below):
pnpm add patchrightPublished to npm under public access (@duyquangnvx/webnovel-downloader).
Ships both ESM and CommonJS builds, so import and require both work.
Quick example
import { createDownloader } from "@duyquangnvx/webnovel-downloader";
const downloader = createDownloader({ rateLimit: { requestsPerSecond: 2 } });
const result = await downloader.download("https://truyenfull.vn/tien-nghich/", {
concurrency: 4,
});
if (result.status === "success") {
console.log(result.data.metadata.title);
console.log(`${result.data.chapters.length} chapters`);
}Resume & partial downloads
download() returns a result envelope rather than throwing for download
failures. It only throws on abort (CancelledError), or when a browser-tier
site (e.g. wikicv) can't start its browser — no patchright/playwright
installed (BrowserModuleNotInstalledError) or transport: "http-only"
(ParseError).
const result = await downloader.download(url, { resume: true });
// `partial` is a union on `resumable`; the resume token only exists on the
// resumable arm, so narrow on it before reading the token.
if (result.status === "partial" && result.resumable) {
// Some chapters failed; retry just those with the issued token.
const retry = await downloader.download(url, {
resume: { token: result.resumeToken },
});
}Download only part of a novel (chapter 1 is index 0 — ranges are 0-based and inclusive):
await downloader.download(url, { chapterRange: { from: 0, to: 9 } }); // first 10 chaptersCheck support without a try/catch:
downloader.canHandle("https://truyenfull.vn/x/"); // boolean
downloader.supportedSites(); // [{ id, displayName, hostnames }, ...]Errors carry a stable code; the error channel is a discriminated union, so a
switch narrows to the concrete error and its typed fields:
const result = await downloader.download(url);
if (result.status === "error") {
switch (result.error.code) {
case "HTTP_ERROR":
console.error(`HTTP ${result.error.status} for ${result.error.url}`);
break;
case "RATE_LIMITED":
console.error(`rate limited; retry after ${result.error.retryAfterMs}ms`);
break;
default:
console.error(result.error.message);
}
}For browser-tier sites (e.g. wikicv) in a long-lived process, build your own
instance and dispose it; the shared downloader singleton is fine for short
scripts:
const dl = createDownloader();
try {
await dl.download(url);
} finally {
await dl.dispose();
}Documentation (Source of Truth)
Read in this order:
docs/architecture.md— Big picture, layers, whydocs/data-model.md— Core types and contractsdocs/pipeline.md— Download flow, events, errors, resumedocs/adapter-spec.md— How to add a new sitedocs/conventions.md— Coding standardsdocs/roadmap.md— Build sequence and milestones
Publishing
Maintainers only. pnpm release runs prepublishOnly (builds dist/ via tsup) and
then pnpm publish --access public. Only dist/ ships (see files). Bump version
first and publish from a clean main (pnpm enforces git checks by default).
License
MIT — see LICENSE.
