@duyquangnvx/webnovel-downloader

v0.4.0

Published

4 days ago

A pluggable, type-safe webnovel downloader for Node.js.

0High
0Medium
0Low

duyquangnvx

webnovel-downloader

A pluggable, type-safe webnovel downloader for Node.js. Part of the webnovel-studio toolkit, but usable standalone.

Status: Pre-alpha. APIs are unstable.

What it does

Given a webnovel URL from a supported site, fetch the full novel as structured data: metadata + ordered chapters. Output is format-agnostic — pair with a separate formatter package (EPUB, TXT, JSON, Markdown) to produce final files.

Supported sites

Sites with Cloudflare/JS challenges are tracked under M5 — transport tier.

| Site | Adapter id | TOC strategy | Notes | |---|---|---|---| | truyenfull.today (+ truyenfull.vision, truyenfull.vn) | truyenfull | S1 paginated (/trang-N/) | Auto-rewrites legacy .vision/.vn hosts → truyenfull.today. | | metruyenchu.com.vn | metruyenchu-com-vn | S2 + S4 hybrid (HTML page-1 + JSON /get/listchap/<bid>) | Distinct from the dead metruyenchu.com brand. | | wikicv.net (+ truyenwikidich.net) | wikicv | S4 via browser (/book/index XHR intercept; HMAC-signed) | Auto-rewrites truyenwikidich.net host. Requires the browser tier (patchright recommended). | | tangthuvien.net | tangthuvien | scaffold — M5.1 | Site was unreachable at 2026-05-03 survey; selectors and parsers TBD. |

Transport tier

Sites protected by Cloudflare or rendering content via JS need a real browser. Install one of:

pnpm add patchright   # recommended for Cloudflare-fronted sites
pnpm add playwright   # works for non-protected pages

Both are peer-optional. The Downloader resolves the module at runtime: patchright first, then playwright. The shared downloader singleton uses "auto" transport (headless). For a custom transport — e.g. a headed browser to solve a Cloudflare challenge by hand — build your own instance with createDownloader (it wires the built-in adapters for you):

import { createDownloader } from "@duyquangnvx/webnovel-downloader";

const dl = createDownloader({
  transport: { mode: "auto", browserOptions: { headed: true } },
});
try {
  const result = await dl.download(url);
} finally {
  await dl.dispose(); // releases the browser pool
}

Modes:

"auto" — undici first, escalate to browser on CF challenge.
"http-only" — never use the browser; adapters with preferredTransport: "browser" will throw.
"browser-required" — every request through the browser.

Manual CF solve: pass browserOptions.headed: true to launch a visible window once; solve the checkbox; cookies cache for ~30 min.

Run a manual end-to-end check across the active adapters with:

pnpm smoke:live                  # default — first 20 chapters per adapter (~25s)
pnpm smoke:live --quick          # first 5 chapters per adapter (~10s)
pnpm smoke:live --metadata-only  # fetchMetadata only, no TOC walk (~3s)

Modes use DownloadOptions.chapterRange so the TOC walk short-circuits early — no novel size matters.

Adding a new site = implementing a single SiteAdapter. See docs/adapter-spec.md.

Install

pnpm add @duyquangnvx/webnovel-downloader
# optional browser tier (Cloudflare / JS-rendered sites — see below):
pnpm add patchright

Published to npm under public access (@duyquangnvx/webnovel-downloader).

Ships both ESM and CommonJS builds, so import and require both work.

Quick example

import { createDownloader } from "@duyquangnvx/webnovel-downloader";

const downloader = createDownloader({ rateLimit: { requestsPerSecond: 2 } });
const result = await downloader.download("https://truyenfull.vn/tien-nghich/", {
  concurrency: 4,
});

if (result.status === "success") {
  console.log(result.data.metadata.title);
  console.log(`${result.data.chapters.length} chapters`);
}

Resume & partial downloads

download() returns a result envelope rather than throwing for download failures. It only throws on abort (CancelledError), or when a browser-tier site (e.g. wikicv) can't start its browser — no patchright/playwright installed (BrowserModuleNotInstalledError) or transport: "http-only" (ParseError).

const result = await downloader.download(url, { resume: true });

// `partial` is a union on `resumable`; the resume token only exists on the
// resumable arm, so narrow on it before reading the token.
if (result.status === "partial" && result.resumable) {
  // Some chapters failed; retry just those with the issued token.
  const retry = await downloader.download(url, {
    resume: { token: result.resumeToken },
  });
}

Download only part of a novel (chapter 1 is index 0 — ranges are 0-based and inclusive):

await downloader.download(url, { chapterRange: { from: 0, to: 9 } }); // first 10 chapters

Check support without a try/catch:

downloader.canHandle("https://truyenfull.vn/x/"); // boolean
downloader.supportedSites(); // [{ id, displayName, hostnames }, ...]

Errors carry a stable code; the error channel is a discriminated union, so a switch narrows to the concrete error and its typed fields:

const result = await downloader.download(url);
if (result.status === "error") {
  switch (result.error.code) {
    case "HTTP_ERROR":
      console.error(`HTTP ${result.error.status} for ${result.error.url}`);
      break;
    case "RATE_LIMITED":
      console.error(`rate limited; retry after ${result.error.retryAfterMs}ms`);
      break;
    default:
      console.error(result.error.message);
  }
}

For browser-tier sites (e.g. wikicv) in a long-lived process, build your own instance and dispose it; the shared downloader singleton is fine for short scripts:

const dl = createDownloader();
try {
  await dl.download(url);
} finally {
  await dl.dispose();
}

Documentation (Source of Truth)

Read in this order:

docs/architecture.md — Big picture, layers, why
docs/data-model.md — Core types and contracts
docs/pipeline.md — Download flow, events, errors, resume
docs/adapter-spec.md — How to add a new site
docs/conventions.md — Coding standards
docs/roadmap.md — Build sequence and milestones

Publishing

Maintainers only. pnpm release runs prepublishOnly (builds dist/ via tsup) and then pnpm publish --access public. Only dist/ ships (see files). Bump version first and publish from a clean main (pnpm enforces git checks by default).

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme