cr-webpage-parser
v1.0.3
Published
Tiny server-side HTML preprocessor for partials, tokens, and conditionals.
Maintainers
Readme
cr-webpage-parser ✂️🧩
A tiny, server-side HTML preprocessor for Node that gives you partials, token replacements, simple conditionals, and routing + link rewriting using a custom tag (default: <crweb>). Keep your HTML DRY, human-friendly, and framework-agnostic—without adopting a full templating engine.
- 🧱 Partials / includes —
<crweb content="Header" params="title:Home" /> - 🔤 Token replacement —
<crweb pnames="TITLE:title"><h1>TITLE</h1></crweb> - 🚦 Conditionals —
iftrue/iffalsewith!negationand optional<then>/<else> - 🧭 Routing & links — declare page routes, auto-route pages, and rewrite links
- 🎯 Per-request data — pass user/session/theme so each user can see a different page
- 🧽 Formatting — optional pretty-print (Prettier) or minify (html-minifier-terser)
- ⚙️ Cheerio-powered — deterministic DOM passes until the document stabilizes
Note: For real HTML pages use
parseMode: "html"to avoid self-closing<script/>and<style/>serialization bugs.
📦 Install
npm i cr-webpage-parser
# Optional (for output formatting)
npm i -D prettier
npm i html-minifier-terserPrettier and html-minifier-terser are optional peer utilities. If present, the engine can pretty-print or minify the final HTML. If not installed, output formatting is a no-op.
⚡ Quick Start
Minimal
const engine = require("cr-webpage-parser")("views", { format: "pretty", parseMode: "html" });
(async () => {
const html = await engine.generate("views/pages/index.html", { title: "Hello World" });
console.log(html);
})();Basic page views/pages/index.html
<!doctype html>
<html>
<head>
<crweb alias="Home" route="/" />
</head>
<body>
<!-- Gets partial from the page with an alias of 'Header' -->
<crweb content="Header" params="title:Welcome!" />
<main>
<!-- Replaces any TITLE keyword with 'title' from engine.generate(...) -->
<crweb pnames="TITLE:title|upper"><h1>TITLE</h1></crweb>
</main>
</body>
</html>Header partial views/partials/header.html
<crweb alias="Header" />
<header class="site-header">
<crweb pnames="TITLE:title|trim"><h2>TITLE</h2></crweb>
</header>Results
<html>
<head></head>
<body>
<header class="site-header">
<h2>Welcome!</h2>
</header>
<main>
<h1>HELLO WORLD</h1>
</main>
</body>
</html>🧭 Routing & Link Rewriting
Point links at aliases (human names), not file paths. The engine rewrites them to real hrefs.
1) Decide how pages get routes
Pick one (or mix them - overrides always win):
- Explicit (in the page file):
<head>
<crweb alias="About" route="/about" />
</head>- Auto (by folder) — recommended for simple sites:
const engine = createEngine("views", {
routesMode: "pages", // auto-route files under /pages
urlStrategy: "pretty", // "/" and "/about" instead of ".html"
});Auto mappings:
views/pages/index.html→/views/pages/about.html→/about
- Override in code (wins over everything):
const engine = createEngine("views", {
routeOverrides: { About: "/about-us" }
});ℹ️ "Pages" vs "Partials": a file becomes a page when it has an alias and either a route="" or matches auto-routing rules. Partials typically have an alias but no route.
2) Write links by alias (the engine rewrites them)
Use any of these in your HTML:
<a crhref="About">About</a> <!-- → <a href="/about">About</a> -->
<crweb link="About">About</crweb> <!-- → <a href="/about">About</a> -->
<crweb url="About" /> <!-- → /about (plain text URL) -->3) Dynamic routes (templates)
Routes can contain placeholders like /posts/${slug}:
<!-- Link with params -->
<a crhref="Post" params="slug:hello-world">Read</a>
<!-- → <a href="/posts/hello-world">Read</a> -->You can also generate the page programmatically:
await engine.generate("Post", { slug: "hello-world" });4) URL strategies
How fallbacks are computed (used by auto-routes and when no explicit route is set):
| urlStrategy | Example file | Result URL |
| ------------- | ------------------------ | ------------------- |
| "file" | views/pages/about.html | /pages/about.html |
| "pretty" | views/pages/about.html | /about |
| "pretty" | views/pages/index.html | / |
🧱 Partials / Includes
Include another HTML fragment by alias:
<crweb content="Header" params="title:Welcome,subtitle:true" />- The included file must declare an alias:
<crweb alias="Header" />. - Per-include params merge into the current data (include wins on conflicts).
- Includes can be nested; circular references are detected and reported clearly.
🔤 Token Replacement (pnames)
Inside any <crweb> wrapper, you can replace token words in text/attributes using a data map:
<crweb pnames="TITLE:title|upper,NAME:user.name|trim">
<h1>TITLE</h1>
<p>Hello, NAME!</p>
</crweb>- Left of
:is the TOKEN (what to replace in the markup). - Right of
:is the data key (dot paths allowed, e.g.,user.name). - Optional
|filters:upper,lower,trim,slug. - Missing variables: emits a visible inline error block in text nodes, and
[MISSING:TOKEN]in attributes.
Tokens match on word boundaries by default (so
TITLEwon’t replace thetitleattribute). You can change this withwordBoundary: falseif needed.
🚦 Conditionals
Show or hide blocks based on the data map. Supports !negation and explicit <then>/<else>.
<crweb iftrue="isLoggedIn">
<then>Welcome back!</then>
<else><a href="/login">Login</a></else>
</crweb>
<crweb iffalse="!user.email">
<p>Please verify your email.</p>
</crweb>iftrue="path"— unwraps the block if the value is truthy.iffalse="path"— unwraps if the value is falsy (or missing).- Prefix with
!ornotto negate. - Missing values can be treated as falsy/true/error via engine option
onMissing.
Truthiness rules:
true/falserespected; numbers:0is false; arrays: empty is false.- Strings:
"false","0","no","off","null","undefined","nan"→ false;"true","1","yes","on"→ true; others → true.
🧠 Personalization (per-request data)
Pass per-request parameters to engine.generate()—e.g., logged-in user, theme, feature flags—so each user sees a tailored page.
app.get("/", async (req, res) => {
const data = {
isLoggedIn: !!req.user,
user: req.user || {},
theme: req.user?.theme || "light",
flags: { showWelcome: !req.session.seenWelcome },
};
const html = await engine.generate("Home", data);
res.type("html").set("Cache-Control", "private, no-store").send("<!doctype html>\n" + html);
});Use in HTML with pnames, iftrue/iffalse, and route params.
🧩 Express integration example
const path = require("path");
const express = require("express");
const session = require("express-session");
const createEngine = require("cr-webpage-parser");
const app = express();
const VIEWS = path.join(__dirname, "views");
const engine = createEngine(VIEWS, {
parseMode: "html", // ✅ important for real HTML pages
urlStrategy: "pretty",
format: process.env.NODE_ENV === "production" ? "minify" : "pretty",
routesMode: "pages", // or "explicit" + route="" in each page
});
app.use(session({ secret: "dev-secret", resave: false, saveUninitialized: false }));
app.use("/static", express.static(path.join(__dirname, "public"), { maxAge: "1h" }));
const buildParams = req => ({ isLoggedIn: !!req.session.user, user: req.session.user || {}, theme: req.session.theme || "light" });
(async function registerRoutes() {
await engine.verify();
for (const [alias, route] of Object.entries(engine.routes)) {
const expr = route.replace(/\$\{([a-zA-Z0-9_]+)\}/g, ":$1");
app.get(expr, async (req, res, next) => {
try {
const params = { ...buildParams(req), ...req.params };
const html = await engine.generate(alias, params);
res.type("html").set("Cache-Control", "private, no-store").send("<!doctype html>\n" + html);
} catch (e) { next(e); }
});
}
app.listen(3000, () => console.log("✅ http://localhost:3000"));
})();⚙️ API
createEngine(folder, options) → engine
folder— root folder containing your HTML files.
Options:
tagName(string, default"crweb") — change the custom tag name.parseMode("html" | "xml", default"xml") — prefer"html"for real pages.urlStrategy("file" | "pretty") — how to compute fallback URLs.routesMode("explicit" | "pages" | "all", default"explicit")- explicit: only aliases with
route=""(or inrouteOverrides) become routes - pages: files under
pagesDirPatternauto-route - all: every alias gets a computed route
- explicit: only aliases with
pagesDirPattern(RegExp, default/[\\/]pages[\\/]/i) — which files are “pages”.routeOverrides(object) — force alias → route mapping.baseUrl(string) — prepend a site origin to generated links.onMissing("false" | "true" | "error", default"false") — conditional behavior for missing vars.wordBoundary(bool, defaulttrue) — token matching behavior forpnames.useCache(bool, defaulttrue) — FS read cache for includes.format("auto" | "pretty" | "minify" | "none", default"auto") — output formatting.prettyOptions(object) — passed to Prettier if installed.minifyOptions(object) — passed to html-minifier-terser if installed.
Why
parseModematters:xmlmode treats unknown tags as XML, which can serialize<script>/<style>as self-closing. For pages, setparseMode: "html". The library auto-normalizes your custom self-closing tags (e.g.,<crweb .../>) so they survive HTML parsing.
await engine.verify()
Scans folder for aliases and routes. Called automatically by generate, but call it explicitly if you need engine.routes up front.
await engine.generate(fileOrAlias, data = {}, opts = {}) → html
Generate final HTML from a file path or alias.
data: map used bypnames, conditionals, and route interpolation.opts: per-call overrides of engine options (e.g.,format,iterations).
engine.href(alias, data = {}) → string | null
Return the final URL for a routable alias (or null if not a page).
🧰 Authoring tips
- Put
<crweb alias="..." route="...">inside<head>(not before<html>). - Use
<crweb content="...">to include partials by alias. - Wrap dynamic text in a
<crweb pnames="..."> ... </crweb>to replace tokens. - Use
iftrue/iffalseand optional<then>/<else>blocks to conditionally render chunks. - Use
<a crhref="Alias">instead of hardcoding/some/path. - Prefer
parseMode: "html"for pages; reserve"xml"for fragment-like inputs.
🧪 Troubleshooting
Body disappears; everything after
<script>is “inside” the script
You’re serializing in XML mode. SetparseMode: "html"when creating the engine.Link to a partial shows an error block
The alias has no route (it’s a partial). Either addroute="", enableroutesMode: "pages", or remove the link.My
<crweb .../>gets dropped in HTML mode
The library normalizes self-closing custom tags before parsing, so<crweb .../>becomes a safe<crweb ...></crweb>pair internally.Same alias in multiple files
You’ll get a warning; the first one found wins.
📜 License
MIT © CloseRange
💡 Why this exists
Sometimes you want “just enough templating” for static-ish sites or server-rendered HTML without buying into a bigger framework. cr-webpage-parser keeps your markup as HTML and adds a few pragmatic building blocks for composition and personalization. 🎉
