npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@bartolli/vaglio

v0.1.0

Published

Sanitize text crossing the LLM trust boundary: Unicode prompt-injection defense, NFKC normalization, credential redaction, reasoning-tag stripping. Stream-aware, zero deps, ESM-first.

Readme

vaglio

Sanitize text crossing the LLM trust boundary.

Text-domain filter between extracted content, tool output, RAG chunks, peer-model output, and the model. Zero runtime dependencies, ESM-only, stream-aware, deterministic.

Pre-1.0. The v0.1 surface is spec-locked. Breaking changes possible at minor-version boundaries until 1.0.

Install

pnpm add @bartolli/vaglio

Requires Node ≥ 22 LTS. ESM-only.

Quick start

import { sanitize } from '@bartolli/vaglio';

const safe = sanitize(untrustedText);
await llm.send(safe);

Pipeline order: NFKC → strip Unicode invisibles → strip reasoning-tag blocks → redact credentials. All four stages run against DEFAULT_POLICY.

Threat coverage

Deterministic. No ML, no entropy heuristics.

  • Unicode invisibles. Tags block (U+E0001–U+E007F), zero-width (ZWSP, ZWNJ, BOM, word joiner), bidi overrides (U+202A–U+202E, U+2066–U+2069), Mongolian Free Variation Selectors, interlinear annotations, supplementary PUA, supplementary variation selectors, soft hyphen / CGJ / Hangul fillers, invisible math operators, orphaned UTF-16 surrogates.
  • Homoglyph forging via NFKC. Mathematical-alphanumeric (𝐬𝐲𝐬𝐭𝐞𝐦 → system) and fullwidth (<system> → <system>) collapse to ASCII. Cross-script (Greek↔Latin, Cyrillic↔Latin) preserved by spec.
  • Zalgo. Combining-mark cap per base character (default 4).
  • ANSI escapes. ESC[…] sequences — invisible in terminals, tokenized verbatim by LLMs.
  • C0/C1 controls. U+0000–U+001F and U+007F–U+009F stripped except \t, \n, \r. NBSP (U+00A0) preserved (printable, not control).
  • Reasoning-tag leakage. <internal>…</internal> blocks; tag-name set is configurable.
  • Credentials. Anthropic, AWS (AKIA/ASIA), Bearer JWT, Slack, GitHub PAT, Stripe restricted, PEM private key (RSA / EC / Ed25519 / generic), long hex (≥ 64 chars). Default placeholder <credential>.

Where it plugs in

URL  → fetch → HTML→markdown extractor          → markdown ┐
File → document extractor (PDF, DOCX, …)        → text     │
HTML → HTML→text converter                      → text     │
RAG  → retrieve → chunk                         → text     ├→ sanitize() → LLM
Tool → API call → serialize                     → string   │
LLM  → output (router / planner / loop step)    → text     ┘

The last row is the inter-model trust boundary. The receiving model treats peer-model output as untrusted — same threat surface as user input.

Recipes

Sanitize a string

import { sanitize } from '@bartolli/vaglio';
const safe = sanitize(text);

Telemetry via callback

Non-Detailed variants build no findings array unless onFinding is provided. Silent operation is the default cost.

import { sanitize } from '@bartolli/vaglio';

sanitize(text, {
  onFinding: (f) => metrics.emit(f.kind, f.ruleId, f.severity),
});

Detail variant

import { sanitizeDetailed } from '@bartolli/vaglio';

const result = sanitizeDetailed(input);
if (result.text === input) return input; // identity-check fast path (contract)
audit.write(result.findings);
return result.text;

Web Streams

Cross-chunk credentials redact via an internal sliding-window buffer (Policy.bufferLimit, auto-derived from the longest active pattern + 64).

import { createSanitizeStream } from '@bartolli/vaglio';

await response.body!
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(createSanitizeStream({ onFinding: emitMetric }))
  .pipeTo(modelInputSink);

Async iterable

import { sanitizeIterable } from '@bartolli/vaglio';

async function* turns() {
  for await (const turn of agentLoop) yield turn.content;
}

for await (const safe of sanitizeIterable(turns(), { onFinding: emitMetric })) {
  yield safe;
}

Custom credential pattern

import { policy, sanitize } from '@bartolli/vaglio';

const myPolicy = policy()
  .addCredentialPattern(/sot-session-[a-z0-9]{32}/i, {
    ruleId: 'sot-session',
    placeholder: '<session>',
    severity: 'high',
  })
  .build();

const safe = sanitize(text, { policy: myPolicy });

The builder is immutable: each method returns a new instance; build() returns a frozen Policy. Reuse a base to derive variants:

const base     = policy().addReasoningTag('plan');
const stricter = base.disableUnicodeCategory('soft-hyphen-fillers').build();
const lenient  = base.build();

Per-hop policy in a model-to-model chain

import { policy, sanitize } from '@bartolli/vaglio';

const planner = policy().addReasoningTag('scratchpad').addReasoningTag('plan').build();
const router  = policy().build();

const safeRouterOut = sanitize(routerOutput, { policy: router });
await mainModel.generate({ context: safeRouterOut });

const safePlannerOut = sanitize(plannerOutput, { policy: planner });
await worker.run(safePlannerOut);

Low-latency streaming

The default bufferLimit is 4160 characters — driven by the PEM private-key pattern (maxMatchLength: 4096 + 64). At ~4 chars/token, downstream sees ~1k tokens of holdback. Drop the PEM pattern for token-stream agents that don't ingest PEM blocks:

import { policy, createSanitizeStream } from '@bartolli/vaglio';

const lowLatency = policy().removeCredentialPattern('pem-private-key').build();
// bufferLimit = 320 (~80 tokens)

const stream = createSanitizeStream({ policy: lowLatency });

Tool result and RAG

import { sanitize } from '@bartolli/vaglio';

const apiResponse    = await externalTool(args);
const safeToolResult = sanitize(JSON.stringify(apiResponse));

const chunks  = await retriever.search(query);
const context = chunks.map((c) => sanitize(c.text)).join('\n\n');

Findings

type Finding =
  | { kind: 'unicode-strip';     ruleId: string; ruleVersion: number; action: PolicyAction; offset: number; length: number; charClass: string; count: number; severity: Severity; }
  | { kind: 'credential';        ruleId: string; ruleVersion: number; action: PolicyAction; offset: number; length: number; placeholder: string; severity: Severity; }
  | { kind: 'stream-diagnostic'; ruleId: string; ruleVersion: number; severity: Severity; message: string; };

type PolicyAction = 'stripped' | 'redacted' | 'replaced';
type Severity     = 'low' | 'medium' | 'high' | 'critical';

In v0.1, unicode-strip and reasoning-tag findings emit action: 'stripped'; credential findings emit action: 'redacted'. 'replaced' is reserved.

Findings carry no raw snippets — emitting partial credentials or PII into telemetry is a secondary leakage anti-pattern.

ruleVersion increments per CHANGELOG when a builtin's pattern, severity, or charClass changes. Forensic logs survive pattern updates.

Continuous identical infractions (Zalgo run, repeated Tags-block codepoints) aggregate into a single finding via count and length.

Default policy

DEFAULT_POLICY:

| Slot | Default | | -------------------- | --------------------------------------- | | Unicode categories | all 13 enabled (see below) | | Combining-mark cap | 4 per base character | | NFKC | enabled | | Reasoning tags | internal | | Credential patterns | 8 builtins | | Placeholder | <credential> | | bufferLimit | 4160 (PEM-driven) |

Categories: tags-block, zero-width, bidi-override, mongolian-fvs, interlinear-annotations, object-replacement, supplementary-pua, supplementary-variation-selectors, soft-hyphen-fillers, math-invisibles, orphaned-surrogates, ansi-escapes, c0-c1-controls.

Every slot is tunable via the builder.

Streaming contract

  • Identical semantics across batch (sanitize, redact), Web Streams (createSanitizeStream, createRedactStream), and async iterables (sanitizeIterable, redactIterable).
  • Sliding-window buffer holds back the trailing bufferLimit characters per push so a chunk-straddling credential lands in the next leading region.
  • Overflow: when the held tail can't shrink without losing a match, oldest bytes commit and a stream-diagnostic finding fires (ruleId: 'buffer-overflow-warning'). Redaction trumps fidelity.
  • Cancel: TransformStream.cancel(reason) or async-iter return() releases the buffer and discards partial-credential state. Subsequent operations throw VaglioStreamCanceledError. A final stream-diagnostic finding (ruleId: 'stream-canceled') emits via onFinding if subscribed.
  • flush() is idempotent. Double-flush is a no-op. push() after flush() throws.
  • Errors are fail-fast: catastrophic regex backtracking or corrupted state tears down the stream. Consumers must not retry.
  • Identity preservation does not extend to streaming — use the batch surface for reference-equality fast paths.

Roadmap

Planned for v0.2 and beyond:

  • Cross-script homoglyph folding (Greek↔Latin, Cyrillic↔Latin) as an opt-in Policy flag. Off by default — folding across scripts would destroy legitimate non-Latin text.
  • ReDoS static analysis on user-supplied patterns at Policy.build() time. v0.1 builtins are pre-validated at library-build time; user patterns are accepted unchecked.
  • Detail-variant streams. Streaming exposes onFinding only in v0.1; a createSanitizeStreamDetailed shape is under consideration.
  • Pluggable Unicode rulesaddUnicodeRule({ ruleId, range, action }) for codepoint-range stripping outside the closed UnicodeCategory set. Workaround in v0.1: addStripPattern with a regex.
  • Control-token forging detection — model-specific message-role markers (<|im_start|>, Human: / Assistant:, etc.). Requires per-model presets.
  • External-content tagging — auto-wrap RAG output in <untrusted-data>…</untrusted-data>.

Out of scope

These belong in other layers, not in vaglio:

  • HTML parsing. Vaglio operates on text and markdown only. Run an HTML sanitizer upstream.
  • Content extraction. Vaglio sanitizes already-extracted text, not raw documents. Run an extractor upstream.
  • Binary / EXIF stripping. Different domain (image / file).
  • Channel-specific output formatting. App policy — chat-platform rendering, emoji allowlists, length caps, casing normalization.
  • Non-deterministic detection. Repetition / entropy heuristics, ML-based detection, indirect prompt-injection (IPIA) defenses, token-boundary disruption mitigation. Vaglio is deterministic by contract — these belong in a separate detection layer.

Compatibility

| Runtime | v0.1 | | -------------------------------------------------- | --------------------------------- | | Node ≥ 22 LTS | CI matrix on 22 + 24 | | Browsers | supported via standard Web APIs (not in CI) | | Edge runtimes | supported via standard surfaces (not in CI) | | Alternative JS runtimes | deferred from CI |

License

MIT