@pixagram/sanitizer
v0.2.7
Published
Secure content processing for PIXA blockchain — Markdown/HTML rendering, sanitization, metadata parsing, summarization
Readme
pixa-content
Secure content processing engine for the PIXA blockchain platform. Rust → WebAssembly module for browser-side rendering of posts, comments, profiles, and metadata.
Features
| Feature | Description |
|---------|-------------|
| Post Rendering | Markdown/HTML → sanitized HTML with full formatting |
| Comment Rendering | Stricter subset — no headings, tables, iframes |
| @Mentions | @username → <a href="/@username"> with validation |
| #Hashtags | #tag → <a href="/trending/tag"> |
| Image Extraction | Returns image list; render with/without images |
| Base64 Images | Validates data:image/* URIs, blocks dangerous MIME types |
| External Links | Marks external links with data-* attrs for React dialog |
| JSON Sanitizer | safeJson — sanitizes any JSON tree (keys, strings, base64) |
| Biography | HTML → plain text sanitization |
| Username | HIVE-compatible validation (3-16 chars, a-z0-9.-) |
| Metadata | parseMetadata — safeJson + known-field extraction |
| Plain Text | Strip all formatting, return clean text |
| Summarization | TF-IDF extractive summarization |
| XSS Protection | Whitelist-based sanitization via ammonia |
Security Model
- Whitelist-based HTML sanitization via
ammonia— only approved tags and attributes pass through - No script execution —
<script>, event handlers (onclick,onerror, etc.), andjavascript:URIs are all blocked - SVG safety — Base64 SVGs are decoded and checked for embedded scripts
- Link isolation — External links get
rel="noopener noreferrer"andtarget="_blank" - Input limits — Maximum body length, image size, nesting depth, and URL length enforced
- Username validation — Strict HIVE-compatible rules prevent injection via mention links
- JSON sanitization — Every key validated, every string HTML-stripped, embedded JSON-in-strings rejected, dangerous URI schemes blocked, base64 images validated
Build
Prerequisites
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup target add wasm32-unknown-unknown
# wasm-pack
cargo install wasm-pack
# Optional: binaryen for extra ~20-30% size reduction
npm install -g binaryenBuild Commands
# Standard release build (optimized for size)
./scripts/build.sh release
# Smaller build with wee_alloc (trades speed for ~20KB less)
./scripts/build.sh release-small
# Dev build (fast compile, no optimization)
./scripts/build.sh dev
# For bundlers (webpack, vite, rollup)
./scripts/build.sh bundler
# Run all tests
./scripts/build.sh testExpected Output Size
| Build | Raw WASM | + wasm-opt | Gzip |
|-------|----------|------------|------|
| release | ~800KB-1.2MB | ~600KB-900KB | ~250KB-350KB |
| release-small | ~750KB-1.1MB | ~550KB-850KB | ~220KB-320KB |
Why not smaller? The bulk comes from
ammonia(HTML sanitizer usinghtml5everparser) andregex. These are essential for security. The gzipped transfer size is what matters for users — and 250-350KB gzipped is reasonable for a full content engine replacing multiple JavaScript libraries.
Usage
JavaScript / React
import { PixaContent } from './pixa-content.js';
// Initialize once
const pixa = await PixaContent.init('./pixa_content.js');
// ── Render a post ──────────────────────────────
const result = pixa.sanitizePost(postBody, {
max_image_count: 0, // 0 = unlimited
internal_domains: ['pixa.pics', 'custom-domain.com'],
});
console.log(result.html); // Sanitized HTML
console.log(result.images); // [{ src, alt, is_base64, index }]
console.log(result.links); // [{ href, text, domain, is_external }]
// ── Render a comment ───────────────────────────
const comment = pixa.sanitizeComment(commentBody);
// ── Render a memo ──────────────────────────────
const memo = pixa.sanitizeMemo(memoBody);
// ── Sanitize any JSON (the main entry point) ───
const safe = pixa.safeJson(rawJsonString);
// Every key validated, every string HTML-stripped,
// embedded JSON rejected, base64 images preserved.
// ── Sanitize a single string ───────────────────
const title = pixa.safeString(rawTitle); // '' if unsafe
const category = pixa.safeString(rawCategory);
// ── Parse metadata (convenience wrapper) ───────
const meta = pixa.parseMetadata(jsonMetadataString);
console.log(meta.profile); // Sanitized JSON object (whatever was in source)
console.log(meta.tags); // ['tag1', 'tag2']
console.log(meta.image); // ['https://...']
console.log(meta.app); // 'peakd/2024.10.11'
console.log(meta.extra); // Any unknown fields (already sanitized)
// ── Extract plain text ─────────────────────────
const text = pixa.extractPlainText(postBody);
// ── Summarize ──────────────────────────────────
const summary = pixa.summarizeContent(postBody, 3);
console.log(summary.summary); // Top 3 sentences joined
console.log(summary.keywords); // Top keywords with scores
console.log(summary.sentences); // Scored sentences with positions
// ── Sanitize profile data ──────────────────────
const bio = pixa.sanitizeBiography(rawBio, 256); // max 256 chars
const name = pixa.sanitizeUsername(rawUsername); // '' if invalidExternal Link Dialog (React)
import { ExternalLinkDialog } from './ExternalLinkDialog';
function PostContent({ html }) {
return (
<ExternalLinkDialog
onNavigate={(href, domain) => {
console.log('User navigated to:', domain);
}}
>
<div dangerouslySetInnerHTML={{ __html: html }} />
</ExternalLinkDialog>
);
}Or with a custom dialog:
<ExternalLinkDialog
renderDialog={({ href, domain, onConfirm, onCancel }) => (
<YourCustomModal
title={`Leave Pixa?`}
message={`Navigate to ${domain}?`}
onYes={onConfirm}
onNo={onCancel}
/>
)}
>
<div dangerouslySetInnerHTML={{ __html: html }} />
</ExternalLinkDialog>Using from Rust (Non-WASM)
use pixa_content::{extract_plain_text, sanitize_biography, sanitize_username};
use pixa_content::sanitizer::{sanitize_post, safe_json, SanitizeOptions};
// Sanitize a post
let opts = SanitizeOptions::default();
let result = sanitize_post("# Hello @world\n\nCheck #pixelart!", &opts);
println!("HTML: {}", result.html);
println!("Images: {:?}", result.images);
println!("Links: {:?}", result.links);
// Sanitize any JSON metadata
let clean = safe_json(r#"{"tags":["art"],"profile":{"name":"Alice"}}"#).unwrap();Architecture
pixa-content/
├── Cargo.toml # Dependencies & WASM optimization config
├── src/
│ ├── lib.rs # Public API & WASM bindings
│ ├── types.rs # Shared types (ImageInfo, LinkInfo, ParsedMetadata, etc.)
│ ├── sanitizer.rs # ammonia-based HTML sanitization + safe_json/safe_string/safe_key
│ ├── mentions.rs # @mention and #hashtag processing
│ ├── images.rs # Image extraction, base64 validation
│ ├── links.rs # External link detection & wrapping
│ ├── text.rs # Plain text extraction, sentence splitting
│ ├── summarizer.rs # TF-IDF extractive summarization
│ └── metadata.rs # JSON metadata parsing (thin wrapper over safe_json)
├── tests/
│ └── integration.rs # Full pipeline integration tests
├── js/
│ ├── pixa-content.js # JS wrapper API
│ └── ExternalLinkDialog.jsx # React external link component
├── scripts/
│ └── build.sh # Build & optimization script
└── README.mdProcessing Pipeline
Input (Markdown or HTML)
│
├──► Detect format (is_predominantly_html)
│
├──► If Markdown: pulldown-cmark → HTML
│
├──► Extract images (before sanitization)
│
├──► Process @mentions and #hashtags
│ (text nodes only, not inside existing links)
│
├──► Sanitize HTML (ammonia whitelist)
│ ├── Post: full formatting
│ ├── Comment: restricted subset
│ └── Memo: inline only
│
├──► Process links (internal vs external)
│ └── External: add data-* attrs for React
│
├──► Optionally strip/limit images
│
└──► Return { html, images, links }JSON Sanitization Pipeline
Input (raw JSON string)
│
├──► Parse JSON
│
├──► Walk tree recursively:
│ ├── Keys: safe_key → validate identifier format, drop bad keys
│ ├── Strings: safe_string →
│ │ ├── Reject embedded JSON-in-strings
│ │ ├── data:image/* → validate base64, pass through
│ │ ├── URLs → strip control chars (preserve & in query strings)
│ │ ├── Text → strip ALL HTML via ammonia, reject dangerous schemes
│ │ └── Enforce length limits
│ ├── Numbers: pass through (must be finite)
│ ├── Arrays: recurse (max 100 elements)
│ └── Objects: recurse (max 50 keys, max depth 5)
│
└──► Return sanitized JSON valueMetadata Flexibility
parseMetadata calls safeJson first, then extracts known fields for convenience:
{
"profile": { "name": "...", "about": "...", "profile_image": "..." },
"tags": ["tag1", "tag2"],
"app": "peakd/2024.10.11",
"format": "markdown",
"pixa_nft_id": "preserved-in-extra",
"custom_field": { "also": "preserved" }
}All fields are sanitized by safeJson before extraction. Unknown fields land in extra.
For full control, use safeJson directly — it handles any JSON structure.
License
Proprietary — Pixagram SA. All rights reserved.
