@officialunofficial/trek
v0.2.1
Published
A web content extraction library that removes clutter from web pages
Maintainers
Readme
Trek

A modern web content extraction library written in Rust, compiled to WebAssembly.
Trek removes clutter from web pages and extracts clean, readable content. It's designed as a modern alternative to Mozilla Readability with enhanced features like mobile-aware extraction and consistent HTML standardization.
Features
- 🦀 Written in Rust for performance and safety
- 🌐 Compiles to WebAssembly for browser usage
- 📱 Mobile-aware content extraction
- 🎯 Site-specific extractors for popular platforms
- 🔧 Configurable extraction options
- 📊 Content scoring algorithm
- 🏷️ Metadata extraction (title, author, date, etc.)
Installation
As a Rust library
[dependencies]
trek-rs = "0.1"As a WASM/JavaScript module
npm install @officialunofficial/trekOr with other package managers:
# Yarn
yarn add @officialunofficial/trek
# pnpm
pnpm add @officialunofficial/trek
# Bun
bun add @officialunofficial/trekUsage
Rust
use trek_rs::{Trek, TrekOptions};
let options = TrekOptions {
debug: false,
url: Some("https://example.com".to_string()),
..Default::default()
};
let trek = Trek::new(options);
let result = trek.parse(html_content)?;
println!("Title: {}", result.metadata.title);
println!("Content: {}", result.content);Web Playground
Trek includes an interactive web playground for testing content extraction:
# Build WASM and start the playground server
make playground
# Open http://localhost:8000/playground/ in your browserThe playground provides:
- Live Extraction: Paste HTML and see extracted content instantly
- Multiple Views: Switch between content, metadata, raw JSON, and debug tabs
- Extraction Options: Toggle clutter removal and metadata inclusion
- Example Content: Pre-loaded example to demonstrate Trek's capabilities
Playground Features
- Content Tab: Shows the extracted article content with proper formatting
- Metadata Tab: Displays title, author, word count, and other metadata
- Raw JSON Tab: View the complete extraction response
- Debug Tab: See extraction details and performance metrics
JavaScript/TypeScript
import init, { TrekWasm } from '@officialunofficial/trek';
// Initialize the WASM module
await init();
const trek = new TrekWasm({
debug: false,
url: 'https://example.com'
});
const result = await trek.parse(htmlContent);
console.log('Title:', result.title);
console.log('Content:', result.content);Building
Native library
cargo build --releaseWebAssembly
wasm-pack build --target web --out-dir pkgDevelopment
# Run tests
cargo test
# Run clippy
cargo clippy --all-targets --all-features
# Format code
cargo fmt
# Generate changelog
git cliff -o CHANGELOG.mdContributing
We welcome contributions! Trek uses conventional commits and automated changelog generation.
Quick Start
# Install development dependencies
make install-dev-deps
# Configure git for conventional commits
make setup-git
# Run pre-commit checks
make pre-commitCommit Message Format
We follow the Conventional Commits specification:
<type>(<scope>): <subject>
<body>
<footer>Types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
Examples:
feat(wasm): add support for custom headersfix(parser): handle empty meta tags correctlydocs: update installation instructions
For detailed contribution guidelines, see CONTRIBUTING.md.
Credits
Trek is a fork of Defuddle by @kepano, refactored into Rust, adding WebAssembly support, site-specific extractors, and additional features.
License
MIT
