epub-wasm
v0.2.0
Published
EPUB utilities compiled to WebAssembly
Downloads
751
Readme
epub-wasm
A Rust crate that compiles to WebAssembly for parsing EPUB files into structured JSON format. This crate provides the core EPUB parsing logic that powers the epub-wasm npm package.
Overview
This crate leverages the rbook to parse EPUB files and extract their content into a clean JSON structure with chapters, headings, and paragraphs. The parsed data is then exposed via WebAssembly bindings for use in web applications.
Features
- EPUB parsing: Extracts text content from EPUB files
- Structured output: Organizes content into books, chapters, and blocks
- WebAssembly compatible: Designed for compilation to WASM
- Fast: Leverages Rust's performance for efficient parsing (parse times typically range from ~20 ms to ~60 ms)
Building
Prerequisites
Build for WebAssembly
# Install wasm-pack if not installed
cargo install wasm-pack
# Build for bundler (recommended for Vite/SvelteKit)
wasm-pack build --release --target bundler
# Or build for web (serves WASM from URL)
# use this to test with the index.html
wasm-pack build --release --target webBuild for Native (Testing)
cargo build --releaseOptimization Levels
You can adjust the opt-level in Cargo.toml to balance between performance and binary size:
| Value | Meaning |
| ----- | ---------------------------------------------- |
| 0 | No optimization (fast compile, slow binary) |
| 1 | Basic optimizations |
| 2 | Good balance (default for release) |
| 3 | Maximum performance |
| "s" | Optimize for small binary size |
| "z" | Optimize for smallest possible binary size |
For WebAssembly, you might want to use "z" for minimal size or "s" for a balance. The current setting is opt-level = 3 for maximum performance.
Usage
This crate is primarily designed to be compiled to WebAssembly. The main entry point is the parse_epub function:
use epub_wasm::parse_epub;
// Parse EPUB bytes into JSON string
let epub_data: &[u8] = // ... load EPUB file
let json_result = parse_epub(epub_data);
let book: serde_json::Value = serde_json::from_str(&json_result).unwrap();API
parse_epub(data: &[u8]) -> String
Parses an EPUB file from raw bytes and returns a JSON string representation.
Parameters:
data: Raw bytes of the EPUB file
Returns: JSON string with the following structure:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Book Title",
"chapters": [
{
"title": "Chapter Title",
"id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"blocks": [
{
"type": "heading",
"text": "Heading Text",
"position": "0"
},
{
"type": "paragraph",
"text": "Paragraph text...",
"position": "1"
}
]
}
]
}Each block in the parsed output includes a position field, a lexicographically sortable string that defines its stable reading order within the document. eg sequence:
"0", "1", "2", ..., "9", "A", "B", ..., "z", "10", "11", ...
Dependencies
rbook: EPUB parsing libraryserde: Serialization frameworkwasm-bindgen: WebAssembly bindingsuuid: Id
Development
Running the Web Example
After building with wasm-pack, you can test the generated package:
cd pkg
# Serve locally (requires a web server)
python3 -m http.server 8000
# Then open index.html in browser