shlesha
v0.5.4
Published
High-performance extensible transliteration library with hub-and-spoke architecture
Readme
Shlesha - Schema-Driven Transliteration Library
A transliteration library for Sanskrit and Indic scripts using schema-driven architecture. Built with compile-time optimization and runtime schema loading.
Quick Start
Setup command:
./scripts/quick-start.shThis sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.
For detailed setup instructions, see DEVELOPER_SETUP.md.
Documentation: See DOCUMENTATION_INDEX.md for guides and references.
Architecture Features
- Schema-generated converters with compile-time optimization
- Zero runtime overhead from code generation
- Token-based conversion system for memory efficiency
Schema-Based Architecture
Compile-Time Code Generation
Converters are generated at compile-time from declarative schemas:
# schemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
name: "slp1"
script_type: "roman"
description: "Sanskrit Library Phonetic Basic"
target: "iso15919"
mappings:
vowels:
"A": "ā"
"I": "ī"
"U": "ū"
# ... more mappings# schemas/bengali.yaml - Generates optimized Bengali converter
metadata:
name: "bengali"
script_type: "brahmic"
description: "Bengali/Bangla script"
mappings:
vowels:
"অ": "अ" # Bengali A → Devanagari A
"আ": "आ" # Bengali AA → Devanagari AA
# ... more mappingsBuild-Time Optimization
The build system automatically generates highly optimized converters:
# Build output showing schema processing
warning: Processing YAML schemas...
warning: Generating optimized converters with Handlebars templates...
warning: Created 18 schema-generated converters with O(1) lookupsHub-and-Spoke Architecture
Multi-Hub Design
- Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
- ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
- Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
- Direct Conversion: Bypass hubs when possible for maximum performance
Routing
The system determines the conversion path:
// Direct passthrough - zero conversion cost
transliterator.transliterate("धर्म", "devanagari", "devanagari")?; // instant
// Single hub - one conversion
transliterator.transliterate("धर्म", "devanagari", "iso")?; // deva→iso
// Cross-hub - optimized path
transliterator.transliterate("dharma", "itrans", "bengali")?; // itrans→iso→deva→bengaliSupported Scripts
Indic Scripts (Schema-Generated)
- Devanagari (
devanagari,deva) - Sanskrit, Hindi, Marathi - Bengali (
bengali,bn) - Bengali/Bangla script - Tamil (
tamil,ta) - Tamil script - Telugu (
telugu,te) - Telugu script - Gujarati (
gujarati,gu) - Gujarati script - Kannada (
kannada,kn) - Kannada script - Malayalam (
malayalam,ml) - Malayalam script - Odia (
odia,od) - Odia/Oriya script - Gurmukhi (
gurmukhi,pa) - Punjabi script - Sinhala (
sinhala,si) - Sinhala script - Sharada (
sharada,shrd) - Historical script of Kashmir, crucial for Vedic manuscripts - Tibetan (
tibetan,tibt,bo) - Important for Buddhist Vedic transmission - Thai (
thai,th) - Adapted from Grantha for Buddhist Vedic texts
Romanization Schemes (Schema-Generated)
- ISO-15919 (
iso15919,iso) - International standard - ITRANS (
itrans) - Indian languages TRANSliteration - SLP1 (
slp1) - Sanskrit Library Phonetic Basic - Harvard-Kyoto (
harvard_kyoto,hk) - ASCII-based scheme - Velthuis (
velthuis) - TeX-compatible scheme - WX (
wx) - ASCII-based notation
Hand-Coded Scripts
- IAST (
iast) - International Alphabet of Sanskrit Transliteration - Kolkata (
kolkata) - Regional romanization scheme - Grantha (
grantha) - Classical Sanskrit script
Usage Examples
Rust Library
use shlesha::Shlesha;
let transliterator = Shlesha::new();
// High-performance cross-script conversion
let result = transliterator.transliterate("धर्म", "devanagari", "gujarati")?;
println!("{}", result); // "ધર્મ"
// Roman to Indic conversion
let result = transliterator.transliterate("dharmakṣetra", "slp1", "tamil")?;
println!("{}", result); // "தர்மக்ஷேத்ர"
// Schema-generated converters in action
let result = transliterator.transliterate("dharmakSetra", "slp1", "iast")?;
println!("{}", result); // "dharmakśetra"Python Bindings (PyO3)
import shlesha
# Create transliterator with all schema-generated converters
transliterator = shlesha.Shlesha()
# Fast schema-based conversion
result = transliterator.transliterate("ধর্ম", "bengali", "telugu")
print(result) # "ధర్మ"
# Performance with metadata tracking
result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")
print(f"Output: {result.output}") # "dharmakr"
print(f"Unknown tokens: {len(result.metadata.unknown_tokens)}")
# Runtime extensibility
scripts = shlesha.get_supported_scripts()
print(f"Supports {len(scripts)} scripts: {scripts}")Command Line Interface
# Schema-generated high-performance conversion
shlesha transliterate --from slp1 --to devanagari "dharmakSetra"
# Output: धर्मक्षेत्र
# Cross-script conversion via dual hubs
shlesha transliterate --from itrans --to tamil "dharma"
# Output: தர்ம
# List all schema-generated + hand-coded scripts
shlesha scripts
# Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...WebAssembly (Browser/Node.js)
import init, { WasmShlesha } from './pkg/shlesha.js';
async function demo() {
await init();
const transliterator = new WasmShlesha();
// Schema-generated converter performance in browser
const result = transliterator.transliterate("કર્મ", "gujarati", "devanagari");
console.log(result); // "कर्म"
// Runtime script discovery
const scripts = transliterator.listSupportedScripts();
console.log(`${scripts.length} scripts available`);
}Runtime Schema Loading
Shlesha supports runtime schema loading across all APIs to add custom scripts without recompilation.
Rust API
use shlesha::Shlesha;
let mut transliterator = Shlesha::new();
// Load custom schema from YAML content
let custom_schema = r#"
metadata:
name: "my_custom_script"
script_type: "roman"
has_implicit_a: false
description: "My custom transliteration scheme"
target: "iso15919"
mappings:
vowels:
"a": "a"
"e": "ē"
consonants:
"k": "k"
"t": "ṭ"
"#;
// Load the schema at runtime
transliterator.load_schema_from_string(custom_schema, "my_custom_script")?;
// Use immediately without recompilation
let result = transliterator.transliterate("kate", "my_custom_script", "devanagari")?;
println!("{}", result); // "काटे"
// Schema management
let info = transliterator.get_schema_info("my_custom_script").unwrap();
println!("Loaded {} with {} mappings", info.name, info.mapping_count);Python API
import shlesha
transliterator = shlesha.Shlesha()
# Load schema from YAML string
yaml_content = """
metadata:
name: "custom_script"
script_type: "roman"
has_implicit_a: false
description: "Custom transliteration"
target: "iso15919"
mappings:
vowels:
"a": "a"
consonants:
"k": "k"
"""
# Runtime loading
transliterator.load_schema_from_string(yaml_content, "custom_script")
# Immediate usage
result = transliterator.transliterate("ka", "custom_script", "devanagari")
print(result) # "क"
# Schema info
info = transliterator.get_schema_info("custom_script")
print(f"Script: {info['name']}, Mappings: {info['mapping_count']}")
# Schema management
transliterator.remove_schema("custom_script")
transliterator.clear_runtime_schemas()JavaScript/WASM API
import init, { WasmShlesha } from './pkg/shlesha.js';
async function loadCustomScript() {
await init();
const transliterator = new WasmShlesha();
// Define custom schema
const yamlContent = `
metadata:
name: "custom_script"
script_type: "roman"
has_implicit_a: false
description: "Custom script"
target: "iso15919"
mappings:
vowels:
"a": "a"
consonants:
"k": "k"
`;
// Load at runtime
transliterator.loadSchemaFromString(yamlContent, "custom_script");
// Use immediately
const result = transliterator.transliterate("ka", "custom_script", "devanagari");
console.log(result); // "क"
// Get schema information
const info = transliterator.getSchemaInfo("custom_script");
console.log(`Name: ${info.name}, Mappings: ${info.mapping_count}`);
}Key Runtime Features
- ✅ Load from YAML strings - No file system required
- ✅ Load from file paths - For development workflows
- ✅ Schema validation - Automatic error checking
- ✅ Hot reloading - Add/remove schemas dynamically
- ✅ Schema introspection - Get metadata about loaded schemas
- ✅ Memory management - Clear schemas when done
- ✅ Cross-platform - Identical API across Rust, Python, WASM
Use Cases
Development & Testing
// Test schema variations quickly
transliterator.load_schema_from_string(variant_a, "test_a")?;
transliterator.load_schema_from_string(variant_b, "test_b")?;
// Compare results immediatelyDynamic Applications
# User uploads custom transliteration scheme
user_schema = request.files['schema'].read().decode('utf-8')
transliterator.load_schema_from_string(user_schema, user_id)
# Use immediately in applicationConfiguration-Driven Systems
// Load schemas from configuration
config.schemas.forEach(schema => {
transliterator.loadSchemaFromString(schema.content, schema.name);
});Performance & Benchmarks
Performance Analysis
Shlesha uses a hub-and-spoke architecture with schema-generated converters, trading some performance for extensibility compared to direct conversion approaches.
Performance Characteristics
- Competitive with other transliteration libraries
- Schema-generated converters match hand-coded performance
- Optimized for both short and long text processing
Architecture Trade-offs
| Aspect | Shlesha | Vidyut | |--------|---------|---------| | Performance | Hub-based | Direct conversion | | Extensibility | Runtime schemas | Compile-time only | | Script Support | 15+ (easily expandable) | Limited | | Architecture | Hub-and-spoke | Direct conversion | | Bindings | Rust/Python/WASM/CLI | Rust only |
Schema-Driven Development
Adding New Scripts
Adding support for new scripts with schemas:
# schemas/new_script.yaml
metadata:
name: "NewScript"
description: "Description of the script"
unicode_block: "NewScript"
has_implicit_vowels: true
mappings:
vowels:
- source: "𑀅" # New script character
target: "अ" # Devanagari equivalent
# ... add more mappings# Rebuild to include new script
cargo build
# New script automatically available!Template-Based Generation
Converters are generated using Handlebars templates for consistency:
{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
{{snake_case metadata.name}}_to_deva_map: HashMap<char, char>,
deva_to_{{snake_case metadata.name}}_map: HashMap<char, char>,
}
impl {{pascal_case metadata.name}}Converter {
pub fn new() -> Self {
// Generated O(1) lookup tables
let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
{{#each character_mappings}}
{{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
{{/each}}
// ... template continues
}
}Quality Assurance
Test Suite
- 127 tests covering all functionality
- Schema-generated converter tests for all 14 generated converters
- Performance regression tests ensuring schema = hand-coded speed
- Cross-script conversion matrix testing all 210+ pairs
- Unknown character handling
Build System Validation
# Test schema-generated converters maintain performance
cargo test --lib
# Verify all conversions work
cargo test comprehensive_bidirectional_tests
# Performance benchmarks
cargo run --example shlesha_vs_vidyut_benchmarkBuild Configuration & Features
Schema Processing Features
# Default: Schema-generated + hand-coded converters
cargo build
# Development mode with schema recompilation
cargo build --features "schema-dev"
# Minimal build (hand-coded only)
cargo build --no-default-features --features "hand-coded-only"
# All features (Python + WASM + CLI)
cargo build --features "python,wasm,cli"Runtime Extensibility
let mut transliterator = Shlesha::new();
// Load additional schemas at runtime (future feature)
transliterator.load_schema("path/to/new_script.yaml")?;
// Schema registry access
let scripts = transliterator.list_supported_scripts();
println!("Dynamically loaded: {:?}", scripts);Advanced Features
Metadata Collection
// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")?;
if let Some(metadata) = result.metadata {
println!("Conversion: {} → {}", metadata.source_script, metadata.target_script);
for unknown in metadata.unknown_tokens {
println!("Unknown '{}' at position {}", unknown.token, unknown.position);
}
}Script Characteristics
// Schema-aware script properties
let registry = ScriptConverterRegistry::default();
// Indic scripts have implicit vowels
assert!(registry.script_has_implicit_vowels("bengali").unwrap());
assert!(registry.script_has_implicit_vowels("devanagari").unwrap());
// Roman schemes don't
assert!(!registry.script_has_implicit_vowels("itrans").unwrap());
assert!(!registry.script_has_implicit_vowels("slp1").unwrap());Hub Processing Control
// Fine-grained control over conversion paths
let hub = Hub::new();
// Direct hub operations
let iso_text = hub.deva_to_iso("धर्म")?; // Devanagari → ISO
let deva_text = hub.iso_to_deva("dharma")?; // ISO → Devanagari
// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata("धर्म")?;Documentation
- Architecture Guide - Deep dive into hub-and-spoke design
- Schema Reference - Complete schema format documentation
- Performance Guide - Optimization techniques and benchmarks
- API Reference - Complete function and type reference
- Developer Setup - Development environment setup
- Release System - Automated release workflow overview
- Deployment Guide - Complete deployment and environment setup
- crates.io RC Support - Release candidate publishing guide
- Security Setup - Token management and environment security
- Contributing Guide - Guidelines for contributors
Quick Reference
# Generate documentation
cargo doc --open
# Run all examples
cargo run --example shlesha_vs_vidyut_benchmark
cargo run --example roman_allocation_analysis
# Performance testing
cargo benchReleases
Shlesha uses an automated release system for publishing to package registries:
Quick Release
# Guided release process
./scripts/release.shPackage Installation
# Python (PyPI)
pip install shlesha
# WASM (npm)
npm install shlesha-wasm
# Rust (crates.io)
cargo add shleshaSee DEPLOYMENT.md for complete release documentation.
Contributing
Contributions are welcome. The schema-driven architecture simplifies adding new scripts:
- Add Schema: Create TOML/YAML mapping file
- Test: Run test suite to verify
- Benchmark: Ensure performance maintained
- Submit: Open PR with schema and tests
See CONTRIBUTING.md for detailed guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Unicode Consortium for Indic script standards
- ISO-15919 for romanization standardization
- Sanskrit Library for SLP1 encoding schemes
- Vidyut Project for performance benchmarking standards
- Rust Community for excellent tools (PyO3, wasm-pack, handlebars)
