@kiroku-solutions/typefix-wasm-core

v1.1.6

Published

2 days ago

Hyper-lightweight, zero-latency typo correction and language detection engine

0High
0Medium
0Low

kirokusolutions

typo correction language-detection trie damerau-levenshtein

TypeFix

Hyper-lightweight, zero-latency typo correction and language detection engine written in Rust.

Open-source project maintained by Kiroku Solutions. Licensed under MIT OR Apache-2.0.

Features

Zero-latency correction: O(1) lookup for known typos, O(m*n) for Damerau-Levenshtein
Dynamic language detection: Bayesian inference with rolling window
Memory efficient: < 10MB RAM footprint (target)
Cross-platform: Windows, Linux, macOS support
Fail-safe design: Graceful degradation on any error
No garbage collection: 100% Rust, zero allocations in hot path

Performance

Stress test results on typical hardware:

| Benchmark | Result | |-----------|--------| | 50K word dictionary insert | 354ms | | 50K word search | 3.6ms | | 10K corrections/second | ✅ | | Memory (idle) | < 10MB target | | Latency per correction | < 1ms target |

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     TypeFixPipeline                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐    ┌──────────────┐    ┌─────────────────────┐   │
│  │CharBuffer│───▶│LanguageDetec│───▶│ CorrectionEngine    │   │
│  │ (64 chars)│    │   tor       │    │  ┌───────────────┐  │   │
│  └──────────┘    │              │    │  │ StaticErrorMap│  │   │
│      │           │  • Bayesian  │    │  │ (O(1) lookup) │  │   │
│      │           │  • Stopwords │    │  └───────────────┘  │   │
│      │           │  • Window     │    │  ┌───────────────┐  │   │
│      │           │    (5 words)  │    │  │ Trie (fuzzy)  │  │   │
│      ▼           └──────────────┘    │  │ (O(m*n) DL)   │  │   │
│  ┌──────────┐                       │  └───────────────┘  │   │
│  │ Delimiter│                       └─────────────────────┘   │
│  │ (space,.)│                                                 │
│  └──────────┘                                                 │
└─────────────────────────────────────────────────────────────────┘

Data Flow

Keystroke Input → CharBuffer accumulates characters
Delimiter detected → Buffer emits word token
Language Detection → Rolling window of 5 words analyzed
Typo Correction → Static map (instant) or Damerau-Levenshtein (fuzzy)
Result → Return corrected word if different

Quick Start

Basic Usage

use typefix::{TypeFixPipeline, PipelineConfig};

// Create simple pipeline with test data
let pipeline = TypeFixPipeline::simple();

// Process text character by character
for ch in "teh world".chars() {
    if let Some(result) = pipeline.push(ch) {
        println!("Word: {}, Corrected: {:?}",
                 result.original,
                 result.corrected);
    }
}
// Output:
// Word: teh, Corrected: Some("the")
// Word: world, Corrected: None

Using Individual Components

use typefix::{CharBuffer, LanguageDetector, CorrectionEngine};

// Buffer - accumulates keystrokes until delimiter
let buffer = CharBuffer::new();
buffer.push('h');
buffer.push('e');
buffer.push('l');
buffer.push('l');
let word = buffer.push(' '); // Some("hell")

// Language Detection - Bayesian inference
let mut detector = LanguageDetector::new(Default::default());
detector.set_language("en");
detector.process_word("the");

// Correction - static map + fuzzy matching
let mut engine = CorrectionEngine::new(Default::default());
let result = engine.correct("qeu"); // Some("que")

Event Subscription

use typefix::{TypeFixPipeline, PipelineConfig, PipelineEvent};

let config = PipelineConfig::default();
let mut pipeline = TypeFixPipeline::new(config);

// Subscribe to pipeline events
pipeline.on_event(|event| {
    match event {
        PipelineEvent::WordExtracted { word } => {
            println!("Extracted: {}", word);
        }
        PipelineEvent::WordCorrected { original, corrected } => {
            println!("{} → {}", original, corrected);
        }
        PipelineEvent::LanguageDetected { language, confidence } => {
            println!("Detected: {} ({:.0}%)", language, confidence * 100.0);
        }
    }
});

Building

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Run stress tests
cargo run --bin stress-runner

# Run benchmarks
cargo bench

Testing

# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

# Run specific test file
cargo test --test stress_test

# Run specific test
cargo test test_name

Data Files

data/
├── dictionaries/   # Word dictionaries by language
│   ├── es.json     # Spanish
│   ├── en.json     # English
│   └── pt.json     # Portuguese
├── stopwords/      # Stopwords for language detection
│   ├── es.json
│   ├── en.json
│   └── pt.json
└── errors/         # Common typos
    ├── es.json
    └── en.json

Dictionary Format

{
  "language": "es",
  "version": "1.0",
  "words": [
    {"word": "que", "frequency": 1000000},
    {"word": "hola", "frequency": 500000}
  ]
}

Stopwords Format

{
  "language": "es",
  "stopwords": ["el", "la", "de", "que", "y"]
}

Common Errors Format

{
  "language": "es",
  "errors": {
    "qeu": "que",
    "qe": "que",
    "teh": "the"
  }
}

Configuration

Default Configuration

// Pipeline
PipelineConfig {
    max_buffer_size: 64,
    enable_correction: true,
    enable_language_detection: true,
}

// Engine
EngineConfig {
    max_edit_distance: 1,
    max_corrections: 3,
    min_word_length: 2,
    dictionary: None,
    error_map: None,
}

// Detector
DetectorConfig {
    window_size: 5,
    confidence_threshold: 0.85,
    hysteresis_zone: 0.10,
    min_words_before_switch: 5,
}

JSON Configuration

{
  "language_detection": {
    "window_size": 5,
    "confidence_threshold": 0.85,
    "hysteresis_zone": 0.10,
    "min_words_before_switch": 5
  },
  "correction": {
    "max_edit_distance": 1,
    "max_corrections": 3,
    "min_word_length": 2
  },
  "buffer": {
    "max_size": 64
  }
}

Project Structure

typefix/
├── src/
│   ├── core/           # Core data structures
│   │   ├── buffer.rs   # Ring buffer for keystrokes
│   │   ├── trie.rs     # Trie for dictionary lookups
│   │   └── config.rs   # Configuration management
│   ├── language/       # Language detection
│   │   ├── detector.rs # Bayesian language detector
│   │   └── stopwords.rs# Stopwords trie
│   ├── correction/     # Typo correction
│   │   ├── damerau.rs  # Damerau-Levenshtein distance
│   │   ├── engine.rs   # Correction engine
│   │   └── static_map.rs# Static error map
│   ├── pipeline.rs     # Complete processing pipeline
│   ├── hooks/          # Platform hooks
│   │   ├── platform.rs # Platform abstraction
│   │   ├── windows.rs  # Windows implementation
│   │   ├── linux.rs    # Linux implementation
│   │   └── macos.rs    # macOS implementation
│   ├── memory.rs       # Memory profiling utilities
│   ├── benchmark.rs    # Performance benchmarks
│   └── lib.rs          # Library entry point
├── tests/
│   └── stress_test.rs  # Stress tests and benchmarks
├── docs/
│   ├── plan-implementacion.md
│   ├── final-review-2026-06-16.md
│   └── integration-ehr-legal.md  # EHR/Legal integration guide
├── Cargo.toml
└── README.md

EHR and Legal Integration

The TypeFix supports integration with Electronic Health Records (EHR) and Legal document management systems. See docs/integration-ehr-legal.md for:

EHR Integration: HIPAA-compliant configuration, medical terminology support (ICD-10, SNOMED CT), audit trails
Legal Integration: Citation preservation, redlining support, multi-jurisdiction dictionaries
API Reference: Full API documentation with data structures
Security Considerations: PHI handling, audit logging patterns
Example Code: Production-ready integration handlers for both domains

Implementation Phases

| Phase | Status | Description | |-------|--------|-------------| | 1 | ✅ Done | Core engine (Trie, Buffer, Config) | | 2 | ✅ Done | Language detection | | 3 | ✅ Done | Correction engine (Damerau-Levenshtein) | | 4 | ✅ Done | Platform hooks (Windows/Linux/macOS) | | 5 | ✅ Done | Performance testing (benchmarks, stress tests) | | 6 | In Progress | Documentation and deployment | | 7-10 | Pending | Real-world integration testing |

Contributing

Fork the repository
Create a feature branch
Run tests: cargo test
Run benchmarks: cargo run --bin stress-runner
Submit a pull request

License

TypeFix is open-source software licensed under the MIT OR Apache-2.0 dual license.

See LICENSE and LICENSE-APACHE for the full text.

About Kiroku Solutions

TypeFix is developed and maintained by Kiroku Solutions, a software consultancy focused on high-performance, low-latency text processing systems. We build open-source tools that make software more accessible to everyone.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Security: See SECURITY.md for reporting vulnerabilities

Contributing

We welcome contributions from the community! See CONTRIBUTING.md for guidelines.

Please read our Code of Conduct before participating.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

TypeFix

Features

Performance

Architecture

Data Flow

Quick Start

Basic Usage

Using Individual Components

Event Subscription

Building

Testing

Data Files

Dictionary Format

Stopwords Format

Common Errors Format

Configuration

Default Configuration

JSON Configuration

Project Structure

EHR and Legal Integration

Implementation Phases

Contributing

License

About Kiroku Solutions

Support

Contributing