@willwade/ppmpredictor

v0.0.13

Published

5 hours ago

Word and letter prediction library with configurable error tolerance using PPM (Prediction by Partial Matching)

0High
0Medium
0Low

prediction autocomplete word-prediction letter-prediction ppm language-model error-tolerant fuzzy-matching aac assistive-technology noisy-channel

PPM Predictor

A Node.js library for word and letter prediction with configurable error tolerance, built on PPM (Prediction by Partial Matching) language modeling. Original PPM JS code by Google.

🎮 Live Demo

Try the Interactive Demo - Full-featured demo with WorldAlphabets integration

24+ languages with real training data
Adaptive learning with word management
Keyboard layout selection and visualization
Fuzzy matching and keyboard-aware typos
Real-time statistics

Features (aka your regular emoji-filled bullet list)

🎯 Character-level prediction using PPM language model
📝 Word completion with lexicon support
🔧 Error-tolerant mode for handling typos and noisy input
⌨️ Keyboard-aware matching for proximity-based error correction
🔄 Adaptive learning that updates as users type
🎛️ Configurable tolerance levels for different use cases
🚀 Zero dependencies - pure JavaScript implementation
♿ AAC-focused - designed for assistive technology applications
🌍 Multi-language support - works with WorldAlphabets for 100+ languages
📚 Per-corpus lexicons - NEW in v0.0.7! Each corpus can have its own vocabulary

Installation

npm install @willwade/ppmpredictor

Platform Support

Node.js: Fully supported (v12+)
Browser: Fully supported (direct usage or bundled)
- Direct: <script src="dist/ppmpredictor.min.js"></script>
- CDN: <script src="https://unpkg.com/@willwade/ppmpredictor"></script>
- Bundled: Works with Webpack, Rollup, etc.

Quick Start

const { createPredictor } = require('@willwade/ppmpredictor');

// Create a predictor
const predictor = createPredictor({
  lexicon: ['hello', 'help', 'hero', 'world', 'word']
});

// Train on some text
predictor.train('The quick brown fox jumps over the lazy dog');

// Add context and predict next character
predictor.addToContext('The qui');
const charPredictions = predictor.predictNextCharacter();
console.log(charPredictions);
// [{ text: 'c', probability: 0.85 }, ...]

// Word completion
const wordPredictions = predictor.predictWordCompletion('hel');
console.log(wordPredictions);
// [{ text: 'hello', probability: 0.45 }, { text: 'help', probability: 0.35 }, ...]

// Next word prediction
const nextWord = predictor.predictNextWord('quick');
console.log(nextWord);
// [{ text: 'brown', probability: 1.0 }]

Usage

Training & Adaptive Learning

Train the predictor on text to learn character patterns and word sequences:

const { createPredictor } = require('@willwade/ppmpredictor');
const fs = require('fs');

// Option 1: With lexicon (recommended for word completion)
const lexicon = fs.readFileSync('lexicon.txt', 'utf-8')
  .split('\n')
  .filter(word => word.trim());

const predictor = createPredictor({ lexicon });

// Train from a string
predictor.train('The quick brown fox jumps over the lazy dog');

// Train from a file
const trainingText = fs.readFileSync('training.txt', 'utf-8');
predictor.train(trainingText);

// Option 2: Without lexicon (character-level only)
// Word completion will fall back to character-based prediction (slower)
const charOnlyPredictor = createPredictor();
charOnlyPredictor.train('The quick brown fox');
// Still works, but word completion is less efficient

// Adaptive mode - learns as user types
const adaptivePredictor = createPredictor({
  adaptive: true,
  lexicon: lexicon  // Include lexicon for best results
});
adaptivePredictor.addToContext('hello world');
// Model automatically updates with new patterns

How Training Works: The PPM (Prediction by Partial Matching) model learns character sequences and their probabilities. It also automatically tracks bigrams (word pairs) for next-word prediction. The more text you train on, the better the predictions become.
Lexicon vs No Lexicon:
With lexicon: Word completion uses fast dictionary lookup (recommended)
Without lexicon: Word completion falls back to character-level prediction (slower but still works)

Available training files (in parent project's data/ directory):

sample_training_text.txt - General text for training
sample_conversation.txt - AAC conversation examples
aac_lexicon_en_gb.txt - AAC vocabulary (2,180 words)

See examples/train-from-file.js for complete examples.

Next Character Prediction

Predict the most likely next character based on context:

const predictor = createPredictor();
predictor.train('The quick brown fox');

predictor.addToContext('The qui');
const predictions = predictor.predictNextCharacter();

console.log(predictions);
// [
//   { text: 'c', probability: 0.85 },
//   { text: 'e', probability: 0.10 },
//   { text: 't', probability: 0.05 }
// ]

Word Completion

Suggest word completions based on a partial word:

const predictor = createPredictor({
  lexicon: ['hello', 'help', 'hero', 'world', 'word', 'work']
});

const completions = predictor.predictWordCompletion('hel');

console.log(completions);
// [
//   { text: 'hello', probability: 0.45 },
//   { text: 'help', probability: 0.35 },
//   { text: 'hero', probability: 0.20 }
// ]

Loading lexicons from files:

const fs = require('fs');

// Load lexicon (one word per line)
const lexicon = fs.readFileSync('lexicon.txt', 'utf-8')
  .split('\n')
  .filter(word => word.trim());

const predictor = createPredictor({ lexicon });

Next Word Prediction

Predict the next word based on the previous word (using bigram tracking):

const predictor = createPredictor();

// Train on text - automatically learns word pairs
predictor.train('The quick brown fox. The quick red fox. The quick brown dog.');

// Predict next word after "quick"
const predictions = predictor.predictNextWord('quick');

console.log(predictions);
// [
//   { text: 'brown', probability: 0.67 },
//   { text: 'red', probability: 0.33 }
// ]

How Bigram Tracking Works: Bigrams are automatically learned when you call train() or addTrainingCorpus(). Each word pair's frequency is tracked, and predictions are based on relative frequencies. For example, if "quick brown" appears twice and "quick red" appears once, "brown" gets a 67% probability.

Bigram statistics:

const stats = predictor.getBigramStats();
console.log(stats);
// { uniqueBigrams: 150, totalBigrams: 500 }

// Export/import bigrams
const bigramData = predictor.exportBigrams();
fs.writeFileSync('bigrams.json', JSON.stringify(bigramData));

const imported = JSON.parse(fs.readFileSync('bigrams.json', 'utf-8'));
predictor.importBigrams(imported);

Error-Tolerant Prediction

Handle typos and noisy input with fuzzy matching:

const { createErrorTolerantPredictor } = require('@willwade/ppmpredictor');

const predictor = createErrorTolerantPredictor({
  lexicon: ['hello', 'help', 'world'],
  maxEditDistance: 2,      // Allow up to 2 character edits
  minSimilarity: 0.5       // Require at least 50% similarity
});

// Works even with typos!
const predictions = predictor.predictWordCompletion('helo'); // Missing 'l'

console.log(predictions);
// [
//   { text: 'hello', probability: 0.85, distance: 1, similarity: 0.8 },
//   { text: 'help', probability: 0.15, distance: 2, similarity: 0.5 }
// ]

Keyboard-Aware Matching

Use physical keyboard layout to better handle typos based on key proximity:

// Build a QWERTY adjacency map
const qwertyMap = {
  'q': ['w', 'a', 's'],
  'w': ['q', 'e', 'a', 's', 'd'],
  'e': ['w', 'r', 's', 'd', 'f'],
  'r': ['e', 't', 'd', 'f', 'g'],
  't': ['r', 'y', 'f', 'g', 'h'],
  'y': ['t', 'u', 'g', 'h', 'j'],
  'u': ['y', 'i', 'h', 'j', 'k'],
  'i': ['u', 'o', 'j', 'k', 'l'],
  'o': ['i', 'p', 'k', 'l'],
  'p': ['o', 'l'],
  'a': ['q', 'w', 's', 'z'],
  's': ['a', 'w', 'e', 'd', 'z', 'x'],
  'd': ['s', 'e', 'r', 'f', 'x', 'c'],
  'f': ['d', 'r', 't', 'g', 'c', 'v'],
  'g': ['f', 't', 'y', 'h', 'v', 'b'],
  'h': ['g', 'y', 'u', 'j', 'b', 'n'],
  'j': ['h', 'u', 'i', 'k', 'n', 'm'],
  'k': ['j', 'i', 'o', 'l', 'm'],
  'l': ['k', 'o', 'p'],
  'z': ['a', 's', 'x'],
  'x': ['z', 's', 'd', 'c'],
  'c': ['x', 'd', 'f', 'v'],
  'v': ['c', 'f', 'g', 'b'],
  'b': ['v', 'g', 'h', 'n'],
  'n': ['b', 'h', 'j', 'm'],
  'm': ['n', 'j', 'k']
};

const predictor = createPredictor({
  lexicon: ['hello', 'jello', 'yellow'],
  errorTolerant: true,
  keyboardAware: true,
  keyboardAdjacencyMap: qwertyMap
});

// 'h' and 'j' are adjacent on QWERTY, so 'jello' scores higher
const predictions = predictor.predictWordCompletion('jelo');
// 'jello' gets a better score than 'yellow' because 'j' and 'h' are close

WorldAlphabets integration (100+ keyboard layouts):

const { loadKeyboardLayout } = require('worldalphabets');

// Helper function to build adjacency map from WorldAlphabets layout
function buildAdjacencyMap(layout) {
  const adjacencyMap = {};

  layout.keys.forEach(key => {
    const char = key.legends.base;
    if (!char) return;

    const adjacent = layout.keys
      .filter(otherKey => {
        if (!otherKey.legends.base || otherKey.legends.base === char) return false;
        const rowDiff = Math.abs(key.row - otherKey.row);
        const colDiff = Math.abs(key.col - otherKey.col);
        // Adjacent if within 1 row and 1 column
        return rowDiff <= 1 && colDiff <= 1;
      })
      .map(k => k.legends.base);

    adjacencyMap[char] = adjacent;
  });

  return adjacencyMap;
}

// Load French AZERTY layout
const layout = await loadKeyboardLayout('fr-azerty');
const adjacencyMap = buildAdjacencyMap(layout);

const predictor = createPredictor({
  lexicon: frenchWords,
  errorTolerant: true,
  keyboardAware: true,
  keyboardAdjacencyMap: adjacencyMap
});

See the demo app for a complete WorldAlphabets integration example.

Advanced Usage

Managing Multiple Corpora

New in v0.0.7: Train and manage multiple domain-specific corpora for context-aware predictions.

const { createPredictor } = require('@willwade/ppmpredictor');

const predictor = createPredictor({
  lexicon: generalWords  // Default corpus
});

// Add domain-specific corpora
predictor.addTrainingCorpus('medical', medicalText, {
  description: 'Medical terminology',
  lexicon: medicalWords
});

predictor.addTrainingCorpus('work', workText, {
  description: 'Work vocabulary',
  lexicon: workWords
});

// Switch context based on user's activity
if (userIsAtWork) {
  predictor.useCorpora(['work', 'default']);
} else if (userIsAtDoctor) {
  predictor.useCorpora(['medical', 'default']);
} else {
  predictor.useAllCorpora();
}

// Manage corpora
const allCorpora = predictor.getCorpora();
const info = predictor.getCorpusInfo('medical');
predictor.removeCorpus('old_vocabulary');

How Predictions are Merged: When multiple corpora are active, PPMPredictor gets character predictions from each active corpus, averages the probabilities, and returns the top N predictions sorted by averaged probability.

Multilingual Support

New in v0.0.7: Each corpus can have its own lexicon, enabling true multilingual support!

const { createPredictor } = require('@willwade/ppmpredictor');
const { loadFrequencyList } = require('worldalphabets');

// Load frequency lists for different languages
const englishWords = (await loadFrequencyList('en')).tokens.slice(0, 5000);
const frenchWords = (await loadFrequencyList('fr')).tokens.slice(0, 5000);
const spanishWords = (await loadFrequencyList('es')).tokens.slice(0, 5000);

// Create predictor with English as default
const predictor = createPredictor({
  lexicon: englishWords
});

// Add French corpus with French lexicon
const frenchText = fs.readFileSync('data/french_training.txt', 'utf-8');
predictor.addTrainingCorpus('french', frenchText, {
  description: 'French language corpus',
  lexicon: frenchWords  // French-specific vocabulary
});

// Add Spanish corpus with Spanish lexicon
const spanishText = fs.readFileSync('data/spanish_training.txt', 'utf-8');
predictor.addTrainingCorpus('spanish', spanishText, {
  description: 'Spanish language corpus',
  lexicon: spanishWords  // Spanish-specific vocabulary
});

// Switch language based on user's selection
if (currentLanguage === 'french') {
  predictor.useCorpora(['french']);
  // Word completion now uses French lexicon only
} else if (currentLanguage === 'spanish') {
  predictor.useCorpora(['spanish']);
  // Word completion now uses Spanish lexicon only
} else {
  predictor.useCorpora(['default']);
  // Word completion uses English lexicon
}

// Or use multiple languages simultaneously (code-switching)
predictor.useCorpora(['french', 'spanish']);
// Word completion merges both French and Spanish lexicons

Domain-Specific Vocabularies

Different contexts require different vocabularies:

const predictor = createPredictor({
  lexicon: generalWords  // General vocabulary
});

// Medical AAC user
const medicalWords = ['acetaminophen', 'ibuprofen', 'diagnosis', 'prescription'];
predictor.addTrainingCorpus('medical', medicalText, {
  description: 'Medical terminology',
  lexicon: medicalWords
});

// Professional user
const workWords = ['meeting', 'deadline', 'project', 'presentation'];
predictor.addTrainingCorpus('work', workText, {
  description: 'Work-related vocabulary',
  lexicon: workWords
});

// Student
const academicWords = ['assignment', 'lecture', 'exam', 'research'];
predictor.addTrainingCorpus('academic', academicText, {
  description: 'Academic vocabulary',
  lexicon: academicWords
});

// Switch context based on user's activity
if (userIsAtWork) {
  predictor.useCorpora(['work', 'default']);
} else if (userIsAtDoctor) {
  predictor.useCorpora(['medical', 'default']);
} else if (userIsAtSchool) {
  predictor.useCorpora(['academic', 'default']);
}

API Reference

Factory Functions

`createPredictor(config)`

Creates a new predictor instance with the given configuration.

Parameters:

config (Object, optional): Configuration options
- maxOrder (number): Maximum context length for PPM (default: 5)
- errorTolerant (boolean): Enable error-tolerant mode (default: false)
- maxEditDistance (number): Maximum edit distance for fuzzy matching (default: 2)
- minSimilarity (number): Minimum similarity score 0-1 (default: 0.5)
- keyboardAware (boolean): Use keyboard-aware distance (default: false)
- keyboardAdjacencyMap (Object): Custom keyboard adjacency map
- caseSensitive (boolean): Case-sensitive matching (default: false)
- maxPredictions (number): Maximum predictions to return (default: 10)
- adaptive (boolean): Update model as text is entered (default: false)
- lexicon (Array): Optional word list for word prediction (default: [])

Returns: Predictor instance

const predictor = createPredictor({
  errorTolerant: true,
  maxEditDistance: 2,
  keyboardAware: true,
  adaptive: true,
  lexicon: ['hello', 'world']
});

`createStrictPredictor(config)`

Creates a predictor with strict mode (exact matching only).

const predictor = createStrictPredictor({ lexicon: words });

`createErrorTolerantPredictor(config)`

Creates a predictor with error-tolerant mode enabled.

const predictor = createErrorTolerantPredictor({
  lexicon: words,
  maxEditDistance: 2
});

Predictor Class

`train(text)`

Train the default corpus on text. For multi-corpus training, use addTrainingCorpus() instead.

Parameters:

text (string): Training text

predictor.train('The quick brown fox jumps over the lazy dog');

`addTrainingCorpus(corpusKey, text, options)`

Add a new training corpus with a unique identifier and optional corpus-specific lexicon.

Parameters:

corpusKey (string): Unique identifier for this corpus (e.g., 'medical', 'personal', 'french')
text (string): Training text for this corpus
options (object, optional):
- description (string): Human-readable description
- enabled (boolean): Whether corpus should be active (default: true)
- lexicon (string[]): NEW in v0.0.7 - Optional word list specific to this corpus

// Add medical terminology corpus with medical lexicon
predictor.addTrainingCorpus('medical', medicalText, {
  description: 'Medical terminology and phrases',
  lexicon: medicalWords
});

// Add French corpus with French lexicon (multilingual support)
predictor.addTrainingCorpus('french', frenchText, {
  description: 'French language corpus',
  lexicon: frenchWords
});

`useCorpora(corpusKeys)`

Enable specific training corpora for predictions. Disables all other corpora.

Parameters:

corpusKeys (string | string[]): Single corpus key or array of corpus keys

// Use only medical corpus
predictor.useCorpora('medical');

// Use medical and personal corpora
predictor.useCorpora(['medical', 'personal']);

`useAllCorpora()`

Enable all loaded training corpora for predictions.

predictor.useAllCorpora();

`getCorpora(onlyEnabled)`

Get list of available corpus keys.

Parameters:

onlyEnabled (boolean, optional): If true, only return enabled corpora

Returns: Array of corpus keys (strings)

const allCorpora = predictor.getCorpora();
// ['default', 'medical', 'personal']

const activeCorpora = predictor.getCorpora(true);
// ['medical', 'personal']

`getCorpusInfo(corpusKey)`

Get information about a specific corpus.

Parameters:

corpusKey (string): Corpus identifier

Returns: Object with corpus information

const info = predictor.getCorpusInfo('medical');
// {
//   key: 'medical',
//   description: 'Medical terminology',
//   enabled: true
// }

`removeCorpus(corpusKey)`

Remove a training corpus. Cannot remove the 'default' corpus.

Parameters:

corpusKey (string): Corpus identifier to remove

predictor.removeCorpus('old_vocabulary');

`addToContext(text)`

Add text to the prediction context.

Parameters:

text (string): Text to add to context

predictor.addToContext('The quick brown');

`resetContext()`

Reset the prediction context to empty.

predictor.resetContext();

`predictNextCharacter(maxPredictions)`

Predict the next character based on current context.

Parameters:

maxPredictions (number, optional): Maximum predictions to return

Returns: Array of predictions with text and probability

predictor.addToContext('The qui');
const predictions = predictor.predictNextCharacter();
// [{ text: 'c', probability: 0.85 }, ...]

`predictWordCompletion(partialWord, precedingContext, maxPredictions)`

Predict word completions based on partial word.

Parameters:

partialWord (string): Partial word to complete
precedingContext (string, optional): Context before the word
maxPredictions (number, optional): Maximum predictions to return

Returns: Array of predictions with text and probability

const predictions = predictor.predictWordCompletion('hel');
// [{ text: 'hello', probability: 0.45 }, ...]

`predictNextWord(currentWord, maxPredictions)`

Predict next word based on learned bigram frequencies.

Parameters:

currentWord (string): The current/last word typed
maxPredictions (number, optional): Maximum predictions to return (default: 10)

Returns: Array of predictions with text and probability

const predictions = predictor.predictNextWord('quick');
// [{ text: 'brown', probability: 1.0 }]

`exportBigrams()`

Export learned bigrams as text for saving/persistence.

Returns: String with bigrams in format "word1 word2 count" (one per line)

const bigramText = predictor.exportBigrams();
// "quick brown 5\nbrown fox 5\n..."

// Save to file (Node.js)
fs.writeFileSync('bigrams.txt', bigramText);

// Save to localStorage (browser)
localStorage.setItem('bigrams', bigramText);

`importBigrams(bigramText)`

Import bigrams from text. Adds to existing bigrams rather than replacing.

Parameters:

bigramText (string): Bigrams in text format

// Load from file (Node.js)
const bigramText = fs.readFileSync('bigrams.txt', 'utf-8');
predictor.importBigrams(bigramText);

// Load from localStorage (browser)
const saved = localStorage.getItem('bigrams');
if (saved) {
  predictor.importBigrams(saved);
}

`clearBigrams()`

Clear all learned bigrams.

predictor.clearBigrams();

`getBigramStats()`

Get statistics about learned bigrams.

Returns: Object with uniqueBigrams and totalBigrams

const stats = predictor.getBigramStats();
console.log(`Learned ${stats.uniqueBigrams} unique word pairs`);
console.log(`Total occurrences: ${stats.totalBigrams}`);

`updateConfig(newConfig)`

Update predictor configuration at runtime.

Parameters:

newConfig (object): Configuration options to update

predictor.updateConfig({
  errorTolerant: true,
  maxEditDistance: 3,
  lexicon: newWordList
});

Configuration Guide

Strict Mode vs Error-Tolerant Mode

Strict Mode (default):

Exact prefix matching only
Fast and predictable
Best for: Clean input, autocomplete

const predictor = createStrictPredictor({
  lexicon: words
});

Error-Tolerant Mode:

Fuzzy matching with edit distance
Handles typos and misspellings
Best for: Noisy input, AAC, accessibility

const predictor = createErrorTolerantPredictor({
  lexicon: words,
  maxEditDistance: 2,
  minSimilarity: 0.6
});

Tolerance Levels

Adjust tolerance based on your use case:

// Strict - only minor typos
const strict = createPredictor({
  errorTolerant: true,
  maxEditDistance: 1,
  minSimilarity: 0.8
});

// Moderate - common typos (recommended)
const moderate = createPredictor({
  errorTolerant: true,
  maxEditDistance: 2,
  minSimilarity: 0.6
});

// Lenient - significant errors
const lenient = createPredictor({
  errorTolerant: true,
  maxEditDistance: 3,
  minSimilarity: 0.4
});

Keyboard-Aware Mode

Enable for better handling of keyboard proximity errors:

const predictor = createErrorTolerantPredictor({
  keyboardAware: true,  // 'h' and 'j' are adjacent, lower cost
  keyboardAdjacencyMap: {  // Optional: override QWERTY layout
    a: ['q', 's', 'z'],
    b: ['v', 'g', 'h', 'n'],
    // ... define adjacency for all keys
  }
});

Benefits:

"helo" → "hello" scores better than "helo" → "jello" (even though both are 1 edit)
Physical proximity matters: 'h' and 'j' are adjacent, so lower error cost

Adaptive Mode

Let the model learn from user input in real-time:

const predictor = createPredictor({
  adaptive: true  // Model updates as user types
});

// As user types, the model learns their patterns
predictor.addToContext('hello world');
// Model now knows "hello" is often followed by "world"

Use cases:

Personalized prediction
Learning user's writing style
Adapting to domain-specific vocabulary

Examples

The library includes several examples:

Run Examples

# Basic character prediction
npm run example:basic

# Error-tolerant prediction with typos
npm run example:error-tolerant

# Word completion with lexicon
npm run example:word-completion

# Training from files
npm run example:train-from-file

# Bigram tracking
npm run example:bigram-tracking

Example Files

examples/basic-prediction.js - Character prediction basics
examples/error-tolerant.js - Handling typos and noisy input
examples/word-completion.js - Word completion with lexicon
examples/train-from-file.js - Loading training data from files
examples/bigram-tracking.js - Next-word prediction with bigrams

Use Cases

AAC (Augmentative and Alternative Communication)

Perfect for users with motor impairments who may have difficulty with precise typing:

const { createErrorTolerantPredictor } = require('@willwade/ppmpredictor');

const predictor = createErrorTolerantPredictor({
  lexicon: aacVocabulary,
  keyboardAware: true,      // Handle proximity errors
  maxEditDistance: 2,       // Allow typos
  adaptive: true,           // Learn user's patterns
  maxPredictions: 5         // Show top 5 suggestions
});

// Train on user's common phrases
predictor.train('I want to go to the park. I need help. Thank you.');

// Predict with error tolerance
const predictions = predictor.predictWordCompletion('hlp');
// Returns: [{ text: 'help', probability: 0.85 }, ...]

Text Input Enhancement

Improve any text input with intelligent prediction:

const predictor = createPredictor({
  adaptive: true,
  maxPredictions: 5
});

// Train on user's writing style
predictor.train(userHistoricalText);

// Provide real-time predictions
inputField.addEventListener('input', (e) => {
  const text = e.target.value;
  predictor.addToContext(text);

  // Character prediction
  const charPredictions = predictor.predictNextCharacter();

  // Word completion
  const lastWord = text.split(/\s+/).pop();
  const wordPredictions = predictor.predictWordCompletion(lastWord);

  showSuggestions(charPredictions, wordPredictions);
});

Multilingual Communication

Support users who communicate in multiple languages:

const { loadFrequencyList } = require('worldalphabets');

const englishWords = (await loadFrequencyList('en')).tokens.slice(0, 5000);
const spanishWords = (await loadFrequencyList('es')).tokens.slice(0, 5000);

const predictor = createPredictor({ lexicon: englishWords });

predictor.addTrainingCorpus('spanish', spanishText, {
  lexicon: spanishWords
});

// User can switch languages
languageSelector.addEventListener('change', (e) => {
  predictor.useCorpora([e.target.value]);
});

Medical/Professional Terminology

Domain-specific vocabulary for specialized users:

const medicalWords = [
  'acetaminophen', 'ibuprofen', 'prescription',
  'diagnosis', 'symptoms', 'treatment'
];

const predictor = createPredictor({
  lexicon: generalWords
});

predictor.addTrainingCorpus('medical', medicalText, {
  description: 'Medical terminology',
  lexicon: medicalWords
});

// At doctor's office
predictor.useCorpora(['medical', 'default']);

Performance Considerations

Memory: PPM model size grows with training data
- ~1-5 MB for typical AAC vocabulary
- Scales linearly with training text size
Speed: Character prediction is very fast (< 1ms)
Training: One-time cost, can be done at initialization
Lexicon: Larger lexicons increase word completion time
- 1,000 words: < 5ms
- 10,000 words: < 20ms
- 50,000 words: < 100ms

Optimization Tips

Limit lexicon size to relevant words (5,000-10,000 is usually sufficient)
Use appropriate maxOrder (5 is usually sufficient, higher = more memory)
Train once at initialization, not per-prediction
Cache predictions for repeated queries
Use corpora to separate vocabularies instead of one huge lexicon

Testing

# Run all tests
npm test

# Run with coverage
npm run test:coverage

All 46 tests passing! ✓

License

Apache License 2.0 - see LICENSE file for details.

Credits

PPM Implementation: Based on Google Research's JavaScript PPM implementation
Original Research: Cleary & Witten (1984), Dasher project (Cambridge)
WorldAlphabets Integration: Frequency lists and keyboard layouts for 100+ languages
Author: Will Wade

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Development

# Install dependencies
npm install

# Run tests
npm test

# Build
npm run build

# Run demo locally
npm run dev

Related Projects

Predictionary - Dictionary-based prediction library from the Asterics project. Just note ours does character level prediction and with fuzzy matching around keyboard layouts.
Google JSLM - Original JS language model code by Google team
pylm - Python PPM implementation
Dasher - Original AAC application using PPM
WorldAlphabets - Frequency lists and keyboard layouts for 100+ languages

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

PPM Predictor

🎮 Live Demo

Features (aka your regular emoji-filled bullet list)

Table of Contents

Installation

Platform Support

Quick Start

Usage

Training & Adaptive Learning

Next Character Prediction

Word Completion

Next Word Prediction

Error-Tolerant Prediction

Keyboard-Aware Matching

Advanced Usage

Managing Multiple Corpora

Multilingual Support

Domain-Specific Vocabularies

API Reference

Factory Functions

createPredictor(config)

createStrictPredictor(config)

createErrorTolerantPredictor(config)

Predictor Class

train(text)

addTrainingCorpus(corpusKey, text, options)

useCorpora(corpusKeys)

useAllCorpora()

getCorpora(onlyEnabled)

getCorpusInfo(corpusKey)

removeCorpus(corpusKey)

addToContext(text)

resetContext()

predictNextCharacter(maxPredictions)

predictWordCompletion(partialWord, precedingContext, maxPredictions)

predictNextWord(currentWord, maxPredictions)

exportBigrams()

importBigrams(bigramText)

clearBigrams()

getBigramStats()

updateConfig(newConfig)

Configuration Guide

Strict Mode vs Error-Tolerant Mode

Tolerance Levels

Keyboard-Aware Mode

Adaptive Mode

Examples

Run Examples

Example Files

Use Cases

AAC (Augmentative and Alternative Communication)

Text Input Enhancement

Multilingual Communication

Medical/Professional Terminology

Performance Considerations

Optimization Tips

Testing

License

Credits

Contributing

Development

Links

Related Projects

`createPredictor(config)`

`createStrictPredictor(config)`

`createErrorTolerantPredictor(config)`

`train(text)`

`addTrainingCorpus(corpusKey, text, options)`

`useCorpora(corpusKeys)`

`useAllCorpora()`

`getCorpora(onlyEnabled)`

`getCorpusInfo(corpusKey)`

`removeCorpus(corpusKey)`

`addToContext(text)`

`resetContext()`

`predictNextCharacter(maxPredictions)`

`predictWordCompletion(partialWord, precedingContext, maxPredictions)`

`predictNextWord(currentWord, maxPredictions)`

`exportBigrams()`

`importBigrams(bigramText)`

`clearBigrams()`

`getBigramStats()`

`updateConfig(newConfig)`