@willwade/ppmpredictor
v0.0.13
Published
Word and letter prediction library with configurable error tolerance using PPM (Prediction by Partial Matching)
Maintainers
Readme
PPM Predictor
A Node.js library for word and letter prediction with configurable error tolerance, built on PPM (Prediction by Partial Matching) language modeling. Original PPM JS code by Google.
🎮 Live Demo
Try the Interactive Demo - Full-featured demo with WorldAlphabets integration
- 24+ languages with real training data
- Adaptive learning with word management
- Keyboard layout selection and visualization
- Fuzzy matching and keyboard-aware typos
- Real-time statistics
Features (aka your regular emoji-filled bullet list)
- 🎯 Character-level prediction using PPM language model
- 📝 Word completion with lexicon support
- 🔧 Error-tolerant mode for handling typos and noisy input
- ⌨️ Keyboard-aware matching for proximity-based error correction
- 🔄 Adaptive learning that updates as users type
- 🎛️ Configurable tolerance levels for different use cases
- 🚀 Zero dependencies - pure JavaScript implementation
- ♿ AAC-focused - designed for assistive technology applications
- 🌍 Multi-language support - works with WorldAlphabets for 100+ languages
- 📚 Per-corpus lexicons - NEW in v0.0.7! Each corpus can have its own vocabulary
Table of Contents
- Installation
- Quick Start
- Usage
- Advanced Usage
- API Reference
- Configuration Guide
- Examples
- Use Cases
- Performance Considerations
- Testing
- License
- Credits
- Contributing
Installation
npm install @willwade/ppmpredictorPlatform Support
- Node.js: Fully supported (v12+)
- Browser: Fully supported (direct usage or bundled)
- Direct:
<script src="dist/ppmpredictor.min.js"></script> - CDN:
<script src="https://unpkg.com/@willwade/ppmpredictor"></script> - Bundled: Works with Webpack, Rollup, etc.
- Direct:
Quick Start
const { createPredictor } = require('@willwade/ppmpredictor');
// Create a predictor
const predictor = createPredictor({
lexicon: ['hello', 'help', 'hero', 'world', 'word']
});
// Train on some text
predictor.train('The quick brown fox jumps over the lazy dog');
// Add context and predict next character
predictor.addToContext('The qui');
const charPredictions = predictor.predictNextCharacter();
console.log(charPredictions);
// [{ text: 'c', probability: 0.85 }, ...]
// Word completion
const wordPredictions = predictor.predictWordCompletion('hel');
console.log(wordPredictions);
// [{ text: 'hello', probability: 0.45 }, { text: 'help', probability: 0.35 }, ...]
// Next word prediction
const nextWord = predictor.predictNextWord('quick');
console.log(nextWord);
// [{ text: 'brown', probability: 1.0 }]Usage
Training & Adaptive Learning
Train the predictor on text to learn character patterns and word sequences:
const { createPredictor } = require('@willwade/ppmpredictor');
const fs = require('fs');
// Option 1: With lexicon (recommended for word completion)
const lexicon = fs.readFileSync('lexicon.txt', 'utf-8')
.split('\n')
.filter(word => word.trim());
const predictor = createPredictor({ lexicon });
// Train from a string
predictor.train('The quick brown fox jumps over the lazy dog');
// Train from a file
const trainingText = fs.readFileSync('training.txt', 'utf-8');
predictor.train(trainingText);
// Option 2: Without lexicon (character-level only)
// Word completion will fall back to character-based prediction (slower)
const charOnlyPredictor = createPredictor();
charOnlyPredictor.train('The quick brown fox');
// Still works, but word completion is less efficient
// Adaptive mode - learns as user types
const adaptivePredictor = createPredictor({
adaptive: true,
lexicon: lexicon // Include lexicon for best results
});
adaptivePredictor.addToContext('hello world');
// Model automatically updates with new patternsHow Training Works: The PPM (Prediction by Partial Matching) model learns character sequences and their probabilities. It also automatically tracks bigrams (word pairs) for next-word prediction. The more text you train on, the better the predictions become.
Lexicon vs No Lexicon:
- With lexicon: Word completion uses fast dictionary lookup (recommended)
- Without lexicon: Word completion falls back to character-level prediction (slower but still works)
Available training files (in parent project's data/ directory):
sample_training_text.txt- General text for trainingsample_conversation.txt- AAC conversation examplesaac_lexicon_en_gb.txt- AAC vocabulary (2,180 words)
See examples/train-from-file.js for complete examples.
Next Character Prediction
Predict the most likely next character based on context:
const predictor = createPredictor();
predictor.train('The quick brown fox');
predictor.addToContext('The qui');
const predictions = predictor.predictNextCharacter();
console.log(predictions);
// [
// { text: 'c', probability: 0.85 },
// { text: 'e', probability: 0.10 },
// { text: 't', probability: 0.05 }
// ]Word Completion
Suggest word completions based on a partial word:
const predictor = createPredictor({
lexicon: ['hello', 'help', 'hero', 'world', 'word', 'work']
});
const completions = predictor.predictWordCompletion('hel');
console.log(completions);
// [
// { text: 'hello', probability: 0.45 },
// { text: 'help', probability: 0.35 },
// { text: 'hero', probability: 0.20 }
// ]Loading lexicons from files:
const fs = require('fs');
// Load lexicon (one word per line)
const lexicon = fs.readFileSync('lexicon.txt', 'utf-8')
.split('\n')
.filter(word => word.trim());
const predictor = createPredictor({ lexicon });Next Word Prediction
Predict the next word based on the previous word (using bigram tracking):
const predictor = createPredictor();
// Train on text - automatically learns word pairs
predictor.train('The quick brown fox. The quick red fox. The quick brown dog.');
// Predict next word after "quick"
const predictions = predictor.predictNextWord('quick');
console.log(predictions);
// [
// { text: 'brown', probability: 0.67 },
// { text: 'red', probability: 0.33 }
// ]How Bigram Tracking Works: Bigrams are automatically learned when you call
train()oraddTrainingCorpus(). Each word pair's frequency is tracked, and predictions are based on relative frequencies. For example, if "quick brown" appears twice and "quick red" appears once, "brown" gets a 67% probability.
Bigram statistics:
const stats = predictor.getBigramStats();
console.log(stats);
// { uniqueBigrams: 150, totalBigrams: 500 }
// Export/import bigrams
const bigramData = predictor.exportBigrams();
fs.writeFileSync('bigrams.json', JSON.stringify(bigramData));
const imported = JSON.parse(fs.readFileSync('bigrams.json', 'utf-8'));
predictor.importBigrams(imported);Error-Tolerant Prediction
Handle typos and noisy input with fuzzy matching:
const { createErrorTolerantPredictor } = require('@willwade/ppmpredictor');
const predictor = createErrorTolerantPredictor({
lexicon: ['hello', 'help', 'world'],
maxEditDistance: 2, // Allow up to 2 character edits
minSimilarity: 0.5 // Require at least 50% similarity
});
// Works even with typos!
const predictions = predictor.predictWordCompletion('helo'); // Missing 'l'
console.log(predictions);
// [
// { text: 'hello', probability: 0.85, distance: 1, similarity: 0.8 },
// { text: 'help', probability: 0.15, distance: 2, similarity: 0.5 }
// ]Keyboard-Aware Matching
Use physical keyboard layout to better handle typos based on key proximity:
// Build a QWERTY adjacency map
const qwertyMap = {
'q': ['w', 'a', 's'],
'w': ['q', 'e', 'a', 's', 'd'],
'e': ['w', 'r', 's', 'd', 'f'],
'r': ['e', 't', 'd', 'f', 'g'],
't': ['r', 'y', 'f', 'g', 'h'],
'y': ['t', 'u', 'g', 'h', 'j'],
'u': ['y', 'i', 'h', 'j', 'k'],
'i': ['u', 'o', 'j', 'k', 'l'],
'o': ['i', 'p', 'k', 'l'],
'p': ['o', 'l'],
'a': ['q', 'w', 's', 'z'],
's': ['a', 'w', 'e', 'd', 'z', 'x'],
'd': ['s', 'e', 'r', 'f', 'x', 'c'],
'f': ['d', 'r', 't', 'g', 'c', 'v'],
'g': ['f', 't', 'y', 'h', 'v', 'b'],
'h': ['g', 'y', 'u', 'j', 'b', 'n'],
'j': ['h', 'u', 'i', 'k', 'n', 'm'],
'k': ['j', 'i', 'o', 'l', 'm'],
'l': ['k', 'o', 'p'],
'z': ['a', 's', 'x'],
'x': ['z', 's', 'd', 'c'],
'c': ['x', 'd', 'f', 'v'],
'v': ['c', 'f', 'g', 'b'],
'b': ['v', 'g', 'h', 'n'],
'n': ['b', 'h', 'j', 'm'],
'm': ['n', 'j', 'k']
};
const predictor = createPredictor({
lexicon: ['hello', 'jello', 'yellow'],
errorTolerant: true,
keyboardAware: true,
keyboardAdjacencyMap: qwertyMap
});
// 'h' and 'j' are adjacent on QWERTY, so 'jello' scores higher
const predictions = predictor.predictWordCompletion('jelo');
// 'jello' gets a better score than 'yellow' because 'j' and 'h' are closeWorldAlphabets integration (100+ keyboard layouts):
const { loadKeyboardLayout } = require('worldalphabets');
// Helper function to build adjacency map from WorldAlphabets layout
function buildAdjacencyMap(layout) {
const adjacencyMap = {};
layout.keys.forEach(key => {
const char = key.legends.base;
if (!char) return;
const adjacent = layout.keys
.filter(otherKey => {
if (!otherKey.legends.base || otherKey.legends.base === char) return false;
const rowDiff = Math.abs(key.row - otherKey.row);
const colDiff = Math.abs(key.col - otherKey.col);
// Adjacent if within 1 row and 1 column
return rowDiff <= 1 && colDiff <= 1;
})
.map(k => k.legends.base);
adjacencyMap[char] = adjacent;
});
return adjacencyMap;
}
// Load French AZERTY layout
const layout = await loadKeyboardLayout('fr-azerty');
const adjacencyMap = buildAdjacencyMap(layout);
const predictor = createPredictor({
lexicon: frenchWords,
errorTolerant: true,
keyboardAware: true,
keyboardAdjacencyMap: adjacencyMap
});See the demo app for a complete WorldAlphabets integration example.
Advanced Usage
Managing Multiple Corpora
New in v0.0.7: Train and manage multiple domain-specific corpora for context-aware predictions.
const { createPredictor } = require('@willwade/ppmpredictor');
const predictor = createPredictor({
lexicon: generalWords // Default corpus
});
// Add domain-specific corpora
predictor.addTrainingCorpus('medical', medicalText, {
description: 'Medical terminology',
lexicon: medicalWords
});
predictor.addTrainingCorpus('work', workText, {
description: 'Work vocabulary',
lexicon: workWords
});
// Switch context based on user's activity
if (userIsAtWork) {
predictor.useCorpora(['work', 'default']);
} else if (userIsAtDoctor) {
predictor.useCorpora(['medical', 'default']);
} else {
predictor.useAllCorpora();
}
// Manage corpora
const allCorpora = predictor.getCorpora();
const info = predictor.getCorpusInfo('medical');
predictor.removeCorpus('old_vocabulary');How Predictions are Merged: When multiple corpora are active, PPMPredictor gets character predictions from each active corpus, averages the probabilities, and returns the top N predictions sorted by averaged probability.
Multilingual Support
New in v0.0.7: Each corpus can have its own lexicon, enabling true multilingual support!
const { createPredictor } = require('@willwade/ppmpredictor');
const { loadFrequencyList } = require('worldalphabets');
// Load frequency lists for different languages
const englishWords = (await loadFrequencyList('en')).tokens.slice(0, 5000);
const frenchWords = (await loadFrequencyList('fr')).tokens.slice(0, 5000);
const spanishWords = (await loadFrequencyList('es')).tokens.slice(0, 5000);
// Create predictor with English as default
const predictor = createPredictor({
lexicon: englishWords
});
// Add French corpus with French lexicon
const frenchText = fs.readFileSync('data/french_training.txt', 'utf-8');
predictor.addTrainingCorpus('french', frenchText, {
description: 'French language corpus',
lexicon: frenchWords // French-specific vocabulary
});
// Add Spanish corpus with Spanish lexicon
const spanishText = fs.readFileSync('data/spanish_training.txt', 'utf-8');
predictor.addTrainingCorpus('spanish', spanishText, {
description: 'Spanish language corpus',
lexicon: spanishWords // Spanish-specific vocabulary
});
// Switch language based on user's selection
if (currentLanguage === 'french') {
predictor.useCorpora(['french']);
// Word completion now uses French lexicon only
} else if (currentLanguage === 'spanish') {
predictor.useCorpora(['spanish']);
// Word completion now uses Spanish lexicon only
} else {
predictor.useCorpora(['default']);
// Word completion uses English lexicon
}
// Or use multiple languages simultaneously (code-switching)
predictor.useCorpora(['french', 'spanish']);
// Word completion merges both French and Spanish lexiconsDomain-Specific Vocabularies
Different contexts require different vocabularies:
const predictor = createPredictor({
lexicon: generalWords // General vocabulary
});
// Medical AAC user
const medicalWords = ['acetaminophen', 'ibuprofen', 'diagnosis', 'prescription'];
predictor.addTrainingCorpus('medical', medicalText, {
description: 'Medical terminology',
lexicon: medicalWords
});
// Professional user
const workWords = ['meeting', 'deadline', 'project', 'presentation'];
predictor.addTrainingCorpus('work', workText, {
description: 'Work-related vocabulary',
lexicon: workWords
});
// Student
const academicWords = ['assignment', 'lecture', 'exam', 'research'];
predictor.addTrainingCorpus('academic', academicText, {
description: 'Academic vocabulary',
lexicon: academicWords
});
// Switch context based on user's activity
if (userIsAtWork) {
predictor.useCorpora(['work', 'default']);
} else if (userIsAtDoctor) {
predictor.useCorpora(['medical', 'default']);
} else if (userIsAtSchool) {
predictor.useCorpora(['academic', 'default']);
}API Reference
Factory Functions
createPredictor(config)
Creates a new predictor instance with the given configuration.
Parameters:
config(Object, optional): Configuration optionsmaxOrder(number): Maximum context length for PPM (default: 5)errorTolerant(boolean): Enable error-tolerant mode (default: false)maxEditDistance(number): Maximum edit distance for fuzzy matching (default: 2)minSimilarity(number): Minimum similarity score 0-1 (default: 0.5)keyboardAware(boolean): Use keyboard-aware distance (default: false)keyboardAdjacencyMap(Object): Custom keyboard adjacency mapcaseSensitive(boolean): Case-sensitive matching (default: false)maxPredictions(number): Maximum predictions to return (default: 10)adaptive(boolean): Update model as text is entered (default: false)lexicon(Array): Optional word list for word prediction (default: [])
Returns: Predictor instance
const predictor = createPredictor({
errorTolerant: true,
maxEditDistance: 2,
keyboardAware: true,
adaptive: true,
lexicon: ['hello', 'world']
});createStrictPredictor(config)
Creates a predictor with strict mode (exact matching only).
const predictor = createStrictPredictor({ lexicon: words });createErrorTolerantPredictor(config)
Creates a predictor with error-tolerant mode enabled.
const predictor = createErrorTolerantPredictor({
lexicon: words,
maxEditDistance: 2
});Predictor Class
train(text)
Train the default corpus on text. For multi-corpus training, use addTrainingCorpus() instead.
Parameters:
text(string): Training text
predictor.train('The quick brown fox jumps over the lazy dog');addTrainingCorpus(corpusKey, text, options)
Add a new training corpus with a unique identifier and optional corpus-specific lexicon.
Parameters:
corpusKey(string): Unique identifier for this corpus (e.g., 'medical', 'personal', 'french')text(string): Training text for this corpusoptions(object, optional):description(string): Human-readable descriptionenabled(boolean): Whether corpus should be active (default: true)lexicon(string[]): NEW in v0.0.7 - Optional word list specific to this corpus
// Add medical terminology corpus with medical lexicon
predictor.addTrainingCorpus('medical', medicalText, {
description: 'Medical terminology and phrases',
lexicon: medicalWords
});
// Add French corpus with French lexicon (multilingual support)
predictor.addTrainingCorpus('french', frenchText, {
description: 'French language corpus',
lexicon: frenchWords
});useCorpora(corpusKeys)
Enable specific training corpora for predictions. Disables all other corpora.
Parameters:
corpusKeys(string | string[]): Single corpus key or array of corpus keys
// Use only medical corpus
predictor.useCorpora('medical');
// Use medical and personal corpora
predictor.useCorpora(['medical', 'personal']);useAllCorpora()
Enable all loaded training corpora for predictions.
predictor.useAllCorpora();getCorpora(onlyEnabled)
Get list of available corpus keys.
Parameters:
onlyEnabled(boolean, optional): If true, only return enabled corpora
Returns: Array of corpus keys (strings)
const allCorpora = predictor.getCorpora();
// ['default', 'medical', 'personal']
const activeCorpora = predictor.getCorpora(true);
// ['medical', 'personal']getCorpusInfo(corpusKey)
Get information about a specific corpus.
Parameters:
corpusKey(string): Corpus identifier
Returns: Object with corpus information
const info = predictor.getCorpusInfo('medical');
// {
// key: 'medical',
// description: 'Medical terminology',
// enabled: true
// }removeCorpus(corpusKey)
Remove a training corpus. Cannot remove the 'default' corpus.
Parameters:
corpusKey(string): Corpus identifier to remove
predictor.removeCorpus('old_vocabulary');addToContext(text)
Add text to the prediction context.
Parameters:
text(string): Text to add to context
predictor.addToContext('The quick brown');resetContext()
Reset the prediction context to empty.
predictor.resetContext();predictNextCharacter(maxPredictions)
Predict the next character based on current context.
Parameters:
maxPredictions(number, optional): Maximum predictions to return
Returns: Array of predictions with text and probability
predictor.addToContext('The qui');
const predictions = predictor.predictNextCharacter();
// [{ text: 'c', probability: 0.85 }, ...]predictWordCompletion(partialWord, precedingContext, maxPredictions)
Predict word completions based on partial word.
Parameters:
partialWord(string): Partial word to completeprecedingContext(string, optional): Context before the wordmaxPredictions(number, optional): Maximum predictions to return
Returns: Array of predictions with text and probability
const predictions = predictor.predictWordCompletion('hel');
// [{ text: 'hello', probability: 0.45 }, ...]predictNextWord(currentWord, maxPredictions)
Predict next word based on learned bigram frequencies.
Parameters:
currentWord(string): The current/last word typedmaxPredictions(number, optional): Maximum predictions to return (default: 10)
Returns: Array of predictions with text and probability
const predictions = predictor.predictNextWord('quick');
// [{ text: 'brown', probability: 1.0 }]exportBigrams()
Export learned bigrams as text for saving/persistence.
Returns: String with bigrams in format "word1 word2 count" (one per line)
const bigramText = predictor.exportBigrams();
// "quick brown 5\nbrown fox 5\n..."
// Save to file (Node.js)
fs.writeFileSync('bigrams.txt', bigramText);
// Save to localStorage (browser)
localStorage.setItem('bigrams', bigramText);importBigrams(bigramText)
Import bigrams from text. Adds to existing bigrams rather than replacing.
Parameters:
bigramText(string): Bigrams in text format
// Load from file (Node.js)
const bigramText = fs.readFileSync('bigrams.txt', 'utf-8');
predictor.importBigrams(bigramText);
// Load from localStorage (browser)
const saved = localStorage.getItem('bigrams');
if (saved) {
predictor.importBigrams(saved);
}clearBigrams()
Clear all learned bigrams.
predictor.clearBigrams();getBigramStats()
Get statistics about learned bigrams.
Returns: Object with uniqueBigrams and totalBigrams
const stats = predictor.getBigramStats();
console.log(`Learned ${stats.uniqueBigrams} unique word pairs`);
console.log(`Total occurrences: ${stats.totalBigrams}`);updateConfig(newConfig)
Update predictor configuration at runtime.
Parameters:
newConfig(object): Configuration options to update
predictor.updateConfig({
errorTolerant: true,
maxEditDistance: 3,
lexicon: newWordList
});Configuration Guide
Strict Mode vs Error-Tolerant Mode
Strict Mode (default):
- Exact prefix matching only
- Fast and predictable
- Best for: Clean input, autocomplete
const predictor = createStrictPredictor({
lexicon: words
});Error-Tolerant Mode:
- Fuzzy matching with edit distance
- Handles typos and misspellings
- Best for: Noisy input, AAC, accessibility
const predictor = createErrorTolerantPredictor({
lexicon: words,
maxEditDistance: 2,
minSimilarity: 0.6
});Tolerance Levels
Adjust tolerance based on your use case:
// Strict - only minor typos
const strict = createPredictor({
errorTolerant: true,
maxEditDistance: 1,
minSimilarity: 0.8
});
// Moderate - common typos (recommended)
const moderate = createPredictor({
errorTolerant: true,
maxEditDistance: 2,
minSimilarity: 0.6
});
// Lenient - significant errors
const lenient = createPredictor({
errorTolerant: true,
maxEditDistance: 3,
minSimilarity: 0.4
});Keyboard-Aware Mode
Enable for better handling of keyboard proximity errors:
const predictor = createErrorTolerantPredictor({
keyboardAware: true, // 'h' and 'j' are adjacent, lower cost
keyboardAdjacencyMap: { // Optional: override QWERTY layout
a: ['q', 's', 'z'],
b: ['v', 'g', 'h', 'n'],
// ... define adjacency for all keys
}
});Benefits:
- "helo" → "hello" scores better than "helo" → "jello" (even though both are 1 edit)
- Physical proximity matters: 'h' and 'j' are adjacent, so lower error cost
Adaptive Mode
Let the model learn from user input in real-time:
const predictor = createPredictor({
adaptive: true // Model updates as user types
});
// As user types, the model learns their patterns
predictor.addToContext('hello world');
// Model now knows "hello" is often followed by "world"Use cases:
- Personalized prediction
- Learning user's writing style
- Adapting to domain-specific vocabulary
Examples
The library includes several examples:
Run Examples
# Basic character prediction
npm run example:basic
# Error-tolerant prediction with typos
npm run example:error-tolerant
# Word completion with lexicon
npm run example:word-completion
# Training from files
npm run example:train-from-file
# Bigram tracking
npm run example:bigram-trackingExample Files
examples/basic-prediction.js- Character prediction basicsexamples/error-tolerant.js- Handling typos and noisy inputexamples/word-completion.js- Word completion with lexiconexamples/train-from-file.js- Loading training data from filesexamples/bigram-tracking.js- Next-word prediction with bigrams
Use Cases
AAC (Augmentative and Alternative Communication)
Perfect for users with motor impairments who may have difficulty with precise typing:
const { createErrorTolerantPredictor } = require('@willwade/ppmpredictor');
const predictor = createErrorTolerantPredictor({
lexicon: aacVocabulary,
keyboardAware: true, // Handle proximity errors
maxEditDistance: 2, // Allow typos
adaptive: true, // Learn user's patterns
maxPredictions: 5 // Show top 5 suggestions
});
// Train on user's common phrases
predictor.train('I want to go to the park. I need help. Thank you.');
// Predict with error tolerance
const predictions = predictor.predictWordCompletion('hlp');
// Returns: [{ text: 'help', probability: 0.85 }, ...]Text Input Enhancement
Improve any text input with intelligent prediction:
const predictor = createPredictor({
adaptive: true,
maxPredictions: 5
});
// Train on user's writing style
predictor.train(userHistoricalText);
// Provide real-time predictions
inputField.addEventListener('input', (e) => {
const text = e.target.value;
predictor.addToContext(text);
// Character prediction
const charPredictions = predictor.predictNextCharacter();
// Word completion
const lastWord = text.split(/\s+/).pop();
const wordPredictions = predictor.predictWordCompletion(lastWord);
showSuggestions(charPredictions, wordPredictions);
});Multilingual Communication
Support users who communicate in multiple languages:
const { loadFrequencyList } = require('worldalphabets');
const englishWords = (await loadFrequencyList('en')).tokens.slice(0, 5000);
const spanishWords = (await loadFrequencyList('es')).tokens.slice(0, 5000);
const predictor = createPredictor({ lexicon: englishWords });
predictor.addTrainingCorpus('spanish', spanishText, {
lexicon: spanishWords
});
// User can switch languages
languageSelector.addEventListener('change', (e) => {
predictor.useCorpora([e.target.value]);
});Medical/Professional Terminology
Domain-specific vocabulary for specialized users:
const medicalWords = [
'acetaminophen', 'ibuprofen', 'prescription',
'diagnosis', 'symptoms', 'treatment'
];
const predictor = createPredictor({
lexicon: generalWords
});
predictor.addTrainingCorpus('medical', medicalText, {
description: 'Medical terminology',
lexicon: medicalWords
});
// At doctor's office
predictor.useCorpora(['medical', 'default']);Performance Considerations
- Memory: PPM model size grows with training data
- ~1-5 MB for typical AAC vocabulary
- Scales linearly with training text size
- Speed: Character prediction is very fast (< 1ms)
- Training: One-time cost, can be done at initialization
- Lexicon: Larger lexicons increase word completion time
- 1,000 words: < 5ms
- 10,000 words: < 20ms
- 50,000 words: < 100ms
Optimization Tips
- Limit lexicon size to relevant words (5,000-10,000 is usually sufficient)
- Use appropriate maxOrder (5 is usually sufficient, higher = more memory)
- Train once at initialization, not per-prediction
- Cache predictions for repeated queries
- Use corpora to separate vocabularies instead of one huge lexicon
Testing
# Run all tests
npm test
# Run with coverage
npm run test:coverageAll 46 tests passing! ✓
License
Apache License 2.0 - see LICENSE file for details.
Credits
- PPM Implementation: Based on Google Research's JavaScript PPM implementation
- Original Research: Cleary & Witten (1984), Dasher project (Cambridge)
- WorldAlphabets Integration: Frequency lists and keyboard layouts for 100+ languages
- Author: Will Wade
Contributing
Contributions welcome! Please open an issue or PR on GitHub.
Development
# Install dependencies
npm install
# Run tests
npm test
# Build
npm run build
# Run demo locally
npm run devLinks
Related Projects
- Predictionary - Dictionary-based prediction library from the Asterics project. Just note ours does character level prediction and with fuzzy matching around keyboard layouts.
- Google JSLM - Original JS language model code by Google team
- pylm - Python PPM implementation
- Dasher - Original AAC application using PPM
- WorldAlphabets - Frequency lists and keyboard layouts for 100+ languages
