glost-frequency
v0.4.0
Published
Word frequency extension for GLOST - generates and formats frequency data
Maintainers
Readme
glost-frequency
Word frequency extension for GLOST - generates and formats frequency data.
Overview
This package provides frequency analysis for GLOST documents. It separates frequency generation (determining word frequencies) from frequency enhancement (formatting for display).
Philosophy: No Data > Bad Data
This package intentionally does NOT include fallback/heuristic providers. Inaccurate frequency data is worse than no data, especially for language learning applications. You must provide a real frequency provider based on:
- Corpus frequency data (e.g., Thai National Corpus, BCCWJ for Japanese)
- Dictionary frequency rankings
- Other validated language resources
Architecture
- Generator: Populates frequency data using a provider (language-specific)
- Enhancer: Formats frequency data with colors, labels, priorities (language-agnostic)
- Provider Pattern: Pluggable frequency sources (corpus data, word lists)
Installation
pnpm add glost-frequencyUsage
With Language-Specific Provider (Required)
import { createFrequencyGeneratorExtension, createFrequencyEnhancerExtension } from "glost-frequency";
import { createThaiFrequencyProvider } from "glost-th/extensions";
// Create language-specific provider
const thaiProvider = createThaiFrequencyProvider(datasource);
// Create extensions
const generator = createFrequencyGeneratorExtension({
targetLanguage: "th",
provider: thaiProvider
});
const enhancer = createFrequencyEnhancerExtension({
normalize: true
});
// Process
const result = await processGLOSTWithExtensionsAsync(doc, [generator, enhancer]);Enhancer Only (Data Already Present)
If your documents already have frequency data, you can use just the enhancer:
import { FrequencyEnhancerExtension } from "glost-frequency";
import { processGLOSTWithExtensions } from "glost-extensions";
// Synchronous processing (no generator needed)
const result = processGLOSTWithExtensions(document, [FrequencyEnhancerExtension]);Frequency Levels
The extension uses four standard frequency levels:
rare- Infrequently used wordsuncommon- Less common wordscommon- Commonly used wordsvery-common- Very frequently used words
Provider Pattern Benefits
- Data Integrity: Only real corpus data, no guessing
- Language-Specific: Each language can have optimized providers
- Data Source Flexibility: Use different corpora or validated resources
- Composability: Mix and match providers and enhancers
- Testability: Mock providers for testing
- Graceful Degradation: Returns undefined when no data available
API
createFrequencyGeneratorExtension(options)
Creates extension that populates frequency data.
Options:
targetLanguage- ISO-639-1 language codeprovider- FrequencyProvider instanceskipExisting- Skip words with existing frequency (default: true)
createFrequencyEnhancerExtension(options)
Creates extension that formats frequency data.
Options:
normalize- Normalize frequency values (default: true)customMapping- Word → frequency mappings
Creating Custom Providers
Implement the FrequencyProvider interface with real corpus data:
import type { FrequencyProvider, FrequencyLevel } from "glost-frequency";
export function createMyFrequencyProvider(corpusData: Map<string, number>): FrequencyProvider {
return {
async getFrequency(word, language) {
const count = corpusData.get(word);
if (!count) return undefined; // No data? Return undefined, don't guess!
// Map corpus counts to frequency levels based on your data
if (count > 10000) return "very-common";
if (count > 1000) return "common";
if (count > 100) return "uncommon";
return "rare";
}
};
}createFrequencyExtension(options)
Convenience function that creates both generator and enhancer.
Returns: [generator, enhancer]
Migration from glost-extensions
Before (v0.1.x):
import { FrequencyExtension } from "glost-extensions";
processGLOSTWithExtensions(doc, [FrequencyExtension]);After (v0.2.0+):
import { createFrequencyExtension } from "glost-frequency";
import { createThaiFrequencyProvider } from "glost-th/extensions";
// Use real corpus data provider
const provider = createThaiFrequencyProvider({
corpusData: thaiNationalCorpusFrequencies
});
const [generator, enhancer] = createFrequencyExtension({
targetLanguage: "th",
provider
});
await processGLOSTWithExtensionsAsync(doc, [generator, enhancer]);License
MIT
