@aid-on/fuzztok
v1.0.0
Published
高速・軽量なファジートークン推定ライブラリ - Fast and lightweight fuzzy token estimation library with CJK support
Maintainers
Readme
@aid-on/fuzztok
Fast and lightweight fuzzy token estimation library with CJK support
Features
- 🚀 High Performance: Optimized for speed and low memory usage
- 🌏 CJK Support: Advanced support for Chinese, Japanese, and Korean text
- 🔧 Flexible Architecture: Dependency injection pattern for model configurations
- 📊 Detailed Analysis: Character type breakdown and composition analysis
- ⚡ Batch Processing: Support for batch estimation and streaming text
- 💰 Cost Calculation: Built-in token-to-cost conversion utilities
- 🐛 Debug Tools: Visualization tools for estimation breakdown
Installation
npm install @aid-on/fuzztokQuick Start
import { createSimpleFuzzyEstimator } from '@aid-on/fuzztok';
// Configure models
const modelConfigs = {
'gpt-3.5-turbo': {
charsPerToken: 4,
overhead: 10,
cjkTokensPerChar: 1.2,
mixedTextMultiplier: 1.05,
numberTokensPerChar: 3.5,
symbolTokensPerChar: 2.5,
whitespaceHandling: 'compress'
}
};
// Create estimator
const estimator = createSimpleFuzzyEstimator(modelConfigs, 'gpt-3.5-turbo');
// Simple estimation
const tokens = estimator.estimate('Hello, world! こんにちは!');
console.log(\`Estimated tokens: \${tokens}\`);
// Detailed estimation
const detailed = estimator.estimateDetailed('Hello, world! こんにちは!');
console.log(detailed);API Reference
Core Classes
FuzzyTokenEstimator
Main estimation engine with dependency injection for model configurations.
constructor(
modelProvider: ModelConfigProvider,
options?: {
fallbackConfig?: FuzzyModelConfig;
defaultModel?: string;
}
)Methods:
estimate(text: string, modelName?: string): number- Simple token countestimateDetailed(text: string, modelName?: string): EstimationResult- Detailed analysisestimatePayload(payload: TextPayload): number- Estimate from text payloadestimateBatch(texts: string[], modelName?: string): EstimationResult[]- Batch processing
CharacterClassifier
Utility for character type detection and text analysis.
// Static methods
CharacterClassifier.isCJKCharacter(char: string): boolean
CharacterClassifier.getCharacterType(char: string): CharacterType
CharacterClassifier.analyzeTextComposition(text: string): TextCompositionConfiguration
FuzzyModelConfig
interface FuzzyModelConfig extends BaseTokenConfig {
cjkTokensPerChar: number; // CJK characters per token
mixedTextMultiplier: number; // Mixed text adjustment factor
numberTokensPerChar?: number; // Number tokenization rate
symbolTokensPerChar?: number; // Symbol tokenization rate
whitespaceHandling?: 'ignore' | 'count' | 'compress';
}Factory Functions
// Using ModelConfigProvider
createFuzzyEstimator(
modelProvider: ModelConfigProvider,
options?: ConfigOptions
): FuzzyTokenEstimator
// Using simple config object
createSimpleFuzzyEstimator(
modelConfigs: Record<string, FuzzyModelConfig>,
defaultModel?: string
): FuzzyTokenEstimatorAdvanced Usage
Custom Model Provider
import { FuzzyTokenEstimator } from '@aid-on/fuzztok';
class CustomModelProvider {
getConfig(modelName) {
// Fetch from database, API, etc.
return {
charsPerToken: 4,
overhead: 10,
cjkTokensPerChar: 1.2,
mixedTextMultiplier: 1.05
};
}
getSupportedModels() {
return ['custom-model-1', 'custom-model-2'];
}
}
const estimator = new FuzzyTokenEstimator(new CustomModelProvider());Cost Calculation
import { TokenCostCalculator } from '@aid-on/fuzztok';
class MyCostProvider {
getCost(model) {
return { input: 0.0015, output: 0.002 }; // per 1K tokens
}
}
const calculator = new TokenCostCalculator(new MyCostProvider());
const cost = calculator.calculate('gpt-3.5-turbo', 1000, 500);
console.log(cost.formattedTotal); // "$2.25"Streaming Support
async function* textStream() {
yield "Hello ";
yield "world ";
yield "こんにちは!";
}
for await (const result of estimator.estimateStream(textStream())) {
console.log(\`Chunk: \${result.chunk}, Tokens: \${result.tokens}, Total: \${result.total}\`);
}CJK Support
This library provides comprehensive support for CJK text:
- Chinese: Simplified and Traditional Chinese characters
- Japanese: Hiragana, Katakana, and Kanji
- Korean: Hangul syllables and compatibility characters
- Extended Unicode: CJK Extension A-G, compatibility forms, and more
License
MIT
Contributing
Issues and pull requests are welcome on GitHub.
