@sygnl/text-normalizer
v1.0.0
Published
Universal text normalization utilities for NLP, search, and data processing
Maintainers
Readme
@sygnl/text-normalizer
Universal text normalization utilities for NLP, search, and data processing. Built with TypeScript for type safety and developer experience.
Features
- 🔤 Accent Removal - Remove diacritics and accents (café → cafe)
- 🔧 Punctuation Stripping - Remove or selectively preserve special characters
- 📏 Whitespace Normalization - Collapse and trim whitespace intelligently
- 🌍 Unicode Support - Handle multilingual text correctly
- ⚡ Zero Dependencies - Lightweight and fast
- 📦 Tree-Shakeable - Import only what you need
- 🎯 TypeScript First - Full type definitions and IntelliSense support
- 🧪 Well Tested - Comprehensive test coverage
Installation
npm install @sygnl/text-normalizeryarn add @sygnl/text-normalizerpnpm add @sygnl/text-normalizerQuick Start
import { normalize } from '@sygnl/text-normalizer';
// Basic usage - applies all normalizations
const text = " CAFÉ!! Hello, World! 🌍 ";
const result = normalize(text);
console.log(result); // Output: "cafe hello world"Usage
Full Normalization (Default)
import { normalize } from '@sygnl/text-normalizer';
normalize("Crème Brûlée @ $12.99!!!");
// Output: "creme brulee 1299"Selective Normalization
import { normalize } from '@sygnl/text-normalizer';
// Keep uppercase and punctuation
normalize("HELLO, World!", {
lowercase: false,
stripPunctuation: false
});
// Output: "HELLO, World!"
// Preserve specific characters
normalize("[email protected]", {
preserveChars: ['@', '.']
});
// Output: "[email protected]"Individual Functions
import {
removeAccents,
stripPunctuation,
collapseWhitespace,
trim,
lowercase
} from '@sygnl/text-normalizer';
removeAccents("café résumé"); // "cafe resume"
stripPunctuation("Hello, World!"); // "Hello World"
collapseWhitespace("a b"); // "a b"
trim(" hello "); // "hello"
lowercase("HELLO"); // "hello"Detailed Normalization (Debug Mode)
import { normalizeDetailed } from '@sygnl/text-normalizer';
const result = normalizeDetailed(" CAFÉ!! ");
console.log(result);
/*
{
original: " CAFÉ!! ",
normalized: "cafe",
steps: {
afterLowercase: " café!! ",
afterAccentRemoval: " cafe!! ",
afterPunctuationStrip: " cafe ",
afterWhitespaceCollapse: " cafe ",
afterTrim: "cafe"
}
}
*/Custom Replacements
import { normalize } from '@sygnl/text-normalizer';
normalize("hello world", {
customReplacements: {
'hello': 'hi',
'world': 'there'
}
});
// Output: "hi there"
// Use regex patterns
normalize("test123", {
customReplacements: {
'\\d+': 'NUM'
}
});
// Output: "testnum"API Reference
normalize(text: string, options?: NormalizationOptions): string
Main normalization function with configurable options.
Parameters:
text- The input text to normalizeoptions- Optional configuration object
Options:
interface NormalizationOptions {
lowercase?: boolean; // Convert to lowercase (default: true)
removeAccents?: boolean; // Remove accents/diacritics (default: true)
stripPunctuation?: boolean; // Remove punctuation (default: true)
collapseWhitespace?: boolean; // Collapse whitespace (default: true)
trim?: boolean; // Trim leading/trailing space (default: true)
customReplacements?: Record<string, string>; // Custom find/replace
preserveChars?: string[]; // Characters to keep when stripping
}normalizeDetailed(text: string, options?: NormalizationOptions): NormalizationResult
Returns normalization result with detailed step-by-step breakdown.
Returns:
interface NormalizationResult {
original: string;
normalized: string;
steps: {
afterLowercase?: string;
afterAccentRemoval?: string;
afterPunctuationStrip?: string;
afterWhitespaceCollapse?: string;
afterTrim?: string;
};
}Individual Functions
removeAccents(text: string): string
Remove accents and diacritical marks from text.
stripPunctuation(text: string, options?: StripOptions): string
Remove punctuation and special characters.
Options:
interface StripOptions {
preserve?: string[]; // Characters to preserve
keepAlphanumeric?: boolean; // Keep letters/numbers (default: true)
}collapseWhitespace(text: string, options?: WhitespaceOptions): string
Collapse multiple whitespace into single spaces.
Options:
interface WhitespaceOptions {
replaceTabs?: boolean; // Replace tabs with spaces (default: true)
replaceNewlines?: boolean; // Replace newlines with spaces (default: true)
collapseSpaces?: boolean; // Collapse multiple spaces (default: true)
}trim(text: string): string
Remove leading and trailing whitespace.
lowercase(text: string): string
Convert text to lowercase.
applyReplacements(text: string, replacements: Record<string, string>): string
Apply custom find/replace patterns.
Real-World Examples
E-commerce Product Matching
import { normalize } from '@sygnl/text-normalizer';
const titles = [
"Men's Café Racer Leather Jacket - Black",
"mens cafe racer leather jacket black",
"MEN'S CAFÉ RACER LEATHER JACKET (BLACK)"
];
// All normalize to the same string for comparison
const normalized = titles.map(t => normalize(t));
console.log(normalized[0] === normalized[1]); // true
console.log(normalized[0] === normalized[2]); // trueSearch Query Normalization
import { normalize } from '@sygnl/text-normalizer';
const userQueries = [
'crème brûlée recipe',
'Creme Brulee Recipe',
'CRÈME BRÛLÉE RECIPE!!!'
];
const searchTerm = normalize(userQueries[0]);
// Use searchTerm for database queryEmail Address Normalization
import { normalize } from '@sygnl/text-normalizer';
const emails = [
' [email protected] ',
'[email protected]',
'[email protected]!!!'
];
const normalized = emails.map(e =>
normalize(e, { preserveChars: ['@', '.'] })
);
// All become: "[email protected]"Multilingual Text Processing
import { normalize } from '@sygnl/text-normalizer';
const translations = {
french: 'Crème Brûlée',
spanish: 'Señorita Niño',
german: 'Übermensch Äpfel'
};
Object.entries(translations).forEach(([lang, text]) => {
console.log(`${lang}: ${normalize(text)}`);
});
// french: creme brulee
// spanish: senorita nino
// german: ubermensch apfelTypeScript Support
Full TypeScript definitions are included. Import types as needed:
import type {
NormalizationOptions,
NormalizationResult,
WhitespaceOptions,
StripOptions,
CharacterClass
} from '@sygnl/text-normalizer';Performance
- Zero dependencies - No external packages required
- Lightweight - < 5KB minified
- Fast - Optimized for performance with minimal allocations
- Tree-shakeable - Import only the functions you need
Browser Support
Works in all modern browsers and Node.js environments:
- ✅ Node.js 14+
- ✅ Chrome, Firefox, Safari, Edge (latest versions)
- ✅ ES2020+ environments
Use Cases
- 🔍 Search & Indexing - Normalize text before indexing for better search results
- 🛒 E-commerce - Match product titles across different stores
- 🌐 i18n - Handle multilingual text consistently
- 📊 Data Cleaning - Prepare text data for analysis
- 🤖 NLP Pipelines - First step in text processing workflows
- 🔐 User Input - Sanitize and standardize user-entered data
Contributing
Contributions are welcome! This package is part of the UPID ecosystem.
License
Apache License 2.0 - see LICENSE file for details
Author
Edge Foundry Inc. - 2206
Made with ❤️ for the text processing community
