bitaboom
v2.1.0
Published
Use string utils library to format Arabic and English translations.
Downloads
221
Readme
Bitaboom
Bitaboom is a TypeScript-first string utility toolkit focused on Arabic and bilingual (Arabic ↔ English) publishing workflows. It ships a wide surface of helpers for:
- Arabic script awareness (diacritics, tatweel, Urdu glyphs, punctuation harmonisation)
- Formatting and typography clean-up for scanned/OCRd manuscripts
- Sanitisation pipelines for removing noise such as references, page numbers, markdown artefacts, or escaped spaces
- Parsing helpers (balanced punctuation, JSON normalisation, page range parsing)
- Transliteration cleanup and salutation normalisation for classical Islamic texts
The project targets ESNext and is built/tested with Bun. All exports are tree-shakeable and documented with JSDoc.
Quick start
npm install bitaboom
# or
yarn add bitaboom
# or
pnpm add bitaboom
# or
bun add bitaboomimport { makeDiacriticInsensitiveRegex, removeMarkdownFormatting } from 'bitaboom';
const rx = makeDiacriticInsensitiveRegex('أنا إلى الآفاق');
rx.test('انا الي الافاق'); // true
const plain = removeMarkdownFormatting('**Bold** _italic_ [link](https://example.com)');
console.log(plain); // "Bold italic link"Feature highlights
- Arabic-first matching – build diacritic-insensitive regular expressions, collapse tatweel, score Arabic content density, and replace Urdu glyphs.
- Rich typography normalisers – more than 30 helpers to fix punctuation spacing, quotes, brackets, ellipses, smart quotes, uppercase detection, and whitespace quirks.
- Sanitisation pipelines – strip references, URLs, part markers, markdown decorations, escaped spaces, or numbers in bilingual text.
- Parsing helpers – validate JSON-ish blobs, split search queries by quotes, balance parentheses/quotes, and expand page range strings.
- Transliteration polish – normalise common Arabic prefixes (
al-,wa-,bi-), dedupe apostrophes, replace salutations with ﷺ, and extract initials from transliterated names. - Bun-native toolchain – tests run through
bun testand builds use an in-repotsdownpipeline powered bybun build+tscfor declarations.
API overview
All modules are exported from src/index.ts. Functions are grouped below by feature area.
Arabic helpers (src/arabic.ts)
| Function | Description |
| --- | --- |
| arabicNumeralToNumber | Convert Arabic-Indic numerals (٠-٩) embedded in a string into a JavaScript number. |
| cleanExtremeArabicUnderscores | Remove decorative tatweel/underscores at line edges without touching Hijri date suffixes. |
| convertUrduSymbolsToArabic | Map Urdu variants such as ھ → ه and ی → ي. |
| getArabicScore | Return the ratio of Arabic letters to total non-space, non-digit characters (0 → 1). |
| fixTrailingWow | Collapse stray "و" separators in greetings (e.g. عليكم و رحمة → عليكم ورحمة). |
| addSpaceBetweenArabicTextAndNumbers | Insert a space between Arabic text segments and following numbers. |
| removeNonIndexSignatures | Drop single-digit indices and dangling dashes surrounded by Arabic text. |
| removeSingularCodes | Strip single Arabic letters or digits enclosed in (), [], or «». |
| removeSolitaryArabicLetters | Remove isolated Arabic letters (excluding Hijri "ه"). |
| replaceEnglishPunctuationWithArabic | Replace ASCII ? and ; with Arabic equivalents (؟, ؛) and normalise commas. |
Cleaning & tolerant matching (src/cleaning.ts, src/sanitization.ts)
| Function | Description |
| --- | --- |
| escapeRegex | Safely escape special characters for inclusion in regular expression sources. |
| makeDiacriticInsensitiveRegex | Build a RegExp tolerant of Arabic diacritics, tatweel, whitespace variants, and letter equivalences. |
| makeDiacriticInsensitive | Produce a pattern string (no delimiters) for diacritic-insensitive matching of Arabic text. |
| cleanSymbolsAndPartReferences | Remove bracketed part markers, Arabic ornaments, and numeric references. |
| cleanTrailingPageNumbers | Drop -[123]- page markers. |
| replaceLineBreaksWithSpaces | Collapse whitespace and newline runs to single spaces. |
| stripAllDigits | Remove ASCII digits. |
| removeDeathYear | Strip (d. ####H)/[d. ####h] style death-year mentions. |
| removeNumbersAndDashes | Remove digits and dash characters everywhere. |
| removeSingleDigitReferences | Delete single digit markers like (1), [2], «3». |
| removeUrls | Remove http(s) URLs. |
| removeMarkdownFormatting | Drop markdown bold/italic/link/list/header/backtick syntax. |
| truncate | Trim strings to a maximum length with ellipsis (…). |
| truncateMiddle | Preserve start/end segments while truncating the middle with ellipsis. |
| unescapeSpaces | Convert escaped spaces (\ ) back to regular spaces and trim ends. |
Formatting & typography (src/formatting.ts)
| Function | Description |
| --- | --- |
| insertLineBreaksAfterPunctuation | Add line breaks after ., !, ?, and ؟. |
| addSpaceBeforeAndAfterPunctuation | Normalise spacing around punctuation while respecting quotes and ayah markers. |
| applySmartQuotes | Convert straight quotes to smart quotes and fix opening quotes. |
| cleanLiteralNewLines | Replace literal \n/\r sequences with actual newlines. |
| cleanMultilines | Trim trailing spaces per line. |
| hasWordInSingleLine | Detect whether a line contains a single standalone word. |
| isOnlyPunctuation | Check whether a string consists solely of punctuation/digits. |
| cleanSpacesBeforePeriod | Remove stray spaces before punctuation marks. |
| condenseAsterisks | Collapse multiple * into a single asterisk. |
| condenseColons | Normalise colon clusters like .:. → :. |
| condenseDashes | Reduce consecutive dashes to a single dash. |
| condenseEllipsis | Convert runs of periods to a single ellipsis character. |
| reduceMultilineBreaksToDouble | Limit blank lines to at most two consecutive newlines. |
| reduceMultilineBreaksToSingle | Collapse multiple blank lines to a single newline. |
| condensePeriods | Normalise spaced dot sequences (. . .). |
| condenseUnderscores | Collapse repeated underscores and tatweel runs. |
| doubleToSingleBrackets | Replace doubled parentheses/brackets with single ones. |
| ensureSpaceBeforeBrackets | Guarantee a single space before bracketed notes. |
| ensureSpaceBeforeQuotes | Ensure spacing before Arabic guillemets « ». |
| fixBracketTypos | Repair mismatched bracket pairs (e.g. (« or )3)). |
| fixCurlyBraces | Normalise {} curly brace mismatches. |
| fixMismatchedQuotationMarks | Fix malformed Arabic guillemets and parentheses combos. |
| formatStringBySentence | Reflow paragraphs while keeping numbered footnotes on separate lines. |
| isAllUppercase | Detect text containing only uppercase letters (ignoring non-letters). |
| normalizeSlashInReferences | Convert spaced fractions 127 / 11 → 127/11. |
| normalizeSpaces | Collapse spaces/tabs to single spaces. |
| removeRedundantPunctuation | Remove redundant punctuation following Arabic ؟/!. |
| removeSpaceInsideBrackets | Trim internal spaces inside brackets/parentheses. |
| replaceDoubleBracketsWithArrows | Turn ((text)) into «text». |
| stripBoldStyling | Remove bold stylisation by decomposing Unicode. |
| stripItalicsStyling | Replace italic Unicode letters with plain equivalents. |
| stripStyling | Convenience combo of bold + italics stripping. |
| toTitleCase | Convert strings to title case, respecting Unicode letters. |
| trimSpaceInsideQuotes | Remove spaces immediately inside quotes/guillemets. |
Parsing helpers (src/parsing.ts)
| Function | Description |
| --- | --- |
| normalizeJsonSyntax | Convert pseudo-JSON with numeric keys/single quotes into valid JSON. |
| isJsonStructureValid | Detect JSON-like key/value blobs that can be normalised. |
| splitByQuotes | Split by spaces while keeping quoted substrings intact. |
| isBalanced | Ensure quotes and brackets are balanced and properly nested. |
| parsePageRanges | Expand range/list strings (1-3,5) into numeric arrays. |
Transliteration (src/transliteration.ts)
| Function | Description |
| --- | --- |
| normalizeArabicPrefixesToAl | Normalise Arabic definite article prefixes to al-. |
| normalizeDoubleApostrophes | Collapse duplicated Arabic apostrophes (ʿʿ, ʾʾ). |
| replaceSalutationsWithSymbol | Replace salutations like "sallallahu alayhi wasallam" with ﷺ. |
| normalize | Strip diacritics, apostrophes, and dashes from transliterated text. |
| removeArabicPrefixes | Remove prefixes such as al-, wa-, bi-, fī, li-. |
| normalizeTransliteratedEnglish | Combine prefix removal + diacritic stripping. |
| extractInitials | Extract the first letters from up to two words (after normalisation). |
Build & development
| Task | Command |
| --- | --- |
| Build library | bun run build (invokes the in-repo scripts/tsdown.ts pipeline, which bundles via bun build then emits declarations through tsc). |
| Run tests | bun test |
| Lint | bun run lint |
| Format | bun run format |
| Continuous lint | bun run lint:ci |
The custom tsdown script ensures reproducible builds without relying on tsup. It cleans the dist/ directory, bundles src/index.ts with Bun's bundler (minified ESM output + sourcemap), and finally emits .d.ts files using tsc --emitDeclarationOnly.
Contributing
- Fork the repository and clone it locally.
- Install Bun (
curl -fsSL https://bun.sh/install | bash). - Run tests with
bun testand format withbun run formatbefore opening a pull request.
Issues and PRs are welcome—please include tests whenever you add or change behaviour.
License
MIT © Ragaeeb Haq
