bn-translit
v0.1.1
Published
Bidirectional Bangla ↔ Banglish transliteration with slugify support. Use Banglish→Bangla for search, Bangla→Banglish for SEO and URL slugs.
Maintainers
Readme
bn-translit
Bidirectional Bangla ↔ Banglish transliteration with slug & search helpers. Type "kalam" → match "কালাম". Convert "জম জম পর্দা" → slug
jom-jom-porda.
A small, dependency-free TypeScript library that handles two complementary problems Bangla apps run into:
- Search — users type Banglish (Roman) but data is stored in Bangla.
Use
toBangla()orsearchPattern()to makekalammatchকালাম. - SEO & URL slugs — content is in Bangla but URLs need ASCII.
Use
toBanglish()orslugify()to turnজম জম পর্দাintojom-jom-porda.
Install
npm install bn-translit
# or
bun add bn-translit
# or
pnpm add bn-translitWorks in Node ≥ 18, Bun, Deno, and modern browsers. Zero runtime dependencies, ships with full TypeScript types.
React Native: works with Metro bundler (the package is CommonJS, which Metro handles via interop). Pure-JS code with no native modules — drop in and use. See the RN example below.
Quick start
import { toBangla, toBanglish, slugify, searchPattern } from "bn-translit";
toBangla("kalam"); // "কালাম"
toBangla("abdul khan"); // "আবদুল খান"
toBanglish("কালাম"); // "kalam"
toBanglish("জম জম পর্দা"); // "jom jom porda"
slugify("জম জম পর্দা, ফেনী"); // "jom-jom-porda-phenee"
slugify("Md. Shahjalal Khan"); // "md-shahjalal-khan"
searchPattern("kalam"); // "kalam|কালাম" (regex-ready)Why this exists
If you've ever shipped an app for a Bangla-speaking audience, you've hit at least one of these problems. They all share the same root cause: Bangla is written in a non-Latin script, but most input methods, URL standards, and search infrastructure assume ASCII.
Problem 1 — Search across input methods
A pharmacy / POS / CRM stores customers in Bangla:
{ name: "কালাম বেডিং, ফেনী", phone: "01700000000" }
{ name: "রহমান এন্টারপ্রাইজ", phone: "01711111111" }Users in the field on Android phones rarely have a Bangla keyboard — they
type kalam in English and expect to find the customer. A naive
WHERE name LIKE '%kalam%' returns zero results because kalam and
কালাম share no characters.
The fix is to normalize the search query so it matches both forms:
- Server-side: pass the search term through
searchPattern()and feed it into MongoDB's$regex(or any other regex matcher). - Or pre-compute a Banglish copy of every record at write time
(
toBanglish(name)) and search the indexed Banglish field directly.
This matters most on mobile, where Bangla typing is slow and many users type Banglish out of habit even when a Bangla keyboard is available.
Problem 2 — URL slugs and SEO
Bangla content needs ASCII URLs:
- Most CDNs, analytics tools, and link-shorteners deal poorly with
percent-encoded UTF-8 (
জম%20জম%20পর্দাis valid but ugly). - Search engines extract keywords from URLs — Romanized slugs let English-speaking crawlers index Bangla pages.
- Users copy-paste links into apps that mangle non-ASCII.
slugify("জম জম পর্দা") gives you jom-jom-porda — readable, sharable,
and stable.
Problem 3 — Cross-script duplicate detection
If you accept user-submitted data, the same store might be entered as both
"কালাম বেডিং" and "Kalam Beding". Running both through toBanglish
gives you a normalized form you can compare or use as a unique key.
Why not just use a Bangla collation?
Database collations like MongoDB's bn locale handle sorting and case
folding, but they don't transliterate. A kalam query still won't
match কালাম — the collation only kicks in once you're already operating
on the same script. This library bridges that gap.
API
toBangla(input)
Convert Banglish (Roman phonetic) text to Bangla.
| Input | Output |
| -------------- | ------------ |
| kalam | কালাম |
| khan | খান |
| abdul | আবদুল |
| bangla | বাংলা |
| shapla | শাপলা |
| dhan | ধান |
Conventions:
- Single
aafter a consonant becomes the visible vowel signা. Type the consonant alone (noa) for the inherent (silent) "অ" sound. - Two consonants in a row do not auto-insert halant — they sit
side-by-side with their inherent vowels. To force a conjunct, type
the same consonant twice (
kk→ক্ক,tt→ত্ত). - Digraphs are recognized greedily:
kh→খ,gh→ঘ,ch→চ,chh→ছ,jh→ঝ,th→থ,dh→ধ,ph→ফ,bh→ভ,sh→শ,ng→ং. - Long vowels:
aa→আ/া,ee→ঈ/ী,oo→ঊ/ূ,oi→ঐ/ৈ,ou→ঔ/ৌ. - Digits, spaces, punctuation pass through unchanged.
toBanglish(input, options?)
Convert Bangla text to Banglish (Roman).
toBanglish("কালাম"); // "kalam"
toBanglish("রহমান"); // "rohoman"
toBanglish("রহমান", { schwaDeletion: false }); // "rohomano"
toBanglish("কালাম", { inherentVowel: "a" }); // "kalam"
toBanglish("হুমায়ূন আহমেদ"); // "humayoon ahomed"Options:
| Option | Type | Default | Description |
| ---------------- | --------- | ------- | ---------------------------------------------------------------------------------------------------- |
| schwaDeletion | boolean | true | Drop the inherent vowel from word-final consonants (so রহমান ends as …man rather than …mano). |
| inherentVowel | string | "o" | Roman letter for a bare consonant's vowel. Common alternatives: "a" (more English-feeling output). |
The library auto-composes Bengali nukta letters (য়, ড়, ঢ়) — these are explicitly excluded from Unicode NFC composition (UAX #15 §5.7), so we handle them manually so input from any source decomposes consistently.
slugify(input, options?)
Build a URL-safe slug. Internally calls toBanglish then strips non-ASCII.
slugify("জম জম পর্দা"); // "jom-jom-porda"
slugify("Naïve café résumé"); // "naive-cafe-resume"
slugify("hello world", { separator: "_" }); // "hello_world"
slugify("a b c d e f", { maxLength: 5 }); // "a-b-c" (truncated at last separator)Options (extends ToBanglishOptions):
| Option | Type | Default | Description |
| ----------------- | --------- | ------- | ----------------------------------------------------------------- |
| separator | string | "-" | Joiner between words. |
| lowercase | boolean | true | Lowercase the result. |
| stripDiacritics | boolean | true | Remove combining marks (NFD-normalize and strip). |
| maxLength | number | — | Truncate at the last separator before this length. |
searchPattern(input)
Returns a regex-pattern string that matches the input AND its
transliteration. Suitable for new RegExp(...) or MongoDB $regex.
searchPattern("kalam"); // "kalam|কালাম"
searchPattern("কালাম"); // "কালাম|kalamo|kalam" (deduped)
searchPattern("0170"); // "0170" (no transliteration for digits)
searchPattern(""); // ""Special regex characters in the input are escaped, so user input like
"test.com" matches the literal dot rather than any character.
Use cases
1. Server-side search (MongoDB)
import { searchPattern } from "bn-translit";
router.get("/customers", async (req, res) => {
const q = req.query.q as string;
const pattern = searchPattern(q);
const filter = pattern
? {
$or: [
{ name: { $regex: pattern, $options: "i" } },
{ organizationName: { $regex: pattern, $options: "i" } },
],
}
: {};
res.json(await Customer.find(filter).limit(50));
});Now both "kalam" (typed by mobile users) and "কালাম" (typed by desktop users with a Bangla keyboard) hit the same record.
2. Storing a Banglish copy for full-text indexing
import { toBanglish } from "bn-translit";
CustomerSchema.pre("save", function () {
this.banglishName = toBanglish(this.name);
});
// then index { banglishName: 1 } and search the user's English query
// against banglishName directly — no regex tricks needed.3. SEO-friendly URLs
import { slugify } from "bn-translit";
const article = {
title: "জম জম পর্দা — এক বছরের অভিজ্ঞতা",
slug: slugify("জম জম পর্দা — এক বছরের অভিজ্ঞতা"),
// → "jom-jom-porda-ek-bochhorer-obhijnota"
};4. Client-side search (e.g. React Native)
When you already have data on the device and want instant filtering as the user types — no server roundtrip:
import { useMemo, useState } from "react";
import { TextInput, FlatList, Text } from "react-native";
import { searchPattern } from "bn-translit";
const customers = [
{ id: 1, name: "কালাম বেডিং" },
{ id: 2, name: "রহমান এন্টারপ্রাইজ" },
{ id: 3, name: "জম জম পর্দা" },
];
export function CustomerList() {
const [q, setQ] = useState("");
const filtered = useMemo(() => {
const pattern = searchPattern(q);
if (!pattern) return customers;
const re = new RegExp(pattern, "i");
return customers.filter((c) => re.test(c.name));
}, [q]);
return (
<>
<TextInput
value={q}
onChangeText={setQ}
placeholder="নাম খুঁজুন / Search"
/>
<FlatList
data={filtered}
keyExtractor={(c) => String(c.id)}
renderItem={({ item }) => <Text>{item.name}</Text>}
/>
</>
);
}Typing kalam filters down to "কালাম বেডিং" — even though no
character of the query matches the data byte-for-byte.
Conventions & limitations
This is a phonetic, heuristic transliteration — not a linguistically exact one. Bangla orthography has historical irregularities that no rule-based system handles perfectly.
The mapping is lossy in both directions:
| Bangla letters that share one Banglish form | Why |
|---|---|
| স / শ / ষ ← all s or sh | All voice-less sibilants in modern speech |
| ত / ট ← both t | Dental vs retroflex distinction is hard for English speakers |
| দ / ড ← both d | Same |
| ন / ণ / ঞ ← all n | Same |
| র / ড় ← both r | Same |
| জ / য ← both j | Same |
The library is biased toward search and slug usefulness, not authentic spelling. If you need precise transliteration (e.g., academic papers, IAST/ITRANS schemes), use a linguistically-aware tool.
Schwa deletion is heuristic — applied only at word ends. Real Bangla applies it more broadly mid-word, which requires a pronunciation dictionary outside the scope of this library.
For best search results, consider feeding both the original query
and a transliterated form into your matcher (the searchPattern helper
does this for you) — that way users typing imperfect Banglish still hit
the right records.
Building & testing
npm install
npm test # node:test runner with tsx loader (28 tests)
npm run build # tsc → dist/The package builds to CommonJS (dist/index.js) with .d.ts types.
Comparison with similar packages
bangla-to-banglish— covers only one direction. This library does both and adds slug + search-pattern helpers for the most common application use cases.- Avro Phonetic keyboard — a typing system, not a programmatic API. The conventions here are inspired by Avro but tuned for search/slug use rather than precise text input.
Contributing
Issues and PRs welcome. Please include test cases that demonstrate the expected behavior — character mappings can be opinionated, so context helps us judge whether a change is a fix or a regression for someone else's setup.
License
MIT — see LICENSE.
