indian-amount-parser
v1.0.2
Published
Parse Indian-currency amounts (digits or words) from free-form text across 24 languages.
Downloads
166
Maintainers
Readme
Indian Amount Parser
Parse Indian-currency amounts — digits or words — from free-form text, across 24 languages (English + 23 Indian languages). Pure ESM, zero runtime dependencies. Works in Node.js and the browser.
parseAmountFromText("रुपये दो लाख पचास हजार") // => { amount: 250000, text: ..., language: 'hi', currency: 'INR', confidence: 0.7, ... }
parseAmountFromText("Rs 2 lakh") // => { amount: 200000, ... }
parseAmountFromText("₹1,50,000") // => { amount: 150000, ... }
parseAmountFromText("1 driver 5000 rupes") // => { amount: 5000, ... } (typo tolerated, picks the largest)
parseAmountFromText("ढाई लाख") // => { amount: 250000, ... } (Hindi/Urdu fractional forms)
parseAmountFromText("दस रुपये पचास पैसे") // => { amount: 10.5, ... } (paise subunits)
parseAmountFromText("(500)") // => { amount: -500, ... } (accounting-style negative)
parseAmountFromText("5K and 10L") // => { amount: 1000000, ... } (K/L/Cr/M abbreviations)Features
- 24 dictionaries: English + 20 Indian languages (Hindi, Bengali, Telugu, Marathi, Tamil, Gujarati, Urdu, Kannada, Odia, Malayalam, Punjabi, Assamese, Maithili, Sanskrit, Konkani, Sindhi, Nepali, Kashmiri, Dogri, Bodo) + Santali, Manipuri/Meitei, Bhojpuri.
- Indian numbering system:
hundred,thousand,lakh,crore(all four are first-class multipliers). - Full number coverage: 0–10, 11–19, 20, 25, 30, 40, 50, 60, 70, 80, 90 in native script for every language.
- Auto-detect: picks the most-likely language from the input; falls back to English for digit-only input.
- Digit + word mixing:
Rs 2 lakh fifty thousandparses correctly. - Indian comma notation:
1,50,000is treated as a single 150000, not split on the comma. - Native-digit support:
₹५००,৫০০,௫௦௦(Devanagari, Bengali, Tamil, and 8 other scripts). - K / L / Cr / M abbreviations:
5K→ 5000,2.5L→ 250000,1.5Cr→ 15000000,10M→ 10000000. - Whitespace-separated digit grouping:
1 50 000→ 150000. - Negative amounts:
-500,−500,minus 500,(500)all parse with sign. - Hindi/Urdu fractional forms:
ढाई लाख(2.5 lakh = 250000),सवा सौ(1.25 × 100 = 125),डेढ़ लाख(1.5 lakh = 150000). - Plural/oblique forms:
लाखों,करोड़ों,रुपैयों, etc. across all relevant dicts. - Paise / paisa subunits:
दस रुपये पचास पैसे→ 10.50,50 paise→ 0.50. - Currency connectors skipped:
and,और,আর,আৰু,અને,ਅਤੇ,ଏବଂ, etc. - False-positive filter (on by default): drops 4-digit years (1900–2099), 10-digit phone-shaped numbers, and
#-prefixed ID numbers. - Result metadata:
language,currency,confidence,rawTokens,groups,matched,text. parseAllAmounts(text): returns every candidate amount, not just the largest.- Optional caching:
createCachedParser()wraps the parser with an LRU cache. - Pure ESM, no runtime dependencies.
Install
Node.js
npm install indian-amount-parserBrowser (CDN)
No bundler needed. Use directly from a CDN:
UMD (works with <script> tag):
<!-- unpkg -->
<script src="https://unpkg.com/indian-amount-parser/dist/indian-amount-parser.min.js"></script>
<!-- jsdelivr -->
<script src="https://cdn.jsdelivr.net/npm/indian-amount-parser/dist/indian-amount-parser.min.js"></script>
<script>
const result = IndianAmountParser.parseAmountFromText('Rs 2 lakh');
console.log(result.amount); // 200000
</script>ESM (modern browsers):
<script type="module">
import { parseAmountFromText } from 'https://esm.sh/indian-amount-parser';
const result = parseAmountFromText('पाँच लाख');
console.log(result.amount); // 500000
</script>See examples/ for runnable browser demos.
Usage
import { parseAmountFromText, parseAllAmounts, createCachedParser } from 'indian-amount-parser';
parseAmountFromText('five lakh');
// => { amount: 500000, text: 'five lakh', language: 'en', currency: null, confidence: 0.7, rawTokens: ['five','lakh'], groups: [{tokens:['five','lakh'], amount:500000}], matched: true }
parseAmountFromText('पाँच लाख');
// => { amount: 500000, text: 'पाँच लाख', language: 'hi', currency: 'INR', ... }
parseAmountFromText('₹ 1,50,000');
// => { amount: 150000, text: '₹ 1,50,000', language: 'en', currency: 'INR', ... }
parseAmountFromText('two lakh fifty thousand');
// => { amount: 250000, ... }
parseAmountFromText('one hundred and fifty');
// => { amount: 150, ... }
parseAmountFromText('this string has no number');
// => { amount: null, matched: false, ... }
// Native digits
parseAmountFromText('₹५००'); // => { amount: 500, ... }
parseAmountFromText('पाँच सौ रुपये'); // => 500
parseAmountFromText('খুব ভালো টাকা ৫০০ টাকা'); // => 500
// K/L/Cr/M abbreviations
parseAmountFromText('5K'); // => 5000
parseAmountFromText('2.5L'); // => 250000
parseAmountFromText('1.5Cr'); // => 15000000
// Negative
parseAmountFromText('-500'); // => -500
parseAmountFromText('minus 500'); // => -500
parseAmountFromText('(500)'); // => -500
// Hindi/Urdu fractions
parseAmountFromText('ढाई लाख'); // => 250000
parseAmountFromText('डेढ़ लाख'); // => 150000
parseAmountFromText('सवा सौ'); // => 125
// Paise
parseAmountFromText('दस रुपये पचास पैसे'); // => 10.5
parseAmountFromText('Rs 10 paise'); // => 0.1
// All amounts in a text
parseAllAmounts('I paid 1000, then 2000, then 3000');
// => [{amount: 1000, ...}, {amount: 2000, ...}, {amount: 3000, ...}]
// Force a specific language
parseAmountFromText('पाँच', { language: 'hi' });API
parseAmountFromText(text, options?) → Result
| Param | Type | Description |
|-------|------|-------------|
| text | string | The input text. May contain digits, number words, currency symbols, punctuation. |
| options.language | string | One of the supported language codes. If omitted, the language is auto-detected. |
| options.filterYears | boolean (default true) | Drop 4-digit numbers in 1900–2099. |
| options.filterPhones | boolean (default true) | Drop 10-digit phone-shaped numbers. |
| options.filterIds | boolean (default true) | Drop pure-number groups when the text contains a # ID marker. |
Returns Result:
amount: parsed number (number), ornullif nothing was detected.text: original input (preserved for downstream use).language: detected language code ('en','hi', …), ornull.currency: detected currency code ('INR'), ornull.confidence: 0–1 score.rawTokens: cleaned tokens.groups: candidate groups with their parsed amounts.matched:trueif an amount was extracted.
parseAllAmounts(text, options?) → Array<Result>
Like parseAmountFromText but returns every candidate group (no filter, no picking of the largest). Useful for texts with multiple amounts.
normalizeText(text, dictionary?) → string
Lower-level: returns the cleaned, tokenized-friendly form. Useful for debugging or building custom pipelines.
tokenize(text, dictionary?) → string[]
Splits the normalized input on whitespace. Currency symbols and punctuation are stripped, language-aware.
dictionaries, supportedLanguages
Direct access to all 24 dictionary objects and the list of language codes.
createCachedParser(parseFn, options?) → cachedParse
Wraps any parser function with an LRU cache. options.maxSize (default 1000) controls the cache size.
const cachedParse = createCachedParser(parseAmountFromText, { maxSize: 500 });
cachedParse('Rs 2 lakh'); // first call: hits the parser
cachedParse('Rs 2 lakh'); // second call: returns cached
cachedParse.size(); // 1
cachedParse.clear(); // resetSupported languages
| # | Code | Language | Script |
|---|------|----------|--------|
| 1 | en | English | Latin |
| 2 | hi | Hindi | Devanagari |
| 3 | bn | Bengali | Bengali |
| 4 | te | Telugu | Telugu |
| 5 | mr | Marathi | Devanagari |
| 6 | ta | Tamil | Tamil |
| 7 | gu | Gujarati | Gujarati |
| 8 | ur | Urdu | Perso-Arabic |
| 9 | kn | Kannada | Kannada |
| 10 | or | Odia | Odia |
| 11 | ml | Malayalam | Malayalam |
| 12 | pa | Punjabi | Gurmukhi |
| 13 | as | Assamese | Bengali/Assamese |
| 14 | mai | Maithili | Devanagari |
| 15 | sa | Sanskrit | Devanagari |
| 16 | kok | Konkani | Devanagari |
| 17 | sd | Sindhi | Perso-Arabic (and Devanagari) |
| 18 | ne | Nepali | Devanagari |
| 19 | ks | Kashmiri | Perso-Arabic |
| 20 | doi | Dogri | Devanagari |
| 21 | brx | Bodo | Devanagari |
| 22 | sat | Santali | Devanagari |
| 23 | mni | Manipuri (Meitei) | Bengali |
| 24 | bho | Bhojpuri | Devanagari |
The Indian numbering system
Indian numbering groups digits differently from the Western system:
| Western | Indian | Value | |--------:|-------:|------:| | thousand | thousand | 1,000 | | (none) | lakh | 100,000 | | (none) | crore | 10,000,000 | | million | 10 lakh | 1,000,000 | | billion | 100 crore / arab | 1,000,000,000 |
The parser correctly handles lakh and crore as first-class multipliers, including composites like two lakh fifty thousand (2 × 100,000 + 50 × 1,000 = 250,000).
Roadmap
- [x] Full 0–99 native composites for Hindi, Bengali, Marathi, Tamil, Telugu.
- [x] Half-units in Hindi/Urdu:
सवा(1.25),ढाई(2.5),डेढ़(1.5). - [x] TypeScript types (
.d.ts). - [x] GitHub Actions CI running
node --test. - [x]
LICENSEfile (MIT). - [x] Browser builds (UMD + ESM) via esbuild.
- [ ] International currencies: USD/EUR/GBP/JPY/CNY (currently INR-only).
- [ ] Full 0–99 composites for remaining languages.
Tests
npm testBuild browser bundles
npm run buildOutputs to dist/:
indian-amount-parser.esm.js— ESM bundle (~50 KB minified)indian-amount-parser.min.js— UMD bundle (~50 KB minified)indian-amount-parser.js— UMD bundle (unminified, for dev)
336 tests, zero runtime dependencies. Tests live in test/:
normalizer.test.js— Unicode NFC, currency stripping, comma grouping, native digits.dictionaries.test.js— shape validation for all 24 dictionaries.parser.test.js— happy paths, composites, edges per language.auto-detect.test.js— language detection logic.phase1.test.js— Phase 1 features (K/L/Cr/M, native digits, negatives, metadata).phase2.test.js— Phase 2 features (fractions, plurals, paise).phase3.test.js— Phase 3 features (filters, parseAllAmounts, confidence, currency).phase4.test.js— Phase 4 features (new languages, caching).phase5.test.js— Phase 5 features (hyphen handling, full composites, edge cases).
Contributing
To add a new language:
Create
src/languages/<code>.jsexporting an object shaped like:export default { numbers: { /* word -> integer, at minimum 0-10, 20, 25, 50, 30, 40, 60, 70, 80, 90 */ }, multipliers: { hundred: 100, thousand: 1000, lakh: 100000, crore: 10000000 }, connectors: [/* 'and' equivalents in the language */], currency: [/* language-appropriate currency symbols/words */], subunits: { /* optional, e.g. paise: 0.01 */ } };Register it in
src/languages/registry.js(one import + one entry indictionaries).Add a happy-path test in
test/parser.test.jsand a sample row intest/auto-detect.test.js.Run
npm test.
License
MIT — see LICENSE.
