indian-amount-parser

v1.0.2

Published

16 hours ago

Parse Indian-currency amounts (digits or words) from free-form text across 24 languages.

Downloads

166

0High
0Medium
0Low

bapi1994

indian amount parser currency rupees lakh crore hindi tamil telugu bengali marathi i18n nlp

Indian Amount Parser

Parse Indian-currency amounts — digits or words — from free-form text, across 24 languages (English + 23 Indian languages). Pure ESM, zero runtime dependencies. Works in Node.js and the browser.

parseAmountFromText("रुपये दो लाख पचास हजार")   // => { amount: 250000, text: ..., language: 'hi', currency: 'INR', confidence: 0.7, ... }
parseAmountFromText("Rs 2 lakh")                  // => { amount: 200000, ... }
parseAmountFromText("₹1,50,000")                 // => { amount: 150000, ... }
parseAmountFromText("1 driver 5000 rupes")        // => { amount: 5000,   ... }   (typo tolerated, picks the largest)
parseAmountFromText("ढाई लाख")                    // => { amount: 250000, ... }   (Hindi/Urdu fractional forms)
parseAmountFromText("दस रुपये पचास पैसे")         // => { amount: 10.5,   ... }   (paise subunits)
parseAmountFromText("(500)")                      // => { amount: -500,   ... }   (accounting-style negative)
parseAmountFromText("5K and 10L")                 // => { amount: 1000000, ... }  (K/L/Cr/M abbreviations)

Features

24 dictionaries: English + 20 Indian languages (Hindi, Bengali, Telugu, Marathi, Tamil, Gujarati, Urdu, Kannada, Odia, Malayalam, Punjabi, Assamese, Maithili, Sanskrit, Konkani, Sindhi, Nepali, Kashmiri, Dogri, Bodo) + Santali, Manipuri/Meitei, Bhojpuri.
Indian numbering system: hundred, thousand, lakh, crore (all four are first-class multipliers).
Full number coverage: 0–10, 11–19, 20, 25, 30, 40, 50, 60, 70, 80, 90 in native script for every language.
Auto-detect: picks the most-likely language from the input; falls back to English for digit-only input.
Digit + word mixing: Rs 2 lakh fifty thousand parses correctly.
Indian comma notation: 1,50,000 is treated as a single 150000, not split on the comma.
Native-digit support: ₹५००, ৫০০, ௫௦௦ (Devanagari, Bengali, Tamil, and 8 other scripts).
K / L / Cr / M abbreviations: 5K → 5000, 2.5L → 250000, 1.5Cr → 15000000, 10M → 10000000.
Whitespace-separated digit grouping: 1 50 000 → 150000.
Negative amounts: -500, −500, minus 500, (500) all parse with sign.
Hindi/Urdu fractional forms: ढाई लाख (2.5 lakh = 250000), सवा सौ (1.25 × 100 = 125), डेढ़ लाख (1.5 lakh = 150000).
Plural/oblique forms: लाखों, करोड़ों, रुपैयों, etc. across all relevant dicts.
Paise / paisa subunits: दस रुपये पचास पैसे → 10.50, 50 paise → 0.50.
Currency connectors skipped: and, और, আর, আৰু, અને, ਅਤੇ, ଏବଂ, etc.
False-positive filter (on by default): drops 4-digit years (1900–2099), 10-digit phone-shaped numbers, and #-prefixed ID numbers.
Result metadata: language, currency, confidence, rawTokens, groups, matched, text.
parseAllAmounts(text): returns every candidate amount, not just the largest.
Optional caching: createCachedParser() wraps the parser with an LRU cache.
Pure ESM, no runtime dependencies.

Install

Node.js

npm install indian-amount-parser

Browser (CDN)

No bundler needed. Use directly from a CDN:

UMD (works with <script> tag):

<!-- unpkg -->
<script src="https://unpkg.com/indian-amount-parser/dist/indian-amount-parser.min.js"></script>

<!-- jsdelivr -->
<script src="https://cdn.jsdelivr.net/npm/indian-amount-parser/dist/indian-amount-parser.min.js"></script>

<script>
  const result = IndianAmountParser.parseAmountFromText('Rs 2 lakh');
  console.log(result.amount); // 200000
</script>

ESM (modern browsers):

<script type="module">
  import { parseAmountFromText } from 'https://esm.sh/indian-amount-parser';
  const result = parseAmountFromText('पाँच लाख');
  console.log(result.amount); // 500000
</script>

See examples/ for runnable browser demos.

Usage

import { parseAmountFromText, parseAllAmounts, createCachedParser } from 'indian-amount-parser';

parseAmountFromText('five lakh');
// => { amount: 500000, text: 'five lakh', language: 'en', currency: null, confidence: 0.7, rawTokens: ['five','lakh'], groups: [{tokens:['five','lakh'], amount:500000}], matched: true }

parseAmountFromText('पाँच लाख');
// => { amount: 500000, text: 'पाँच लाख', language: 'hi', currency: 'INR', ... }

parseAmountFromText('₹ 1,50,000');
// => { amount: 150000, text: '₹ 1,50,000', language: 'en', currency: 'INR', ... }

parseAmountFromText('two lakh fifty thousand');
// => { amount: 250000, ... }

parseAmountFromText('one hundred and fifty');
// => { amount: 150, ... }

parseAmountFromText('this string has no number');
// => { amount: null, matched: false, ... }

// Native digits
parseAmountFromText('₹५००');                          // => { amount: 500, ... }
parseAmountFromText('पाँच सौ रुपये');                   // => 500
parseAmountFromText('খুব ভালো টাকা ৫০০ টাকা');           // => 500

// K/L/Cr/M abbreviations
parseAmountFromText('5K');                              // => 5000
parseAmountFromText('2.5L');                            // => 250000
parseAmountFromText('1.5Cr');                           // => 15000000

// Negative
parseAmountFromText('-500');                            // => -500
parseAmountFromText('minus 500');                       // => -500
parseAmountFromText('(500)');                           // => -500

// Hindi/Urdu fractions
parseAmountFromText('ढाई लाख');                         // => 250000
parseAmountFromText('डेढ़ लाख');                         // => 150000
parseAmountFromText('सवा सौ');                          // => 125

// Paise
parseAmountFromText('दस रुपये पचास पैसे');              // => 10.5
parseAmountFromText('Rs 10 paise');                     // => 0.1

// All amounts in a text
parseAllAmounts('I paid 1000, then 2000, then 3000');
// => [{amount: 1000, ...}, {amount: 2000, ...}, {amount: 3000, ...}]

// Force a specific language
parseAmountFromText('पाँच', { language: 'hi' });

API

`parseAmountFromText(text, options?) → Result`

| Param | Type | Description | |-------|------|-------------| | text | string | The input text. May contain digits, number words, currency symbols, punctuation. | | options.language | string | One of the supported language codes. If omitted, the language is auto-detected. | | options.filterYears | boolean (default true) | Drop 4-digit numbers in 1900–2099. | | options.filterPhones | boolean (default true) | Drop 10-digit phone-shaped numbers. | | options.filterIds | boolean (default true) | Drop pure-number groups when the text contains a # ID marker. |

Returns Result:

amount: parsed number (number), or null if nothing was detected.
text: original input (preserved for downstream use).
language: detected language code ('en', 'hi', …), or null.
currency: detected currency code ('INR'), or null.
confidence: 0–1 score.
rawTokens: cleaned tokens.
groups: candidate groups with their parsed amounts.
matched: true if an amount was extracted.

`parseAllAmounts(text, options?) → Array<Result>`

Like parseAmountFromText but returns every candidate group (no filter, no picking of the largest). Useful for texts with multiple amounts.

`normalizeText(text, dictionary?) → string`

Lower-level: returns the cleaned, tokenized-friendly form. Useful for debugging or building custom pipelines.

`tokenize(text, dictionary?) → string[]`

Splits the normalized input on whitespace. Currency symbols and punctuation are stripped, language-aware.

`dictionaries`, `supportedLanguages`

Direct access to all 24 dictionary objects and the list of language codes.

`createCachedParser(parseFn, options?) → cachedParse`

Wraps any parser function with an LRU cache. options.maxSize (default 1000) controls the cache size.

const cachedParse = createCachedParser(parseAmountFromText, { maxSize: 500 });
cachedParse('Rs 2 lakh');   // first call: hits the parser
cachedParse('Rs 2 lakh');   // second call: returns cached
cachedParse.size();         // 1
cachedParse.clear();        // reset

Supported languages

| # | Code | Language | Script | |---|------|----------|--------| | 1 | en | English | Latin | | 2 | hi | Hindi | Devanagari | | 3 | bn | Bengali | Bengali | | 4 | te | Telugu | Telugu | | 5 | mr | Marathi | Devanagari | | 6 | ta | Tamil | Tamil | | 7 | gu | Gujarati | Gujarati | | 8 | ur | Urdu | Perso-Arabic | | 9 | kn | Kannada | Kannada | | 10 | or | Odia | Odia | | 11 | ml | Malayalam | Malayalam | | 12 | pa | Punjabi | Gurmukhi | | 13 | as | Assamese | Bengali/Assamese | | 14 | mai | Maithili | Devanagari | | 15 | sa | Sanskrit | Devanagari | | 16 | kok | Konkani | Devanagari | | 17 | sd | Sindhi | Perso-Arabic (and Devanagari) | | 18 | ne | Nepali | Devanagari | | 19 | ks | Kashmiri | Perso-Arabic | | 20 | doi | Dogri | Devanagari | | 21 | brx | Bodo | Devanagari | | 22 | sat | Santali | Devanagari | | 23 | mni | Manipuri (Meitei) | Bengali | | 24 | bho | Bhojpuri | Devanagari |

The Indian numbering system

Indian numbering groups digits differently from the Western system:

The parser correctly handles lakh and crore as first-class multipliers, including composites like two lakh fifty thousand (2 × 100,000 + 50 × 1,000 = 250,000).

Roadmap

[x] Full 0–99 native composites for Hindi, Bengali, Marathi, Tamil, Telugu.
[x] Half-units in Hindi/Urdu: सवा (1.25), ढाई (2.5), डेढ़ (1.5).
[x] TypeScript types (.d.ts).
[x] GitHub Actions CI running node --test.
[x] LICENSE file (MIT).
[x] Browser builds (UMD + ESM) via esbuild.
[ ] International currencies: USD/EUR/GBP/JPY/CNY (currently INR-only).
[ ] Full 0–99 composites for remaining languages.

Tests

npm test

Build browser bundles

npm run build

Outputs to dist/:

indian-amount-parser.esm.js — ESM bundle (~50 KB minified)
indian-amount-parser.min.js — UMD bundle (~50 KB minified)
indian-amount-parser.js — UMD bundle (unminified, for dev)

336 tests, zero runtime dependencies. Tests live in test/:

normalizer.test.js — Unicode NFC, currency stripping, comma grouping, native digits.
dictionaries.test.js — shape validation for all 24 dictionaries.
parser.test.js — happy paths, composites, edges per language.
auto-detect.test.js — language detection logic.
phase1.test.js — Phase 1 features (K/L/Cr/M, native digits, negatives, metadata).
phase2.test.js — Phase 2 features (fractions, plurals, paise).
phase3.test.js — Phase 3 features (filters, parseAllAmounts, confidence, currency).
phase4.test.js — Phase 4 features (new languages, caching).
phase5.test.js — Phase 5 features (hyphen handling, full composites, edge cases).

Contributing

To add a new language:

Create src/languages/<code>.js exporting an object shaped like:

export default {
  numbers: { /* word -> integer, at minimum 0-10, 20, 25, 50, 30, 40, 60, 70, 80, 90 */ },
  multipliers: { hundred: 100, thousand: 1000, lakh: 100000, crore: 10000000 },
  connectors: [/* 'and' equivalents in the language */],
  currency: [/* language-appropriate currency symbols/words */],
  subunits: { /* optional, e.g. paise: 0.01 */ }
};

Register it in src/languages/registry.js (one import + one entry in dictionaries).
Add a happy-path test in test/parser.test.js and a sample row in test/auto-detect.test.js.
Run npm test.

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme