ai-bangladesh-address-parser
v1.0.5
Published
Production-grade AI-powered Bangladeshi address parser - Extract house, road, area, district, postal code, and more from full addresses
Maintainers
Readme
ai-bangladesh-address-parser
Production-grade AI-powered Bangladeshi address parser - Extract house, road, area, district, postal code, and more from full addresses with 99.3% accuracy
📑 Table of Contents
- Overview
- Features
- Installation
- Quick Start
- What Gets Extracted
- Basic Usage Examples
- Configuration
- Advanced Features
- Complete API Reference
- Troubleshooting
- FAQ
- License & Support
📖 Overview
What Problems Does This Solve?
Problem: Unstructured address data in various formats
Before (Raw Input):
"1152/C \"Greenhouse\", House# 45, Road# 08, Shapla Residential Area, Halishahar, Chittagong-4219"
"Flat A-3, Building 7, Bashundhara R/A, Dhaka"After (Structured Output):
{
"house_number": "45",
"road": "08",
"area": "Shapla Residential Area",
"district": "Chittagong",
"postal_code": "4219",
"overall_confidence": 0.98
}Real-World Use Cases
- E-commerce Checkout - Auto-fill delivery forms, validate postal codes
- Address Validation - Validate addresses before processing
- Data Normalization - Standardize inconsistent address formats
🌟 Features
Core Features
- ✅ Zero Configuration - Just
npm installand it works! Everything is automatic - ✅ High Accuracy - 99.3% accuracy with ML-powered extraction
- ✅ Fast Performance - 20ms latency (0.1ms cached)
- ✅ Postal Code Prediction - Auto-predict postal codes with 98%+ confidence
- ✅ 100% Offline - No API calls, works completely offline
- ✅ AI-Powered - Uses fine-tuned spaCy NER model trained on 1363 real Bangladeshi addresses
- ✅ Geographic Intelligence - Built-in Bangladesh geographic hierarchy system
Advanced Features
- ✅ Address Validation - Validate completeness and component validity
- ✅ Address Formatting - Standardize addresses in multiple formats
- ✅ Address Comparison - Compare addresses and detect duplicates
- ✅ Address Autocomplete - Get suggestions as users type
- ✅ Address Enrichment - Enrich addresses with additional geographic data
- ✅ Statistics & Analytics - Calculate statistics for multiple addresses
- ✅ Custom Confidence Thresholds - Set minimum confidence levels for each component
📦 Installation
Step 1: Prerequisites
- Node.js >= 14.0.0
- Python >= 3.9.0 (automatically detected)
- npm or yarn or pnpm
Step 2: Install Package
npm install ai-bangladesh-address-parserThat's it! Everything installs automatically:
- ✅ Node.js dependencies
- ✅ Python dependencies (spacy, pygtrie, etc.)
- ✅ spaCy language models
- ✅ All required packages
No manual steps needed! The package handles everything during installation.
Step 3: Verify Installation
# Check Node.js
node --version # Should be >= 14.0.0
# Check Python
python3 --version # Should be >= 3.9.0
# Check Python dependencies
python3 -c "import spacy; print('spaCy OK')"
python3 -c "import pygtrie; print('pygtrie OK')"🚀 Quick Start
TypeScript/ES6
import { AddressExtractor } from "ai-bangladesh-address-parser";
const extractor = new AddressExtractor();
// Extract from single address
const result = await extractor.extract("House 12, Road 5, Mirpur, Dhaka-1216");
console.log(result.components);
// {
// house_number: '12',
// road: '5',
// area: 'Mirpur',
// district: 'Dhaka',
// postal_code: '1216'
// }
console.log(result.overall_confidence); // 0.98
console.log(result.extraction_time_ms); // 23.45JavaScript (CommonJS)
const { AddressExtractor } = require("ai-bangladesh-address-parser");
async function main() {
const extractor = new AddressExtractor();
const result = await extractor.extract("Flat A-3, Building 7, Bashundhara R/A, Dhaka");
console.log("Area:", result.components.area);
console.log("Postal Code:", result.components.postal_code);
console.log("Confidence:", result.overall_confidence);
}
main();🎯 What Gets Extracted
The parser extracts the following components from Bangladeshi addresses:
| Component | Example | Description |
| -------------- | --------------------------- | ------------------------ |
| house_number | 12, 12/A, 105/2 | House or building number |
| road | 5, R-7, Central Road | Road name or number |
| area | Mirpur, Bashundhara R/A | Area or residential area |
| district | Dhaka, Chattogram | District name |
| division | Dhaka, Chattogram | Division name |
| postal_code | 1216, 4219 | 4-digit postal code |
| flat_number | A-3, 5B | Flat or apartment number |
| floor_number | 2nd, 3rd floor | Floor number |
| block_number | Block A, B-5 | Block number |
Result Structure
{
components: {
house_number?: string;
road?: string;
area?: string;
district?: string;
division?: string;
postal_code?: string;
flat_number?: string;
floor_number?: string;
block_number?: string;
};
overall_confidence: number; // 0.0 - 1.0
extraction_time_ms: number; // Processing time
normalized_address: string; // Normalized version
original_address: string; // Original input
}💡 Basic Usage Examples
📁 Complete Examples: See the
examples/directory for comprehensive TypeScript examples covering all features.
Example 1: Simple Extraction
import { AddressExtractor } from "ai-bangladesh-address-parser";
const extractor = new AddressExtractor();
const address = "House 12, Road 5, Mirpur, Dhaka-1216";
const result = await extractor.extract(address);
console.log("Extracted Components:");
console.log(`House: ${result.components.house_number}`); // "12"
console.log(`Road: ${result.components.road}`); // "5"
console.log(`Area: ${result.components.area}`); // "Mirpur"
console.log(`District: ${result.components.district}`); // "Dhaka"
console.log(`Postal Code: ${result.components.postal_code}`); // "1216"
console.log(`Confidence: ${(result.overall_confidence * 100).toFixed(1)}%`);Example 2: Complex Address
const complexAddress = '1152/C "Greenhouse", House# 45, Road# 08, Shapla Residential Area, Halishahar, Chittagong-4219';
const result = await extractor.extract(complexAddress, { detailed: true });
console.log("Components:", result.components);
// {
// house_number: '45',
// road: '08',
// area: 'Shapla Residential Area',
// district: 'Chittagong',
// postal_code: '4219'
// }
if (result.metadata) {
console.log("Sources:", result.metadata.component_details);
}Example 3: Batch Processing
const addresses = [
"House 12, Road 5, Mirpur, Dhaka",
"Flat A-3, Building 7, Bashundhara R/A, Dhaka",
"Banani, Dhaka",
"Gulshan 2, Dhaka",
"Dhanmondi 15, Dhaka",
];
const results = await extractor.batchExtract(addresses);
results.forEach((result, index) => {
const addr = addresses[index];
const comp = result.components;
console.log(`\n${addr}:`);
if (comp.area) console.log(` Area: ${comp.area}`);
if (comp.district) console.log(` District: ${comp.district}`);
if (comp.postal_code) {
console.log(` Postal Code: ${comp.postal_code} (${(result.overall_confidence * 100).toFixed(1)}% confidence)`);
}
});Example 4: Error Handling
try {
const result = await extractor.extract("House 12, Road 5, Mirpur, Dhaka-1216", {
timeout: 5000,
});
if (result.components.postal_code) {
console.log(`Postal code found: ${result.components.postal_code}`);
} else {
console.log("No postal code detected");
}
} catch (error) {
console.error("Extraction failed:", error.message);
}🔧 Configuration
Component Confidence Thresholds
You can customize the minimum confidence thresholds for each component. Components with confidence below the threshold will be filtered out from the results.
Default Thresholds:
house_number: 0.70road: 0.70area: 0.65district: 0.75division: 0.80postal_code: 0.80flat_number: 0.70floor_number: 0.70block_number: 0.70
Usage:
const extractor = new AddressExtractor();
// Set custom confidence thresholds
extractor.setConfidenceThresholds({
house_number: 0.75, // Only accept house numbers with 75%+ confidence
postal_code: 0.85, // Only accept postal codes with 85%+ confidence
area: 0.70, // Only accept areas with 70%+ confidence
district: 0.80 // Only accept districts with 80%+ confidence
});
// Extract with custom thresholds
const result = await extractor.extract("House 12, Road 5, Mirpur, Dhaka-1216");
// Get current thresholds
const currentThresholds = extractor.getConfidenceThresholds();
console.log(currentThresholds);Example: All Components at 0.90 (High Precision)
// Set all thresholds to 0.90 for maximum precision
extractor.setConfidenceThresholds({
house_number: 0.90,
road: 0.90,
area: 0.90,
district: 0.90,
division: 0.90,
postal_code: 0.90,
flat_number: 0.90,
floor_number: 0.90,
block_number: 0.90
});
// Only components with 90%+ confidence will be included
const result = await extractor.extract("House 12, Road 5, Mirpur, Dhaka-1216");Example: More Lenient Thresholds
// Set lower thresholds to include more results (may have lower accuracy)
extractor.setConfidenceThresholds({
house_number: 0.60,
postal_code: 0.70,
area: 0.55,
district: 0.65
});
// More components will be included, even with lower confidence
const result = await extractor.extract("House 12, Road 5, Mirpur, Dhaka-1216");Note: Thresholds must be between 0.0 and 1.0. Setting a threshold to 0.0 means all results are accepted, while 1.0 means only perfect matches are accepted.
🚀 Advanced Features
1. Address Validation & Completeness
Validate addresses and check for missing components:
const validation = await extractor.validate("House 12, Road 5, Mirpur, Dhaka-1216");
console.log(validation.is_valid); // true
console.log(validation.completeness); // 0.89 (89% complete)
console.log(validation.missing); // [] (no missing required components)
console.log(validation.score); // 0.92 (overall validity score)
// With custom required components
const strictValidation = await extractor.validate(
"Mirpur, Dhaka",
['district', 'area', 'postal_code'] // Require these components
);
console.log(strictValidation.missing); // ['postal_code']Use Cases: E-commerce checkout validation, Form validation, Data quality checks
2. Address Formatting & Standardization
Format addresses into standardized strings for different use cases:
const address = "House 12, Road 5, Mirpur, Dhaka-1216";
// Full format (default)
const full = await extractor.format(address);
// "House 12, Road 5, Mirpur, Dhaka, 1216"
// Short format
const short = await extractor.format(address, { style: 'short' });
// "Mirpur, Dhaka, 1216"
// Postal format
const postal = await extractor.format(address, { style: 'postal' });
// "Dhaka-1216"
// Minimal format
const minimal = await extractor.format(address, { style: 'minimal' });
// "Mirpur, Dhaka"
// Custom separator
const custom = await extractor.format(address, {
style: 'full',
separator: ' | ',
includePostal: true
});
// "House 12 | Road 5 | Mirpur | Dhaka | 1216"Use Cases: Shipping labels, Database storage, Display in UI, Email/SMS notifications
3. Address Comparison & Similarity
Compare two addresses and detect duplicates:
const addr1 = "House 12, Road 5, Mirpur, Dhaka-1216";
const addr2 = "H-12, R-5, Mirpur, Dhaka";
const comparison = await extractor.compare(addr1, addr2);
console.log(comparison.similarity); // 0.92 (92% similar)
console.log(comparison.match); // true (considered a match)
console.log(comparison.score); // 0.95 (weighted similarity)
console.log(comparison.common); // ['area', 'district', 'house_number']
console.log(comparison.differences); // ['road', 'postal_code']Use Cases: Duplicate address detection, Address matching, Fraud detection, Data deduplication
4. Address Autocomplete & Suggestions
Get address suggestions as users type:
// Search for areas/districts
const suggestions = await extractor.suggest("Mirpur", 5);
suggestions.forEach(s => {
console.log(`${s.area}, ${s.district} - ${s.postal_code} (${s.confidence})`);
});
// Mirpur, Dhaka - 1216 (0.98)
// Mirpur DOHS, Dhaka - 1216 (0.85)
// ...
// Search for districts
const districtSuggestions = await extractor.suggest("Dhak", 3);
// Returns: Dhaka-related suggestionsUse Cases: Search autocomplete, Address input assistance, Location search, User experience improvement
5. Address Enrichment
Enrich addresses with additional geographic information:
const enriched = await extractor.enrich("Mirpur, Dhaka");
console.log(enriched.components); // Extracted components
console.log(enriched.hierarchy); // Geographic hierarchy
console.log(enriched.suggested_postal_code); // Suggested postal if missing
console.log(enriched.overall_confidence); // Overall confidence scoreUse Cases: Complete missing information, Add geographic hierarchy, Add delivery zones, Enhanced data for analytics
7. Address Statistics & Analytics
Calculate statistics for multiple addresses:
const addresses = [
"House 12, Mirpur, Dhaka",
"Banani, Dhaka",
"Gulshan 2, Dhaka",
"Dhanmondi, Dhaka"
];
const stats = await extractor.getStatistics(addresses);
console.log(stats.total); // 4
console.log(stats.completeness); // 0.87 (87% average completeness)
console.log(stats.average_confidence); // 0.92
console.log(stats.distribution.districts); // { "Dhaka": 4 }
console.log(stats.distribution.areas); // { "Mirpur": 1, "Banani": 1, ... }
console.log(stats.common_areas); // Top areas
console.log(stats.missing_components); // { "postal_code": 2, ... }Use Cases: Data quality analysis, Geographic distribution, Business intelligence, Reporting
8. Enhanced Bulk Processing
Process multiple addresses with progress tracking:
const addresses = [
"House 12, Mirpur, Dhaka",
"Banani, Dhaka",
"Gulshan 2, Dhaka"
];
// Process with progress callback
const results = await extractor.batchExtract(addresses, {
onProgress: (current, total) => {
console.log(`Processing ${current}/${total} (${(current/total*100).toFixed(1)}%)`);
},
onError: (address, error) => {
console.error(`Failed: ${address}`, error);
}
});
results.forEach((result, i) => {
console.log(`${addresses[i]}: ${result.components.postal_code}`);
});Use Cases: Database migration, Data cleaning, Batch processing, ETL pipelines
📖 Complete API Reference
All Available Methods
| Method | Description | Returns |
|--------|-------------|---------|
| extract(address, options?) | Extract components from address | ExtractionResult |
| batchExtract(addresses, options?) | Extract from multiple addresses | ExtractionResult[] |
| validate(address, required?) | Validate address completeness | ValidationResult |
| format(address, options?) | Format address string | string |
| compare(address1, address2) | Compare two addresses | ComparisonResult |
| suggest(query, limit?) | Get address suggestions | Suggestion[] |
| enrich(address) | Enrich with additional data | EnrichedResult |
| getStatistics(addresses) | Calculate statistics | Statistics |
| setConfidenceThresholds(thresholds) | Set confidence thresholds | void |
| getConfidenceThresholds() | Get current thresholds | ConfidenceThresholds \| null |
| isAvailable() | Check if system is ready | Promise<boolean> |
| getVersion() | Get package version | string |
Method Details
extract(address, options?)
Extract components from a single address.
const result = await extractor.extract("House 12, Road 5, Mirpur, Dhaka-1216", {
detailed: true, // Include detailed metadata
timeout: 30000 // Timeout in milliseconds
});batchExtract(addresses, options?)
Extract from multiple addresses with optional progress tracking.
const results = await extractor.batchExtract(addresses, {
detailed: true,
timeout: 30000,
onProgress: (current, total) => { /* ... */ },
onError: (address, error) => { /* ... */ }
});validate(address, required?)
Validate address completeness and component validity.
const validation = await extractor.validate("House 12, Mirpur, Dhaka",
['district', 'area', 'postal_code'] // Required components
);format(address, options?)
Format address into standardized string.
const formatted = await extractor.format(address, {
style: 'full', // 'full' | 'short' | 'postal' | 'minimal'
separator: ', ', // Separator between components
includePostal: true // Include postal code
});compare(address1, address2)
Compare two addresses and calculate similarity.
const comparison = await extractor.compare(addr1, addr2);
// Returns: { similarity, match, differences, common, score }suggest(query, limit?)
Get address suggestions based on query.
const suggestions = await extractor.suggest("Mirpur", 5);
// Returns array of suggestions with confidence scoresenrich(address)
Enrich address with additional geographic information.
const enriched = await extractor.enrich("Mirpur, Dhaka");
// Returns: Enhanced result with hierarchy, suggestionsgetStatistics(addresses)
Calculate statistics for multiple addresses.
const stats = await extractor.getStatistics(addresses);
// Returns: { total, completeness, distribution, common_areas, missing_components }📚 TypeScript Examples
Complete, ready-to-run TypeScript examples are available in the examples/ directory:
Available Examples
01-basic-extraction.ts- Basic address extraction02-detailed-extraction.ts- Detailed extraction with metadata03-batch-extraction.ts- Batch processing with progress tracking04-address-validation.ts- Address validation and completeness checking05-address-formatting.ts- Address formatting in multiple styles06-address-comparison.ts- Address comparison and duplicate detection07-address-suggestions.ts- Address autocomplete and suggestions08-address-enrichment.ts- Address enrichment with additional data10-statistics-analytics.ts- Statistics and analytics11-confidence-thresholds.ts- Confidence thresholds configuration
Running Examples
Using ts-node (Recommended):
# Install ts-node globally
npm install -g ts-node
# Run any example
ts-node examples/01-basic-extraction.tsUsing TypeScript Compiler:
# Compile
tsc examples/01-basic-extraction.ts --outDir examples/dist --module commonjs --esModuleInterop
# Run
node examples/dist/01-basic-extraction.jsSee examples/README.md for detailed instructions.
🆘 Troubleshooting
"Python not found"
Solution: The package auto-detects Python. If it can't find it:
- Make sure Python 3.9+ is installed:
python3 --version - Make sure Python is in your system PATH
- Try running
python3 --versionto verify Python is accessible
The package automatically tries python3, python, and py - one of them should work!
"Module not found" errors
Solution: Python dependencies should install automatically. If they didn't:
# Re-run the postinstall script
npm run install-python-deps
# Or manually:
python3 -m pip install -r node_modules/ai-bangladesh-address-parser/requirements.txt"Extraction timed out"
Solution: Increase timeout:
const result = await extractor.extract(address, {
timeout: 60000, // 60 seconds
});"No results returned"
Solution:
- Check Python script exists:
node_modules/ai-bangladesh-address-parser/api/python/extract.py - Test Python script directly:
python3 node_modules/ai-bangladesh-address-parser/api/python/extract.py extract "House 12, Road 5, Mirpur, Dhaka"
Import errors in TypeScript
Solution: Make sure TypeScript is configured:
{
"compilerOptions": {
"module": "commonjs",
"esModuleInterop": true
}
}❓ FAQ
Q: Do I need to install Python dependencies manually?
A: No! Everything is 100% automatic. When you run npm install ai-bangladesh-address-parser, the package automatically:
- Detects Python (python3, python, or py)
- Installs all Python dependencies (spacy, pygtrie, etc.)
- Downloads required models
- Verifies everything is working
Just run npm install - that's it!
Q: Do I need internet connection?
A: No! The package works 100% offline. All data and models are included.
Q: How accurate is postal code prediction?
A: 98%+ confidence for postal code prediction using 21,810 real addresses and geographic hierarchy.
Q: Can I use this commercially?
A: Yes! Commercial use is permitted. See License for details.
Q: What if an address doesn't have a postal code?
A: The parser will auto-predict the postal code with 98%+ confidence based on area/district.
Q: How fast is it?
A: First extraction: ~20ms, cached extractions: ~0.1ms (99% cache hit rate).
Q: Does it work with Bangla text?
A: Yes! The parser handles Bangla, English, and mixed scripts.
Q: Can I modify the code?
A: No. This package uses a proprietary license that prohibits modifications.
📄 License & Support
License
PROPRIETARY - All Rights Reserved
Copyright (c) 2026 Md. Tarikul Islam Juel
Permitted:
- ✅ Install and use the package
- ✅ Use in personal or commercial projects
- ✅ Distribute as part of applications
Prohibited:
- ❌ Modify the source code
- ❌ Create derivative works
- ❌ Redistribute modified versions
- ❌ Reverse engineer
See LICENSE for full terms.
Support
- Issues: GitHub Issues
- Package: npm Package
- Repository: GitHub Repository
Performance
- Latency: 20ms (first call), 0.1ms (cached)
- Accuracy: 99.3%
- Cache Hit Rate: 99%
- Postal Code Prediction: 98%+ confidence
- Model Size: ~100MB (before optimization)
Architecture
The package uses Python for the actual extraction logic (9-stage pipeline) and Node.js as a wrapper:
Node.js Application
↓
AddressExtractor (TypeScript)
↓
python-shell
↓
Python Script (extract.py)
↓
ProductionAddressExtractor (9-stage pipeline)
├── STAGE 1: Script Detection (Bangla/English/Mixed)
├── STAGE 2: Canonical Normalization (Standardize format)
├── STAGE 3: FSM Parsing (Validate structure)
├── STAGE 4: Regex Extraction (Pattern matching)
├── STAGE 5: spaCy NER (ML-based extraction)
├── STAGE 6: Gazetteer Validation (Geographic intelligence)
├── STAGE 7: Geographic Validator (Hierarchy validation)
├── STAGE 8: Conflict Resolution (Evidence-weighted)
└── STAGE 9: Structured Output (JSON generation)
↓
Extracted Components (JSON)Made with ❤️ for Bangladesh 🇧🇩
Copyright (c) 2026 Md. Tarikul Islam Juel - All Rights Reserved
