contact-deduplication-lib
v0.4.0
Published
A TypeScript library for contact deduplication
Readme
Contact Deduplication Library
A TypeScript library for identifying and merging duplicate contacts.
Features
- Find duplicate contacts using various matching algorithms
- Configurable matching thresholds and field weights
- Support for merging duplicate contacts
- Multiple matching strategies (email-based, phone-based, combined)
- Fuzzy string matching for names and other text fields
- Field mapping utilities for integrating with different contact schemas
- Dual module format (CommonJS and ESM) for maximum compatibility
- Webpack-friendly ES5 output for seamless integration
Installation
# Using npm
npm install contact-deduplication-lib
# Using yarn
yarn add contact-deduplication-lib
# Using pnpm
pnpm add contact-deduplication-libUsage
Basic Usage
import { ContactDeduplicator } from 'contact-deduplication-lib';
// Create a deduplicator with default options
const deduplicator = new ContactDeduplicator();
// Find duplicates in your contacts
const result = deduplicator.findDuplicates(contacts);
// Access the results
console.log(`Found ${result.duplicateGroups.length} groups of duplicates`);
console.log(`${result.uniqueContacts.length} contacts have no duplicates`);Auto-Merging Duplicates
// Create a deduplicator with auto-merge enabled
const deduplicator = new ContactDeduplicator({ autoMerge: true });
// Find and merge duplicates
const result = deduplicator.findDuplicates(contacts);
// Access the merged contacts
console.log(`Created ${result.mergedContacts?.length} merged contacts`);Custom Matching Options
// Create a deduplicator with custom options
const deduplicator = new ContactDeduplicator({
threshold: 0.8, // Higher threshold for stricter matching
fieldsToCompare: ['firstName', 'lastName', 'email', 'phone'], // Only compare these fields
fieldWeights: {
email: 0.5, // Email matches are more important
phone: 0.3,
firstName: 0.1,
lastName: 0.1,
},
});Using Different Matchers
import { ContactDeduplicator, emailMatcher, phoneMatcher, strictMatcher, hybridMatcher } from 'contact-deduplication-lib';
// Create a deduplicator that prioritizes email matches
const emailDeduplicator = new ContactDeduplicator({}, emailMatcher);
// Create a deduplicator that prioritizes phone matches
const phoneDeduplicator = new ContactDeduplicator({}, phoneMatcher);
// Create a deduplicator that requires fields with high weights to be present and not empty
// This is useful when you want to prevent matching contacts with empty required fields
const strictDeduplicator = new ContactDeduplicator({
fieldWeights: {
email: 0.8, // Fields with weight > 0.5 are considered required
phone: 0.7,
firstName: 0.15,
lastName: 0.15,
}
}, strictMatcher);
// Create a deduplicator that uses a hybrid approach
// Uses strict matcher if a contact has both empty email and phone arrays
// Otherwise uses the combined matcher for normal comparison
const hybridDeduplicator = new ContactDeduplicator({
fieldWeights: {
email: 0.8,
phone: 0.7,
firstName: 0.15,
lastName: 0.15,
}
}, hybridMatcher);Handling Empty Fields
The library provides different ways to handle contacts with missing or empty fields:
Default behavior: By default, contacts with empty arrays (e.g.,
email: []) are considered a perfect match for that field if both contacts have empty arrays.Using strictMatcher: The
strictMatchertreats fields with high weights (> 0.5) as required fields. If any required field is an empty array in either contact, they will not be matched, regardless of other field similarities.
// Example: Contacts with empty email arrays will not match even if names match perfectly
const strictDeduplicator = new ContactDeduplicator({
fieldWeights: {
email: 0.8, // Email is required (weight > 0.5)
phone: 0.7, // Phone is required (weight > 0.5)
firstName: 0.15,
lastName: 0.15,
}
}, strictMatcher);- Using hybridMatcher: The
hybridMatcherprovides a balanced approach by using thestrictMatcheronly when a contact has both empty email and phone arrays. If at least one of these fields has values, it falls back to the regularcombinedMatcher.
// Example: Using the hybrid approach
const hybridDeduplicator = new ContactDeduplicator({
fieldWeights: {
email: 0.8,
phone: 0.7,
firstName: 0.15,
lastName: 0.15,
}
}, hybridMatcher);This approach prevents matching contacts that have no identifying information (no email or phone) while still allowing matches when at least one of these fields is present.
Field Mapping for Different Contact Schemas
The library includes utilities for mapping contacts from different schemas:
import { mapExternalContact, Contact } from 'contact-deduplication-lib';
// External contact with different field names
const externalContact = {
'First Name': 'John',
'Last Name': 'Smith',
'Public Email': '[email protected]',
'Personal Email': '[email protected]',
};
// Map to our internal schema
const mappedContact = mapExternalContact(externalContact, {
firstName: 'First Name',
lastName: 'Last Name',
emails: ['Public Email', 'Personal Email'],
});
// Now you can use the mapped contact with the deduplicator
const deduplicator = new ContactDeduplicator();
const result = deduplicator.findDuplicates([existingContact, mappedContact]);For more complex mapping scenarios:
// External contact with custom field names
const externalContact = {
contactId: '123',
contactFirstName: 'Jane',
contactLastName: 'Doe',
primaryEmail: '[email protected]',
secondaryEmail: '[email protected]',
workPhone: '555-123-4567',
mobilePhone: '555-987-6543',
organization: 'Acme Inc',
position: 'Software Engineer',
dateCreated: '2023-01-15T00:00:00Z',
dateModified: '2023-02-20T00:00:00Z',
};
// Define custom field mapping
const fieldMapping = {
id: 'contactId',
firstName: 'contactFirstName',
lastName: 'contactLastName',
emails: ['primaryEmail', 'secondaryEmail'],
phones: ['workPhone', 'mobilePhone'],
company: 'organization',
jobTitle: 'position',
createdAt: 'dateCreated',
updatedAt: 'dateModified',
};
// Map the contact
const mappedContact = mapExternalContact(externalContact, fieldMapping);API Reference
ContactDeduplicator
The main class for finding and merging duplicate contacts.
Constructor
constructor(
options?: Partial<DeduplicationOptions>,
matcher?: ContactMatcher
)Methods
findDuplicates(contacts: Contact[]): DeduplicationResult- Finds duplicate contactsmergeDuplicateGroups(duplicateGroups: Contact[][]): Contact[]- Merges groups of duplicate contactssetOptions(options: Partial<DeduplicationOptions>): void- Updates the deduplication optionssetMatcher(matcher: ContactMatcher): void- Sets a new matcher function
Field Mapping
mapExternalContact
function mapExternalContact(
externalContact: ExternalContact,
fieldMapping: FieldMapping
): ContactMaps an external contact with any schema to our internal Contact type.
FieldMapping Interface
interface FieldMapping {
id?: string;
firstName?: string;
lastName?: string;
emails?: string[];
phones?: string[];
company?: string;
jobTitle?: string;
addressFields?: {
street?: string;
city?: string;
state?: string;
postalCode?: string;
country?: string;
type?: string;
};
createdAt?: string;
updatedAt?: string;
}Types
Contact
interface Contact {
id: string;
firstName?: string;
lastName?: string;
email?: string[];
phone?: string[];
address?: Address[];
company?: string;
jobTitle?: string;
notes?: string;
createdAt: Date;
updatedAt: Date;
[key: string]: any; // Additional properties
}DeduplicationOptions
interface DeduplicationOptions {
threshold: number; // 0.0 to 1.0
fieldsToCompare: (keyof Contact)[];
autoMerge: boolean;
fieldWeights?: Record<keyof Contact, number>;
}DeduplicationResult
interface DeduplicationResult {
duplicateGroups: Contact[][]; // Groups of duplicate contacts
uniqueContacts: Contact[]; // Contacts with no duplicates
mergedContacts?: Contact[]; // Merged contacts (if autoMerge is true)
}Development
Setup
# Clone the repository
git clone https://github.com/yourusername/contact-deduplication-lib.git
cd contact-deduplication-lib
# Install dependencies
pnpm install
# Build the library
pnpm build
# Run tests
pnpm testWebpack Integration
If you're using webpack, the library should work out of the box with the following configuration:
// webpack.config.js
module.exports = {
// ... your other webpack configuration
resolve: {
extensions: ['.ts', '.tsx', '.js', '.jsx'],
},
module: {
rules: [
{
test: /\.tsx?$/,
use: 'ts-loader',
exclude: /node_modules/,
},
// If you still encounter issues, you can add this rule:
{
test: /\.m?js$/,
include: /node_modules\/contact-deduplication-lib/,
use: {
loader: 'babel-loader',
options: {
presets: ['@babel/preset-env']
}
}
}
],
},
};Scripts
pnpm build- Build the librarypnpm dev- Build with watch modepnpm test- Run testspnpm test:watch- Run tests in watch modepnpm lint- Type check without emitting filespnpm clean- Remove build artifacts
License
ISC
