astrolex

v1.0.0

Published

2 months ago

Fuzzy keyword extraction from a controlled vocabulary

0High
0Medium
0Low

claquettes

fuzzy keyword extraction levenshtein search matching

astrolex

A small, framework-agnostic Node.js package in TypeScript that provides fuzzy keyword extraction from user input, using a predefined dictionary of known terms.

Features

Multi-strategy matching: Exact match, startsWith, contains, and fuzzy Levenshtein distance matching
Normalization: Handles accents/diacritics, case-insensitive matching, and common separators
Multi-language support: Works with French and English keywords (and can be extended)
Factory pattern: Create an engine once with a dictionary, then parse multiple inputs efficiently
Type-safe: Full TypeScript support with strict types

Installation

npm install astrolex

Usage

Basic Example

import { createSearchEngine } from "astrolex";

const engine = createSearchEngine(["tomate", "salade", "pomme", "tomato", "apple"]);

const result = engine.parse("saladee de pome");
console.log(result.bestKeywords); // ["salade", "pomme"]

// Access detailed match information
result.matches.forEach((match) => {
  if (match.matched) {
    console.log(`${match.token.raw} -> ${match.keyword} (score: ${match.score}, strategy: ${match.strategy})`);
  }
});

With Options

const engine = createSearchEngine(
  ["tomate", "salade", "pomme"],
  {
    maxDistance: 2,           // Maximum Levenshtein distance (default: 2)
    minSimilarityScore: 0.5,  // Minimum similarity score 0..1 (default: 0.5)
    languageHint: "fr"        // Optional hint (currently not used, reserved for future)
  }
);

API

`createSearchEngine(keywords: string[], options?: SearchEngineOptions): SearchEngine`

Creates a search engine instance with a predefined dictionary of keywords.

Parameters:

keywords: Array of known keywords (French or English)
options: Optional configuration (see SearchEngineOptions below)

Returns: A SearchEngine instance

`SearchEngine.parse(input: string): ParseOutput`

Parses user input and returns structured match results.

Returns:

{
  input: string;              // Original input
  tokens: ParsedToken[];      // Tokenized input
  matches: MatchResult[];     // Match results for each token
  bestKeywords: string[];     // Unique matched keywords, sorted by score (desc)
}

`SearchEngine.getKeywords(): string[]`

Returns the list of original keywords provided at construction.

Matching Strategies

The engine tries multiple strategies in order for each token:

Exact match: Normalized token exactly equals a keyword
StartsWith: Keyword starts with the token (useful for abbreviations like "tom" -> "tomate")
Contains: Keyword contains the token
Fuzzy/Levenshtein: Uses edit distance to catch typos (e.g., "saladee" -> "salade")

The best matching keyword (highest score) is selected for each token.

Integration in a Dockerized Backend

This package is designed to be used as a dependency in your backend application. It runs inside your backend container, not in its own container.

Example Dockerfile

FROM node:20-alpine
WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies (including this package)
RUN npm install

# Copy your application code
COPY . .

# Build your application
RUN npm run build

# Run your application
CMD ["node", "dist/main.js"]

Usage in NestJS

// app.module.ts
import { Module } from '@nestjs/common';
import { createSearchEngine } from 'astrolex';

@Module({
  providers: [
    {
      provide: 'KEYWORD_ENGINE',
      useFactory: () => createSearchEngine(['tomate', 'salade', 'pomme']),
    },
  ],
})
export class AppModule {}

// some.service.ts
import { Inject, Injectable } from '@nestjs/common';
import { SearchEngine } from 'astrolex';

@Injectable()
export class SomeService {
  constructor(@Inject('KEYWORD_ENGINE') private engine: SearchEngine) {}

  searchUserInput(input: string) {
    return this.engine.parse(input);
  }
}

Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Watch mode for tests
npm run test:watch

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

astrolex

Features

Installation

Usage

Basic Example

With Options

API

createSearchEngine(keywords: string[], options?: SearchEngineOptions): SearchEngine

SearchEngine.parse(input: string): ParseOutput

SearchEngine.getKeywords(): string[]

Matching Strategies

Integration in a Dockerized Backend

Example Dockerfile

Usage in NestJS

Development

License

`createSearchEngine(keywords: string[], options?: SearchEngineOptions): SearchEngine`

`SearchEngine.parse(input: string): ParseOutput`

`SearchEngine.getKeywords(): string[]`