astrolex
v1.0.0
Published
Fuzzy keyword extraction from a controlled vocabulary
Maintainers
Readme
astrolex
A small, framework-agnostic Node.js package in TypeScript that provides fuzzy keyword extraction from user input, using a predefined dictionary of known terms.
Features
- Multi-strategy matching: Exact match, startsWith, contains, and fuzzy Levenshtein distance matching
- Normalization: Handles accents/diacritics, case-insensitive matching, and common separators
- Multi-language support: Works with French and English keywords (and can be extended)
- Factory pattern: Create an engine once with a dictionary, then parse multiple inputs efficiently
- Type-safe: Full TypeScript support with strict types
Installation
npm install astrolexUsage
Basic Example
import { createSearchEngine } from "astrolex";
const engine = createSearchEngine(["tomate", "salade", "pomme", "tomato", "apple"]);
const result = engine.parse("saladee de pome");
console.log(result.bestKeywords); // ["salade", "pomme"]
// Access detailed match information
result.matches.forEach((match) => {
if (match.matched) {
console.log(`${match.token.raw} -> ${match.keyword} (score: ${match.score}, strategy: ${match.strategy})`);
}
});With Options
const engine = createSearchEngine(
["tomate", "salade", "pomme"],
{
maxDistance: 2, // Maximum Levenshtein distance (default: 2)
minSimilarityScore: 0.5, // Minimum similarity score 0..1 (default: 0.5)
languageHint: "fr" // Optional hint (currently not used, reserved for future)
}
);API
createSearchEngine(keywords: string[], options?: SearchEngineOptions): SearchEngine
Creates a search engine instance with a predefined dictionary of keywords.
Parameters:
keywords: Array of known keywords (French or English)options: Optional configuration (seeSearchEngineOptionsbelow)
Returns: A SearchEngine instance
SearchEngine.parse(input: string): ParseOutput
Parses user input and returns structured match results.
Returns:
{
input: string; // Original input
tokens: ParsedToken[]; // Tokenized input
matches: MatchResult[]; // Match results for each token
bestKeywords: string[]; // Unique matched keywords, sorted by score (desc)
}SearchEngine.getKeywords(): string[]
Returns the list of original keywords provided at construction.
Matching Strategies
The engine tries multiple strategies in order for each token:
- Exact match: Normalized token exactly equals a keyword
- StartsWith: Keyword starts with the token (useful for abbreviations like "tom" -> "tomate")
- Contains: Keyword contains the token
- Fuzzy/Levenshtein: Uses edit distance to catch typos (e.g., "saladee" -> "salade")
The best matching keyword (highest score) is selected for each token.
Integration in a Dockerized Backend
This package is designed to be used as a dependency in your backend application. It runs inside your backend container, not in its own container.
Example Dockerfile
FROM node:20-alpine
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies (including this package)
RUN npm install
# Copy your application code
COPY . .
# Build your application
RUN npm run build
# Run your application
CMD ["node", "dist/main.js"]Usage in NestJS
// app.module.ts
import { Module } from '@nestjs/common';
import { createSearchEngine } from 'astrolex';
@Module({
providers: [
{
provide: 'KEYWORD_ENGINE',
useFactory: () => createSearchEngine(['tomate', 'salade', 'pomme']),
},
],
})
export class AppModule {}// some.service.ts
import { Inject, Injectable } from '@nestjs/common';
import { SearchEngine } from 'astrolex';
@Injectable()
export class SomeService {
constructor(@Inject('KEYWORD_ENGINE') private engine: SearchEngine) {}
searchUserInput(input: string) {
return this.engine.parse(input);
}
}Development
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
# Watch mode for tests
npm run test:watchLicense
MIT
