ndx-compat

v1.0.0

Published

2 years ago

Backward-compatible API for a lightweight full-text searching and indexing library ndx.

Downloads

0High
0Medium
0Low

localvoid

ndx

ndx ·

Backward-compatible API for ndx < 1.0.0.

Documentation

Creating a new Document Index

DocumentIndex<I, D>(options?: DocumentIndexOptions)

Document Index is a main object that stores all internal statistics and Inverted Index for documents.

Parametric Types

I is a type of document IDs.
D is a type of documents.

Options

/**
 * BM25 Ranking function constants.
 */
interface BM25Options {
  /**
   * Controls non-linear term frequency normalization (saturation).
   *
   * Default value: 1.2
   */
  k1?: number;

  /**
   * Controls to what degree document length normalizes tf values.
   *
   * Default value: 0.75
   */
  b?: number;
}

interface DocumentIndexOptions {
  /**
   * Tokenizer is a function that breaks a text into words, phrases, symbols, or other meaningful elements called
   * tokens.
   *
   * Default tokenizer breaks words on spaces, tabs, line feeds and assumes that contiguous nonwhitespace characters
   * form a single token.
   */
  tokenizer?: (query: string) => string[];
  /**
   * Filter is a function that processes tokens and returns terms, terms are used in Inverted Index to index documents.
   *
   * Default filter transforms all characters to lower case and removes all non-word characters at the beginning and
   * the end of a term.
   */
  filter?: (term: string) => string;
  /**
   * BM25 Ranking function constants.
   */
  bm25?: BM25Options;
}

Example

/**
 * Creating a simple index with default options.
 */
const default = new DocumentIndex();

/**
 * Creating an index with changed BM25 constants.
 */
const index = new DocumentIndex({
  bm25: {
    k1: 1.3,
    b: 0.8,
  },
});

Adding a text field to an index

addField(fieldName: string, options?: FieldOptions) => void

The first step after creating a document index should be registering all text fields. Document Index is indexing only registered text fields in documents.

Each field can have its own score boosting factor, score boosting factor will be used to boost score when ranking documents.

Options

interface FieldOptions<D> {
  /**
   * Getter is a function that will be used to get value for this field. If getter function isn't specified, field name
   * will be used to get value.
   */
  getter: (doc: D) => string;

  /**
   * Score boosting factor.
   */
  boost: number;
}

Example

const index = new DocumentIndex();

/**
 * Add a "title" field with a score boosting factor "1.4".
 */
index.addField("title", { boost: 1.4 });

/**
 * Add a "descriptionr" field.
 */
index.addField("description");

/**
 * Add a "body" field with a custom getter.
 */
index.addField("body", { getter: (doc) => doc.body });

Adding a document to an index

When all fields are registered, tokenizer and filter is set, Document Index is ready to index documents.

Document Index doesn't store documents internally to reduce memory footprint, all results will contain a docId that is associated with an added document.

add(docId: I, document: D) => void

Example

const index = new DocumentIndex();

/**
 * Add a "content" field.
 */
index.addField("content");

const doc = {
  "id": "12345",
  "content": "Lorem ipsum",
};

/**
 * Add a document with `doc.id` to an index.
 */
index.add(doc.id, doc);

Removing a document from an index

Remove method requires to know just document id to remove it from an index. When document is removed from an index, it is actually marked as removed, and when searching all removed documents will be ignored. vacuum() method is used to completely remove all data from an index.

remove(docId: I) => void

Example

const index = new DocumentIndex();

/**
 * Add a "content" field.
 */
index.addField("content");

const doc = {
  "id": "12345",
  "content": "Lorem ipsum",
};

/**
 * Add a document with `doc.id` to an index.
 */
index.add(doc.id, doc);

/**
 * Remove a document from an index.
 */
index.remove(doc.id);

Search with a free text query

Perform a search query with a free text, query will be preprocessed in the same way as all text fields with a registered tokenizer and filter. Each token separator will work as a disjunction operator. All terms will be expanded to find more documents, documents with expanded terms will have a lower score than documents with exact terms.

search(query: string) => SearchResult[]

interface SearchResult<I> {
  docId: I;
  score: number;
}

Example

const index = new DocumentIndex();

/**
 * Add a "content" field.
 */
index.addField("content");

const doc1 = {
  "id": "1",
  "content": "Lorem ipsum dolor",
};
const doc2 = {
  "id": "2",
  "content": "Lorem ipsum",
};

/**
 * Add two documents to an index.
 */
index.add(doc1.id, doc1);
index.add(doc2.id, doc2);

/**
 * Perform a search query.
 */
index.search("Lorem");
// => [{ docId: "2" , score: ... }, { docId: "1", score: ... } ]
//
// document with an id `"2"` is ranked higher because it has a `"content"` field with a less number of terms than
// document with an id `"1"`.

index.search("dolor");
// => [{ docId: "1" }]

Extending a term

Extend a term with all possible combinations starting from a term that is registered in an index.

extendTerm(term: string) => string[]

Example

const index = new DocumentIndex();

index.addField("content");

const doc1 = {
  "id": "1",
  "content": "abc abcde",
};
const doc2 = {
  "id": "2",
  "content": "ab de",
};

index.add(doc1.id, doc1);
index.add(doc2.id, doc2);

/**
 * Extend a term with all possible combinations starting from `"a"`.
 */
index.extendTerm("a");
// => ["ab", "abc", "abcde"]

index.extendTerm("abc");
// => ["abc", "abcde"]

index.extendTerm("de");
// => ["de"]

Converting query to terms

Convert a query to an array of terms with the same tokenizer and filters that are used in a DocumentIndex.

Converting queries are useful for implementing search highlighting feature.

queryToTerms(query: string) => string[]

Example

const index = new DocumentIndex();

index.addField("content");

const doc1 = {
  "id": "1",
  "content": "abc abcde",
};
const doc2 = {
  "id": "2",
  "content": "ab de",
};

index.add(doc1.id, doc1);
index.add(doc2.id, doc2);

/**
 * Convert a query to an array of terms.
 */
index.queryToTerms("a d");
// => ["ab", "abc", "abcde", "de"]

Custom indexing options

The library doesn't serialize functions. So if you use custom tokenizer, filter, field.getter functions, then you will need to pass these functions to deserialize function call. See example in src/tests/serialization.spec.ts test file.

Vacuuming

Vacuuming is a process that will remove all outdated documents from an inverted index.

When search is performed, all outdated documents will be automatically removed for all terms generated from a search query.

vacuum() => void

Example

const index = new DocumentIndex();

index.addField("content");

const doc = {
  "id": "12345",
  "content": "Lorem ipsum",
};

index.add(doc.id, doc);
index.remove(doc.id);

/**
 * Perform a vacuuming, it will remove all removed documents from an Inverted Index.
 */
index.vacuum();

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ndx ·

Documentation

Creating a new Document Index

Parametric Types

Options

Example

Adding a text field to an index

Options

Example

Adding a document to an index

Example

Removing a document from an index

Example

Search with a free text query

Example

Extending a term

Example

Converting query to terms

Example

Custom indexing options

Vacuuming

Example