@ralalabs/embed-js

v0.1.2

Published

2 months ago

Client for Text Embeddings Inference (TEI) and E5 query/passage helpers.

Downloads

0High
0Medium
0Low

llaurentiu-npm

embeddings tei text-embeddings-inference huggingface e5 bge bge-m3 multilingual vector search similarity

@ralalabs/embed-js

Fast, tiny, zero-dependency TypeScript client for Text Embeddings Inference (TEI).

@ralalabs/embed-js is model-agnostic by default through embed() and embedOne(), and also includes optional E5-style helpers for query: / passage: workflows.

Features

zero runtime dependencies
TypeScript-first API
single and batch embedding methods
timeout and retry handling
E5 helper methods for query/passage pipelines

Install

npm install @ralalabs/embed-js
# or
pnpm add @ralalabs/embed-js

Quick start

import { EmbeddingsClient } from '@ralalabs/embed-js';

const client = new EmbeddingsClient('http://localhost:5625');

const vector = await client.embedOne('Sherlock | ...');

API

Core methods

These methods are model-agnostic and do not transform the input.

embed(texts: string | string[], opts?: EmbedOptions): Promise<number[][]>
embedOne(text: string, opts?: EmbedOptions): Promise<number[]>
info(): Promise<TeiModelInfo>
health(): Promise<boolean>

E5 helper methods

These helpers preserve the E5 query: / passage: convention.

embedQuery(text: string, opts?: EmbedOptions): Promise<number[]>
embedQueries(text: string | string[], opts?: EmbedOptions): Promise<number[] | number[][]>
embedPassage(text: string, opts?: EmbedOptions): Promise<number[]>
embedPassages(text: string | string[], opts?: EmbedOptions): Promise<number[] | number[][]>

Model conventions

Plain embedding workflow

For models that embed raw text directly, use the core methods:

const similarityVec = await client.embedOne(similarityText);
const searchVec = await client.embedOne(searchText);
const queryVec = await client.embedOne(userInput);

This is the default, model-agnostic way to use the library.

E5 workflow

For E5-style models, the following conventions apply:

| Use case | Prefix | | --- | --- | | query / similarity text | query: | | indexed retrieval passage | passage: |

The helper methods apply these prefixes automatically:

const similarityVec = await client.embedQuery(similarityText);
const searchVec = await client.embedPassage(searchText);
const queryVec = await client.embedQuery(userInput);

Examples

Batch embedding

const client = new EmbeddingsClient('http://localhost:5625');

const vectors = await client.embed([
  'Sherlock',
  'House M.D',
]);

E5 batch helpers

const queryVectors = await client.embedQueries([
  'crime thriller',
  'medical drama',
]);

const passageVectors = await client.embedPassages([
  'Breaking Bad | crime drama | chemistry teacher becomes meth producer',
  'House M.D. | medical drama | brilliant diagnostician solves unusual cases',
]);

Service endpoints

const client = new EmbeddingsClient('http://localhost:5625');

await client.health();
const info = await client.info();
console.log(info)

Options

const client = new EmbeddingsClient('http://localhost:5625', {
  timeout: 30_000,
  retries: 2,
  headers: {
    Authorization: 'Bearer token',
  },
});

Each embed method also accepts optional embed settings:

const vector = await client.embedOne('very long text', {
  truncate: true,
});

Exports

import {
  EmbeddingsClient,
  E5_QUERY_PREFIX,
  E5_PASSAGE_PREFIX,
} from '@ralalabs/embed-js';

import type {
  EmbeddingsClientOptions,
  EmbedOptions,
  TeiModelInfo,
} from '@ralalabs/embed-js';

Running TEI

Example docker-compose.yml:

embeddings:
  image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.9
  container_name: embeddings-api
  ports:
    - '5625:5556'
  command:
    - --model-id
    - BAAI/bge-m3
    - --port
    - '5556'
  healthcheck:
    test: ['CMD', 'curl', '-f', 'http://localhost:5556/health']
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 60s
  volumes:
    - embeddings-cache:/data
  restart: unless-stopped
  mem_limit: 2g
  cpus: 2

volumes:
  embeddings-cache:

docker compose up -d embeddings

Verify:

curl http://localhost:5625/health
curl http://localhost:5625/info

Notes

Use the same embedding model family for indexing and querying.
Use embed() / embedOne() for plain model workflows.
Use embedQuery() / embedPassage() only when your model or pipeline expects E5-style prefixes.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@ralalabs/embed-js

Features

Install

Quick start

API

Core methods

E5 helper methods

Model conventions

Plain embedding workflow

E5 workflow

Examples

Batch embedding

E5 batch helpers

Service endpoints

Options

Exports

Running TEI

Notes

License