@hivehub/vectorizer-sdk

v2.2.0

Published

2 months ago

TypeScript SDK for Vectorizer - High-performance vector database

0High
0Medium
0Low

caikpigosso

andrehrf

vector-database semantic-search embeddings hnsw similarity-search typescript hivehub

Vectorizer TypeScript SDK

High-performance TypeScript SDK for Vectorizer vector database.

Package: @hivehub/vectorizer-sdk
Version: 2.2.0

Features

✅ Complete TypeScript Support: Full type safety and IntelliSense
✅ Async/Await: Modern async programming patterns
✅ Multiple Transport Protocols: HTTP/HTTPS and UMICP support
✅ HTTP Client: Native fetch-based HTTP client with robust error handling
✅ UMICP Protocol: High-performance protocol with compression and encryption
✅ Comprehensive Validation: Input validation and error handling
✅ 12 Custom Exceptions: Robust error management
✅ Logging: Configurable logging system
✅ Collection Management: CRUD operations for collections
✅ Vector Operations: Insert, search, update, delete vectors
✅ Semantic Search: Text and vector similarity search
✅ Intelligent Search: AI-powered search with query expansion, MMR diversification, and domain expansion
✅ Semantic Search: Advanced semantic search with reranking and similarity thresholds
✅ Contextual Search: Context-aware search with metadata filtering
✅ Multi-Collection Search: Cross-collection search with intelligent aggregation
✅ Hybrid Search: Combine dense and sparse vectors for improved search quality
✅ Discovery Operations: Collection filtering, query expansion, and intelligent discovery
✅ File Operations: File content retrieval, chunking, project outlines, and related files
✅ Graph Relationships: Automatic relationship discovery, path finding, and edge management
✅ Summarization: Text and context summarization with multiple methods
✅ Workspace Management: Multi-workspace support for project organization
✅ Backup & Restore: Collection backup and restore operations
✅ Batch Operations: Efficient bulk insert, update, delete, and search
✅ Qdrant Compatibility: Full Qdrant 1.14.x REST API compatibility for easy migration
- Snapshots API (create, list, delete, recover)
- Sharding API (create shard keys, distribute data)
- Cluster Management API (status, recovery, peer management, metadata)
- Query API (query, batch query, grouped queries with prefetch)
- Search Groups and Matrix API (grouped results, similarity matrices)
- Named Vectors support (partial)
- Quantization configuration (PQ and Binary)
✅ Embedding Generation: Text embedding support

Installation

npm install @hivehub/vectorizer-sdk

# Or specific version
npm install @hivehub/[email protected]

Quick Start

import { VectorizerClient } from "@hivehub/vectorizer-sdk";

// Create client
const client = new VectorizerClient({
  baseURL: "http://localhost:15001",
  apiKey: "your-api-key-here",
});

// Health check
const health = await client.healthCheck();
console.log("Server status:", health.status);

// Create collection
const collection = await client.createCollection({
  name: "documents",
  dimension: 768,
  similarity_metric: "cosine",
});

// Insert vectors
const vectors = [
  {
    data: [0.1, 0.2, 0.3 /* ... 768 dimensions */],
    metadata: { source: "document1.pdf" },
  },
];

await client.insertVectors("documents", vectors);

// Search vectors
const results = await client.searchVectors("documents", {
  query_vector: [0.1, 0.2, 0.3 /* ... 768 dimensions */],
  limit: 5,
});

// Text search
const textResults = await client.searchText("documents", {
  query: "machine learning algorithms",
  limit: 5,
});

// Generate embeddings
const embedding = await client.embedText({
  text: "machine learning algorithms",
});

// Hybrid search (dense + sparse vectors)
const hybridResults = await client.hybridSearch({
  collection: "documents",
  query: "machine learning",
  query_sparse: {
    indices: [0, 5, 10, 15],
    values: [0.8, 0.6, 0.9, 0.7],
  },
  alpha: 0.7,
  algorithm: "rrf",
  dense_k: 20,
  sparse_k: 20,
  final_k: 10,
});

// Graph Operations (requires graph enabled in collection config)
// List all graph nodes
const nodes = await client.listGraphNodes("documents");
console.log(`Graph has ${nodes.count} nodes`);

// Get neighbors of a node
const neighbors = await client.getGraphNeighbors("documents", "document1");
console.log(`Node has ${neighbors.neighbors.length} neighbors`);

// Find related nodes within 2 hops
const related = await client.findRelatedNodes("documents", "document1", {
  max_hops: 2,
  relationship_type: "SIMILAR_TO",
});
console.log(`Found ${related.related.length} related nodes`);

// Find shortest path between two nodes
const path = await client.findGraphPath({
  collection: "documents",
  source: "document1",
  target: "document2",
});
if (path.found) {
  console.log(`Path found: ${path.path.map(n => n.id).join(" -> ")}`);
}

// Create explicit relationship
const edge = await client.createGraphEdge({
  collection: "documents",
  source: "document1",
  target: "document2",
  relationship_type: "REFERENCES",
  weight: 0.9,
});
console.log(`Created edge: ${edge.edge_id}`);

// Discover SIMILAR_TO edges for entire collection
const discoveryResult = await client.discoverGraphEdges("documents", {
  similarity_threshold: 0.7,
  max_per_node: 10,
});
console.log(`Discovered ${discoveryResult.edges_created} edges`);

// Discover edges for a specific node
const nodeDiscovery = await client.discoverGraphEdgesForNode(
  "documents",
  "document1",
  {
    similarity_threshold: 0.7,
    max_per_node: 10,
  }
);
console.log(`Discovered ${nodeDiscovery.edges_created} edges for node`);

// Get discovery status
const status = await client.getGraphDiscoveryStatus("documents");
console.log(
  `Discovery status: ${status.total_nodes} nodes, ` +
    `${status.total_edges} edges, ` +
    `${status.progress_percentage.toFixed(1)}% complete`
);

// Qdrant-compatible API usage
const qdrantCollections = await client.qdrantListCollections();
const qdrantResults = await client.qdrantSearchPoints(
  "documents",
  embedding.embedding,
  10
);

Configuration

HTTP Configuration (Default)

const client = new VectorizerClient({
  baseURL: "http://localhost:15002", // API base URL
  apiKey: "your-api-key", // API key for authentication
  timeout: 30000, // Request timeout in ms
  headers: {
    // Custom headers
    "User-Agent": "MyApp/1.0",
  },
  logger: {
    // Logger configuration
    level: "info", // debug, info, warn, error
    enabled: true,
  },
});

UMICP Configuration (High Performance)

UMICP (Universal Messaging and Inter-process Communication Protocol) provides significant performance benefits:

Automatic Compression: GZIP, DEFLATE, or LZ4 compression for large payloads
Built-in Encryption: Optional encryption for secure communication
Lower Latency: Optimized binary protocol with checksums
Request Validation: Automatic request/response validation

Using Connection String

const client = new VectorizerClient({
  connectionString: "umicp://localhost:15003",
  apiKey: "your-api-key",
});

Using Explicit Configuration

const client = new VectorizerClient({
  protocol: "umicp",
  apiKey: "your-api-key",
  umicp: {
    host: "localhost",
    port: 15003,
    compression: "gzip", // 'gzip', 'deflate', 'lz4', or 'none'
    encryption: true, // Enable encryption
    priority: "normal", // 'low', 'normal', 'high'
  },
});

When to Use UMICP

Use UMICP when:

Large Payloads: Inserting or searching large batches of vectors
High Throughput: Need maximum performance for production workloads
Secure Communication: Require encryption without TLS overhead
Low Latency: Need minimal protocol overhead

Use HTTP when:

Development: Quick testing and debugging
Firewall Restrictions: Only HTTP/HTTPS allowed
Simple Deployments: No need for custom protocol setup

Protocol Comparison

| Feature | HTTP/HTTPS | UMICP | | ----------- | -------------------- | ---------------------------- | | Compression | Manual (gzip header) | Automatic (GZIP/DEFLATE/LZ4) | | Encryption | TLS required | Built-in optional | | Latency | Standard | Lower | | Firewall | Widely supported | May require configuration | | Debugging | Easy (browser tools) | Requires UMICP tools |

Master/Slave Configuration (Read/Write Separation)

Vectorizer supports Master-Replica replication for high availability and read scaling. The SDK provides automatic routing - writes go to master, reads are distributed across replicas.

Basic Setup

import { VectorizerClient } from "@hivehub/vectorizer-sdk";

// Configure with master and replicas - SDK handles routing automatically
const client = new VectorizerClient({
  hosts: {
    master: "http://master-node:15001",
    replicas: ["http://replica1:15001", "http://replica2:15001"],
  },
  apiKey: "your-api-key",
  readPreference: "replica", // "master" | "replica" | "nearest"
});

// Writes automatically go to master
await client.createCollection({
  name: "documents",
  dimension: 768,
  similarity_metric: "cosine",
});

await client.insertTexts("documents", [
  { id: "doc1", text: "Sample document", metadata: { source: "api" } },
]);

// Reads automatically go to replicas (load balanced)
const results = await client.searchVectors("documents", {
  query: "sample",
  limit: 10,
});

const collections = await client.listCollections();

Read Preferences

| Preference | Description | Use Case | |------------|-------------|----------| | "replica" | Route reads to replicas (round-robin) | Default for high read throughput | | "master" | Route all reads to master | When you need read-your-writes consistency | | "nearest" | Route to the node with lowest latency | Geo-distributed deployments |

Read-Your-Writes Consistency

For operations that need to immediately read what was just written:

// Option 1: Override read preference for specific operation
await client.insertTexts("docs", [newDoc]);
const result = await client.getVector("docs", newDoc.id, { readPreference: "master" });

// Option 2: Use a transaction-like pattern
const result = await client.withMaster(async (masterClient) => {
  await masterClient.insertTexts("docs", [newDoc]);
  return await masterClient.getVector("docs", newDoc.id);
});

Automatic Operation Routing

The SDK automatically classifies operations:

| Operation Type | Routed To | Methods | |---------------|-----------|---------| | Writes | Always Master | insertTexts, insertVectors, updateVector, deleteVector, createCollection, deleteCollection | | Reads | Based on readPreference | searchVectors, getVector, listCollections, intelligentSearch, semanticSearch, hybridSearch |

Standalone Mode (Single Node)

For development or single-node deployments:

// Single node - no replication
const client = new VectorizerClient({
  baseURL: "http://localhost:15001",
  apiKey: "your-api-key",
});

API Reference

Collection Management

// List collections
const collections = await client.listCollections();

// Get collection info
const info = await client.getCollection("documents");

// Create collection
const collection = await client.createCollection({
  name: "documents",
  dimension: 768,
  similarity_metric: "cosine",
  description: "Document embeddings",
});

// Update collection
const updated = await client.updateCollection("documents", {
  description: "Updated description",
});

// Delete collection
await client.deleteCollection("documents");

Vector Operations

// Insert vectors
const vectors = [
  {
    data: [0.1, 0.2, 0.3],
    metadata: { source: "doc1.pdf" },
  },
];
await client.insertVectors("documents", vectors);

// Get vector
const vector = await client.getVector("documents", "vector-id");

// Update vector
const updated = await client.updateVector("documents", "vector-id", {
  metadata: { updated: true },
});

// Delete vector
await client.deleteVector("documents", "vector-id");

// Delete multiple vectors
await client.deleteVectors("documents", ["id1", "id2", "id3"]);

Search Operations

// Vector similarity search
const results = await client.searchVectors("documents", {
  query_vector: [0.1, 0.2, 0.3],
  limit: 10,
  threshold: 0.8,
  include_metadata: true,
});

// Text semantic search
const textResults = await client.searchText("documents", {
  query: "machine learning",
  limit: 10,
  threshold: 0.8,
  include_metadata: true,
  model: "bert-base",
});

Advanced Search Operations

Intelligent Search

AI-powered search with query expansion, MMR diversification, and domain expansion:

const results = await client.intelligentSearch({
  query: "machine learning algorithms",
  collections: ["documents", "research"],
  max_results: 15,
  domain_expansion: true,
  technical_focus: true,
  mmr_enabled: true,
  mmr_lambda: 0.7,
});

Semantic Search

Advanced semantic search with reranking and similarity thresholds:

const results = await client.semanticSearch({
  query: "neural networks",
  collection: "documents",
  max_results: 10,
  semantic_reranking: true,
  similarity_threshold: 0.6,
});

Contextual Search

Context-aware search with metadata filtering:

const results = await client.contextualSearch({
  query: "API documentation",
  collection: "docs",
  context_filters: {
    category: "backend",
    language: "typescript",
  },
  max_results: 10,
});

Multi-Collection Search

Cross-collection search with intelligent aggregation:

const results = await client.multiCollectionSearch({
  query: "authentication",
  collections: ["docs", "code", "tickets"],
  max_total_results: 20,
  max_per_collection: 5,
  cross_collection_reranking: true,
});

Discovery Operations

Filter Collections

Filter collections based on query relevance:

const filtered = await client.filterCollections({
  query: "machine learning",
  min_score: 0.5,
});

Expand Queries

Expand queries with related terms:

const expanded = await client.expandQueries({
  query: "neural networks",
  max_expansions: 5,
});

Discover

Intelligent discovery across collections:

const discovery = await client.discover({
  query: "authentication methods",
  max_results: 10,
});

File Operations

Get File Content

Retrieve file content from collection:

const content = await client.getFileContent({
  collection: "docs",
  file_path: "src/client.ts",
});

List Files

List all files in a collection:

const files = await client.listFilesInCollection({
  collection: "docs",
});

Get File Chunks

Get ordered chunks of a file:

const chunks = await client.getFileChunksOrdered({
  collection: "docs",
  file_path: "README.md",
  chunk_size: 1000,
});

Get Project Outline

Get project structure outline:

const outline = await client.getProjectOutline({
  collection: "codebase",
});

Get Related Files

Find files related to a specific file:

const related = await client.getRelatedFiles({
  collection: "codebase",
  file_path: "src/client.ts",
  max_results: 5,
});

Summarization Operations

Summarize Text

Summarize text using various methods:

const summary = await client.summarizeText({
  text: "Long document text...",
  method: "extractive", // 'extractive', 'abstractive', 'hybrid'
  max_length: 200,
});

Summarize Context

Summarize context with metadata:

const summary = await client.summarizeContext({
  context: "Document context...",
  method: "abstractive",
  focus: "key_points",
});

Workspace Management

Add Workspace

Add a new workspace:

await client.addWorkspace({
  name: "my-project",
  path: "/path/to/project",
});

List Workspaces

List all workspaces:

const workspaces = await client.listWorkspaces();

Remove Workspace

Remove a workspace:

await client.removeWorkspace({
  name: "my-project",
});

Backup Operations

Create Backup

Create a backup of collections:

const backup = await client.createBackup({
  name: "backup-2024-11-24",
});

List Backups

List all available backups:

const backups = await client.listBackups();

Restore Backup

Restore from a backup:

await client.restoreBackup({
  filename: "backup-2024-11-24.vecdb",
});

Embedding Operations

// Generate embeddings
const embedding = await client.embedText({
  text: "machine learning algorithms",
  model: "bert-base",
  parameters: {
    max_length: 512,
    normalize: true,
  },
});

Error Handling

import {
  VectorizerError,
  AuthenticationError,
  CollectionNotFoundError,
  ValidationError,
  NetworkError,
  ServerError,
} from "@hivehub/vectorizer-sdk";

try {
  await client.createCollection({
    name: "documents",
    dimension: 768,
  });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error("Authentication failed:", error.message);
  } else if (error instanceof ValidationError) {
    console.error("Validation error:", error.message);
  } else if (error instanceof NetworkError) {
    console.error("Network error:", error.message);
  } else {
    console.error("Unknown error:", error.message);
  }
}

Types

// Vector types
interface Vector {
  id: string;
  data: number[];
  metadata?: Record<string, unknown>;
}

// Collection types
interface Collection {
  name: string;
  dimension: number;
  similarity_metric: "cosine" | "euclidean" | "dot_product";
  description?: string;
  created_at?: Date;
  updated_at?: Date;
}

// Search result types
interface SearchResult {
  id: string;
  score: number;
  data: number[];
  metadata?: Record<string, unknown>;
}

// Client configuration
interface VectorizerClientConfig {
  baseURL?: string;
  wsURL?: string;
  apiKey?: string;
  timeout?: number;
  headers?: Record<string, string>;
  logger?: LoggerConfig;
}

Development

# Install dependencies
npm install

# Build
npm run build

# Watch mode
npm run build:watch

# Test
npm test

# Test with coverage
npm run test:coverage

# Lint
npm run lint

# Lint and fix
npm run lint:fix

License

MIT License - see LICENSE for details.