@hivehub/vectorizer-sdk-js

v2.2.0

Published

2 months ago

JavaScript SDK for Vectorizer - High-performance vector database

0High
0Medium
0Low

caikpigosso

andrehrf

vector-database semantic-search embeddings hnsw similarity-search javascript hivehub

Vectorizer JavaScript SDK

High-performance JavaScript SDK for Vectorizer vector database.

Package: @hivehub/vectorizer-sdk-js
Version: 2.2.0

Features

✅ Modern JavaScript: ES2020+ support with async/await
✅ Multiple Transport Protocols: HTTP/HTTPS and UMICP support
✅ HTTP Client: Native fetch-based HTTP client with robust error handling
✅ UMICP Protocol: High-performance protocol using @hivellm/umicp SDK
✅ Comprehensive Validation: Input validation with isFinite() checks for Infinity/NaN
✅ 12 Custom Exceptions: Robust error management with consistent error codes
✅ Logging: Configurable logging system
✅ Collection Management: CRUD operations for collections
✅ Vector Operations: Insert, search, update, delete vectors
✅ Semantic Search: Text and vector similarity search
✅ Intelligent Search: AI-powered search with query expansion, MMR diversification, and domain expansion
✅ Semantic Search: Advanced semantic search with reranking and similarity thresholds
✅ Contextual Search: Context-aware search with metadata filtering
✅ Multi-Collection Search: Cross-collection search with intelligent aggregation
✅ Hybrid Search: Combine dense and sparse vectors for improved search quality
✅ Discovery Operations: Collection filtering, query expansion, and intelligent discovery
✅ File Operations: File content retrieval, chunking, project outlines, and related files
✅ Graph Relationships: Automatic relationship discovery, path finding, and edge management
✅ Summarization: Text and context summarization with multiple methods
✅ Workspace Management: Multi-workspace support for project organization
✅ Backup & Restore: Collection backup and restore operations
✅ Batch Operations: Efficient bulk insert, update, delete, and search
✅ Qdrant Compatibility: Full Qdrant 1.14.x REST API compatibility for easy migration
- Snapshots API (create, list, delete, recover)
- Sharding API (create shard keys, distribute data)
- Cluster Management API (status, recovery, peer management, metadata)
- Query API (query, batch query, grouped queries with prefetch)
- Search Groups and Matrix API (grouped results, similarity matrices)
- Named Vectors support (partial)
- Quantization configuration (PQ and Binary)
✅ Embedding Generation: Text embedding support
✅ Multiple Build Formats: CommonJS, ES Modules, UMD
✅ 100% Test Coverage: Comprehensive test suite with all tests passing

Installation

npm install @hivehub/vectorizer-sdk-js

Quick Start

import { VectorizerClient } from '@hivehub/vectorizer-sdk-js';

// Create client
const client = new VectorizerClient({
  baseURL: 'http://localhost:15001',
  apiKey: 'your-api-key-here'
});

// Health check
const health = await client.healthCheck();
console.log('Server status:', health.status);

// Create collection
const collection = await client.createCollection({
  name: 'documents',
  dimension: 768,
  similarity_metric: 'cosine'
});

// Insert vectors
const vectors = [{
  data: [0.1, 0.2, 0.3, /* ... 768 dimensions */],
  metadata: { source: 'document1.pdf' }
}];

await client.insertVectors('documents', vectors);

// Search vectors
const results = await client.searchVectors('documents', {
  query_vector: [0.1, 0.2, 0.3, /* ... 768 dimensions */],
  limit: 5
});

// Text search
const textResults = await client.searchText('documents', {
  query: 'machine learning algorithms',
  limit: 5
});

// Intelligent search with multi-query expansion
const intelligentResults = await client.intelligentSearch({
  query: 'machine learning algorithms',
  collections: ['documents', 'research'],
  max_results: 15,
  domain_expansion: true,
  technical_focus: true,
  mmr_enabled: true,
  mmr_lambda: 0.7
});

// Semantic search with reranking
const semanticResults = await client.semanticSearch({
  query: 'neural networks',
  collection: 'documents',
  max_results: 10,
  semantic_reranking: true,
  similarity_threshold: 0.6
});

// Graph Operations (requires graph enabled in collection config)
// List all graph nodes
const nodes = await client.listGraphNodes('documents');
console.log(`Graph has ${nodes.count} nodes`);

// Get neighbors of a node
const neighbors = await client.getGraphNeighbors('documents', 'document1');
console.log(`Node has ${neighbors.neighbors.length} neighbors`);

// Find related nodes within 2 hops
const related = await client.findRelatedNodes('documents', 'document1', {
  max_hops: 2,
  relationship_type: 'SIMILAR_TO'
});
console.log(`Found ${related.related.length} related nodes`);

// Find shortest path between two nodes
const path = await client.findGraphPath({
  collection: 'documents',
  source: 'document1',
  target: 'document2'
});
if (path.found) {
  console.log(`Path found: ${path.path.map(n => n.id).join(' -> ')}`);
}

// Create explicit relationship
const edge = await client.createGraphEdge({
  collection: 'documents',
  source: 'document1',
  target: 'document2',
  relationship_type: 'REFERENCES',
  weight: 0.9
});
console.log(`Created edge: ${edge.edge_id}`);

// Discover SIMILAR_TO edges for entire collection
const discoveryResult = await client.discoverGraphEdges('documents', {
  similarity_threshold: 0.7,
  max_per_node: 10
});
console.log(`Discovered ${discoveryResult.edges_created} edges`);

// Discover edges for a specific node
const nodeDiscovery = await client.discoverGraphEdgesForNode(
  'documents',
  'document1',
  {
    similarity_threshold: 0.7,
    max_per_node: 10
  }
);
console.log(`Discovered ${nodeDiscovery.edges_created} edges for node`);

// Get discovery status
const status = await client.getGraphDiscoveryStatus('documents');
console.log(
  `Discovery status: ${status.total_nodes} nodes, ` +
  `${status.total_edges} edges, ` +
  `${status.progress_percentage.toFixed(1)}% complete`
);

// Contextual search with metadata filtering
const contextualResults = await client.contextualSearch({
  query: 'deep learning',
  collection: 'documents',
  context_filters: {
    category: 'AI',
    language: 'en',
    year: 2023
  },
  max_results: 10,
  context_weight: 0.4
});

// Multi-collection search
const multiResults = await client.multiCollectionSearch({
  query: 'artificial intelligence',
  collections: ['documents', 'research', 'tutorials'],
  max_per_collection: 5,
  max_total_results: 20,
  cross_collection_reranking: true
});

// Generate embeddings
const embedding = await client.embedText({
  text: 'machine learning algorithms'
});

// Hybrid search (dense + sparse vectors)
const hybridResults = await client.hybridSearch({
  collection: 'documents',
  query: 'machine learning',
  query_sparse: {
    indices: [0, 5, 10, 15],
    values: [0.8, 0.6, 0.9, 0.7]
  },
  alpha: 0.7,
  algorithm: 'rrf',
  dense_k: 20,
  sparse_k: 20,
  final_k: 10
});

// Qdrant-compatible API usage
const qdrantCollections = await client.qdrantListCollections();
const qdrantResults = await client.qdrantSearchPoints(
  'documents',
  embedding.embedding,
  10
);

Configuration

HTTP Configuration (Default)

const client = new VectorizerClient({
  baseURL: 'http://localhost:15002',     // API base URL
  apiKey: 'your-api-key',                // API key for authentication
  timeout: 30000,                        // Request timeout in ms
  headers: {                             // Custom headers
    'User-Agent': 'MyApp/1.0'
  },
  logger: {                              // Logger configuration
    level: 'info',                       // debug, info, warn, error
    enabled: true
  }
});

UMICP Configuration (High Performance)

UMICP (Universal Messaging and Inter-process Communication Protocol) provides performance benefits using the StreamableHTTP transport from the official SDK.

Using Connection String

const client = new VectorizerClient({
  connectionString: 'umicp://localhost:15003',
  apiKey: 'your-api-key'
});

Using Explicit Configuration

const client = new VectorizerClient({
  protocol: 'umicp',
  apiKey: 'your-api-key',
  umicp: {
    host: 'localhost',
    port: 15003,
    timeout: 60000
  }
});

When to Use UMICP

Use UMICP when:

Large Payloads: Inserting or searching large batches of vectors
High Throughput: Need maximum performance for production workloads
Low Latency: Need minimal protocol overhead

Use HTTP when:

Development: Quick testing and debugging
Firewall Restrictions: Only HTTP/HTTPS allowed
Simple Deployments: No need for custom protocol setup

Protocol Comparison

| Feature | HTTP/HTTPS | UMICP | |---------|-----------|-------| | Transport | Standard fetch API | StreamableHTTP (from @hivellm/umicp) | | Performance | Standard | Optimized for large payloads | | Firewall | Widely supported | May require configuration | | Debugging | Easy (browser tools) | Requires UMICP tools |

Master/Slave Configuration (Read/Write Separation)

Vectorizer supports Master-Replica replication for high availability and read scaling. The SDK provides automatic routing - writes go to master, reads are distributed across replicas.

Basic Setup

const { VectorizerClient } = require('@hivehub/vectorizer-sdk-js');

// Configure with master and replicas - SDK handles routing automatically
const client = new VectorizerClient({
  hosts: {
    master: 'http://master-node:15001',
    replicas: ['http://replica1:15001', 'http://replica2:15001']
  },
  apiKey: 'your-api-key',
  readPreference: 'replica' // 'master' | 'replica' | 'nearest'
});

// Writes automatically go to master
await client.createCollection({
  name: 'documents',
  dimension: 768,
  similarity_metric: 'cosine'
});

await client.insertTexts('documents', [
  { id: 'doc1', text: 'Sample document', metadata: { source: 'api' } }
]);

// Reads automatically go to replicas (load balanced)
const results = await client.searchVectors('documents', {
  query: 'sample',
  limit: 10
});

const collections = await client.listCollections();

Read Preferences

| Preference | Description | Use Case | |------------|-------------|----------| | 'replica' | Route reads to replicas (round-robin) | Default for high read throughput | | 'master' | Route all reads to master | When you need read-your-writes consistency | | 'nearest' | Route to the node with lowest latency | Geo-distributed deployments |

Read-Your-Writes Consistency

For operations that need to immediately read what was just written:

// Option 1: Override read preference for specific operation
await client.insertTexts('docs', [newDoc]);
const result = await client.getVector('docs', newDoc.id, { readPreference: 'master' });

// Option 2: Use a transaction-like pattern
const result = await client.withMaster(async (masterClient) => {
  await masterClient.insertTexts('docs', [newDoc]);
  return await masterClient.getVector('docs', newDoc.id);
});

Automatic Operation Routing

The SDK automatically classifies operations:

| Operation Type | Routed To | Methods | |---------------|-----------|---------| | Writes | Always Master | insertTexts, insertVectors, updateVector, deleteVector, createCollection, deleteCollection | | Reads | Based on readPreference | searchVectors, getVector, listCollections, intelligentSearch, semanticSearch, hybridSearch |

Standalone Mode (Single Node)

For development or single-node deployments:

// Single node - no replication
const client = new VectorizerClient({
  baseURL: 'http://localhost:15001',
  apiKey: 'your-api-key'
});

API Reference

Collection Management

// List collections
const collections = await client.listCollections();

// Get collection info
const info = await client.getCollection('documents');

// Create collection
const collection = await client.createCollection({
  name: 'documents',
  dimension: 768,
  similarity_metric: 'cosine',
  description: 'Document embeddings'
});

// Update collection
const updated = await client.updateCollection('documents', {
  description: 'Updated description'
});

// Delete collection
await client.deleteCollection('documents');

Vector Operations

// Insert vectors
const vectors = [{
  data: [0.1, 0.2, 0.3],
  metadata: { source: 'doc1.pdf' }
}];
await client.insertVectors('documents', vectors);

// Get vector
const vector = await client.getVector('documents', 'vector-id');

// Update vector
const updated = await client.updateVector('documents', 'vector-id', {
  metadata: { updated: true }
});

// Delete vector
await client.deleteVector('documents', 'vector-id');

// Delete multiple vectors
await client.deleteVectors('documents', ['id1', 'id2', 'id3']);

Search Operations

// Vector similarity search
const results = await client.searchVectors('documents', {
  query_vector: [0.1, 0.2, 0.3],
  limit: 10,
  threshold: 0.8,
  include_metadata: true
});

// Text semantic search
const textResults = await client.searchText('documents', {
  query: 'machine learning',
  limit: 10,
  threshold: 0.8,
  include_metadata: true,
  model: 'bert-base'
});

Embedding Operations

// Generate embeddings
const embedding = await client.embedText({
  text: 'machine learning algorithms',
  model: 'bert-base',
  parameters: {
    max_length: 512,
    normalize: true
  }
});

Error Handling

import {
  VectorizerError,
  AuthenticationError,
  CollectionNotFoundError,
  ValidationError,
  NetworkError,
  ServerError
} from '@hivehub/vectorizer-sdk-js';

try {
  await client.createCollection({
    name: 'documents',
    dimension: 768
  });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error('Authentication failed:', error.message);
  } else if (error instanceof ValidationError) {
    console.error('Validation error:', error.message);
  } else if (error instanceof NetworkError) {
    console.error('Network error:', error.message);
  } else {
    console.error('Unknown error:', error.message);
  }
}

Build Formats

The SDK is available in multiple formats:

CommonJS: dist/index.js - For Node.js
ES Modules: dist/index.esm.js - For modern bundlers
UMD: dist/index.umd.js - For browsers
UMD Minified: dist/index.umd.min.js - For production

Node.js (CommonJS)

const { VectorizerClient } = require('@hivehub/vectorizer-sdk-js');

ES Modules

import { VectorizerClient } from '@hivehub/vectorizer-sdk-js';

Browser (UMD)

<script src="https://unpkg.com/@hivehub/vectorizer-sdk-js/dist/index.umd.min.js"></script>
<script>
  const client = new VectorizerClient.VectorizerClient({
    baseURL: 'http://localhost:15001'
  });
</script>

Development

# Install dependencies
npm install

# Build
npm run build

# Watch mode
npm run build:watch

# Test
npm test

# Test with coverage
npm run test:coverage

# Lint
npm run lint

# Lint and fix
npm run lint:fix

License

MIT License - see LICENSE for details.