@dooor-ai/cortexdb
v0.8.9
Published
Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing
Maintainers
Readme
CortexDB TypeScript SDK
██████╗ ██████╗ ██████╗ ██████╗ ██████╗
██╔══██╗██╔═══██╗██╔═══██╗██╔═══██╗██╔══██╗
██║ ██║██║ ██║██║ ██║██║ ██║██████╔╝
██║ ██║██║ ██║██║ ██║██║ ██║██╔══██╗
██████╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║ ██║
╚═════╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝Official TypeScript/JavaScript SDK for CortexDB
What is CortexDB?
CortexDB is a multi-modal RAG (Retrieval Augmented Generation) platform that combines traditional database capabilities with vector search and advanced document processing. It enables you to:
- Store structured and unstructured data in a unified database
- Automatically extract text from documents (PDF, DOCX, XLSX) using Docling
- Generate embeddings for semantic search using various providers (OpenAI, Gemini, etc.)
- Perform hybrid search combining filters with vector similarity
- Build RAG applications with automatic chunking and vectorization
CortexDB handles the complex infrastructure of vector databases (Qdrant), object storage (MinIO), and traditional databases (PostgreSQL) behind a simple API.
Features
- Multi-modal document processing: Upload PDFs, DOCX, XLSX files and automatically extract text with OCR fallback
- Semantic search: Vector-based search using embeddings from OpenAI, Gemini, or custom providers
- Automatic chunking: Smart text splitting optimized for RAG applications
- Flexible schema: Define collections with typed fields (string, number, boolean, file, array)
- Hybrid queries: Combine exact filters with semantic search
- Storage control: Choose where each field is stored (PostgreSQL, Qdrant, MinIO)
- Type-safe: Full TypeScript support with comprehensive type definitions
- Modern API: Async/await using native fetch (Node.js 18+)
- Infra management: Database (
client.databases) and embedding provider (client.embeddingProviders) APIs built-in - 🆕 TypeScript Decorators: Define schemas using decorators (like TypeORM) with full IDE support - see Schema Decorators Guide
Installation
npm install @dooor-ai/cortexdbOr with yarn:
yarn add @dooor-ai/cortexdbOr with pnpm:
pnpm add @dooor-ai/cortexdbQuick Start
import { CortexClient, FieldType, StoreLocation } from '@dooor-ai/cortexdb';
async function main() {
// Initialize with database in connection string
const client = new CortexClient('cortexdb://my-api-key@localhost:8000/production');
// Create a collection with vectorization enabled
await client.collections.create(
'documents',
[
{ name: 'title', type: FieldType.STRING },
{ name: 'content', type: FieldType.TEXT, vectorize: true },
{ name: 'published_at', type: FieldType.DATETIME, store_in: [StoreLocation.POSTGRES] }
],
'your-embedding-provider-id' // Required when vectorize=true
// database parameter is optional here since we set 'production' as default
);
// Create a record
const record = await client.records.create('documents', {
title: 'Introduction to AI',
content: 'Artificial intelligence is transforming how we build software...'
});
// Semantic search - finds relevant content by meaning, not just keywords
const results = await client.records.search(
'documents',
'How is AI changing software development?',
undefined, // filters
10 // limit - database parameter optional since we have default
);
results.results.forEach(result => {
console.log(`Score: ${result.score.toFixed(4)}`);
console.log(`Title: ${result.record.data.title}`);
console.log(`Content: ${result.record.data.content}\n`);
});
await client.close();
}
main();Project-Specific Typing
The SDK becomes fully type-safe once you apply your YAML schema with the Dooor CLI:
npx dooor schema apply # reads dooor/schemas by default and generates types in dooor/generated/This command creates dooor/generated/cortex-schema.ts and automatically augments the SDK types. After the file exists in your project, you can keep importing CortexClient from @dooor-ai/cortexdb; TypeScript will infer the fields/collections defined in your YAML. Invalid field names or missing required properties inside client.records.create('my_collection', {...}) now trigger compile-time errors, Prisma-style.
If you need an explicit factory, the generated file also exports createCortexClient() and TypedCortexClient helpers.
ℹ️ The CLI also drops a lightweight
.d.tsshim innode_modules/@dooor-ai/cortexdb/generated/schema.d.ts, so TypeScript picks up your schema automatically—no need to tweaktsconfig.json.
Prisma-like Records Delegates
Once the schema is generated, you can call collections with property access instead of passing strings:
// Fully typed
const record = await client.records.tool_calls.create({
chatId: "chat-123",
description: "RAG invocation summary",
createdAt: new Date().toISOString(),
});
// String form still available when you need something dynamic
await client.records.create("tool_calls", {
chatId,
description,
createdAt,
});Usage
Initialize Client
import { CortexClient } from '@dooor-ai/cortexdb';
// Using connection string with database (recommended)
const client = new CortexClient('cortexdb://my-api-key@localhost:8000/production');
// Without database in connection string (must pass database to each method)
const client = new CortexClient('cortexdb://my-api-key@localhost:8000');
// Production (HTTPS auto-detected)
const client = new CortexClient('cortexdb://[email protected]/production');
// Using options object (alternative)
const client = new CortexClient({
baseUrl: 'http://localhost:8000',
apiKey: 'your-api-key',
database: 'production', // Optional: set default database
timeout: 1800000, // Optional: override timeout (default = 30 min to cover large uploads)
waitUntilComplete: true, // Optional: keep SDK waiting for async ingestion to finish (default = true)
});Connection String Format:
cortexdb://[api_key@]host[:port][/database]
Benefits:
- Single string configuration
- Easy to store in environment variables
- Familiar pattern (like PostgreSQL, MongoDB, Redis)
- Auto-detects HTTP vs HTTPS
- Optional database specification for multi-tenant isolation
Database Parameter:
- If you specify a database in the connection string or options, it becomes the default for all operations
- You can override the default database on a per-method basis
- If no default database is set, you must pass the
databaseparameter to each method
Async File Uploads & Processing
Large documents (PDFs, DOCXs, etc.) are ingested asynchronously to avoid timeouts. When you call client.records.create(...) the gateway now responds immediately with a payload like:
{
"id": "rec_123",
"status": "pending",
"processing_state": {
"record_id": "rec_123",
"status": "pending",
"processed_chunks": 0,
"total_chunks": 0
}
}By default the SDK keeps polling the processing_state endpoint until the background worker finishes and only then resolves with the final CreateRecordResponse. That preserves backward compatibility with existing backends that expect a fully processed record once create() returns.
You can control this behavior:
// Return immediately (HTTP 202) and poll manually later
const pending = await client.records.create(
'documents',
{ title: 'Async', content: '...' },
undefined,
{ waitUntilComplete: false }
);
// Later in your workflow…
const status = await client.records.getStatus('documents', pending.id);
if (status?.status === 'completed') {
const finalRecord = await client.records.waitForCompletion('documents', pending.id);
}Useful options:
waitUntilComplete(defaulttrue): let the SDK poll automatically.pollingIntervalMs(default5000): change how often the SDK checks status.timeoutMs(default30 min): upper bound for the auto-poll loop.
Under the hood the SDK calls GET /records/{id}/status until the worker updates the processing_state to completed or failed. You can also call that endpoint directly via client.records.getStatus(...) to drive custom progress indicators.
Databases
// Create database
await client.databases.create({ name: 'ai_docs', description: 'Knowledge base' });
// List databases
const databases = await client.databases.list();
// Delete database
await client.databases.delete('ai_docs');Embedding Providers
await client.embeddingProviders.create({
name: 'Gemini Flash',
provider: 'gemini',
embedding_model: 'models/text-embedding-004',
api_key: process.env.GEMINI_API_KEY!,
});
const providers = await client.embeddingProviders.list();Collections
Collections define the schema for your data. Each collection can have multiple fields with different types and storage options.
import { FieldType, StoreLocation } from '@dooor-ai/cortexdb';
// Create collection with vectorization (database required)
const collection = await client.collections.create(
'articles',
[
{
name: 'title',
type: FieldType.STRING
},
{
name: 'content',
type: FieldType.TEXT,
vectorize: true // Enable semantic search on this field
},
{
name: 'year',
type: FieldType.INT,
store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
}
],
'embedding-provider-id', // Required when any field has vectorize=true
'production' // Database name (or omit if default database is set)
);
// List collections (uses default database if set, or pass specific database)
const collections = await client.collections.list('production');
// Get collection schema
const schema = await client.collections.get('articles', 'production');
// Delete collection and all its records
await client.collections.delete('articles', 'production');
// If you set a default database in the client, you can omit it:
const client = new CortexClient('cortexdb://key@host:8000/production');
const collections = await client.collections.list(); // Uses 'production'Records
Records are the actual data stored in collections. They must match the collection schema.
import fs from 'node:fs';
// Create record (with optional file upload and database)
const created = await client.records.create(
'articles',
{
title: 'Machine Learning Basics',
content: 'Machine learning is a subset of AI focused on learning from data...',
year: 2024,
},
{
attachment: fs.readFileSync('ml-intro.pdf'),
},
'production' // Database name
);
// Get record by ID
const fetched = await client.records.get('articles', created.id, 'production');
// Update record
const updated = await client.records.update('articles', created.id, {
year: 2025,
}, 'production');
// Delete record
await client.records.delete('articles', created.id, 'production');
// List records with filters/pagination
const results = await client.records.list('articles', {
limit: 10,
offset: 0,
filters: { year: { $gte: 2023 } },
});Schema CLI (YAML)
Install the CLI (recommended in devDependencies):
npm install --save-dev dooorUse the unified dooor CLI to synchronize declarative schemas.
Also install the "Dooor Tools" extension in VS Code/Cursor for real-time validation (Open VSX).
# Check differences between local YAML and CortexDB
npx dooor schema diff --dir dooor/schemas
# Create collections that don't exist yet
npx dooor schema apply --dir dooor/schemas
# Apply without generating types (by default apply already generates them)
npx dooor schema apply --no-generate-types
# Generate TypeScript types for use in services
npx dooor schema generate-types --dir dooor/schemas --out src/generated/cortex-schema.tsAutomatic Collection Typing
After synchronizing the schema, the CLI generates dooor/generated/cortex-schema.ts with derived types. Provide this schema to the SDK to get Prisma-like autocomplete and validation:
import { CortexClient } from '@dooor-ai/cortexdb';
import type {
CortexGeneratedSchema,
CollectionCreateInput,
} from '../dooor/generated/cortex-schema';
const client = new CortexClient<CortexGeneratedSchema>(
process.env.CORTEXDB_CONNECTION!,
);
const payload: CollectionCreateInput<'tool_calls'> = {
chatId,
workspaceId,
toolName,
description,
toolOutput,
createdAt: new Date().toISOString(),
};
await client.records.create('tool_calls', payload);Generics propagate to records.update, records.list, records.get, and records.search. If you prefer the old dynamic mode, instantiate new CortexClient() without the generic parameter.
Set CORTEXDB_CONNECTION (e.g., cortexdb://key@host:8000) or the CORTEXDB_BASE_URL + CORTEXDB_API_KEY variables before running commands. If no directory is specified, the CLI automatically looks in dooor/schemas.
To avoid repeating flags, configure dooor/config.yaml at the project root:
cortexdb:
connection: env(CORTEXDB_CONNECTION)
defaultEmbeddingProvider: default-provider
schema:
dir: dooor/schemas
typesOut: dooor/generated/cortex-schema.tsYou can override with dooor/config.local.yaml or point to another path via DOOOR_CONFIG.
Semantic Search
Semantic search finds records by meaning, not just exact keyword matches. It uses vector embeddings to understand context.
// Basic semantic search
const results = await client.records.search(
'articles',
'machine learning fundamentals',
undefined,
10
);
// Search with filters - combine semantic search with exact matches
const filteredResults = await client.records.search(
'articles',
'neural networks',
{
year: 2024,
category: 'AI'
},
5
);
// Process results - ordered by relevance score
filteredResults.results.forEach(result => {
console.log(`Score: ${result.score.toFixed(4)}`); // Higher = more relevant
console.log(`Title: ${result.record.data.title}`);
console.log(`Year: ${result.record.data.year}`);
});Working with Files
CortexDB can process documents and automatically extract text for vectorization.
// Create collection with file field
await client.collections.create(
'documents',
[
{ name: 'title', type: FieldType.STRING },
{
name: 'document',
type: FieldType.FILE,
vectorize: true // Extract text and create embeddings
}
],
'embedding-provider-id'
);
// Note: File upload support is currently available in the REST API
// TypeScript SDK file upload will be added in a future versionFilter Operators
// Exact match filters
const results = await client.records.list('articles', {
filters: {
category: 'technology',
published: true,
year: 2024
}
});
// Combine multiple filters
const filtered = await client.records.list('articles', {
filters: {
year: 2024,
category: 'AI',
author: 'John Doe'
},
limit: 20
});Error Handling
The SDK provides specific error types for different failure scenarios.
import {
CortexDBError,
CortexDBNotFoundError,
CortexDBValidationError,
CortexDBConnectionError,
CortexDBTimeoutError
} from '@dooor-ai/cortexdb';
try {
const record = await client.records.get('articles', 'invalid-id');
} catch (error) {
if (error instanceof CortexDBNotFoundError) {
console.log('Record not found');
} else if (error instanceof CortexDBValidationError) {
console.log('Invalid data:', error.message);
} else if (error instanceof CortexDBConnectionError) {
console.log('Connection failed:', error.message);
} else if (error instanceof CortexDBTimeoutError) {
console.log('Request timed out:', error.message);
} else if (error instanceof CortexDBError) {
console.log('General error:', error.message);
}
}Examples
Check the examples/ directory for complete working examples:
quickstart.ts- Complete walkthrough of SDK featuressearch.ts- Semantic search with filters and providersbasic.ts- Basic CRUD operations
Run examples:
npx ts-node -O '{"module":"commonjs"}' examples/quickstart.tsDevelopment
Setup
# Clone repository
git clone https://github.com/yourusername/cortexdb
cd cortexdb/clients/typescript
# Install dependencies
npm install
# Build
npm run buildScripts
# Build TypeScript
npm run build
# Build in watch mode
npm run build:watch
# Clean build artifacts
npm run clean
# Lint code
npm run lint
# Format code
npm run formatRequirements
- Node.js >= 18.0.0 (for native fetch support)
- CortexDB gateway running locally or remotely
- Embedding provider configured (OpenAI, Gemini, etc.) if using vectorization
Architecture
CortexDB integrates multiple technologies:
- PostgreSQL: Stores structured data and metadata
- Qdrant: Vector database for semantic search
- MinIO: Object storage for files
- Docling: Advanced document processing and text extraction
The SDK abstracts this complexity into a simple, unified API.
Advanced RAG Strategies (v0.4.0+)
CortexDB now supports multiple RAG strategies to improve search quality and relevance. Choose the strategy that best fits your use case:
Available Strategies
- SIMPLE: Basic vector similarity search (default)
- MULTI_QUERY: Generate multiple query variations and combine results using Reciprocal Rank Fusion
- HYDE: Generate hypothetical documents and use them for improved retrieval
- RERANK: Use LLM to rerank search results by relevance
- FUSION: Combine multi-query expansion with LLM reranking
- CONTEXTUAL_QUERY: Reformulate queries based on conversation context
Setup AI Providers
Before using advanced strategies, configure an AI provider:
// Create an AI provider for query expansion/reranking
const aiProvider = await client.aiProviders.create({
name: "Gemini Flash",
provider: "gemini",
api_key: "your-gemini-api-key",
model: "gemini-1.5-flash",
enabled: true,
});
// List providers
const providers = await client.aiProviders.list();
// Update provider
await client.aiProviders.update(aiProvider.id, {
model: "gemini-2.0-flash",
});Using Advanced Search
import { RAGStrategy } from '@dooor-ai/cortexdb';
// Simple search (default)
const simpleResults = await client.records.searchAdvanced('documents', {
query: 'What is machine learning?',
limit: 10,
strategy: RAGStrategy.SIMPLE,
});
// Multi-query with automatic query expansion
const multiQueryResults = await client.records.searchAdvanced('documents', {
query: 'What is machine learning?',
limit: 10,
strategy: RAGStrategy.MULTI_QUERY,
strategyConfig: {
num_queries: 5, // Generate 5 query variations
},
aiProviderName: "Gemini Flash", // Use provider by name
});
// HyDE: Generate hypothetical document for better retrieval
const hydeResults = await client.records.searchAdvanced('documents', {
query: 'Explain neural networks',
limit: 10,
strategy: RAGStrategy.HYDE,
strategyConfig: {
document_length: 200, // Length of hypothetical document
},
aiProviderName: "Gemini Flash",
});
// Rerank: Use LLM to reorder results by relevance
const rerankResults = await client.records.searchAdvanced('documents', {
query: 'Benefits of deep learning',
limit: 10,
strategy: RAGStrategy.RERANK,
strategyConfig: {
initial_k: 50, // Fetch 50 results then rerank to top 10
},
aiProviderName: "Gemini Flash",
});
// Fusion: Best of both worlds (multi-query + reranking)
const fusionResults = await client.records.searchAdvanced('documents', {
query: 'How does AI work?',
limit: 10,
strategy: RAGStrategy.FUSION,
strategyConfig: {
num_queries: 5,
initial_k: 50,
},
aiProviderName: "Gemini Flash",
});
// Contextual: Reformulate query based on conversation history
const contextualResults = await client.records.searchAdvanced('documents', {
query: 'What about its applications?',
limit: 10,
strategy: RAGStrategy.CONTEXTUAL_QUERY,
strategyConfig: {
context: [
'Previous: What is machine learning?',
'Answer: Machine learning is a subset of AI...',
],
},
aiProviderName: "Gemini Flash",
});
// Access results
fusionResults.results.forEach(result => {
console.log(`Score: ${result.score}`);
console.log(`Content: ${result.record.content}`);
console.log(`Strategy used: ${fusionResults.strategy_used}`);
});Collection-Specific Delegates
The advanced search is also available on collection delegates:
// Using the facade pattern
const results = await client.records.documents.searchAdvanced({
query: 'Machine learning applications',
strategy: RAGStrategy.FUSION,
aiProviderName: "Gemini Flash",
});Performance Tips
- SIMPLE: Fastest, use for basic semantic search
- MULTI_QUERY: 5x slower than simple (generates 5 queries)
- HYDE: Similar to multi-query, good for questions
- RERANK: Moderate cost, great for accuracy improvement
- FUSION: Highest cost and latency, best quality
- CONTEXTUAL_QUERY: Use for conversational interfaces
For more details, see RAG Strategies Documentation.
License
MIT License - see LICENSE for details.
Related
- CortexDB Python SDK - Python client for CortexDB
- CortexDB Documentation - Complete platform documentation
