@ancatag/n-r
v0.2.3
Published
Official Node.js/TypeScript SDK for Nova AI API with route-config-based orchestration
Maintainers
Readme
@ancatag/n-r
Official Node.js/TypeScript SDK for Nova-route AI API
Save 60-80% on AI token costs with intelligent multi-tier caching and OpenAI-compatible interface.
Nova-route is an AI infrastructure platform that reduces your AI API costs by 60-80% through intelligent caching, semantic similarity matching, and RAG optimization. This SDK provides a drop-in replacement for OpenAI with automatic cost savings and route-config-based AI orchestration.
Features
- 💰 60-80% Token Cost Reduction - Multi-tier caching (hot + semantic) automatically saves on redundant API calls
- 🔄 OpenAI-Compatible API - Drop-in replacement, just change your base URL
- ⚡ Dual Transport - REST (default) and gRPC (lower overhead) support
- 🌊 Streaming Support - Real-time response streaming with cancellation
- 🧠 RAG Integration - Retrieval-Augmented Generation for document-based AI
- 🎯 Smart Routing - Automatic model selection and route configuration
- 📊 Cache Analytics - Track savings, hit rates, and token usage
- 🔒 TypeScript First - Full type safety with exported types
- 🚀 Zero Configuration - Works out of the box with sensible defaults
Installation
npm install @ancatag/n-r
# or
pnpm add @ancatag/n-r
# or
yarn add @ancatag/n-rRequirements: Node.js 18+ (uses native fetch)
Quick Start
Basic Chat Completion
import { NovaClient } from '@ancatag/n-r';
const client = new NovaClient({
apiKey: process.env.NOVA_API_KEY || 'nova_sk_...',
});
// Recommended: Use route config ID for consistent behavior
const response = await client.chat.create({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
],
temperature: 0.7,
max_tokens: 1000,
nova: {
routeConfigId: 'your-route-config-id' // Use specific route config
}
});
// Legacy: Model field still supported but route config recommended
const responseLegacy = await client.chat.create({
model: 'llama2', // Falls back to project default model
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
],
temperature: 0.7,
max_tokens: 1000,
});
console.log(response.choices[0].message.content);
console.log('Tokens used:', response.usage.total_tokens);
console.log('Cache hit:', response.nova?.cacheHit);
console.log('Tokens saved:', response.nova?.tokensSaved);Streaming
const stream = client.chat.createStream({
messages: [{ role: 'user', content: 'Tell me a story' }],
temperature: 0.7,
nova: {
routeConfigId: 'your-route-config-id' // Recommended: use route config
}
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
process.stdout.write(content); // Print as it arrives
}
if (chunk.choices[0]?.finish_reason) {
console.log('\n\nStream complete!');
break;
}
}Usage Examples
Non-Streaming Chat Completion
import { NovaClient } from '@ancatag/n-r';
const client = new NovaClient({
apiKey: 'nova_sk_...',
baseUrl: 'https://api.nova.ai', // Optional: defaults to http://localhost:3000
timeoutMs: 60000, // Optional: request timeout (default: 60000)
maxRetries: 3, // Optional: max retries (default: 2)
});
const response = await client.chat.create({
model: 'gpt-4', // Uses project default if not specified
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing in simple terms' }
],
temperature: 0.7,
max_tokens: 1000,
});
// Access response
console.log(response.choices[0].message.content);
// Access Nova-specific metrics
if (response.nova) {
console.log('Cache hit:', response.nova.cacheHit);
console.log('Cache layer:', response.nova.cacheLayer); // 'hot' | 'semantic' | null
console.log('Tokens saved:', response.nova.tokensSaved);
console.log('Response time:', response.nova.responseTimeMs, 'ms');
console.log('Request ID:', response.nova.requestId);
}Streaming with Cancellation
const controller = new AbortController();
// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);
try {
for await (const chunk of client.chat.createStream(
{
model: 'llama2',
messages: [{ role: 'user', content: 'Write a long story' }],
},
controller.signal
)) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled');
}
}Models API
// List all available models in your project
const models = await client.models.list();
console.log('Available models:', models.map(m => m.id));
// Get specific model details
const model = await client.models.get('llama2');
console.log('Model:', model);Error Handling
import { NovaClient, NovaError } from '@ancatag/n-r';
const client = new NovaClient({
apiKey: 'nova_sk_...',
});
try {
const response = await client.chat.create({
model: 'invalid-model',
messages: [{ role: 'user', content: 'Hello' }],
});
} catch (error) {
if (error instanceof NovaError) {
console.error('Nova API Error:', error.message);
console.error('Status:', error.status);
console.error('Code:', error.code);
console.error('Type:', error.type);
// Handle specific error codes
switch (error.code) {
case 'invalid_api_key':
console.error('Invalid API key');
break;
case 'model_not_found':
console.error('Model not found or not accessible');
break;
case 'rate_limit_exceeded':
console.error('Rate limit exceeded');
break;
}
} else {
console.error('Unexpected error:', error);
}
}Nova-Specific Features
Nova extends the OpenAI API with powerful features for cost optimization and advanced routing:
Cache Control
const response = await client.chat.create({
model: 'llama2',
messages: [{ role: 'user', content: 'Hello' }],
// Nova-specific options
nova: {
skipCache: false, // Skip cache lookup for this request (default: false)
},
});
// Response includes cache information
console.log('Cache hit:', response.nova?.cacheHit);
console.log('Cache layer:', response.nova?.cacheLayer); // 'hot' | 'semantic' | null
console.log('Tokens saved:', response.nova?.tokensSaved);Route Configuration
const response = await client.chat.create({
model: 'llama2',
messages: [{ role: 'user', content: 'Hello' }],
nova: {
routeConfigId: 'route-config-uuid', // Use specific route configuration
},
});RAG (Retrieval-Augmented Generation)
RAG enables AI models to answer questions using your own documents as context. Instead of sending entire documents with every request, Nova-route automatically retrieves only the relevant chunks that match your query, dramatically reducing token usage (70-90% savings) while improving accuracy.
How RAG Works:
- Upload documents (PDF, TXT, MD) to a route config via REST API
- Documents are automatically parsed, chunked, and embedded
- During chat completions, relevant chunks are automatically retrieved and injected as context
- Only chunks that fit within your token budget are included
Using RAG with the SDK:
RAG works automatically once documents are uploaded and processed. Simply use a route config that has RAG enabled:
// RAG is automatic - no code changes needed!
const response = await client.chat.create({
model: 'your-route-config-id', // Route config with ragEnabled: true
messages: [
{ role: 'user', content: 'What is the vacation policy?' }
],
});
// Response includes context from your uploaded documents
console.log(response.choices[0].message.content);Document Upload (via REST API):
Document upload is done via the REST API (not SDK methods). Here's a complete example:
const API_BASE_URL = 'https://api.nova.ai';
const JWT_TOKEN = 'your-jwt-token'; // From /auth/login
const ROUTE_CONFIG_ID = 'your-route-config-id';
// 1. Upload document
async function uploadDocument(file: File) {
const formData = new FormData();
formData.append('file', file);
const response = await fetch(
`${API_BASE_URL}/rag/collections/${ROUTE_CONFIG_ID}/documents`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${JWT_TOKEN}`,
},
body: formData,
}
);
const document = await response.json();
console.log('Document uploaded:', document.id);
// Poll for processing completion
return pollDocumentStatus(document.id);
}
// 2. Check processing status
async function pollDocumentStatus(documentId: string) {
const maxAttempts = 30;
const delayMs = 2000;
for (let i = 0; i < maxAttempts; i++) {
const response = await fetch(
`${API_BASE_URL}/rag/documents/${documentId}`,
{
headers: {
'Authorization': `Bearer ${JWT_TOKEN}`,
},
}
);
const document = await response.json();
if (document.status === 'completed') {
console.log('Document processed! Chunks:', document.chunkCount);
return document;
}
if (document.status === 'failed') {
throw new Error(`Processing failed: ${document.errorMessage}`);
}
await new Promise(resolve => setTimeout(resolve, delayMs));
}
throw new Error('Document processing timeout');
}
// 3. Use RAG in chat completions (automatic)
const response = await client.chat.create({
model: ROUTE_CONFIG_ID, // Route config with ragEnabled: true
messages: [
{ role: 'user', content: 'What is the vacation policy?' }
],
});Token Savings with RAG:
- Without RAG: Send entire documents (10,000+ tokens)
- With RAG: Only relevant chunks (500-2,000 tokens)
- Savings: 70-90% reduction in prompt tokens
RAG Configuration:
RAG settings are configured per route config:
chunkSize: 100-2048 tokens per chunk (default: 512)chunkOverlap: 0-200 tokens overlap (default: 50)topK: 1-20 chunks to retrieve (default: 5)similarityThreshold: 0.5-0.95 minimum similarity (default: 0.7)
See RAG SDK Documentation for complete details.
Custom Metadata
const response = await client.chat.create({
model: 'llama2',
messages: [{ role: 'user', content: 'Hello' }],
nova: {
metadata: {
userId: '123',
sessionId: 'abc',
feature: 'chatbot',
},
},
});System Prompt Override
const response = await client.chat.create({
model: 'llama2',
messages: [{ role: 'user', content: 'Hello' }],
nova: {
systemPromptOverride: 'You are a specialized technical assistant.',
},
});gRPC Transport
For lower overhead and better performance, use gRPC transport:
import { NovaClient } from '@ancatag/n-r';
const client = new NovaClient({
apiKey: process.env.NOVA_API_KEY || 'nova_sk_...',
transport: 'grpc', // Use gRPC instead of REST
grpcUrl: '0.0.0.0:50051', // Optional: defaults to 0.0.0.0:50051
});
// Same API, lower overhead
const response = await client.chat.create({
model: 'llama2',
messages: [{ role: 'user', content: 'Hello' }],
});
// Streaming also works with gRPC
for await (const chunk of client.chat.createStream({
model: 'llama2',
messages: [{ role: 'user', content: 'Hey' }],
})) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Transport Options:
transport: 'rest'(default) - Uses HTTP fetchtransport: 'grpc'- Uses gRPC over grpc-js with ts-proto stubs
API Reference
Client Configuration
interface NovaClientConfig {
/** Nova API key (required, format: nova_sk_...) */
apiKey: string;
/** Base URL for REST API (default: http://localhost:3000) */
baseUrl?: string;
/** gRPC URL (default: 0.0.0.0:50051) */
grpcUrl?: string;
/** Preferred transport: 'rest' | 'grpc' (default: 'rest') */
transport?: 'rest' | 'grpc';
/** Request timeout in milliseconds (default: 60000) */
timeoutMs?: number;
/** Maximum number of retries for failed requests (default: 2) */
maxRetries?: number;
}Chat Completions
client.chat.create(request)
Create a non-streaming chat completion.
Parameters:
request: ChatCompletionRequest- Chat completion request (OpenAI-compatible)
Returns: Promise<ChatCompletionResponse>
client.chat.createStream(request, signal?)
Create a streaming chat completion.
Parameters:
request: ChatCompletionRequest- Chat completion requestsignal?: AbortSignal- Optional abort signal for cancellation
Returns: AsyncIterable<ChatCompletionChunk>
Models
client.models.list()
List all available models in your project.
Returns: Promise<Model[]>
client.models.get(modelId)
Get details for a specific model.
Parameters:
modelId: string- Model identifier
Returns: Promise<Model>
Type Exports
import type {
ChatMessage,
ChatCompletionRequest,
ChatCompletionResponse,
ChatCompletionChunk,
ChatCompletionChoice,
ChatCompletionUsage,
Model,
NovaClientConfig,
NovaTransport,
} from '@ancatag/n-r';
import { NovaError } from '@ancatag/n-r';Nova-Specific Extensions
Request Extensions
interface NovaRequestExtensions {
nova?: {
/** Skip cache lookup for this request */
skipCache?: boolean;
/** Route config ID - specifies which route configuration to use */
routeConfigId?: string;
/** Enable RAG (Retrieval-Augmented Generation) for this request */
ragEnabled?: boolean;
/** Additional metadata to attach to the request */
metadata?: Record<string, any>;
/** Override the system prompt for this request */
systemPromptOverride?: string;
};
}Response Extensions
interface NovaResponseExtensions {
nova?: {
/** Whether this response was served from cache */
cacheHit: boolean;
/** Cache layer used: 'hot' (exact match) | 'semantic' (similarity match) | null */
cacheLayer?: 'hot' | 'semantic' | null;
/** Number of tokens saved by cache hit */
tokensSaved: number;
/** Response time in milliseconds */
responseTimeMs: number;
/** Unique request ID for tracking */
requestId: string;
};
}Advanced Features
Multi-Tier Caching
Nova automatically uses two cache layers:
Hot Cache - Exact match caching (7-30 day TTL)
- Instant responses for identical requests
- SHA-256 hash-based lookup
Semantic Cache - Similarity matching (95% threshold)
- Matches semantically similar prompts
- Vector embedding-based similarity search
- Available on paid plans
Cost Savings Tracking
Every response includes savings metrics:
const response = await client.chat.create({
model: 'llama2',
messages: [{ role: 'user', content: 'Hello' }],
});
if (response.nova?.cacheHit) {
console.log(`Saved ${response.nova.tokensSaved} tokens`);
console.log(`Cache layer: ${response.nova.cacheLayer}`);
}Model Routing
Nova automatically routes requests to the correct provider based on:
- Model identifier in request
- Project default model configuration
- Route configuration (if specified)
// Uses project default if model not specified
const response = await client.chat.create({
messages: [{ role: 'user', content: 'Hello' }],
// model is optional - uses project default
});RAG (Retrieval-Augmented Generation)
RAG provides 70-90% token savings by automatically retrieving only relevant document chunks instead of sending entire documents.
How RAG Works:
Automatic Context Retrieval: When you make a chat completion to a route config with
ragEnabled: true, Nova-route automatically:- Extracts the query from the last user message
- Generates a vector embedding of the query
- Searches Qdrant for semantically similar chunks
- Selects top-K chunks above similarity threshold
- Manages token budget to include only chunks that fit
- Injects retrieved chunks as context before the user's query
Token Budget Management: Nova-route automatically calculates:
Token Budget = Context Window - Existing Prompt Tokens - Headroom (200 tokens)Only chunks that fit within this budget are included.
Prompt Format: The final prompt sent to the AI model includes:
[System Prompt] [Pre-prompt Items] Context: [Relevant chunk 1 from your documents] [Relevant chunk 2 from your documents] Query: [User's original question]
RAG Benefits:
- 70-90% Token Savings: Only relevant chunks vs. full documents
- Improved Accuracy: AI responses grounded in your documents
- Automatic: No manual context management needed
- Scalable: Works with large document collections
- Intelligent: Semantic search finds relevant content even with different wording
Example Token Savings:
- Full document: 10,000 tokens
- Relevant chunks: 1,500 tokens
- Savings: 8,500 tokens (85%)
See RAG SDK Documentation for complete setup and configuration details.
TypeScript Support
Full TypeScript support with comprehensive type definitions:
import { NovaClient } from '@ancatag/n-r';
import type {
ChatCompletionRequest,
ChatCompletionResponse,
NovaClientConfig,
} from '@ancatag/n-r';
const config: NovaClientConfig = {
apiKey: process.env.NOVA_API_KEY!,
baseUrl: 'https://api.nova.ai',
};
const client = new NovaClient(config);
async function chat(
request: ChatCompletionRequest
): Promise<ChatCompletionResponse> {
return await client.chat.create(request);
}Requirements
- Node.js: 18.0.0 or higher (uses native
fetch) - TypeScript: 5.0+ (for type definitions, optional)
Getting Started with Nova-route
- Sign up at nova.ai (Free plan available)
- Create a project in the dashboard
- Configure your models (BYOP or use hosted providers)
- Generate an API key (format:
nova_sk_...) - Install the SDK and start saving on token costs!
Migration from OpenAI
Switching from OpenAI to Nova-route is simple:
// Before (direct OpenAI)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// After (via Nova-route) - Just change the SDK!
import { NovaClient } from '@ancatag/n-r';
const client = new NovaClient({
apiKey: process.env.NOVA_API_KEY, // Get from Nova-route dashboard
baseUrl: 'https://api.nova.ai', // Nova-route API endpoint
});
// Your code stays exactly the same!
const response = await client.chat.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello!' }],
});Documentation
- Full SDK Documentation - Complete SDK reference
- RAG SDK Documentation - RAG setup, document upload, and configuration
- API Documentation - API endpoints and authentication
- Architecture Guide - System design and components
- Getting Started - Quick start guide
License
ISC
Support
- Documentation: docs/
- Issues: GitHub Issues
- Dashboard: nova.ai
Built with ❤️ by the Nova-route team
