@ancatag/n-r

v0.2.35

Published

a day ago

Official Node.js/TypeScript SDK for Nova AI API with route-config-based orchestration

0High
0Medium
0Low

sayanmohsin

nova ai sdk chat completions openai-compatible route-config caching rag cost-optimization

@ancatag/n-r

Official Node.js/TypeScript SDK for Nova-route AI API
Save 60-80% on AI token costs with intelligent multi-tier caching and OpenAI-compatible interface.

Nova-route is an AI infrastructure platform that reduces your AI API costs by 60-80% through intelligent caching, semantic similarity matching, and RAG optimization. This SDK provides a drop-in replacement for OpenAI with automatic cost savings and route-config-based AI orchestration.

Features

💰 60-80% Token Cost Reduction - Multi-tier caching (hot + semantic) automatically saves on redundant API calls
🔄 OpenAI-Compatible API - Drop-in replacement, just change your base URL
⚡ Dual Transport - REST (default) and gRPC (lower overhead) support
🌊 Streaming Support - Real-time response streaming with cancellation
🧠 RAG Integration - Retrieval-Augmented Generation for document-based AI
🎯 Smart Routing - Automatic model selection and route configuration
📊 Cache Analytics - Track savings, hit rates, and token usage
🔒 TypeScript First - Full type safety with exported types
🚀 Zero Configuration - Works out of the box with sensible defaults

Installation

npm install @ancatag/n-r
# or
pnpm add @ancatag/n-r
# or
yarn add @ancatag/n-r

Requirements: Node.js 18+ (uses native fetch)

Quick Start

Basic Chat Completion

import { NovaClient } from '@ancatag/n-r';

const client = new NovaClient({
  apiKey: process.env.NOVA_API_KEY || 'nova_sk_...',
});

// Recommended: Use route config ID for consistent behavior
const response = await client.chat.create({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7,
  max_tokens: 1000,
  nova: {
    routeConfigId: 'your-route-config-id' // Use specific route config
  }
});

// Legacy: Model field still supported but route config recommended
const responseLegacy = await client.chat.create({
  model: 'llama2', // Falls back to project default model
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7,
  max_tokens: 1000,
});

console.log(response.choices[0].message.content);
console.log('Tokens used:', response.usage.total_tokens);
console.log('Cache hit:', response.nova?.cacheHit);
console.log('Tokens saved:', response.nova?.tokensSaved);

Streaming

const stream = client.chat.createStream({
  messages: [{ role: 'user', content: 'Tell me a story' }],
  temperature: 0.7,
  nova: {
    routeConfigId: 'your-route-config-id' // Recommended: use route config
  }
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  if (content) {
    process.stdout.write(content); // Print as it arrives
  }
  
  if (chunk.choices[0]?.finish_reason) {
    console.log('\n\nStream complete!');
    break;
  }
}

Usage Examples

Non-Streaming Chat Completion

import { NovaClient } from '@ancatag/n-r';

const client = new NovaClient({
  apiKey: 'nova_sk_...',
  baseUrl: 'https://api.nova.ai', // Optional: defaults to http://localhost:3000
  timeoutMs: 60000, // Optional: request timeout (default: 60000)
  maxRetries: 3, // Optional: max retries (default: 2)
});

const response = await client.chat.create({
  model: 'gpt-4', // Uses project default if not specified
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in simple terms' }
  ],
  temperature: 0.7,
  max_tokens: 1000,
});

// Access response
console.log(response.choices[0].message.content);

// Access Nova-specific metrics
if (response.nova) {
  console.log('Cache hit:', response.nova.cacheHit);
  console.log('Cache layer:', response.nova.cacheLayer); // 'hot' | 'semantic' | null
  console.log('Tokens saved:', response.nova.tokensSaved);
  console.log('Response time:', response.nova.responseTimeMs, 'ms');
  console.log('Request ID:', response.nova.requestId);
}

Streaming with Cancellation

const controller = new AbortController();

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);

try {
  for await (const chunk of client.chat.createStream(
    {
      model: 'llama2',
      messages: [{ role: 'user', content: 'Write a long story' }],
    },
    controller.signal
  )) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Stream cancelled');
  }
}

Models API

// List all available models in your project
const models = await client.models.list();
console.log('Available models:', models.map(m => m.id));

// Get specific model details
const model = await client.models.get('llama2');
console.log('Model:', model);

Error Handling

import { NovaClient, NovaError } from '@ancatag/n-r';

const client = new NovaClient({
  apiKey: 'nova_sk_...',
});

try {
  const response = await client.chat.create({
    model: 'invalid-model',
    messages: [{ role: 'user', content: 'Hello' }],
  });
} catch (error) {
  if (error instanceof NovaError) {
    console.error('Nova API Error:', error.message);
    console.error('Status:', error.status);
    console.error('Code:', error.code);
    console.error('Type:', error.type);
    
    // Handle specific error codes
    switch (error.code) {
      case 'invalid_api_key':
        console.error('Invalid API key');
        break;
      case 'model_not_found':
        console.error('Model not found or not accessible');
        break;
      case 'rate_limit_exceeded':
        console.error('Rate limit exceeded');
        break;
    }
  } else {
    console.error('Unexpected error:', error);
  }
}

Nova-Specific Features

Nova extends the OpenAI API with powerful features for cost optimization and advanced routing:

Cache Control

const response = await client.chat.create({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hello' }],
  
  // Nova-specific options
  nova: {
    skipCache: false, // Skip cache lookup for this request (default: false)
  },
});

// Response includes cache information
console.log('Cache hit:', response.nova?.cacheHit);
console.log('Cache layer:', response.nova?.cacheLayer); // 'hot' | 'semantic' | null
console.log('Tokens saved:', response.nova?.tokensSaved);

Route Configuration

const response = await client.chat.create({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hello' }],
  
  nova: {
    routeConfigId: 'route-config-uuid', // Use specific route configuration
  },
});

RAG (Retrieval-Augmented Generation)

RAG enables AI models to answer questions using your own documents as context. Instead of sending entire documents with every request, Nova-route automatically retrieves only the relevant chunks that match your query, dramatically reducing token usage (70-90% savings) while improving accuracy.

How RAG Works:

Upload documents (PDF, TXT, MD) to a route config via REST API
Documents are automatically parsed, chunked, and embedded
During chat completions, relevant chunks are automatically retrieved and injected as context
Only chunks that fit within your token budget are included

Using RAG with the SDK:

RAG works automatically once documents are uploaded and processed. Simply use a route config that has RAG enabled:

// RAG is automatic - no code changes needed!
const response = await client.chat.create({
  model: 'your-route-config-id', // Route config with ragEnabled: true
  messages: [
    { role: 'user', content: 'What is the vacation policy?' }
  ],
});

// Response includes context from your uploaded documents
console.log(response.choices[0].message.content);

Document Upload (via REST API):

Document upload is done via the REST API (not SDK methods). Here's a complete example:

const API_BASE_URL = 'https://api.nova.ai';
const JWT_TOKEN = 'your-jwt-token'; // From /auth/login
const ROUTE_CONFIG_ID = 'your-route-config-id';

// 1. Upload document
async function uploadDocument(file: File) {
  const formData = new FormData();
  formData.append('file', file);

  const response = await fetch(
    `${API_BASE_URL}/rag/collections/${ROUTE_CONFIG_ID}/documents`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${JWT_TOKEN}`,
      },
      body: formData,
    }
  );

  const document = await response.json();
  console.log('Document uploaded:', document.id);
  
  // Poll for processing completion
  return pollDocumentStatus(document.id);
}

// 2. Check processing status
async function pollDocumentStatus(documentId: string) {
  const maxAttempts = 30;
  const delayMs = 2000;

  for (let i = 0; i < maxAttempts; i++) {
    const response = await fetch(
      `${API_BASE_URL}/rag/documents/${documentId}`,
      {
        headers: {
          'Authorization': `Bearer ${JWT_TOKEN}`,
        },
      }
    );

    const document = await response.json();

    if (document.status === 'completed') {
      console.log('Document processed! Chunks:', document.chunkCount);
      return document;
    }

    if (document.status === 'failed') {
      throw new Error(`Processing failed: ${document.errorMessage}`);
    }

    await new Promise(resolve => setTimeout(resolve, delayMs));
  }

  throw new Error('Document processing timeout');
}

// 3. Use RAG in chat completions (automatic)
const response = await client.chat.create({
  model: ROUTE_CONFIG_ID, // Route config with ragEnabled: true
  messages: [
    { role: 'user', content: 'What is the vacation policy?' }
  ],
});

Token Savings with RAG:

Without RAG: Send entire documents (10,000+ tokens)
With RAG: Only relevant chunks (500-2,000 tokens)
Savings: 70-90% reduction in prompt tokens

RAG Configuration:

RAG settings are configured per route config:

chunkSize: 100-2048 tokens per chunk (default: 512)
chunkOverlap: 0-200 tokens overlap (default: 50)
topK: 1-20 chunks to retrieve (default: 5)
similarityThreshold: 0.5-0.95 minimum similarity (default: 0.7)

See RAG SDK Documentation for complete details.

Custom Metadata

const response = await client.chat.create({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hello' }],
  
  nova: {
    metadata: {
      userId: '123',
      sessionId: 'abc',
      feature: 'chatbot',
    },
  },
});

System Prompt Override

const response = await client.chat.create({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hello' }],
  
  nova: {
    systemPromptOverride: 'You are a specialized technical assistant.',
  },
});

gRPC Transport

For lower overhead and better performance, use gRPC transport:

import { NovaClient } from '@ancatag/n-r';

const client = new NovaClient({
  apiKey: process.env.NOVA_API_KEY || 'nova_sk_...',
  transport: 'grpc', // Use gRPC instead of REST
  grpcUrl: '0.0.0.0:50051', // Optional: defaults to 0.0.0.0:50051
});

// Same API, lower overhead
const response = await client.chat.create({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hello' }],
});

// Streaming also works with gRPC
for await (const chunk of client.chat.createStream({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hey' }],
})) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Transport Options:

transport: 'rest' (default) - Uses HTTP fetch
transport: 'grpc' - Uses gRPC over grpc-js with ts-proto stubs

API Reference

Client Configuration

interface NovaClientConfig {
  /** Nova API key (required, format: nova_sk_...) */
  apiKey: string;
  /** Base URL for REST API (default: http://localhost:3000) */
  baseUrl?: string;
  /** gRPC URL (default: 0.0.0.0:50051) */
  grpcUrl?: string;
  /** Preferred transport: 'rest' | 'grpc' (default: 'rest') */
  transport?: 'rest' | 'grpc';
  /** Request timeout in milliseconds (default: 60000) */
  timeoutMs?: number;
  /** Maximum number of retries for failed requests (default: 2) */
  maxRetries?: number;
}

Chat Completions

`client.chat.create(request)`

Create a non-streaming chat completion.

Parameters:

request: ChatCompletionRequest - Chat completion request (OpenAI-compatible)

Returns: Promise<ChatCompletionResponse>

`client.chat.createStream(request, signal?)`

Create a streaming chat completion.

Parameters:

request: ChatCompletionRequest - Chat completion request
signal?: AbortSignal - Optional abort signal for cancellation

Returns: AsyncIterable<ChatCompletionChunk>

Models

`client.models.list()`

List all available models in your project.

Returns: Promise<Model[]>

`client.models.get(modelId)`

Get details for a specific model.

Parameters:

modelId: string - Model identifier

Returns: Promise<Model>

Type Definitions

The SDK provides full TypeScript interfaces for all requests and responses. Here are the core types you'll use most often:

`ChatCompletionRequest`

The main payload for creating a chat completion.

interface ChatCompletionRequest {
  model?: string; // Optional: uses project default if omitted
  messages: ChatMessage[];
  temperature?: number;
  top_p?: number;
  max_tokens?: number;
  stream?: boolean;
  stop?: string | string[];
  
  // Structured Output & Function Calling
  response_format?: { type: "json_object" } | any;
  tools?: any[];
  tool_choice?: "auto" | "none" | any;
  
  // Nova-specific extensions
  nova?: {
    skipCache?: boolean;
    routeConfigId?: string;
    ragEnabled?: boolean;
    systemPromptOverride?: string;
    metadata?: Record<string, any>;
  };
}

`ChatMessage`

interface ChatMessage {
  role: "system" | "user" | "assistant" | "function";
  content: string;
  name?: string;
  function_call?: {
    name: string;
    arguments: string;
  };
}

`ChatCompletionResponse`

interface ChatCompletionResponse {
  id: string;
  object: "chat.completion";
  created: number;
  model: string;
  choices: {
    index: number;
    message: {
      role: "assistant";
      content: string;
      function_call?: { name: string; arguments: string; };
    };
    finish_reason: "stop" | "length" | "function_call" | "content_filter" | null;
  }[];
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
  nova?: {
    cacheHit: boolean;
    cacheLayer?: "hot" | "semantic" | null;
    tokensSaved: number;
    responseTimeMs: number;
    requestId: string;
  };
}

Exported Types

You can import any of these types directly from the SDK:

import type {
  ChatMessage,
  ChatCompletionRequest,
  ChatCompletionResponse,
  ChatCompletionChunk,
  Model,
  NovaClientConfig,
} from '@ancatag/n-r';

Nova-Specific Extensions

Request Extensions

interface NovaRequestExtensions {
  nova?: {
    /** Skip cache lookup for this request */
    skipCache?: boolean;
    /** Route config ID - specifies which route configuration to use */
    routeConfigId?: string;
    /** Enable RAG (Retrieval-Augmented Generation) for this request */
    ragEnabled?: boolean;
    /** Additional metadata to attach to the request */
    metadata?: Record<string, any>;
    /** Override the system prompt for this request */
    systemPromptOverride?: string;
  };
}

Response Extensions

interface NovaResponseExtensions {
  nova?: {
    /** Whether this response was served from cache */
    cacheHit: boolean;
    /** Cache layer used: 'hot' (exact match) | 'semantic' (similarity match) | null */
    cacheLayer?: 'hot' | 'semantic' | null;
    /** Number of tokens saved by cache hit */
    tokensSaved: number;
    /** Response time in milliseconds */
    responseTimeMs: number;
    /** Unique request ID for tracking */
    requestId: string;
  };
}

Advanced Features

Multi-Tier Caching

Nova automatically uses two cache layers:

Hot Cache - Exact match caching (7-30 day TTL)
- Instant responses for identical requests
- SHA-256 hash-based lookup
Semantic Cache - Similarity matching (95% threshold)
- Matches semantically similar prompts
- Vector embedding-based similarity search
- Available on paid plans

Cost Savings Tracking

Every response includes savings metrics:

const response = await client.chat.create({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Hello' }],
});

if (response.nova?.cacheHit) {
  console.log(`Saved ${response.nova.tokensSaved} tokens`);
  console.log(`Cache layer: ${response.nova.cacheLayer}`);
}

Model Routing

Nova automatically routes requests to the correct provider based on:

Model identifier in request
Project default model configuration
Route configuration (if specified)

// Uses project default if model not specified
const response = await client.chat.create({
  messages: [{ role: 'user', content: 'Hello' }],
  // model is optional - uses project default
});

RAG (Retrieval-Augmented Generation)

RAG provides 70-90% token savings by automatically retrieving only relevant document chunks instead of sending entire documents.

How RAG Works:

Automatic Context Retrieval: When you make a chat completion to a route config with ragEnabled: true, Nova-route automatically:
- Extracts the query from the last user message
- Generates a vector embedding of the query
- Searches Qdrant for semantically similar chunks
- Selects top-K chunks above similarity threshold
- Manages token budget to include only chunks that fit
- Injects retrieved chunks as context before the user's query
Token Budget Management: Nova-route automatically calculates:
```
Token Budget = Context Window - Existing Prompt Tokens - Headroom (200 tokens)
```
Only chunks that fit within this budget are included.

Prompt Format: The final prompt sent to the AI model includes:

[System Prompt]
[Pre-prompt Items]
   
Context:
[Relevant chunk 1 from your documents]
[Relevant chunk 2 from your documents]
   
Query: [User's original question]

RAG Benefits:

70-90% Token Savings: Only relevant chunks vs. full documents
Improved Accuracy: AI responses grounded in your documents
Automatic: No manual context management needed
Scalable: Works with large document collections
Intelligent: Semantic search finds relevant content even with different wording

Example Token Savings:

Full document: 10,000 tokens
Relevant chunks: 1,500 tokens
Savings: 8,500 tokens (85%)

See RAG SDK Documentation for complete setup and configuration details.

TypeScript Support

Full TypeScript support with comprehensive type definitions:

import { NovaClient } from '@ancatag/n-r';
import type {
  ChatCompletionRequest,
  ChatCompletionResponse,
  NovaClientConfig,
} from '@ancatag/n-r';

const config: NovaClientConfig = {
  apiKey: process.env.NOVA_API_KEY!,
  baseUrl: 'https://api.nova.ai',
};

const client = new NovaClient(config);

async function chat(
  request: ChatCompletionRequest
): Promise<ChatCompletionResponse> {
  return await client.chat.create(request);
}

Requirements

Node.js: 18.0.0 or higher (uses native fetch)
TypeScript: 5.0+ (for type definitions, optional)

Getting Started with Nova-route

Sign up at nova.ai (Free plan available)
Create a project in the dashboard
Configure your models (BYOP or use hosted providers)
Generate an API key (format: nova_sk_...)
Install the SDK and start saving on token costs!

Integration Examples

Express.js

import express from 'express';
import { NovaClient } from '@ancatag/n-r';

const app = express();
app.use(express.json());

const nova = new NovaClient({
  apiKey: process.env.NOVA_API_KEY,
});

app.post('/api/chat', async (req, res) => {
  try {
    const { messages } = req.body;
    const response = await nova.chat.create({
      messages,
      nova: {
        routeConfigId: process.env.NOVA_ROUTE_CONFIG_ID,
      }
    });
    
    res.json(response);
  } catch (error) {
    res.status(500).json({ error: 'Failed to generate response' });
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

Next.js App Router (Streaming)

// app/api/chat/route.ts
import { NovaClient } from '@ancatag/n-r';

// Ensure the Edge runtime if desired, or Node runtime automatically works
export const maxDuration = 60;

const nova = new NovaClient({
  apiKey: process.env.NOVA_API_KEY,
});

export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await nova.chat.createStream({
    messages,
    nova: {
      routeConfigId: process.env.NOVA_ROUTE_CONFIG_ID,
    }
  });

  // Create a ReadableStream from the AsyncIterable
  const readableStream = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || '';
        if (text) controller.enqueue(new TextEncoder().encode(text));
      }
      controller.close();
    }
  });

  return new Response(readableStream, {
    headers: { 'Content-Type': 'text/event-stream' }
  });
}

Migration from OpenAI

Switching from OpenAI to Nova-route is simple:

// Before (direct OpenAI)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// After (via Nova-route) - Just change the SDK!
import { NovaClient } from '@ancatag/n-r';

const client = new NovaClient({
  apiKey: process.env.NOVA_API_KEY, // Get from Nova-route dashboard
  baseUrl: 'https://api.nova.ai', // Nova-route API endpoint
});

// Your code stays exactly the same!
const response = await client.chat.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Documentation

Full SDK Documentation - Complete SDK reference
RAG SDK Documentation - RAG setup, document upload, and configuration
API Documentation - API endpoints and authentication
Architecture Guide - System design and components
Getting Started - Quick start guide

License

ISC

Support

Documentation: docs/
Issues: GitHub Issues
Dashboard: nova.ai

Built with ❤️ by the Nova-route team

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@ancatag/n-r

Features

Installation

Quick Start

Basic Chat Completion

Streaming

Usage Examples

Non-Streaming Chat Completion

Streaming with Cancellation

Models API

Error Handling

Nova-Specific Features

Cache Control

Route Configuration

RAG (Retrieval-Augmented Generation)

Custom Metadata

System Prompt Override

gRPC Transport

API Reference

Client Configuration

Chat Completions

client.chat.create(request)

client.chat.createStream(request, signal?)

Models

client.models.list()

client.models.get(modelId)

Type Definitions

ChatCompletionRequest

ChatMessage

ChatCompletionResponse

Exported Types

Nova-Specific Extensions

Request Extensions

Response Extensions

Advanced Features

Multi-Tier Caching

Cost Savings Tracking

Model Routing

RAG (Retrieval-Augmented Generation)

TypeScript Support

Requirements

Getting Started with Nova-route

Integration Examples

Express.js

Next.js App Router (Streaming)

Migration from OpenAI

Documentation

License

Support

`client.chat.create(request)`

`client.chat.createStream(request, signal?)`

`client.models.list()`

`client.models.get(modelId)`

`ChatCompletionRequest`

`ChatMessage`

`ChatCompletionResponse`