rag-api-kit

v1.0.0

Published

a month ago

Backend toolkit for building secure, scalable Retrieval-Augmented Generation (RAG) APIs

0High
0Medium
0Low

usmanshamas

rag retrieval embedding vector llm api

rag-api-kit

Production-grade backend toolkit for building secure, scalable Retrieval-Augmented Generation (RAG) APIs.

TypeScript — Written in TypeScript with full typings.
Modular — Embedding, vector store, cache, and security are pluggable.
DI-friendly — Inject your own providers via interfaces.
Production-oriented — Error handling, metrics, no demo shortcuts.

Install

npm install rag-api-kit

Peer / runtime: openai and ioredis (only if using Redis cache). Install them in your app:

npm install openai ioredis

Quick start

import { createRagKit } from "rag-api-kit";

const rag = createRagKit({
  embedding: { provider: "openai", apiKey: process.env.OPENAI_API_KEY! },
  vectorStore: { type: "memory" },
  retrieval: { topK: 5, minScore: 0.7 },
  chunk: { size: 100, overlap: 20 },
  security: { strict: true },
});

await rag.ingest([
  "Your document text here.",
  { text: "Another doc.", metadata: { source: "api" } },
]);

const result = await rag.ask("Your question?");
console.log(result.answer);      // Assembled context
console.log(result.context);     // Ranked chunks
console.log(result.metrics);     // retrievalTimeMs, cacheHits, etc.

Configuration

| Key | Required | Description | |-----|----------|-------------| | embedding | Yes* | { provider: "openai", apiKey, model? } or supply embeddingProvider | | vectorStore | Yes | { type: "memory" } or inject a VectorStore | | retrieval | Yes | { topK, minScore?, maxContextTokens? } | | chunk | Yes | { size, overlap } (token-based chunking) | | cache | No | { redisUrl, ttlSeconds?, similarityThreshold? } for Redis semantic cache | | security | No | { strict: true } enables prompt-injection filter |

* Either embedding (with provider: "openai") or embeddingProvider must be provided.

Features

Auto chunking

Token-based chunking with configurable size and overlap.
Metadata is preserved on each chunk.

import { createChunker } from "rag-api-kit";

const chunker = createChunker({ size: 100, overlap: 20 });
const chunks = chunker.chunk(["Long document...", { text: "Doc with meta.", metadata: { id: 1 } }]);

Embedding wrapper

Interface: EmbeddingProvider (embed, embedBatch, dimensions).
Built-in: OpenAIEmbeddingProvider (OpenAI embeddings).
You can implement the interface for other providers.

Vector store

Interface: VectorStore (add, search, clear, size).
Built-in: in-memory store with cosine similarity, topK, minScore, and metadata filters.

Redis semantic cache (optional)

Cache by embedding similarity: similar queries return cached response.
Configurable TTL and similarity threshold.
Enable by passing cache: { redisUrl, ttlSeconds?, similarityThreshold? }.

Prompt injection filter

Detects patterns such as “ignore previous instructions”, “reveal system prompt”, “act as system”, and prompt override attempts.
Configurable strict mode; returns structured SecurityError with reason and pattern.

import { createPromptInjectionFilter } from "rag-api-kit";

const filter = createPromptInjectionFilter({ strict: true });
const result = filter.check(userInput);
if (!result.safe) throw new SecurityError(result.reason!, result.pattern, result.reason);

Retriever pipeline

Embeds query → searches vector store → assembles context → enforces maxContextTokens when set.
Returns ranked chunks (scores from cosine similarity).

Metrics

result.metrics: retrievalTimeMs, embeddingTimeMs, totalTimeMs, cacheHits, cacheMisses.
rag.stats(): documentCount, chunkCount, cacheHits, cacheMisses.

API summary

new RagKit(config) / createRagKit(config) — Build the RAG pipeline.
ingest(documents) — Chunk, embed, and store. documents: string[] or { text, metadata? }[].
ask(query, options?) — Security check → optional cache → retrieve → return { answer, context, fromCache, metrics }. options: { skipCache?, skipSecurityCheck? }.
stats() — { documentCount, chunkCount, cacheHits, cacheMisses }.
clearCache() — Clears semantic cache and resets metrics.

Dependency injection

You can inject your own implementations:

createRagKit({
  embedding: { provider: "openai", apiKey: "" },
  embeddingProvider: myEmbeddingProvider,
  vectorStore: myVectorStore,
  retrieval: { topK: 5 },
  chunk: { size: 50, overlap: 10 },
  security: { strict: true },
});

Implement:

EmbeddingProvider — embed(text), embedBatch(texts), dimensions().
VectorStore — add(docs), search(vector, options), clear(), size().
SecurityFilter — check(input) → { safe, reason?, pattern? }.
CacheLayer / SemanticCache — for key-based or similarity-based caching.

Errors

RagError — Base (code: RagError).
SecurityError — Prompt injection or security check failed (SECURITY_VIOLATION).
EmbeddingError — Embedding call failed.
VectorStoreError — Vector store operation failed.
CacheError — Cache operation failed.
ConfigurationError — Invalid or missing config.

Project structure

src/
  core/         # Chunker, Retriever, errors
  providers/    # OpenAI embedding
  adapters/     # In-memory vector store
  security/     # Prompt injection filter
  cache/        # In-memory + Redis semantic cache
  types/        # Interfaces and config types
  utils/        # Cosine similarity, metrics, token utils
  rag-kit.ts    # Main RagKit class
  index.ts      # Public API

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

rag-api-kit

Install

Quick start

Configuration

Features

Auto chunking

Embedding wrapper

Vector store

Redis semantic cache (optional)

Prompt injection filter

Retriever pipeline

Metrics

API summary

Dependency injection

Errors

Project structure

License