npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

naturalsql

v1.0.2

Published

Generate SQL from natural language using DB schema + vector retrieval

Readme

NaturalSQL TypeScript

Lightweight library to convert your SQL database schema into a vector database, so LLMs can use real table/column context to generate more accurate SQL from natural language.

Problem

Sometimes you want to interact with SQL databases through an LLM without pulling in large frameworks with many unrelated features.

Solution

NaturalSQL extracts your database schema, vectorizes it using a configurable backend (chroma or sqlite) and configurable embedding provider (local or gemini), then performs semantic retrieval to return relevant schema context for your LLM.

Installation

# Base package
npm i naturalsql

Install optional dependencies based on your configuration:

# Chroma + local embeddings
npm i chromadb @xenova/transformers

# SQLite + local embeddings
npm i better-sqlite3 @xenova/transformers

# SQLite + Gemini embeddings
npm i better-sqlite3 @google/generative-ai

# Chroma + Gemini embeddings
npm i chromadb @google/generative-ai

# PostgreSQL driver support
npm i pg

# MySQL driver support
npm i mysql2

# SQL Server driver support
npm i mssql

Node.js >=18 is required. For Gemini, this package uses @google/generative-ai. Use environment variables for API keys (for example, GEMINI_API_KEY), never hardcode secrets.

Supported SQL engines

  • PostgreSQL
  • MySQL
  • SQL Server
  • SQLite

Quick start

import { NaturalSQL } from "naturalsql";

// 1) Create an instance
const nsql = new NaturalSQL({
	dbUrl: "postgresql://user:password@localhost:5432/mydb",
	dbType: "postgresql"
});

// 2) Build vector DB from schema
const result = await nsql.buildVectorDb("./metadata_vdb");
console.log(`Indexed tables: ${result.indexedTables}`);
console.log(`From cache: ${result.fromCache}`);

// 3) Retrieve relevant tables for a question
const tables = await nsql.search("Show me sales from last month", {
	storagePath: "./metadata_vdb",
	limit: 3
});

// 4) Use returned tables as LLM context
for (const table of tables) {
	console.log(table);
}

Automatic cache

buildVectorDb() reuses existing vector storage when possible (same storage path + schema cache key), unless you pass forceReset: true.

Also, search() reuses the in-memory VectorManager instance for the same storagePath, reducing repeated initialization cost.

E2E examples

import { NaturalSQL } from "naturalsql";

// A) Chroma + local
const nsql = new NaturalSQL({
	dbUrl: "postgresql://user:pass@localhost:5432/mydb",
	dbType: "postgresql",
	vectorBackend: "chroma",
	embeddingProvider: "local",
	vectorDistanceThreshold: 0.35
});

await nsql.buildVectorDb("./metadata_vdb", { forceReset: false });
const tables = await nsql.search("sales for the last month", {
	storagePath: "./metadata_vdb",
	limit: 3
});
console.log(tables);
import { NaturalSQL } from "naturalsql";

// B) SQLite + Gemini
const nsql = new NaturalSQL({
	dbUrl: "sqlite:///./app.db",
	dbType: "sqlite",
	vectorBackend: "sqlite",
	embeddingProvider: "gemini",
	geminiApiKey: process.env.GEMINI_API_KEY,
	geminiEmbeddingModel: "text-embedding-004"
});

await nsql.buildVectorDb("./metadata_vdb_sqlite", { forceReset: false });
const tables = await nsql.search("users with recent purchases", {
	storagePath: "./metadata_vdb_sqlite",
	limit: 3
});
console.log(tables);

API

new NaturalSQL(options?)

Creates an instance with DB and embedding configuration.

| Parameter | Type | Default | Description | |---|---|---|---| | dbUrl | string \| null | null | Database connection URL | | dbType | string | "sqlite" | Engine: postgresql, mysql, sqlite, sqlserver | | normalizeEmbeddings | boolean | true | Normalize embedding vectors | | device | string | "cpu" | Embedding device: cpu or cuda | | vectorBackend | "chroma" \| "sqlite" | "sqlite" | Vector backend | | embeddingProvider | "local" \| "gemini" | "local" | Embedding provider | | geminiApiKey | string \| null | null | Required when embeddingProvider="gemini" | | geminiEmbeddingModel | string | "text-embedding-004" | Gemini embedding model | | vectorDistanceThreshold | number | 0.35 | Max distance threshold used by search() filtering |

Supported backend/provider combinations:

  • chroma + local
  • sqlite + local
  • sqlite + gemini
  • chroma + gemini

await nsql.buildVectorDb(storagePath, options?) -> BuildVectorDbResult

Connects to the DB, extracts schema, and indexes it in the configured vector backend.

storagePath is required.

| Options | Type | Default | Description | |---|---|---|---| | forceReset | boolean | false | Rebuild vector collection from scratch | | cacheKey | string | schema hash | Optional custom cache key |

Returns:

| Key | Type | Description | |---|---|---| | storagePath | string | Storage path used | | indexedTables | number | Number of indexed tables | | fromCache | boolean | true if existing vector store was reused |

await nsql.search(request, options?) -> string[]

Retrieves semantically relevant tables for a natural-language question.

| Parameter | Type | Default | Description | |---|---|---|---| | request | string | required | Natural-language question | | options.storagePath | string | "./metadata_vdb" | Vector storage path | | options.limit | number | 3 | Maximum number of tables to return |

buildPrompt(relevantTables, userQuestion) -> string

Helper function in naturalsql to create an LLM prompt from relevant schema tables and a user question.

import { buildPrompt } from "naturalsql";

const prompt = buildPrompt(tables, "Show me sales from last month");

Technical strategy

  1. Connection: connects using native drivers (pg, mysql2, mssql) or SQLite support.
  2. Schema extraction: uses metadata queries for PostgreSQL/MySQL/SQL Server or PRAGMA table_info for SQLite.
  3. Vectorization: each table schema is transformed into a semantic document and indexed in configured backend (chroma or sqlite) with local or Gemini embeddings.
  4. Retrieval: semantic nearest-neighbor search + vectorDistanceThreshold filtering.
  5. Caching: vector index metadata cache + in-memory manager reuse to reduce repeated latency.

Development

npm run test
npm run typecheck
npm run build

License

Apache-2.0