@deepagents/text2sql
v3.0.1
Published
AI-powered natural language to SQL. Ask questions in plain English, get executable queries.
Readme
@deepagents/text2sql
AI-powered natural language to SQL. Ask questions in plain English, get executable queries.
Features
- Natural Language to SQL - Convert questions to validated, executable queries
- Multi-Database Support - PostgreSQL, SQLite, SQL Server, MySQL/MariaDB, and BigQuery adapters
- Schema-Aware - Automatic introspection of tables, relationships, indexes, and constraints
- Domain Knowledge - Inject business terms, guardrails, and query patterns via fragments
- Conversational - Multi-turn conversations with context persistence
- Safe by Default - Read-only queries, validation, and configurable guardrails
Installation
npm install @deepagents/text2sqlInstall the database driver or client library that matches your adapter:
npm install pg # PostgreSQL
npm install mssql # SQL Server
npm install mysql2 # MySQL / MariaDB
npm install @google-cloud/bigquery # BigQueryRequires Node.js LTS
Quick Start
import { groq } from '@ai-sdk/groq';
import pg from 'pg';
import { Text2Sql } from '@deepagents/text2sql';
import {
Postgres,
columnValues,
constraints,
indexes,
info,
rowCount,
tables,
views,
} from '@deepagents/text2sql/postgres';
const pool = new pg.Pool({
connectionString: process.env.DATABASE_URL,
});
const adapter = new Postgres({
execute: async (sql) => {
const result = await pool.query(sql);
return result.rows;
},
grounding: [
tables(),
views(),
info(),
indexes(),
constraints(),
rowCount(),
columnValues(),
],
});
const text2sql = new Text2Sql({
model: groq('openai/gpt-oss-20b'),
adapters: { main: adapter },
});
// Generate SQL
const sql = await text2sql.toSql(
'Show me the top 10 customers by revenue',
'main',
);
console.log(sql);The adapter-map key (main here) is the adapter name. Reuse that same key in
text2sql.toSql(..., 'main') and in any sql validate <db> "..." /
sql run <db> "..." calls from a sandbox where the package CLI is installed.
Adapter names must match /^[A-Za-z_][A-Za-z0-9_]*$/. If you build adapter
maps dynamically, use isValidAdapterName(name) to check one key or
validateAdapterNames(names) to fail fast before constructing Text2Sql or
the sandbox-side adapter module used by the sql CLI.
AI Model Providers
Text2SQL works with any model provider supported by the Vercel AI SDK, including OpenAI, Anthropic, Google, Groq, and more.
Building a Conversational Agent
Text2Sql owns SQL generation and schema indexing, but you own the chat loop.
Build the agent by composing a ContextEngine, a sandbox, and agent +
chat from @deepagents/context:
import {
ContextEngine,
type ContextFragment,
InMemoryContextStore,
agent,
chat,
createBashTool,
createDockerSandbox,
errorRecoveryGuardrail,
npm,
user,
} from '@deepagents/context';
import { createSqlCommandHooks, instructions } from '@deepagents/text2sql';
const store = new InMemoryContextStore();
const context = new ContextEngine({
store,
chatId: 'chat-123',
userId: 'user-456',
});
const backend = await createDockerSandbox({
installers: [npm('@deepagents/text2sql', { ensureRuntime: true })],
volumes: [
{
type: 'bind',
hostPath: process.cwd(),
containerPath: '/workspace',
readOnly: true,
},
],
env: {
TEXT2SQL_ADAPTERS: '/workspace/text2sql-adapters.ts',
},
});
const sandbox = await createBashTool({
sandbox: backend,
...createSqlCommandHooks({ adapters: { main: adapter } }),
});
const indexResult = await sandbox.sandbox.executeCommand('sql index');
if (indexResult.exitCode !== 0) throw new Error(indexResult.stderr);
const manifest = JSON.parse(indexResult.stdout) as { fragmentsPath: string };
const fragments = JSON.parse(
await sandbox.sandbox.readFile(manifest.fragmentsPath),
) as ContextFragment[];
context.set(...instructions(), ...fragments);
const ai = agent({
name: 'sql-assistant',
sandbox,
model: groq('openai/gpt-oss-20b'),
context,
guardrails: [errorRecoveryGuardrail],
maxGuardrailRetries: 3,
});
await context.continue(user('Show me the top 10 customers by revenue'));
const stream = await chat(ai);
for await (const chunk of stream) {
// handle streaming response
}The /workspace/text2sql-adapters.ts module must exist in the sandbox and
default-export your adapter map. Mount your project into /workspace (as above)
or upload/write that module before you call sql index or chat().
instructions() returns the SQL-flavored system fragments (policies, workflows,
clarifications, style guides) — spread them into context.set() alongside the
schema fragments returned by sql index. Add or replace them with your
own domain fragments as needed.
Advanced: SQL CLI in Sandboxes
sql validate <db> "..." and sql run <db> "..." are real commands from the
@deepagents/text2sql package. sql index writes schema fragments plus
progress events for chat setup, indexing all configured adapters by default
(same as --all) unless adapter names are provided.
Install the package inside the sandbox and set TEXT2SQL_ADAPTERS to a module
whose default export is Record<string, Adapter>. Missing sql means the
sandbox was not prepared correctly.
sql index output details:
stdout: JSON manifest withfragmentsPath,eventsPath, adapters, and fragment count.--verbose prettyor--verbose json: mirrors progress events tostderrwhile keepingstdoutas the manifest.--out-dir <path>: writes artifacts under that path (default:$TEXT2SQL_OUT_DIRor./sql).
The sql CLI caches introspected schema only when you opt in via env: set
TEXT2SQL_INDEX_CACHE_DIR (where cache files live) and/or
TEXT2SQL_INDEX_VERSION (an invalidation token — bump it when the schema
changes). With neither set, every sql index introspects fresh. See
Schema index caching & coordination for
the underlying injectable primitives.
For in-process or virtual-sandbox usage (without installing the package CLI),
wrap an existing Text2Sql instance as a just-bash custom command:
import { InMemoryFs, createVirtualSandbox } from '@deepagents/context';
import {
type CreateSqlCommandOptions,
type CreateSqlCommandResult,
Text2Sql,
createSqlCommand,
} from '@deepagents/text2sql';
const text2sql = new Text2Sql({ model, adapters: { main: adapter } });
const commandOptions: CreateSqlCommandOptions = {
outputDir: '/sql-artifacts',
};
const sqlCommand: CreateSqlCommandResult = createSqlCommand(
text2sql,
commandOptions,
);
const sandbox = await createVirtualSandbox({
fs: new InMemoryFs(),
customCommands: [sqlCommand.command],
});
await sandbox.executeCommand('sql validate main "SELECT 1"');CreateSqlCommandOptions configures command defaults (currently outputDir).
CreateSqlCommandResult returns the command plus a repair(raw) helper for
normalizing model-generated argv before execution.
Spread createSqlCommandHooks({ adapters }) into createBashTool() for
model-driven bash calls. The before hook preserves
the old virtual-command tolerance for common LLM quote mistakes, rewrites SQL
identifier backticks so bash does not run them as command substitutions, and
blocks raw database access so read-only and scope checks stay behind
sql validate / sql run. The after hook restores hidden formattedSql
metadata from the host adapter map without putting that concern into the real
CLI.
Read-only enforcement accepts a single SELECT/WITH statement even when it
starts with whitespace or SQL comments (-- ..., /* ... */). It still
rejects comment-only input, multi-statement batches, and write operations.
Fragments
Inject domain knowledge by setting fragments on the ContextEngine you build
for the agent. Those fragments affect every chat() turn. If you need direct
SQL generation with extra fragments, use the lower-level toSql({ fragments })
helper exported from @deepagents/text2sql.
import {
ContextEngine,
InMemoryContextStore,
example,
guardrail,
hint,
term,
} from '@deepagents/context';
const store = new InMemoryContextStore();
const context = new ContextEngine({
store,
chatId: 'chat-123',
userId: 'user-456',
});
context.set(
term('MRR', 'monthly recurring revenue'),
hint('Always exclude test accounts with email ending in @test.com'),
guardrail({
rule: 'Never expose individual salaries',
reason: 'Confidential HR data',
action: 'Aggregate by department instead',
}),
example({
question: 'show me churned customers',
answer: `SELECT * FROM customers WHERE status = 'churned' ORDER BY churned_at DESC`,
}),
);Domain fragments (11 types): term, hint, guardrail, example, explain, clarification, workflow, quirk, styleGuide, analogy, glossary.
User fragments (5 types): identity, persona, alias, preference, correction.
See @deepagents/context for full fragment documentation.
Grounding
Control what schema metadata the AI receives:
| Function | Description |
| ---------------- | --------------------------------------------------------- |
| tables() | Tables, columns, and primary keys |
| views() | Database views |
| info() | Database version and info |
| indexes() | Index information for performance hints |
| constraints() | Foreign keys and other constraints |
| rowCount() | Table sizes (tiny, small, medium, large, huge) |
| columnStats() | Min/max/null distribution for columns |
| columnValues() | Enum-like and low-cardinality columns with sampled values |
Conversations
chat() (from @deepagents/context) persists history through the
ContextEngine you build for the agent. Reuse the same store, chatId, and
userId to continue the same thread. Before each turn, append only the new
incoming message to the engine; do not replay earlier turns that are already
stored in the context store:
await context.continue(user('Show me orders from last month'));
const stream = await chat(ai);
for await (const chunk of stream) {
// handle streaming response
}
// Continue the same conversation
await context.continue(user('Now filter to only completed ones'));
const followUp = await chat(ai);
for await (const chunk of followUp) {
// handle streaming response
}Schema index caching & coordination
Schema introspection is the expensive part of indexing. Text2Sql and
AdapterIndexer own no storage and no locking — you inject both as optional
primitives, keyed per adapter:
export interface IndexCache {
read(key: string): Promise<ContextFragment[] | null>;
write(key: string, fragments: ContextFragment[]): Promise<void>;
}
export interface IndexLock {
// Acquire (waiting if held elsewhere), run fn, release — even on throw.
// Reject if the lock can't be acquired (callers fail closed).
run<T>(key: string, fn: () => Promise<T>): Promise<T>;
}- No
cache→ everyindex()introspects fresh (no cache events). cacheonly → warm reads skip introspection within whatever the cache spans.cache+lock→ introspection is single-flight: concurrent callers for the same adapter wait on the lock, then read the warm cache.
A file-backed IndexCache ships as a composable primitive:
import { FileIndexCache, Text2Sql } from '@deepagents/text2sql';
const text2sql = new Text2Sql({
model,
adapters: { main: adapter },
// Point `dir` at a shared volume to share one cache across processes.
cache: new FileIndexCache({ dir: '/var/cache/text2sql', namespace: 'v1' }),
});FileIndexCache writes atomically (temp + rename) and treats an unparseable
file as a miss, so a torn read self-heals into a re-introspect.
Horizontally-scaled deployments
Running many processes/containers against one database (e.g. a fleet of daemons)? Two coordination tiers:
- The lock alone serializes introspection — never two concurrent
introspections of the same database (caps peak DB load). Back it with a
distributed lock you already operate (a Redis lock, a Postgres
pg_advisory_lock, etc.). - The lock plus a shared cache directory gives fleet-wide single-flight: the lock holder writes the cache on the shared volume, and every waiter reads it instead of re-introspecting.
On object-storage-backed volumes (GCS/S3 FUSE)
renameis not atomic and file locks are unreliable — that is exactly why the lock and cache are injectable. Use a real distributed lock; theFileIndexCacheparse-as-miss behavior plus the post-lock recheck tolerate a non-atomic write.
See demo/text2sql-daemon for a daemon that injects a FileIndexCache
(TEXT2SQL_INDEX_CACHE_DIR / TEXT2SQL_INDEX_VERSION) and a pg_advisory_lock
backed IndexLock.
Streaming Index Progress
AdapterIndexer#index({ onProgress }) emits progress events while it warms or reads
the schema cache for host-side indexing. The sandbox CLI writes the same event
shape to the eventsPath file returned by sql index:
import { createUIMessageStream } from 'ai';
import {
AdapterIndexer,
TEXT2SQL_INDEX_PROGRESS_CHUNK,
type Text2SqlIndexProgressEvent,
} from '@deepagents/text2sql';
const indexer = new AdapterIndexer({ adapters });
await context.continue(user('Show me top 10 customers'));
const stream = createUIMessageStream({
execute: async ({ writer }) => {
const head = await context.headMessage();
if (!head || head.name !== 'assistant') {
throw new Error(
'expected head to be an assistant message — call context.continue() before chat()',
);
}
writer.write({ type: 'start', messageId: head.id });
const fragments = await indexer.index({
onProgress: (event) =>
writer.write({
type: TEXT2SQL_INDEX_PROGRESS_CHUNK,
data: event,
}),
});
context.set(...instructions(), ...fragments);
writer.merge(await chat(ai));
},
});Each progress event includes a generic Unix epoch timestampMs so clients can
calculate per-index, per-adapter, or per-phase durations without Text2SQL
imposing one duration policy.
toSql() is stateless. It reads the current schema fragments, but it does not
write messages, titles, or usage into your context store.
Direct SQL Generation with Extra Fragments
This lower-level helper still takes a single adapter because it bypasses the
multi-adapter Text2Sql wrapper entirely.
import { term } from '@deepagents/context';
import { toSql } from '@deepagents/text2sql';
const result = await toSql({
input: 'Show ARR by plan',
adapter,
model: groq('openai/gpt-oss-20b'),
fragments: [
...(await adapter.introspect()),
term('ARR', 'annual recurring revenue'),
],
});
console.log(result.sql);Documentation
Full documentation available at januarylabs.github.io/deepagents:
- Getting Started
- Generate SQL
- Agent + SQLite Store Recipe
- Teach the System
- Build Conversations
- Grounding
- PostgreSQL
- SQLite
- SQL Server
- MySQL / MariaDB
- BigQuery
