@mop9047/db-agent
v1.0.0
Published
Database RAG layer — natural-language queries over relational and vector databases with ranked, cited results.
Maintainers
Readme
db-agent
A database RAG (Retrieval-Augmented Generation) layer that takes natural-language questions plus structured filters, queries one or more external databases (relational or vector), and returns ranked, cited results.
Read-only. Source-cited. Never throws.
Install
npm install db-agentQuick start
import { createDbAgent, createRelationalAdapter } from 'db-agent';
const adapter = createRelationalAdapter({
adapter_id: 'violations_db',
execute: async (sql, params) => db.query(sql, params), // your DB driver
schema: {
trade_events: {
description: 'Records of employee trade activity',
columns: [
{ name: 'id', type: 'uuid' },
{ name: 'employee_id', type: 'varchar' },
{ name: 'symbol', type: 'varchar' },
{ name: 'trade_date', type: 'timestamp' },
{ name: 'notes', type: 'text' },
],
},
},
});
const agent = createDbAgent({ adapter });
const result = await agent.query({
questions: [
'Find all trades for employee emp_123 in August 2023',
'Were there any FINRA rule violations for this employee?',
],
filters: {
employee_id: 'emp_123',
date_range: { from: '2023-08-01', to: '2023-08-31' },
rule_ids: ['FINRA-3110'],
},
context_hints: ['trade_events', 'rule_violations'],
top_k: 5,
});
if (result.ok) {
for (const finding of result.findings) {
console.log(finding.question, finding.status, finding.results);
}
}Adapters
Relational adapter
Generates parameterized SELECT queries against any relational database. You supply an execute function — the library never manages connections.
import { createRelationalAdapter } from 'db-agent/adapters/relational';
const adapter = createRelationalAdapter({
adapter_id: 'my_db',
// Required: caller provides query execution
execute: async (sql, params) => rows,
// Option A: provide schema statically
schema: {
my_table: {
description: 'Human-readable description helps table selection',
columns: [
{ name: 'id', type: 'uuid' },
{ name: 'user_id', type: 'varchar' },
{ name: 'created_at', type: 'timestamp' },
{ name: 'notes', type: 'text' },
],
},
},
// Option B: auto-introspect via INFORMATION_SCHEMA (default when schema absent)
// introspect: true,
// catalog: 'my_database', // optional catalog scoping
});SQL generation:
- Always parameterized (
$1,$2, ...) — filter values are never interpolated - SELECT only — the builder rejects any non-SELECT at runtime
- LIMIT is always enforced (top_k)
- Date filters map to
>=/<=on timestamp/date columns - Keyword search runs ILIKE on the first text column
Vector adapter
Embeds questions and runs nearest-neighbor search. You supply embed and search functions.
import { createVectorAdapter } from 'db-agent/adapters/vector';
const adapter = createVectorAdapter({
adapter_id: 'knowledge_base',
// Required: embed a question into a vector
embed: async (text) => openai.embeddings.create({ input: text }).then(r => r.data[0].embedding),
// Required: search the vector store
search: async ({ embedding, top_k, filter }) => {
return pinecone.query({ vector: embedding, topK: top_k, filter });
},
collection: 'compliance_docs', // used for source citations
// Optional: map DbAgentInput filter keys to your vector DB's metadata keys
filter_mappings: {
employee_id: 'emp_id',
'date_range.from': 'start_date',
'date_range.to': 'end_date',
},
});Composite adapter
Fans queries out to multiple adapters and combines results.
import { createCompositeAdapter } from 'db-agent/adapters/composite';
const adapter = createCompositeAdapter({
adapter_id: 'composite',
adapters: [relationalAdapter, vectorAdapter],
// 'merge': query all adapters concurrently, interleave by score, deduplicate
// 'sequential': try adapters in order, stop when a question is answered
strategy: 'merge',
});Input schema
interface DbAgentInput {
questions: string[]; // Natural-language questions (answered independently)
filters?: {
employee_id?: string;
account_id?: string;
symbol?: string;
date_range?: { from?: string; to?: string }; // ISO8601
rule_ids?: string[];
[key: string]: unknown; // Domain-specific filters — mapped if column name matches
};
context_hints?: string[]; // Table/collection names to prioritize
top_k?: number; // Max results per question (default 5)
}Output schema
interface DbAgentOutput {
ok: boolean;
findings: DbAgentFinding[];
retrieval_metadata: {
db_type: 'relational' | 'vector' | 'composite';
adapter_id: string;
questions_asked: number;
results_returned: number;
retrieved_at: string; // ISO8601
duration_ms: number;
};
error?: {
code: string;
message: string;
details?: string[];
};
}
interface DbAgentFinding {
question: string;
results: DbAgentResult[];
status: 'found' | 'not_found' | 'error';
retrieved_from: 'relational' | 'vector';
}
interface DbAgentResult {
content: string;
source: {
table?: string; // relational: table name
collection?: string; // vector: collection name
row_id?: string; // relational: primary key
chunk_id?: string; // vector: chunk id
score?: number; // similarity/relevance score 0-1
};
metadata: Record<string, unknown>;
}Error handling
The agent never throws. All errors are returned in the output envelope.
// Partial failure — some questions succeeded
{
ok: false,
findings: [
{ question: "Q1", status: "found", results: [...] },
{ question: "Q2", status: "error", results: [] },
],
error: {
code: "partial_failure",
message: "One or more questions could not be retrieved.",
details: ["Question \"Q2\" failed: connection timeout"]
}
}
// Total failure
{
ok: false,
findings: [],
error: {
code: "connection_failed" | "timeout" | "query_failed" | "invalid_input",
message: "...",
details: []
}
}Design constraints
| Constraint | Detail |
|---|---|
| Read-only | The relational adapter never generates INSERT/UPDATE/DELETE. The SQL builder rejects any non-SELECT at construction time. |
| No hardcoded schemas | Works with any schema — introspects via INFORMATION_SCHEMA or uses caller-provided descriptions. |
| Caller owns connections | The library never opens DB connections. All I/O goes through execute/embed/search functions you provide. |
| Source citations required | Every DbAgentResult has a populated source. Results without a traceable source are dropped before return. |
| Partial results preferred | If 3 of 5 questions succeed, those 3 are returned with ok: false and a partial_failure error. |
| No LLM calls | The library does retrieval only. No language model API is called internally. |
| top_k always enforced | Never returns more than top_k results per question (default 5). |
Running tests
npm testTests cover: happy path, not_found, partial failure, empty filters, SQL injection safety, schema introspection caching.
Package structure
src/
index.js # Public exports
agent.js # Core orchestrator
adapters/
relational.js # Relational adapter
vector.js # Vector adapter
composite.js # Composite adapter
sql-builder.js # Parameterized SELECT generation
schema-introspector.js # INFORMATION_SCHEMA queries + table selection
result-formatter.js # Normalizes rows/hits to DbAgentResult[]
test/
agent.test.js
adapters/
relational.test.js
vector.test.js
fixtures/
mock-schema.js