@mop9047/db-agent

v1.0.0

Published

18 days ago

Database RAG layer — natural-language queries over relational and vector databases with ranked, cited results.

0High
0Medium
0Low

mop9047

database rag retrieval agent vector relational

db-agent

A database RAG (Retrieval-Augmented Generation) layer that takes natural-language questions plus structured filters, queries one or more external databases (relational or vector), and returns ranked, cited results.

Read-only. Source-cited. Never throws.

Install

npm install db-agent

Quick start

import { createDbAgent, createRelationalAdapter } from 'db-agent';

const adapter = createRelationalAdapter({
  adapter_id: 'violations_db',
  execute: async (sql, params) => db.query(sql, params),  // your DB driver
  schema: {
    trade_events: {
      description: 'Records of employee trade activity',
      columns: [
        { name: 'id', type: 'uuid' },
        { name: 'employee_id', type: 'varchar' },
        { name: 'symbol', type: 'varchar' },
        { name: 'trade_date', type: 'timestamp' },
        { name: 'notes', type: 'text' },
      ],
    },
  },
});

const agent = createDbAgent({ adapter });

const result = await agent.query({
  questions: [
    'Find all trades for employee emp_123 in August 2023',
    'Were there any FINRA rule violations for this employee?',
  ],
  filters: {
    employee_id: 'emp_123',
    date_range: { from: '2023-08-01', to: '2023-08-31' },
    rule_ids: ['FINRA-3110'],
  },
  context_hints: ['trade_events', 'rule_violations'],
  top_k: 5,
});

if (result.ok) {
  for (const finding of result.findings) {
    console.log(finding.question, finding.status, finding.results);
  }
}

Adapters

Relational adapter

Generates parameterized SELECT queries against any relational database. You supply an execute function — the library never manages connections.

import { createRelationalAdapter } from 'db-agent/adapters/relational';

const adapter = createRelationalAdapter({
  adapter_id: 'my_db',

  // Required: caller provides query execution
  execute: async (sql, params) => rows,

  // Option A: provide schema statically
  schema: {
    my_table: {
      description: 'Human-readable description helps table selection',
      columns: [
        { name: 'id', type: 'uuid' },
        { name: 'user_id', type: 'varchar' },
        { name: 'created_at', type: 'timestamp' },
        { name: 'notes', type: 'text' },
      ],
    },
  },

  // Option B: auto-introspect via INFORMATION_SCHEMA (default when schema absent)
  // introspect: true,
  // catalog: 'my_database',   // optional catalog scoping
});

SQL generation:

Always parameterized ($1, $2, ...) — filter values are never interpolated
SELECT only — the builder rejects any non-SELECT at runtime
LIMIT is always enforced (top_k)
Date filters map to >=/<= on timestamp/date columns
Keyword search runs ILIKE on the first text column

Vector adapter

Embeds questions and runs nearest-neighbor search. You supply embed and search functions.

import { createVectorAdapter } from 'db-agent/adapters/vector';

const adapter = createVectorAdapter({
  adapter_id: 'knowledge_base',

  // Required: embed a question into a vector
  embed: async (text) => openai.embeddings.create({ input: text }).then(r => r.data[0].embedding),

  // Required: search the vector store
  search: async ({ embedding, top_k, filter }) => {
    return pinecone.query({ vector: embedding, topK: top_k, filter });
  },

  collection: 'compliance_docs',  // used for source citations

  // Optional: map DbAgentInput filter keys to your vector DB's metadata keys
  filter_mappings: {
    employee_id: 'emp_id',
    'date_range.from': 'start_date',
    'date_range.to': 'end_date',
  },
});

Composite adapter

Fans queries out to multiple adapters and combines results.

import { createCompositeAdapter } from 'db-agent/adapters/composite';

const adapter = createCompositeAdapter({
  adapter_id: 'composite',
  adapters: [relationalAdapter, vectorAdapter],

  // 'merge': query all adapters concurrently, interleave by score, deduplicate
  // 'sequential': try adapters in order, stop when a question is answered
  strategy: 'merge',
});

Input schema

interface DbAgentInput {
  questions: string[];         // Natural-language questions (answered independently)

  filters?: {
    employee_id?: string;
    account_id?: string;
    symbol?: string;
    date_range?: { from?: string; to?: string };  // ISO8601
    rule_ids?: string[];
    [key: string]: unknown;    // Domain-specific filters — mapped if column name matches
  };

  context_hints?: string[];    // Table/collection names to prioritize
  top_k?: number;              // Max results per question (default 5)
}

Output schema

interface DbAgentOutput {
  ok: boolean;
  findings: DbAgentFinding[];
  retrieval_metadata: {
    db_type: 'relational' | 'vector' | 'composite';
    adapter_id: string;
    questions_asked: number;
    results_returned: number;
    retrieved_at: string;   // ISO8601
    duration_ms: number;
  };
  error?: {
    code: string;
    message: string;
    details?: string[];
  };
}

interface DbAgentFinding {
  question: string;
  results: DbAgentResult[];
  status: 'found' | 'not_found' | 'error';
  retrieved_from: 'relational' | 'vector';
}

interface DbAgentResult {
  content: string;
  source: {
    table?: string;         // relational: table name
    collection?: string;    // vector: collection name
    row_id?: string;        // relational: primary key
    chunk_id?: string;      // vector: chunk id
    score?: number;         // similarity/relevance score 0-1
  };
  metadata: Record<string, unknown>;
}

Error handling

The agent never throws. All errors are returned in the output envelope.

// Partial failure — some questions succeeded
{
  ok: false,
  findings: [
    { question: "Q1", status: "found", results: [...] },
    { question: "Q2", status: "error", results: [] },
  ],
  error: {
    code: "partial_failure",
    message: "One or more questions could not be retrieved.",
    details: ["Question \"Q2\" failed: connection timeout"]
  }
}

// Total failure
{
  ok: false,
  findings: [],
  error: {
    code: "connection_failed" | "timeout" | "query_failed" | "invalid_input",
    message: "...",
    details: []
  }
}

Design constraints

| Constraint | Detail | |---|---| | Read-only | The relational adapter never generates INSERT/UPDATE/DELETE. The SQL builder rejects any non-SELECT at construction time. | | No hardcoded schemas | Works with any schema — introspects via INFORMATION_SCHEMA or uses caller-provided descriptions. | | Caller owns connections | The library never opens DB connections. All I/O goes through execute/embed/search functions you provide. | | Source citations required | Every DbAgentResult has a populated source. Results without a traceable source are dropped before return. | | Partial results preferred | If 3 of 5 questions succeed, those 3 are returned with ok: false and a partial_failure error. | | No LLM calls | The library does retrieval only. No language model API is called internally. | | top_k always enforced | Never returns more than top_k results per question (default 5). |

Running tests

npm test

Tests cover: happy path, not_found, partial failure, empty filters, SQL injection safety, schema introspection caching.

Package structure

src/
  index.js              # Public exports
  agent.js              # Core orchestrator
  adapters/
    relational.js       # Relational adapter
    vector.js           # Vector adapter
    composite.js        # Composite adapter
  sql-builder.js        # Parameterized SELECT generation
  schema-introspector.js # INFORMATION_SCHEMA queries + table selection
  result-formatter.js   # Normalizes rows/hits to DbAgentResult[]
test/
  agent.test.js
  adapters/
    relational.test.js
    vector.test.js
  fixtures/
    mock-schema.js