agent-sql

v0.4.0

Published

2 months ago

A starter for creating a TypeScript package.

0High
0Medium
0Low

chrisarderne

agent ai postgres sql

Sanitise agent-written SQL for multi-tenant DBs.

You provide a tenant ID, and the agent supplies the query.

Apparently this is how Trigger.dev does it. And Cloudflare.

How it works

agent-sql works by fully parsing the supplied SQL query into an AST and transforming it:

Only SELECT: it's impossible to insert, drop or anything else.
Reduced subset: CTEs, subqueries and other tricky things are rejected.
Limited functions: passed through a (configurable) whitelist.
No DoS: a default LIMIT is applied, but can be adjusted.
WHERE guards: insert multiple tenant/ownership conditions to be inserted.
JOINs added: if needed to reach the guard tenant tables (save on tokens).
No sneaky joins: no join secrets on true. We have your back.

What's next

[ ] Support INSERT, UPDATE, even DROP TABLE as user-configurable options
[ ] Support CTEs, subqueries and more, once they can be fully hardened
[ ] Skills for common harnesses

Quickstart

npm install agent-sql

import { agentSql } from "agent-sql";

const sql = agentSql("SELECT * FROM msg", { "msg.tenant_id": 123 });

console.log(sql);
// SELECT *
// FROM msg
// WHERE msg.tenant_id = 123
// LIMIT 10000

Usage

Define a schema

In the simple example above, all JOINs will be blocked. For agent-sql to know what joins and tables to permit, you need to define a schema. Heads up: if you use Drizzle, you can just use your Drizzle schema.

import { agentSql, defineSchema } from "agent-sql";

// Define your schema.
// Only the tables listed will be permitted
// Joins can only use the FKs defined here
const schema = defineSchema({
  tenant: { id: null },
  msg: { tenant_id: { ft: "tenant", fc: "id" } },
});

// Use your schema from above
// Specify 1+ column->value pairs that will be enforced
const result = agentSql("SELECT * FROM msg", { "tenant.id": 123 }, schema);

Output:

SELECT
  msg.*                        -- qualify the *
FROM msg
INNER JOIN tenant              -- add the needed join for the guard
  ON tenant.id = msg.tenant_id -- use the schema to join correctly
WHERE tenant.id = 123          -- apply the guard
LIMIT 10000                    -- limit the rows

Bad stuff is blocked

The following query will be blocked (many times over).

SELECT
    sneaky_func('./bad_file')      -- won't pass whitelist
FROM secret
JOIN random                        -- not an approved table
  ON random.id = secret.id         -- not an approved FK pair
JOIN danger                        -- disconnected from join graph
  ON true                          -- not allowed
WHERE true                         -- won't trick anyone

Make a closure

You'll probably want to do something like the below in production. Mix and match to your taste.

const sanitise = (sql: string) => agentSql(sql, guards, schema);

//later
sanitise("SELECT * FROM foo");

Integration with AI SDK and Drizzle

If you're using Drizzle, you can skip the schema step and use the one you already have!

Just pass it through, and agentSql will respect your schema.

import { tool } from "ai";
import { sql } from "drizzle-orm";
import { z } from "zod";

import { agentSql } from "agent-sql";
import { defineSchemaFromDrizzle } from "agent-sql/drizzle";

import { db } from "@/db";
import * as drizzleSchema from "@/db/schema";

// No need to re-enter your schema, we'll pull it in from Drizzle
const schema = defineSchemaFromDrizzle(drizzleSchema);

// Create your closure
const sanitise = (sql: string, tenantId: string) =>
  agentSql(sql, { "tenant.id": tenantId }, schema);

function makeSqlTool(tenantId: string) {
  return tool({
    description: "Run raw SQL against the DB",
    inputSchema: z.object({ query: z.string() }),
    execute: async ({ query }) => {
      // The LLM can pass any query it likes, we'll sanitise it if possible
      // and return helpful error messages if not
      const safeQuery = sanitise(query, tenantId);
      // Now we can throw that straight at the db and be confident it'll only
      // return data from the specified tenant
      return db.execute(sql.raw(safeQuery));
    },
  });
}

If you don't want your whole Drizzle schema available

You can also exclude tables if you don't want agents to see them:

import { defineSchemaFromDrizzle } from "agent-sql/drizzle";

const schema = defineSchemaFromDrizzle(drizzleSchema, {
  exclude: ["api_keys"],
});

Development

First install Vite+:

curl -fsSL https://vite.plus | bash

Install dependencies:

vp install

Format, lint, typecheck:

vp check --fix

Run the unit tests:

vp test

Build the library:

vp pack

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme