npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ai-database

v2.4.0

Published

AI-powered database interface primitives with mdxld conventions

Readme

ai-database

Stability: Stable

AI hallucinates. Your database shouldn't.

When AI generates a "Software Developer" for your customer profile, does it match your existing O*NET occupation data? Does "Enterprise SaaS" connect to your NAICS industry codes? Traditional approaches fragment context—AI juggles content creation and referential integrity simultaneously, producing plausible-sounding but disconnected data.

ai-database grounds AI generation against your domain.

import { DB } from 'ai-database'

const { db } = DB({
  IdealCustomerProfile: {
    as: 'Who are they? <~Occupation',      // Ground against O*NET occupations
    at: 'Where do they work? <~Industry',  // Ground against NAICS industries
    are: 'What are they doing? <~Task',    // Ground against O*NET tasks
  },
  Occupation: { title: 'string', description: 'string' },
  Industry: { name: 'string', naicsCode: 'string' },
  Task: { name: 'string' },
})

// Seed reference data from O*NET, NAICS, etc.
await db.Occupation.create({ title: 'Software Developer', description: 'Develops applications' })
await db.Industry.create({ name: 'Technology', naicsCode: '5112' })

// AI generation is grounded against real reference data
const icp = await db.ICP.create({
  asHint: 'Engineers who build software',  // Matches "Software Developer"
  atHint: 'Tech companies',                 // Matches "Technology"
})

const occupation = await icp.as
// => { title: 'Software Developer', ... } — matched via semantic search, not hallucinated

The Core Insight

Traditional databases require foreign keys at schema time. When generating with AI, this fragments context: the model must juggle content creation and referential integrity simultaneously.

ai-database inverts this paradigm. Relationship operators become workflow instructions, not schema constraints:

  1. Generate the entity with full semantic context intact
  2. Link as a post-processing step via insertion or vector search

This separation eliminates context fragmentation during generation and produces human-readable relationship labels ("Software Developers") instead of opaque IDs (occ_1547).


The Four Operators

ai-database provides four relationship operators that control how entities connect. They combine two dimensions:

| | Create New | Search Existing | |---|---|---| | Link TO target | -> Forward Exact | ~> Forward Fuzzy | | Link FROM target | <- Backward Exact | <~ Backward Fuzzy |

Quick Reference

| Operator | Direction | Match Mode | When to Use | |----------|-----------|------------|-------------| | -> | forward | exact | Creating child entities (Blog → Posts) | | ~> | forward | fuzzy | Reusing existing entities (Campaign → Audience) | | <- | backward | exact | Aggregation queries (Blog collects Posts) | | <~ | backward | fuzzy | Grounding against reference data (ICP → Occupation) |

Understanding the Operators

Direction determines who owns the relationship:

  • Forward (->, ~>): Current entity links TO the target
  • Backward (<-, <~): Target entity links FROM the current entity

Match Mode determines how the target is resolved:

  • Exact (->, <-): Create a new entity, then link to it
  • Fuzzy (~>, <~): Search existing entities via semantic similarity

Example 1: Grounding Against Reference Data (<~)

The backward fuzzy operator grounds AI-generated content against authoritative reference data. This is the semantic grounding pattern.

const { db } = DB({
  // Generative entity that grounds against reference data
  IdealCustomerProfile: {
    as: 'Who are they? (e.g. "Developers") <~Occupation',
    at: 'Where do they work? (e.g. "FinTech startups") <~Industry',
    are: 'What are they doing? (e.g. "building APIs") <~Task',
    using: 'What are they using? (e.g. "Node.js") <~Tool',
    to: 'What is their goal? (e.g. "ship faster") <~Outcome',
  },

  // Reference data seeded from O*NET, NAICS, etc.
  Occupation: {
    $seed: 'https://onet.data/occupations.tsv',
    $id: '$.oNETSOCCode',
    title: '$.title',
    description: '$.description',
  },
  Industry: {
    $seed: 'https://naics.data/industries.tsv',
    $id: '$.naicsCode',
    name: '$.title',
  },
  Task: { name: 'string' },
  Tool: { name: 'string' },
  Outcome: { description: 'string' },
})

How it works:

  1. AI generates ICP with as: "Engineers who build software"
  2. Runtime embeds the text and searches the Occupation collection
  3. Best match found: "Software Developer" (via vector similarity)
  4. Link created with human-readable label: "Software Developer"

Key behaviors:

  • Uses embedding similarity to find the best match
  • Returns null if no semantic match found (doesn't hallucinate)
  • Grounds generated content against curated reference data
  • Perfect for taxonomies, categories, and standardized values

Union Types for Fallback Search

When multiple collections could contain the best match:

IdealCustomerProfile: {
  as: '<~Occupation|Role|JobType',      // Search Occupation first, then Role, then JobType
  using: '<~Tool|Technology|Product',   // Search multiple collections in priority order
}

Example 2: Content Generation with Cascade (->, <-)

The forward and backward exact operators create hierarchical content. This is the cascading generation pattern.

const { db } = DB({
  Blog: {
    title: 'string',
    description: 'string',
    topics: ['List 5 topics covered ->Topic'],  // Creates Topic children
    posts: ['<-Post'],                           // Aggregates Post children
  },
  Topic: {
    name: 'string',
    titles: ['List 3 blog post titles ->Post'], // Creates Post children
  },
  Post: {
    title: 'string',
    synopsis: 'string',
    content: 'markdown',
    blog: '->Blog',     // Links back to parent Blog
    topic: '->Topic',   // Links to Topic
  },
})

// One call generates the entire blog structure
const blog = await db.Blog.create(
  { title: 'AI Engineering', description: 'Building with LLMs' },
  { cascade: true, maxDepth: 3 }
)

// Topics were auto-generated
const topics = await blog.topics
// => [{ name: 'Prompt Engineering' }, { name: 'RAG Systems' }, ...]

// Posts were auto-generated under each topic
const posts = await topics[0].titles
// => [{ title: 'Getting Started with Prompts' }, ...]

// Backward refs enable aggregation queries
const allPosts = await blog.posts
// => All posts that reference this blog

Forward Exact (->)

Creates child entities that belong to the parent:

Startup: {
  founders: ['Who are the founders? ->Founder'],   // Creates Founder entities
  businessModel: 'What is the business model? ->LeanCanvas',
}

Key behaviors:

  • Text before -> is the AI generation prompt
  • If a value is provided, uses it instead of generating
  • Optional fields (->Type?) skip generation when not provided
  • Nested forward fields cascade automatically

Backward Exact (<-)

Creates inverse relationships for aggregation:

Blog: {
  posts: ['<-Post'],        // All posts that reference this blog
},
Post: {
  blog: '->Blog',           // Forward reference to parent
}

Key behaviors:

  • Creates inverted edge direction (Post → Blog)
  • Enables reverse lookups and aggregation queries
  • Works with explicit backrefs: ['<-Post.blog']
  • Handles self-referential trees: children: ['<-Node.parent']

Forward Fuzzy (~>)

Searches existing entities first, creates if not found:

Campaign: {
  audience: 'Target audience ~>Audience',  // Find existing or create new
}

// If "Enterprise" audience exists, reuses it
const campaign = await db.Campaign.create({
  audienceHint: 'Big companies with 1000+ employees'
})
const audience = await campaign.audience
// => { name: 'Enterprise', ... } — reused existing!

Key behaviors:

  • Searches via semantic similarity using ${fieldName}Hint
  • Reuses existing entity if match exceeds threshold
  • Generates new entity if no match found
  • Generated entities marked with $generated: true

Example 3: Startup Generator (Mixed Operators)

A complete example showing all four operators working together:

const { db } = DB({
  Startup: {
    $instructions: 'Generate a B2B SaaS startup',
    name: 'string',
    idea: 'What problem does this solve? <-Idea',           // Idea spawns Startup
    founders: ['Who are the founding team? ->Founder'],      // Create founders
    customer: 'Who is the target customer? ~>CustomerPersona', // Find existing
    industry: 'What industry? <~Industry',                   // Ground to NAICS
  },
  Idea: { problem: 'string', solution: 'string' },
  Founder: { name: 'string', role: 'string' },
  CustomerPersona: { title: 'string', painPoints: 'string' },
  Industry: { name: 'string', naicsCode: 'string' },
})

// Pre-populate reference data
await db.Industry.create({ name: 'Technology', naicsCode: '5112' })
await db.CustomerPersona.create({
  title: 'VP of Engineering',
  painPoints: 'Managing distributed teams',
})

// Generate complete startup with grounded relationships
const startup = await db.Startup.create(
  { name: 'DevFlow' },
  { cascade: true, maxDepth: 2 }
)

// Relationships resolved appropriately:
const idea = await startup.idea        // Created new (->)
const founders = await startup.founders // Created new ([->])
const customer = await startup.customer // Matched existing (~>)
const industry = await startup.industry // Grounded to reference (<~)

Threshold Syntax

For fuzzy operators (~> and <~), configure the similarity threshold:

Field-Level Thresholds

Event: {
  venue: 'Where is the event? ~>Venue(0.9)',     // High threshold - strict match
  sponsor: 'Event sponsor ~>Company(0.5)',       // Low threshold - lenient match
}

Entity-Level Thresholds

Startup: {
  $fuzzyThreshold: 0.85,  // Apply to all ~> and <~ fields
  customer: '~>Customer',
  competitor: '~>Company',
}

Threshold values:

  • 0.9 - Very strict: Only near-exact semantic matches
  • 0.7 - Default: Balanced matching
  • 0.5 - Lenient: Accept loosely related matches

Cascade Generation

Build complex entity graphs from a single create() call:

const company = await db.Company.create(
  { name: 'TechCorp' },
  {
    cascade: true,
    maxDepth: 4,
    onProgress: (p) => console.log(`${p.totalEntitiesCreated} created`),
  }
)

// Entire org chart generated: Company → Departments → Teams → Employees

Cascade Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | cascade | boolean | false | Enable cascade generation | | maxDepth | number | 0 | Maximum recursion depth | | cascadeTypes | string[] | - | Only cascade to these types | | onProgress | function | - | Progress callback | | onError | function | - | Error handler | | stopOnError | boolean | false | Stop on first error |


Special Variables

$instructions

Entity-level prompting that guides AI generation:

Character: {
  $instructions: 'This character is from a medieval fantasy setting',
  name: 'string',
  backstory: 'What is their history?',  // Influenced by $instructions
}

Template variables resolve against entity data:

Problem: {
  $instructions: `
    Identify problems for occupation: {task.occupation.title}
    in industry: {task.occupation.industry.name}
  `,
  task: '<-Task',
  description: 'string',
}

$context

Explicit context dependencies pre-fetched before generation:

Ad: {
  $context: ['Startup', 'ICP'],
  $instructions: 'Generate ad for {startup.name} targeting {icp.as}',
  headline: 'string (30 chars)',
}

Schema Definition

Define once, get typed operations everywhere:

const { db, events, actions } = DB({
  Post: {
    title: 'string',
    content: 'markdown',
    author: 'Author.posts',  // Creates bidirectional relationship
  },
  Author: {
    name: 'string',
    // posts: Post[] auto-created from backref
  }
})

Field Types

| Type | Description | |------|-------------| | string | Text | | number | Numeric | | boolean | True/false | | date | Date only | | datetime | Date and time | | markdown | Rich text | | json | Structured data | | url | URL string |

Relationships

// One-to-many: Post has one Author, Author has many Posts
Post: { author: 'Author.posts' }

// Many-to-many: Post has many Tags, Tag has many Posts
Post: { tags: ['Tag.posts'] }

Promise Pipelining

Chain database operations without await:

const leads = db.Lead.list()
const topLeads = leads.filter(l => l.score > 80)
const names = topLeads.map(l => l.name)

// Only await when you need the result
const result = await names

Batch Relationship Loading

Eliminate N+1 queries automatically:

// All companies loaded in ONE query
const enriched = await db.Lead.list().map(lead => ({
  lead,
  company: lead.company,
}))

CRUD Operations

// Read
const lead = await db.Lead.get('lead-123')
const leads = await db.Lead.list()
const found = await db.Lead.find({ status: 'active' })

// Search
const results = await db.Lead.search('enterprise SaaS')

// Natural language queries
const pending = await db.Order`what's stuck in processing?`

// Write
const lead = await db.Lead.create({ name: 'Acme Corp' })
await db.Lead.update(lead.$id, { score: 90 })
await db.Lead.delete(lead.$id)

Chainable Methods

db.Lead.list()
  .filter(l => l.score > 50)
  .sort((a, b) => b.score - a.score)
  .limit(10)
  .map(l => ({ name: l.name, score: l.score }))

Events

React to changes in real-time:

events.on('Lead.created', event => {
  notifySlack(`New lead: ${event.data.name}`)
})

events.on('*.updated', event => {
  logChange(event)
})

forEach - Large-Scale Processing

Process thousands of items with concurrency and error handling:

const result = await db.Lead.forEach(async lead => {
  const analysis = await ai`analyze ${lead}`
  await db.Lead.update(lead.$id, { analysis })
}, {
  concurrency: 10,
  maxRetries: 3,
  retryDelay: attempt => 1000 * Math.pow(2, attempt),
  onProgress: p => console.log(`${p.completed}/${p.total}`),
  onError: (err, lead) => err.code === 'RATE_LIMIT' ? 'retry' : 'continue',
})

forEach Options

| Option | Type | Description | |--------|------|-------------| | concurrency | number | Max parallel operations (default: 1) | | maxRetries | number | Retries per item (default: 0) | | retryDelay | number \| fn | Delay between retries | | onProgress | fn | Progress callback | | onError | fn | Error handling | | timeout | number | Timeout per item in ms | | persist | boolean \| string | Enable durability | | resume | string | Resume from action ID |

Durable forEach

Persist progress to survive crashes:

const result = await db.Lead.forEach(processLead, {
  concurrency: 10,
  persist: 'analyze-leads',
})

// Resume after crash
await db.Lead.forEach(processLead, {
  resume: result.actionId,
})

Actions

Track long-running operations:

const action = await actions.create({
  type: 'import-leads',
  data: { file: 'leads.csv' },
  total: 1000,
})

await actions.update(action.id, { progress: 500 })
await actions.update(action.id, { status: 'completed' })

Installation

pnpm add ai-database

Configuration

DATABASE_URL=./content         # filesystem (default)
DATABASE_URL=sqlite://./data   # SQLite
DATABASE_URL=:memory:          # in-memory

Cloudflare Workers Deployment

ai-database provides dedicated exports for Cloudflare Workers deployment and RPC client consumption.

/worker Export

Use the /worker export when deploying ai-database as a Cloudflare Worker service:

// worker.ts - the ai-database service
import { DatabaseWorker, DatabaseDO } from 'ai-database/worker'

export { DatabaseDO }
export default DatabaseWorker
// wrangler.jsonc
{
  "name": "ai-database",
  "main": "src/worker.ts",
  "compatibility_date": "2024-01-01",
  "durable_objects": {
    "bindings": [
      { "name": "DATABASE_DO", "class_name": "DatabaseDO" }
    ]
  }
}

/client Export

Use the /client export when consuming ai-database from another worker or HTTP client:

With Cloudflare Service Bindings (RPC):

// consumer-worker.ts
import type { DatabaseService } from 'ai-database/worker'

interface Env {
  AI_DATABASE: Service<DatabaseService>
}

export default {
  async fetch(request: Request, env: Env) {
    // Direct RPC via service binding - no HTTP overhead
    const service = env.AI_DATABASE.connect('my-namespace')
    const post = await service.create('Post', { title: 'Hello' })
    return Response.json(post)
  }
}
// consumer wrangler.jsonc
{
  "services": [
    { "binding": "AI_DATABASE", "service": "ai-database" }
  ]
}

With HTTP Client (rpc.do):

import { createDatabaseClient, DB } from 'ai-database/client'

// Connect to production
const client = createDatabaseClient('https://ai-database.workers.dev')
const service = client.connect('my-namespace')

// CRUD operations
const post = await service.create('Post', { title: 'Hello', content: 'World' })
const posts = await service.list('Post', { limit: 10 })
const found = await service.get('Post', post.$id)

// Search
const results = await service.search('Post', 'hello')
const semantic = await service.semanticSearch('Post', 'greeting posts')

// Relationships
await service.relate('Post', post.$id, 'author', 'User', userId)
const authors = await service.related('Post', post.$id, 'author')

// Events
await service.emit({ event: 'Post.published', actor: userId, object: post.$id })
const events = await service.listEvents({ event: 'Post.published' })

TypeScript Setup for Service Bindings

For proper type inference with service bindings, import the worker types:

// types.ts
import type { DatabaseService } from 'ai-database/worker'

export interface Env {
  AI_DATABASE: Service<DatabaseService>
  // ... other bindings
}

Common Patterns

Self-Referential Trees

Node: {
  value: 'string',
  parent: '->Node?',
  children: ['<-Node.parent'],
}

Union Types for Polymorphic References

Comment: {
  content: 'string',
  target: '->Post|Article|Video',
}

const target = await comment.target
console.log(target.$matchedType)  // 'Post', 'Article', or 'Video'

Symmetric Relationships

Team: {
  name: 'string',
  members: ['->Member'],
},
Member: {
  name: 'string',
  team: '<-Team',
}

Document Database Interface

In addition to the schema-first graph model, ai-database exports environment-agnostic types for document-based storage (MDX files with frontmatter):

import type {
  DocumentDatabase,
  Document,
  DocListOptions,
  DocSearchOptions,
} from 'ai-database'

// Same interface regardless of backend
const doc = await db.get('posts/hello-world')
await db.set('posts/new', { data: { title: 'New Post' }, content: '# Hello' })

Usage with @mdxdb adapters

import { createFsDatabase } from '@mdxdb/fs'
import { createSqliteDatabase } from '@mdxdb/sqlite'
import { createApiDatabase } from '@mdxdb/api'

const db = createFsDatabase({ root: './content' })
const db = createSqliteDatabase({ path: './data.db' })
const db = createApiDatabase({ baseUrl: 'https://api.example.com' })

Provider Capabilities

Different database providers support different features. Use detectCapabilities() to check what's available at runtime:

import { detectCapabilities, requireCapability, CapabilityNotSupportedError } from 'ai-database'

const capabilities = await detectCapabilities(provider)

// Check capabilities
if (capabilities.hasSemanticSearch) {
  const results = await provider.semanticSearch('Post', 'machine learning')
} else {
  // Fallback to regular search
  const results = await provider.search('Post', 'machine learning')
}

// Require capabilities (throws if unavailable)
requireCapability(capabilities, 'hasEvents')
provider.on('Post.created', handleCreate)

Capability Matrix

| Capability | MemoryProvider | RDB | DigitalObjects | |------------|----------------|-----|----------------| | Semantic Search | Yes | No | No | | Events API | Yes | No | No | | Actions API | Yes | No | No | | Artifacts | Yes | No | No | | Batch Operations | Yes | No | No |

Capabilities

| Capability | Description | Methods Required | |------------|-------------|------------------| | hasSemanticSearch | Vector similarity search | semanticSearch(), setEmbeddingsConfig() | | hasEvents | Event emission and subscription | on(), emit(), listEvents() | | hasActions | Durable action tracking | createAction(), getAction(), updateAction() | | hasArtifacts | Artifact/cache storage | getArtifact(), setArtifact() | | hasBatchOperations | Concurrency-controlled batching | withConcurrency() or mapWithConcurrency() |

Graceful Degradation

When a capability isn't available, use fallbacks:

import { detectCapabilities, warnIfUnavailable } from 'ai-database'

const capabilities = await detectCapabilities(provider)

// Log a warning (once) if semantic search unavailable
warnIfUnavailable(capabilities, 'hasSemanticSearch', 'semanticSearch')

// Use capability with fallback
async function searchPosts(query: string) {
  if (capabilities.hasSemanticSearch) {
    return provider.semanticSearch('Post', query)
  }
  return provider.search('Post', query)
}

Features Requiring Semantic Search

When using a provider without semantic search support (e.g., RDB), some features behave differently:

| Feature | With Semantic Search | Without Semantic Search | |---------|---------------------|------------------------| | ~> Forward Fuzzy | Matches via vector similarity, falls back to generation | Uses text search fallback, then generates if no match | | <~ Backward Fuzzy | Matches via vector similarity | Uses text search fallback | | db.Entity.semanticSearch() | Vector similarity search | Throws CapabilityNotSupportedError | | db.Entity.hybridSearch() | Combined FTS + vector search | Throws CapabilityNotSupportedError | | db.semanticSearch() | Global vector search | Throws CapabilityNotSupportedError |

Fuzzy Operator Fallback: When semantic search is unavailable, fuzzy operators (~> and <~) gracefully degrade to basic text search:

// Without semantic search, these operators use text matching instead of embeddings
const { db } = DB({
  Article: {
    category: '~>Category',  // Will use text search fallback
  },
  Category: { name: 'string' }
})

// Forward fuzzy (~>) tries text search first, generates if no match found
await db.Article.create({ categoryHint: 'Tech' })  // Searches for 'Tech' in categories

// Backward fuzzy (<~) uses text search only - never generates
await db.Article.create({ categoryHint: 'Tech' })  // Returns null if no text match

Explicit Search Methods: When you need semantic search but it's unavailable, the methods throw with helpful alternatives:

import { CapabilityNotSupportedError, isCapabilityNotSupportedError } from 'ai-database'

try {
  await db.Post.semanticSearch('machine learning')
} catch (error) {
  if (isCapabilityNotSupportedError(error)) {
    console.log(error.capability)   // 'hasSemanticSearch'
    console.log(error.alternative)  // 'Use the regular search() method instead...'
    // Fall back to text search
    const results = await db.Post.search('machine learning')
  }
}

Integration with RDB

RDB provides a simple relational database backend for ai-database. Use it when you want:

  • Edge-native storage via Cloudflare Durable Objects or D1
  • Simple two-table schema (_data and _rels)
  • Graph traversal and relationship queries

Creating an RDB Provider Adapter

import { setProvider, DB } from 'ai-database'
import type { DBProvider, ListOptions, SearchOptions } from 'ai-database'
import { RDB } from '@dotdo/rdb'

// Adapter to bridge RDB and ai-database interfaces
class RDBProviderAdapter implements DBProvider {
  private rdb: RDB

  constructor(sqlStorage: SqlStorage) {
    this.rdb = new RDB(sqlStorage)
  }

  async get(type: string, id: string) {
    const entity = await this.rdb.get(type, id)
    if (!entity) return null
    return { $id: entity.id, $type: entity.type, ...entity }
  }

  async list(type: string, options?: ListOptions) {
    const entities = await this.rdb.list(type, options)
    return entities.map(e => ({ $id: e.id, $type: e.type, ...e }))
  }

  async search(type: string, query: string, options?: SearchOptions) {
    // RDB uses filter-based search; perform text matching
    const all = await this.rdb.list(type, options)
    return all
      .filter(e => JSON.stringify(e).toLowerCase().includes(query.toLowerCase()))
      .map(e => ({ $id: e.id, $type: e.type, ...e }))
  }

  async create(type: string, id: string | undefined, data: Record<string, unknown>) {
    const entity = await this.rdb.create(type, data, id)
    return { $id: entity.id, $type: entity.type, ...entity }
  }

  async update(type: string, id: string, data: Record<string, unknown>) {
    const entity = await this.rdb.update(type, id, data)
    return { $id: entity.id, $type: entity.type, ...entity }
  }

  async delete(type: string, id: string): Promise<boolean> {
    const exists = await this.rdb.get(type, id)
    if (!exists) return false
    await this.rdb.delete(type, id)
    return true
  }

  async related(type: string, id: string, relation: string) {
    const entities = await this.rdb.related(type, id, relation)
    return entities.map(e => ({ $id: e.id, $type: e.type, ...e }))
  }

  async relate(fromType: string, fromId: string, relation: string, toType: string, toId: string, metadata?: object) {
    await this.rdb.relate(fromType, fromId, relation, toType, toId, metadata)
  }

  async unrelate(fromType: string, fromId: string, relation: string, toType: string, toId: string) {
    await this.rdb.unrelate(fromType, fromId, relation, toType, toId)
  }
}

// Usage in a Durable Object
export class MyDO extends DurableObject {
  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env)
    setProvider(new RDBProviderAdapter(ctx.storage.sql))
  }
}

// Now use ai-database schema with RDB backend
const { db } = DB({
  Post: { title: 'string', author: '->Author.posts' },
  Author: { name: 'string' },
})

const author = await db.Author.create({ name: 'Alice' })
const post = await db.Post.create({ title: 'Hello', author: author.$id })

Limitations with RDB

When using RDB as a provider:

  • No semantic search: Fuzzy operators (~>, <~) require vector embeddings. Use exact operators (->, <-) instead, or use MemoryProvider for semantic matching.
  • No events/actions API: RDB focuses on core CRUD and relationships.
  • Text search only: The search() method performs text matching, not semantic similarity.

AI Integration

ai-database integrates with AI providers for two core capabilities:

  1. Entity Generation - AI-powered content generation for schema fields using ai-functions
  2. Semantic Search - Vector embeddings for fuzzy matching (~>, <~ operators) and similarity search

Supported AI Providers

For Entity Generation

Entity generation uses ai-functions which supports:

| Provider | Models | Configuration | |----------|--------|---------------| | Anthropic | claude-3-5-sonnet, claude-3-opus, claude-3-haiku | ANTHROPIC_API_KEY | | OpenAI | gpt-4o, gpt-4-turbo, gpt-3.5-turbo | OPENAI_API_KEY | | Google | gemini-1.5-pro, gemini-1.5-flash | GOOGLE_API_KEY | | Local Models | Ollama, LM Studio, llama.cpp | AI_BASE_URL |

import { DB, configureAIGeneration } from 'ai-database'

// Configure the AI model for entity generation
configureAIGeneration({
  model: 'sonnet',        // Model alias (see ai-functions for full list)
  enabled: true,          // Enable AI generation (default: true)
  onGenerate: (details) => {
    // Track generation calls for monitoring
    console.log(`Generated ${details.entityType} in ${details.latencyMs}ms`)
    if (details.error) console.error('Generation failed:', details.error)
  }
})

const { db } = DB({
  BlogPost: {
    title: 'string',
    content: 'Write a detailed blog post about this topic',  // AI generates this
    summary: 'Summarize the content in 2 sentences',
  }
})

For Embeddings/Semantic Search

Embedding generation for semantic search can use any provider that produces vector embeddings:

| Provider | Models | Dimensions | Configuration | |----------|--------|------------|---------------| | OpenAI | text-embedding-3-small | 1536 | OPENAI_API_KEY | | OpenAI | text-embedding-3-large | 3072 | OPENAI_API_KEY | | Cohere | embed-english-v3.0 | 1024 | COHERE_API_KEY | | Voyage AI | voyage-large-2 | 1024-4096 | VOYAGE_API_KEY | | Local | sentence-transformers | 384 | Self-hosted |

import { createMemoryProvider, setProvider } from 'ai-database'

// Configure embedding dimensions to match your provider
const provider = createMemoryProvider({
  embeddingDimensions: 1536,  // Match OpenAI text-embedding-3-small
})
setProvider(provider)

Configuring Embedding Generation

Control which fields are embedded for semantic search:

import { DB } from 'ai-database'

const { db } = DB({
  Article: {
    title: 'string',
    content: 'markdown',
    authorId: 'string',    // Won't be embedded (not text content)
  },
  InternalNote: {
    text: 'string',
  }
}, {
  embeddings: {
    // Specify which fields to embed for Article
    Article: { fields: ['title', 'content'] },

    // Disable embeddings for InternalNote (won't appear in semantic search)
    InternalNote: false,
  }
})

Embedding Configuration Options

| Option | Type | Description | |--------|------|-------------| | fields | string[] | Fields to include in embedding (default: auto-detect text fields) | | false | boolean | Disable embeddings for this entity type |

Auto-Detection

If no embeddings config is provided, ai-database automatically embeds:

  • All string fields (except those ending in Id, At, or starting with $/_)
  • All markdown fields
  • String arrays (concatenated)

Cost and Token Implications

Understanding token usage is critical for production deployments. Here's what triggers AI API calls:

Entity Generation Costs

| Operation | AI Calls | When | |-----------|----------|------| | create() with prompt fields | 1 per entity | Fields like 'Write a description' | | create({ cascade: true }) | 1 per cascaded entity | Each -> forward relation | | create() with ~> fuzzy | 1 embedding + search | If no semantic match found, may generate |

Example: Cascade Cost Estimation

const { db } = DB({
  Blog: {
    title: 'string',
    topics: ['Generate 5 topics ->Topic'],     // Creates 5 Topic entities
  },
  Topic: {
    name: 'string',
    posts: ['Generate 3 posts ->Post'],        // Creates 3 Post entities per Topic
  },
  Post: {
    title: 'string',
    content: 'Write a 500-word blog post',     // AI generates ~500 words
  }
})

// This single call generates:
// - 1 Blog (1 generation call)
// - 5 Topics (5 generation calls)
// - 15 Posts (15 generation calls, each ~500 words)
// Total: 21 AI generation calls
const blog = await db.Blog.create(
  { title: 'My Tech Blog' },
  { cascade: true, maxDepth: 3 }
)

Cost Control Strategies:

// 1. Limit cascade depth to control entity count
await db.Blog.create(data, { cascade: true, maxDepth: 1 })  // Only creates immediate children

// 2. Filter which types cascade
await db.Blog.create(data, {
  cascade: true,
  cascadeTypes: ['Topic']  // Only cascade to Topic, not Post
})

// 3. Track costs with onGenerate callback
let totalTokens = 0
configureAIGeneration({
  model: 'sonnet',
  onGenerate: (details) => {
    if (details.result) {
      // Estimate tokens (actual count depends on provider)
      const inputTokens = details.prompt.length / 4
      const outputTokens = JSON.stringify(details.result).length / 4
      totalTokens += inputTokens + outputTokens
      console.log(`Running total: ~${totalTokens} tokens`)
    }
  }
})

// 4. Use draftOnly for preview without committing
const draft = await db.Blog.draft({ title: 'Test' })
// Review draft before creating
const entity = await draft.resolve()

Embedding Costs

| Operation | Embedding Calls | Notes | |-----------|-----------------|-------| | create() | 1 per entity | Embeds text fields on creation | | update() | 1 if text changed | Re-embeds when text fields update | | semanticSearch() | 1 for query | Embeds query string | | hybridSearch() | 1 for query | Embeds query string | | ~> or <~ resolution | 1 per hint | Embeds the hint text for matching |


Rate Limiting Best Practices

ai-database provides built-in concurrency control to prevent API rate limit errors:

Using forEach with Concurrency

// Process entities with controlled concurrency
const result = await db.Lead.forEach(async (lead) => {
  const analysis = await generateAnalysis(lead)
  await db.Lead.update(lead.$id, { analysis })
}, {
  concurrency: 10,         // Max 10 parallel operations
  maxRetries: 3,           // Retry failed items up to 3 times
  retryDelay: (attempt) => 1000 * Math.pow(2, attempt),  // Exponential backoff

  // Handle rate limit errors specifically
  onError: (error, lead) => {
    if (error.message.includes('rate_limit') || error.message.includes('429')) {
      return 'retry'       // Retry with exponential backoff
    }
    return 'continue'      // Skip this item and continue
  },

  onProgress: (progress) => {
    console.log(`${progress.completed}/${progress.total} (${progress.failed} failed)`)
  }
})

Provider-Level Concurrency

import { createMemoryProvider } from 'ai-database'

// Configure concurrency at the provider level
const provider = createMemoryProvider({
  concurrency: 10,  // Global limit on parallel operations
})

Execution Queue for Batch Operations

For large-scale operations with different priority levels:

import { ExecutionQueue } from 'ai-database'

const queue = new ExecutionQueue({
  concurrency: {
    priority: 50,    // High-priority operations
    standard: 20,    // Normal operations
    flex: 10,        // Low-priority background operations
    batch: 1000,     // Batch window operations
  }
})

// Submit operations with priority
await queue.submit(
  () => db.Lead.create({ name: 'Important Lead' }),
  { priority: 'priority' }  // Runs with higher concurrency
)

Rate Limit Patterns by Provider

| Provider | Rate Limits | Recommended Concurrency | |----------|-------------|------------------------| | OpenAI | 60-10000 RPM (varies by tier) | 5-50 | | Anthropic | 60-4000 RPM (varies by tier) | 5-40 | | Cohere | 100 RPM (trial), 10000 RPM (prod) | 5-100 | | Local (Ollama) | Limited by hardware | 1-4 |

Recommended Configuration by Use Case:

// Development/Testing - Low concurrency, fail fast
configureAIGeneration({ model: 'sonnet', enabled: true })
await db.Entity.forEach(fn, { concurrency: 2, maxRetries: 1 })

// Production - Moderate concurrency with retries
await db.Entity.forEach(fn, {
  concurrency: 10,
  maxRetries: 3,
  retryDelay: attempt => 1000 * Math.pow(2, attempt)
})

// Batch Processing - Low concurrency, high retry tolerance
await db.Entity.forEach(fn, {
  concurrency: 5,
  maxRetries: 5,
  retryDelay: attempt => 2000 * Math.pow(2, attempt),
  timeout: 60000,  // 60 second timeout per item
  persist: 'batch-job-123',  // Resume on crash
})

Disabling AI Generation

For testing or when you want placeholder values instead of AI generation:

import { configureAIGeneration } from 'ai-database'

// Disable AI globally - uses deterministic placeholder values
configureAIGeneration({ enabled: false })

// Or per-DB instance
const { db } = DB(schema, {
  aiGeneration: { enabled: false }
})

When AI is disabled:

  • Prompt fields generate deterministic placeholder text
  • Fuzzy operators (~>, <~) fall back to text search
  • No API calls are made to AI providers
  • Tests run faster and don't require API keys

Related