@db4/ai

v0.1.2

Published

5 days ago

AI capabilities for db4 - embeddings, generations, and vector operations

Downloads

186

0High
0Medium
0Low

nathanclevenger

db4 ai embeddings generations vector-operations cloudflare workers durable-objects

@db4/ai

(GitHub, npm)

Your AI-powered features are broken. Embeddings drift stale. Summaries don't update when content changes. Tags fall out of sync. You're spending more time maintaining AI plumbing than building features.

Declare once. Stay in sync forever.

The Problem

AI integration is a maintenance nightmare:

Stale embeddings - Content changes, vectors don't
Broken cascades - Summary updates, but tags still reflect the old version
Prompt sprawl - Same logic duplicated across your codebase
Manual orchestration - You're writing cron jobs to fix what should be automatic

The Solution

Declare AI behavior in your schema. db4 handles the rest.

const db = DB({
  Article: {
    title: 'string!',
    content: 'text!',

    // AI-generated summary from content
    summary: 'text ~> content',

    // Auto-maintained embedding
    embedding: 'vector[1536] ~> content',

    // Cascading: tags regenerate when summary changes
    tags: '[string] ~> summary',

    $vector: 'embedding',
  },
})

The ~> operator declares generation dependencies. When content changes, summary regenerates. When summary regenerates, tags follow. Embeddings stay fresh automatically.

Get Started

1. Install

npm install @db4/ai

2. Configure

import { configure } from '@db4/ai'

configure({
  model: 'claude-3-5-sonnet-20241022',
  embeddingModel: '@cf/baai/bge-base-en-v1.5',
})

3. Use

const article = await db.Article.create({
  title: 'Edge Computing Explained',
  content: 'Edge computing brings computation closer to data sources...',
})

// Auto-generated:
article.summary    // "Edge computing processes data near its source..."
article.embedding  // [0.023, -0.041, 0.089, ...] (1536 dims)
article.tags       // ['edge computing', 'distributed systems', 'latency']

No manual embedding calls. No orchestration code. No sync jobs.

Core Features

Cascading Generations

Build AI pipelines that auto-update:

const schema = DB({
  Document: {
    source: 'text!',

    // Single-field generation
    summary: 'text ~> source',

    // Multi-field input
    abstract: 'text ~> [title, content]',

    // Chained: each stage feeds the next
    keywords: '[string] ~> summary',
    category: 'string ~> keywords',

    // Vector embedding
    embedding: 'vector[768] ~> content',
  },
})

Automatic Embeddings

import { createWorkersAIEmbedder } from '@db4/ai'

const embedder = createWorkersAIEmbedder('base')

// Single
const { embedding } = await embedder.embed('Hello, world!')

// Batch with caching
const { embeddings } = await embedder.embedBatch([
  'First document',
  'Second document',
])

// Incremental: only re-embed changed content
const { embedding, changed } = await embedder.updateIfChanged(
  record,
  ['title', 'content'],
  'embedding'
)

Semantic Search

import { createAI } from '@db4/ai'

const ai = createAI(db, {
  provider: 'workers-ai',
  embeddingModel: '@cf/baai/bge-base-en-v1.5',
  entityEmbeddings: {
    Post: { fields: ['title', 'content'] },
  },
})

await ai.indexEntities('Post', posts)

// Semantic search
const results = await ai.semanticSearch('Post', 'machine learning tutorials', {
  limit: 10,
  minScore: 0.7,
})

// Hybrid: semantic + full-text
const hybrid = await ai.hybridSearch('Post', 'typescript generics guide', {
  semanticWeight: 0.6,
  ftsWeight: 0.4,
})

RAG Pipeline

import { createRAGPipeline, createVectorIndex, createWorkersAIEmbedder } from '@db4/ai'

const rag = createRAGPipeline({
  embedder: createWorkersAIEmbedder('base'),
  vectorIndex: createVectorIndex({ dimensions: 768, enableTextIndex: true }),
  config: {
    chunking: { method: 'recursive', chunkSize: 500 },
    retrieval: { topK: 5, hybrid: true },
  },
})

await rag.addDocuments([
  { id: 'doc-1', content: '...', title: 'Getting Started' },
  { id: 'doc-2', content: '...', title: 'Configuration' },
])

const result = await rag.query('How do I configure sharding?')
console.log(result.context.formattedContext)
console.log(result.sources)

Workflow Cascades

Complex pipelines with error handling, retries, and parallelism:

import { Cascade } from '@db4/ai'

const workflow = new Cascade<string>()
  .then(extractEntities, { name: 'extract' })
  .parallel([summarize, classify])
  .aggregate((results) => ({
    summary: results[0],
    category: results[1],
  }))

const result = await workflow.run(articleContent)

Agentic Loops

AI agents with entity tools:

import { createDB4AgenticLoop, createEntityToolset, createSearchTools } from '@db4/ai'

const loop = createDB4AgenticLoop({
  provider: db,
  entityTypes: ['User', 'Post'],
  tools: [
    ...createEntityToolset(db, 'User'),
    ...createSearchTools(db, 'Post'),
  ],
  maxIterations: 10,
})

await loop.run('Find users interested in TypeScript and list their recent posts')

Structured Extraction

import { createExtractor, ContactSchema } from '@db4/ai'

const contactExtractor = createExtractor(ContactSchema)

const contact = await contactExtractor.extract(
  'Contact John Smith at [email protected] or call 555-1234'
)
// { name: 'John Smith', email: '[email protected]', phone: '555-1234' }

Evaluation

import { EvalRunner, createTestSuite, semanticSimilarity } from '@db4/ai'

const suite = createTestSuite({
  name: 'Summarization Tests',
  testCases: [
    { id: 'test-1', input: 'Long article...', expected: 'Expected summary' },
  ],
  defaultComparator: semanticSimilarity,
})

const runner = new EvalRunner({ passThreshold: 0.7 })
runner.registerSuite(suite)

const results = await runner.runAll(summarize)
console.log(runner.generateReport())

Success vs. Failure

With @db4/ai

Embeddings update automatically when content changes
Cascades regenerate in correct dependency order
Semantic search works out of the box
RAG is one function call
Schema changes automatically update AI pipelines

Without It

Stale embeddings return irrelevant search results
Tags reference old summaries
Cron jobs patch over broken orchestration
Prompts duplicated across services
Every schema change = manual AI pipeline update

API Quick Reference

Generation

import { write, list, is, code, extract } from '@db4/ai'

const summary = await write`Summarize: ${content}`
const tags = await list`Extract tags: ${content}`
const spam = await is`Is this spam? ${message}`
const fn = await code`TypeScript function to validate email`
const data = await extract`Extract contact info: ${text}`

Embeddings

import { embedText, embedTexts, createEmbedder } from '@db4/ai'

const { embedding } = await embedText('Hello, world!')
const { embeddings } = await embedTexts(['First', 'Second'])

const embedder = createEmbedder('bge-base', { cacheEmbeddings: true })

Vector Search

import { createVectorIndex } from '@db4/ai'

const index = createVectorIndex({ dimensions: 768 })
index.add({ id: 'doc-1', embedding, metadata: doc })
const results = index.search(queryEmbedding, { topK: 10 })

Batch Processing

import { BatchProcessor, batchProcess } from '@db4/ai'

const processor = new BatchProcessor({
  concurrency: 5,
  batchSize: 10,
  onProgress: (p) => console.log(`${p.percentage}%`),
})

const results = await processor.process(items, processItem)

Scheduling

import { WorkflowScheduler } from '@db4/ai'

const scheduler = new WorkflowScheduler()

await scheduler.createRecurringWorkflow({
  name: 'Daily embedding update',
  schedule: '0 0 * * *',
  handler: updateEmbeddingsWorkflow,
})

Supported Providers

Embedding Models

| Provider | Model | Dimensions | |----------|-------|------------| | Workers AI | @cf/baai/bge-small-en-v1.5 | 384 | | Workers AI | @cf/baai/bge-base-en-v1.5 | 768 | | Workers AI | @cf/baai/bge-large-en-v1.5 | 1024 | | OpenAI | text-embedding-3-small | 1536 | | OpenAI | text-embedding-3-large | 3072 |

Generation Models

Claude (claude-3-5-sonnet, claude-3-opus)
OpenAI (gpt-4, gpt-4-turbo)
Workers AI (llama-3, mistral, deepseek-coder)

Related Packages

@db4/schema - IceType schema with AI directives
@db4/search - Full-text and vector search
@db4/vortex - Columnar storage for embeddings
@db4/workflows - Durable execution for AI pipelines

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@db4/ai

The Problem

The Solution

Get Started

1. Install

2. Configure

3. Use

Core Features

Cascading Generations

Automatic Embeddings

Semantic Search

RAG Pipeline

Workflow Cascades

Agentic Loops

Structured Extraction

Evaluation

Success vs. Failure

With @db4/ai

Without It

API Quick Reference

Generation

Embeddings

Vector Search

Batch Processing

Scheduling

Supported Providers

Embedding Models

Generation Models

Related Packages

License