npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@datazod/zod-pinecone

v0.1.8

Published

Convert Zod schemas to Pinecone vector database definitions

Readme

@datazod/zod-pinecone

A library that bridges Turso/SQLite databases with Pinecone vector search, enabling type-safe hybrid search capabilities with seamless data synchronization.

NPM Version License: MIT

Experimental Status

This package is currently in experimental status. While I strive for stability and have comprehensive tests, the API may change in future versions as I continue to improve and add features based on community feedback.

Related Packages

This package is part of the @datazod ecosystem, which includes complementary tools for working with Zod schemas and databases:

  • @datazod/zod-sql - Convert Zod schemas to SQL table definitions with multi-dialect support and intelligent type mapping
  • @datazod/zod-turso - A type-safe Turso/SQLite ORM with data flattening, batch operations, and query building capabilities

Together, these packages provide a complete solution for schema-driven database design, data operations, and vector search integration.

Contributing

Contributions are really welcome! As a solo developer, I'm actively looking for help to make this package better. Whether you want to:

  • Report bugs or issues
  • Suggest new features
  • Improve documentation
  • Add support for more vector databases
  • Optimize performance
  • Write more tests

Please feel free to:

  • Open issues on GitHub
  • Submit pull requests
  • Share your use cases and feedback

Every contribution, no matter how small, helps make this package better for everyone.

Thanks

Special thanks to all contributors and users who have provided feedback, reported issues, and helped improve this package. Your input is invaluable in making @datazod/zod-pinecone better!

Note: This package is optimized for Bun but works with any JavaScript runtime.

Features

Core Functionality

  • Turso-Pinecone Integration - Seamlessly bridge relational data with vector search capabilities
  • Type-safe Operations - Full TypeScript support with comprehensive type definitions
  • Hybrid Search - Combine traditional filters with semantic vector search
  • Automatic Synchronization - Keep your Turso database and Pinecone index in sync

Advanced Capabilities

  • Flexible Embedding Strategy - Configure which fields to embed and how to combine them
  • Metadata Management - Automatically sync database fields as Pinecone metadata for filtering
  • Batch Processing - Efficient bulk operations for large datasets
  • Custom ID Mapping - Link records between systems with flexible ID strategies
  • Query Optimization - Smart caching and batching for optimal performance

Vector Search Features

  • Semantic Search - Find records by meaning, not just keywords
  • Similarity Search - Discover related content using vector similarity
  • Multi-modal Queries - Combine text, metadata filters, and vector search
  • Real-time Updates - Keep vector embeddings synchronized with database changes

Installation

# With Bun (recommended)
bun add @datazod/zod-pinecone @libsql/client @pinecone-database/pinecone

# With npm
npm install @datazod/zod-pinecone @libsql/client @pinecone-database/pinecone

# With yarn
yarn add @datazod/zod-pinecone @libsql/client @pinecone-database/pinecone

# With pnpm
pnpm add @datazod/zod-pinecone @libsql/client @pinecone-database/pinecone

Quick Start

import { createClient } from '@libsql/client'
import { Pinecone } from '@pinecone-database/pinecone'
import { createTursoPineconeAdapter } from '@datazod/zod-pinecone'

// Initialize clients
const turso = createClient({
	url: 'your-turso-database-url',
	authToken: 'your-auth-token'
})

const pinecone = new Pinecone({
	apiKey: 'your-pinecone-api-key'
})

// Create the adapter
const adapter = createTursoPineconeAdapter({
	indexName: 'products',
	embeddingFields: ['title', 'description'], // Fields to convert to embeddings
	metadataFields: ['category', 'price', 'brand'], // Fields to sync as metadata
	generateEmbedding: async (text: string) => {
		// Your embedding generation logic (OpenAI, Cohere, etc.)
		const response = await openai.embeddings.create({
			model: 'text-embedding-ada-002',
			input: text
		})
		return response.data[0].embedding
	}
})

// Sync data from Turso to Pinecone
const products = await turso.execute('SELECT * FROM products')
const syncResult = await adapter.syncRecords(pinecone, products.rows)

// Perform semantic search
const results = await adapter.queryByText(pinecone, 'wireless headphones', {
	topK: 10,
	filter: { category: 'electronics', price: { $lte: 200 } }
})

console.log('Found products:', results)

Core Concepts

1. Database-Vector Bridge

The adapter acts as a bridge between your Turso database (structured, relational data) and Pinecone (vector search):

// Your Turso table
// products: id, title, description, category, price, created_at

// Becomes Pinecone vectors with:
// - Vector: embedding of title + description
// - Metadata: { category, price, created_at }
// - ID: linked to your database record

2. Embedding Strategy

Configure which fields to embed and how:

const adapter = createTursoPineconeAdapter({
	indexName: 'products',

	// Embed specific fields
	embeddingFields: ['title', 'description'],

	// Or embed all fields except excluded ones
	embeddingFields: '*',
	excludeFromEmbedding: ['id', 'created_at', 'price'],

	// Fields to include as searchable metadata
	metadataFields: ['category', 'brand', 'price'],

	generateEmbedding: async (text: string) => {
		// Combine multiple fields into a single embedding
		return await yourEmbeddingService(text)
	}
})

3. Hybrid Search

Combine semantic search with traditional filters:

// Find "gaming laptops" under $1500 in electronics category
const results = await adapter.queryByText(pinecone, 'gaming laptop', {
	filter: {
		$and: [
			{ category: { $eq: 'electronics' } },
			{ price: { $lte: 1500 } },
			{ brand: { $in: ['Dell', 'HP', 'Lenovo'] } }
		]
	},
	topK: 20
})

API Reference

Creating an Adapter

const adapter = createTursoPineconeAdapter(options: TursoPineconeAdapterOptions)

TursoPineconeAdapterOptions

interface TursoPineconeAdapterOptions {
	// Required
	indexName: string
	generateEmbedding: (text: string) => Promise<number[]>

	// Embedding configuration
	embeddingFields?: string[] | '*' // Fields to embed
	excludeFromEmbedding?: string[] // Fields to exclude when using '*'

	// Metadata configuration
	metadataFields?: string[] // Fields to sync as metadata

	// ID and mapping
	idField?: string // Database ID field (default: 'id')
	mappingFields?: {
		turso_id?: boolean // Include database ID in metadata
		custom_id?: string // Custom ID field name
	}

	// Performance
	batchSize?: number // Batch size for operations (default: 100)
}

Core Methods

syncRecords()

Synchronize records from Turso to Pinecone:

const result = await adapter.syncRecords(
  pineconeClient: Pinecone,
  records: any[],
  options?: {
    onProgress?: (completed: number, total: number) => void
    onError?: (error: Error, record: any, index: number) => void
  }
)

// Returns: { success: boolean, syncedCount: number, errors: string[] }

queryByText()

Semantic search by text:

const results = await adapter.queryByText(
  pineconeClient: Pinecone,
  queryText: string,
  options?: {
    topK?: number
    filter?: Record<string, any>
    namespace?: string
    includeValues?: boolean
  }
)

findSimilar()

Find records similar to a given record:

const results = await adapter.findSimilar(
  pineconeClient: Pinecone,
  referenceId: string,
  options?: {
    topK?: number
    filter?: Record<string, any>
    namespace?: string
  }
)

hybridSearch()

Advanced search combining text and filters:

const results = await adapter.hybridSearch(
  pineconeClient: Pinecone,
  options: {
    queryText: string
    filter?: Record<string, any>
    topK?: number
    namespace?: string
  }
)

Advanced Usage

Custom Embedding Generation

import OpenAI from 'openai'

const openai = new OpenAI({ apiKey: 'your-api-key' })

const adapter = createTursoPineconeAdapter({
	indexName: 'products',
	embeddingFields: ['title', 'description', 'features'],

	generateEmbedding: async (text: string) => {
		const response = await openai.embeddings.create({
			model: 'text-embedding-3-small',
			input: text,
			dimensions: 1536
		})
		return response.data[0].embedding
	}
})

Batch Synchronization with Progress Tracking

const products = await turso.execute('SELECT * FROM products LIMIT 10000')

const result = await adapter.syncRecords(pinecone, products.rows, {
	onProgress: (completed, total) => {
		console.log(
			`Progress: ${completed}/${total} (${Math.round((completed / total) * 100)}%)`
		)
	},
	onError: (error, record, index) => {
		console.error(`Failed to sync record ${index}:`, error.message)
	}
})

console.log(
	`Synced ${result.syncedCount} records, ${result.errors.length} errors`
)

Complex Filter Queries

// Find products with advanced filtering
const results = await adapter.queryByText(pinecone, 'wireless bluetooth', {
	filter: {
		$and: [
			{ category: { $eq: 'electronics' } },
			{
				$or: [
					{ brand: { $in: ['Apple', 'Sony', 'Bose'] } },
					{ rating: { $gte: 4.5 } }
				]
			},
			{ price: { $gte: 50, $lte: 500 } },
			{ inStock: { $eq: true } }
		]
	},
	topK: 25
})

ID Mapping and Cross-References

const adapter = createTursoPineconeAdapter({
	indexName: 'products',
	embeddingFields: ['title', 'description'],
	idField: 'product_id', // Use custom ID field from database

	mappingFields: {
		turso_id: true, // Include database ID in metadata
		custom_id: 'sku' // Also include SKU for cross-referencing
	},

	generateEmbedding: async (text) => await embedText(text)
})

// Now vectors will have metadata like:
// { title: "Product Name", turso_id: "123", sku: "PROD-456" }

Utility Functions

Standalone Query Functions

import {
	queryByText,
	queryByIds,
	findSimilar,
	hybridSearch,
	deleteByFilter
} from '@datazod/zod-pinecone'

// Query without adapter
const results = await queryByText(pinecone, {
	indexName: 'products',
	queryText: 'gaming mouse',
	generateEmbedding: embedFunction,
	topK: 10,
	filter: { category: 'gaming' }
})

// Delete by filter
const deleteResult = await deleteByFilter(pinecone, {
	indexName: 'products',
	filter: { discontinued: true }
})

Helper Utilities

import {
	EmbeddingHelper,
	FilterHelper,
	MetadataHelper
} from '@datazod/zod-pinecone'

// Embedding utilities
const normalized = EmbeddingHelper.normalize([0.1, 0.2, 0.3])
const similarity = EmbeddingHelper.cosineSimilarity(vectorA, vectorB)

// Filter building
const filter = FilterHelper.createFilter({ category: 'electronics' })
const rangeFilter = FilterHelper.createRangeFilter('price', 10, 100)
const combinedFilter = FilterHelper.combineFilters([filter, rangeFilter], 'and')

// Metadata processing
const cleanedMetadata = MetadataHelper.cleanMetadata(rawObject)
const selectedMetadata = MetadataHelper.extractMetadata(object, [
	'title',
	'price'
])

Performance Tips

1. Optimize Embedding Fields

// Good: Specific, relevant fields
embeddingFields: ['title', 'description']

// Less optimal: Too many fields
embeddingFields: ['title', 'description', 'category', 'brand', 'specs', 'reviews']

// Good alternative: Use wildcard with exclusions
embeddingFields: '*',
excludeFromEmbedding: ['id', 'created_at', 'updated_at', 'internal_notes']

2. Batch Operations

// Process in batches for large datasets
const adapter = createTursoPineconeAdapter({
	indexName: 'products',
	batchSize: 50 // Adjust based on your embedding service limits
	// ...other options
})

3. Filter Early

// Better: Filter in Pinecone first, then process
const results = await adapter.queryByText(pinecone, 'query', {
	filter: { category: 'electronics', inStock: true },
	topK: 100
})

// Then post-process if needed
const filtered = results.filter((r) => r.score > 0.8)

Error Handling

try {
	const result = await adapter.syncRecords(pinecone, records, {
		onError: (error, record, index) => {
			// Log individual record errors
			console.error(`Record ${index} failed:`, error.message)
		}
	})

	if (!result.success) {
		console.error('Sync completed with errors:', result.errors)
	}
} catch (error) {
	console.error('Sync failed completely:', error)
}

Best Practices

1. Embedding Strategy

  • Use concise, descriptive fields for embeddings
  • Combine related fields (title + description) for richer context
  • Exclude metadata-only fields (IDs, timestamps) from embeddings

2. Metadata Design

  • Include fields you'll filter on frequently
  • Keep metadata values simple (strings, numbers, booleans)
  • Avoid deeply nested objects in metadata

3. Index Management

  • Use consistent naming conventions for indexes
  • Consider separate indexes for different content types
  • Monitor index size and performance

4. Query Optimization

  • Start with broader queries, then add filters
  • Use appropriate topK values (don't over-fetch)
  • Cache expensive embedding generations

TypeScript Support

Full TypeScript support with comprehensive type definitions:

import type {
	TursoPineconeAdapterOptions,
	QueryByTextOptions,
	HybridSearchOptions,
	MappingFields
} from '@datazod/zod-pinecone'

// All operations are fully typed
const adapter: TursoPineconeAdapter = createTursoPineconeAdapter(options)
const results: QueryResult[] = await adapter.queryByText(pinecone, 'query')

License

MIT