supabase-semantic-search
v0.1.4
Published
Easy-to-use semantic search plugin for Supabase with automatic embedding generation and vector similarity search
Maintainers
Readme
Supabase Semantic Search
A semantic search plugin for Supabase that provides automatic embedding generation and vector similarity search capabilities. Built with OpenAI's text-embedding-3-small model and Supabase's pgvector extension.
Features
- Automatic embedding generation for text columns
- Queue-based processing with Supabase Edge Functions
- Simple semantic search API
- Hybrid search combining semantic and keyword search
- OpenAI text-embedding-3-small integration
- Vector similarity search with pgvector
- PostgreSQL message queue (PGMQ) for reliable job processing
Installation
Install the package via npm:
npm install supabase-semantic-searchInitialization
Use the CLI tool to initialize your project with the necessary SQL migrations and Edge Functions:
npx supabase-semantic-search initThis command will:
- Copy SQL migration files to your project
- Copy the Edge Function worker to your Supabase functions directory
- Create example files and documentation
Setup
1. Run SQL Migrations
After initialization, run the SQL migrations in order:
# Using Supabase CLI (recommended)
supabase db reset
# Or manually with psql
psql -h your-db-host -U your-user -d your-db -f sql/01_extensions.sql
psql -h your-db-host -U your-user -d your-db -f sql/02_register_embedding_function.sql
psql -h your-db-host -U your-user -d your-db -f sql/03_trigger_function.sql
psql -h your-db-host -U your-user -d your-db -f sql/04_search_functions.sql
psql -h your-db-host -U your-user -d your-db -f sql/05_example_table.sql
psql -h your-db-host -U your-user -d your-db -f sql/06_hybrid_search_functions.sql
psql -h your-db-host -U your-user -d your-db -f sql/07_observability_functions.sql2. Deploy Edge Function
Deploy the embedding worker Edge Function:
supabase functions deploy embed-worker --project-ref your-project-ref3. Set Environment Variables
Configure the following environment variables for your Edge Function:
# In your Supabase dashboard or via CLI
supabase secrets set OPENAI_API_KEY=your-openai-api-key
supabase secrets set SUPABASE_URL=your-supabase-project-url
supabase secrets set SUPABASE_SERVICE_ROLE_KEY=your-service-role-key4. Register Tables for Embedding
Register any table and text column for automatic embedding generation:
SELECT register_embedding('your_table_name', 'your_text_column');Usage
Basic Setup
import { createSemanticSearch } from 'supabase-semantic-search';
const semanticSearch = createSemanticSearch(
process.env.SUPABASE_URL,
process.env.SUPABASE_ANON_KEY,
process.env.OPENAI_API_KEY
);Document Search
Search the default documents table:
// Basic semantic search
const results = await semanticSearch.searchDocuments('contract renewal', {
topK: 5,
threshold: 0.7
});
console.log(results.data);
// Returns: Array of documents with similarity scoresGeneric Table Search
Search any registered table:
// Search custom table
const results = await semanticSearch.semanticSearch(
'articles',
'content',
'machine learning trends',
{
topK: 10,
threshold: 0.8
}
);Hybrid Search
Combine semantic search with keyword search for better results:
// Hybrid search (semantic + keyword)
const hybridResults = await semanticSearch.hybridSearchDocuments('apple earnings', {
topK: 5,
alpha: 0.3, // Weight for keyword search (BM25)
beta: 0.7, // Weight for semantic search
threshold: 0.6
});
console.log(hybridResults.data);
// Returns: Documents with combined semantic and keyword scoresAdvanced Usage
Custom Search Options
const results = await semanticSearch.searchDocuments('query', {
topK: 20, // Number of results to return
threshold: 0.75, // Minimum similarity threshold
filter: { // Additional filters
category: 'legal',
status: 'active'
}
});Error Handling
try {
const results = await semanticSearch.searchDocuments('query');
if (results.error) {
console.error('Search error:', results.error);
} else {
console.log('Results:', results.data);
}
} catch (error) {
console.error('Network error:', error);
}API Reference
createSemanticSearch(url, anonKey, openaiKey)
Creates a new semantic search instance.
Parameters:
url(string): Your Supabase project URLanonKey(string): Your Supabase anonymous keyopenaiKey(string): Your OpenAI API key
searchDocuments(query, options?)
Search the documents table for similar content.
Parameters:
query(string): Search query textoptions(object, optional):topK(number): Maximum results to return (default: 10)threshold(number): Minimum similarity score (default: 0.5)filter(object): Additional SQL filters
hybridSearchDocuments(query, options?)
Perform hybrid search combining semantic and keyword search.
Parameters:
query(string): Search query textoptions(object, optional):topK(number): Maximum results to return (default: 10)alpha(number): Weight for keyword search (default: 0.5)beta(number): Weight for semantic search (default: 0.5)threshold(number): Minimum combined score (default: 0.5)
semanticSearch(table, column, query, options?)
Search any registered table for similar content.
Parameters:
table(string): Table name to searchcolumn(string): Text column namequery(string): Search query textoptions(object, optional): Same assearchDocuments
Development
Build
npm install
npm run buildTest
npm testSet up environment variables for testing:
# Create .env file
SUPABASE_URL=your-supabase-project-url
SUPABASE_ANON_KEY=your-supabase-anon-key
OPENAI_API_KEY=your-openai-api-keyDevelopment Commands
npm run dev # TypeScript watch mode
npm run build:cjs # Build CommonJS version
npm run build:esm # Build ES modules version
npm run build:cli # Build CLI tool
npm run clean # Remove dist folderArchitecture
The system follows this flow:
Text Insert/Update → Database Trigger → PGMQ Queue → Edge Function → OpenAI API → Update Embedding → Vector SearchComponents
- Database Triggers: Automatically queue embedding jobs when text columns are updated
- Edge Function Worker: Processes embedding jobs asynchronously using PGMQ
- Vector Storage: Embeddings stored as pgvector columns for fast similarity search
- Search Functions: SQL functions for semantic and hybrid search operations
Contributing
We welcome contributions! Please follow these steps:
Getting Started
- Fork the repository
- Clone your fork:
git clone https://github.com/yourusername/supabase-semantic-search.git - Install dependencies:
npm install - Create a feature branch:
git checkout -b feature/your-feature-name
Development Setup
Set up environment variables in
.env:SUPABASE_URL=your-test-supabase-url SUPABASE_ANON_KEY=your-test-anon-key OPENAI_API_KEY=your-openai-keyRun tests to ensure everything works:
npm test
Making Changes
- Make your changes
- Add tests for new functionality
- Ensure all tests pass:
npm test - Build the project:
npm run build - Update documentation if needed
Submitting Changes
- Commit your changes with a clear message
- Push to your fork:
git push origin feature/your-feature-name - Create a Pull Request with:
- Clear description of changes
- Any breaking changes noted
- Test results
Code Style
- Follow existing TypeScript conventions
- Use meaningful variable and function names
- Add JSDoc comments for public APIs
- Keep functions focused and single-purpose
Areas for Contribution
- Additional embedding providers (Azure OpenAI, Anthropic, etc.)
- Performance optimizations
- Enhanced search filtering options
- Better error handling and logging
- Documentation improvements
- Example applications
License
MIT License - see LICENSE file for details.
