strapi-plugin-semantic-search
v1.1.0
Published
Intelligent semantic search plugin for Strapi 5 powered by OpenAI embeddings. Automatically generates embeddings for your content and provides powerful semantic search capabilities.
Maintainers
Readme
Semantic Search Plugin for Strapi 5
Intelligent content discovery powered by OpenAI embeddings
Transform your Strapi CMS into an AI-powered content platform with automatic embedding generation and semantic search capabilities.
🆕 What's New in v1.1.0
- Configurable Field Mapping - Customize which fields to process per content type
- Enhanced Validation - Comprehensive configuration validation with helpful error messages
- Improved Logging - Better visibility into plugin configuration and processing
- Flexible Content Types - Support for any content type with custom field mappings
Features
- Automatic Embedding Generation - Content embeddings created automatically on save
- Semantic Search APIs - RESTful endpoints for intelligent content discovery
- Multi-Content Type Search - Search across different content types simultaneously
- High Performance - Sub-300ms search responses with vector similarity
- Production Ready - Comprehensive error handling and rate limiting
- Analytics - Monitor embedding coverage and search performance
- Zero Dependencies - No external vector databases required
Installation
1. Install Dependencies
cd src/plugins/semantic-search
npm install openai@^5.8.2 axios@^1.10.02. Configure Environment
Add your OpenAI API key to .env:
OPENAI_API_KEY=your_openai_api_key_here3. Enable Plugin
Update config/plugins.ts:
export default ({ env }) => ({
'semantic-search': {
enabled: true,
resolve: './src/plugins/semantic-search'
},
});4. Add Embedding Fields
Add embedding fields to your content types:
{
"embedding": {
"type": "json"
},
"embeddingMetadata": {
"type": "json"
}
}5. Restart Strapi
npm run developConfiguration
Basic Configuration
By default, the plugin processes these content types and fields:
// Default configuration
{
'api::article.article': ['title', 'content', 'summary'],
'api::blog.blog': ['title', 'body', 'excerpt']
}Custom Configuration
Configure which content types and fields to process by updating config/plugins.js:
// config/plugins.js
module.exports = ({ env }) => ({
'semantic-search': {
enabled: true,
resolve: './src/plugins/semantic-search',
config: {
contentTypes: {
'api::article.article': ['title', 'content', 'summary', 'tags'],
'api::blog.blog': ['title', 'body', 'excerpt', 'category'],
'api::product.product': ['name', 'description', 'features', 'benefits'],
'api::course.course': ['title', 'overview', 'learningOutcomes']
}
}
}
});Configuration Options
| Option | Type | Description |
|--------|------|-------------|
| contentTypes | Object | Maps content type UIDs to arrays of field names |
Content Type Format: Use Strapi's UID format: api::collection-name.collection-name
Field Names:
- Must be valid field names from your content type schema
- Can include any text-based fields (text, textarea, richtext, etc.)
- Rich text fields are automatically converted to plain text for embedding
Configuration Validation
The plugin validates your configuration on startup:
- ✅ Valid content type format:
api::article.article - ✅ Valid field arrays:
['title', 'content'] - ✅ Non-empty field names: No empty strings or invalid types
- ❌ Invalid formats: Missing
api::prefix, empty fields, etc.
Invalid configurations will log warnings and fallback to defaults.
Usage
Automatic Embedding Generation
Embeddings are generated automatically when you create or update content. The plugin extracts text from the fields you've configured for each content type (see Configuration section above).
Search API
Single Content Type Search
POST /api/semantic-search/searchRequest:
{
"query": "artificial intelligence and machine learning",
"contentType": "api::article.article",
"limit": 10,
"threshold": 0.1
}Response:
{
"success": true,
"data": {
"query": "artificial intelligence and machine learning",
"contentType": "api::article.article",
"results": [
{
"id": 1,
"title": "Deep Learning Fundamentals",
"similarityScore": 0.8945,
"content": "...",
"createdAt": "2025-01-15T10:30:00.000Z"
}
],
"metadata": {
"totalResults": 5,
"queryProcessing": {
"embeddingDimensions": 1536
}
}
}
}Multi-Content Type Search
POST /api/semantic-search/multi-searchRequest:
{
"query": "productivity and remote work",
"contentTypes": ["api::article.article", "api::blog.blog"],
"limit": 15,
"aggregateResults": true
}Embedding Statistics
GET /api/semantic-search/statsResponse:
{
"success": true,
"data": {
"api::article.article": {
"total": 50,
"withEmbeddings": 50,
"coverage": "100.00%"
},
"api::blog.blog": {
"total": 25,
"withEmbeddings": 23,
"coverage": "92.00%"
}
}
}Configuration
Supported Content Types
The plugin automatically processes these content types:
- Any content type that starts with
api:: - Excludes admin and plugin content types
- Configurable in the lifecycle registration
Text Field Mapping
Default fields processed for embedding generation:
const textFields = [
'title', 'name', 'content', 'body',
'summary', 'description', 'excerpt'
];Search Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | string | required | Search query text |
| contentType | string | required | Strapi content type UID |
| limit | number | 10 | Maximum results (max: 50) |
| threshold | number | 0.1 | Minimum similarity score |
| filters | object | {} | Additional database filters |
Architecture
Plugin Structure
semantic-search/
├── package.json # Plugin metadata and dependencies
├── strapi-server.js # Plugin entry point
└── server/
├── index.js # Server exports
└── src/
├── index.js # Main plugin logic
├── controllers/ # API request handlers
│ ├── index.js
│ └── search-controller.js
├── services/ # Business logic
│ ├── index.js
│ ├── embedding-service.js # OpenAI integration
│ ├── vector-service.js # Similarity calculations
│ └── search-service.js # Search orchestration
└── routes/ # API endpoint definitions
└── index.jsData Flow
- Content Creation/Update → Lifecycle hook triggered
- Text Extraction → Combine relevant text fields
- OpenAI API Call → Generate 1536-dimension embedding
- Database Storage → Save embedding in JSON field
- Search Request → Convert query to embedding
- Similarity Search → Calculate cosine similarity
- Result Ranking → Sort by similarity score
Vector Storage
Embeddings are stored as JSON in your existing database:
{
"embedding": [0.1234, -0.5678, 0.9012, ...], // 1536 dimensions
"embeddingMetadata": {
"model": "text-embedding-ada-002",
"generatedAt": "2025-01-15T10:30:00.000Z",
"dimensions": 1536,
"processedText": "Machine learning fundamentals...",
"originalLength": 1250,
"processedLength": 1180
}
}Similarity Scores
Understanding similarity score ranges:
| Score Range | Relevance | Description | |-------------|-----------|-------------| | 0.85 - 1.0 | Highly Relevant | Direct topic match | | 0.75 - 0.85 | Relevant | Related concepts | | 0.65 - 0.75 | Somewhat Relevant | Tangential connection | | 0.1 - 0.65 | Low Relevance | Weak semantic relation |
Performance
Benchmarks
- Embedding Generation: 1-3 seconds per document
- Search Latency: ~250ms end-to-end
- Vector Comparison: ~50ms for 1000 documents
- Memory Usage: ~1.5KB per embedding
- Storage Overhead: ~6KB per document (embedding + metadata)
Optimization Tips
- Batch Processing: Process multiple embeddings in parallel
- Caching: Cache frequently searched embeddings
- Indexing: Add database indexes on frequently filtered fields
- Text Preprocessing: Optimize text extraction for your content types
Development
Local Development
# Install plugin dependencies
cd src/plugins/semantic-search
npm install
# Return to project root
cd ../../../
# Start development server
npm run developTesting
# Test embedding generation
curl -X POST http://localhost:1337/api/semantic-search/search \
-H "Content-Type: application/json" \
-d '{"query": "test query", "contentType": "api::article.article"}'
# Check statistics
curl http://localhost:1337/api/semantic-search/statsDebugging
Enable debug logging in your Strapi configuration:
// config/logger.js
module.exports = {
level: 'debug',
transports: [
{
type: 'console',
options: {
pool: true,
format: 'combined',
level: 'debug'
}
}
]
};Production Deployment
Environment Variables
# Required
OPENAI_API_KEY=your_production_openai_key
# Optional
NODE_ENV=productionSecurity Considerations
- API Authentication: Enable authentication for search endpoints
- Rate Limiting: Configure appropriate rate limits
- Input Validation: Validate search queries and parameters
- API Key Security: Secure OpenAI API key storage
Monitoring
Monitor these key metrics:
- Embedding generation success rate
- Search response times
- OpenAI API usage and costs
- Similarity score distributions
- Search query patterns
Extending the Plugin
Custom Content Type Support
Add support for additional content types:
// In registerEmbeddingLifecycles function
const contentTypes = [
'api::article.article',
'api::blog.blog',
'api::product.product', // Add your content type
'api::course.course' // Add another content type
];Custom Text Extraction
Modify text field extraction:
// In extractTextContent function
const textFields = [
'title', 'content', 'summary',
'customField', // Add custom field
'description' // Add more fields as needed
];Advanced Filtering
Add custom filtering logic:
// In search service
const searchWithCustomFilters = async (query, contentType, customFilters) => {
const filters = {
...customFilters,
publishedAt: { $notNull: true }, // Only published content
featured: true // Only featured content
};
return await semanticSearch(query, contentType, { filters });
};Related Documentation
- Complete Implementation Guide - Detailed technical documentation
- Strapi Plugin Development - Official Strapi plugin docs
- OpenAI Embeddings API - OpenAI documentation
Troubleshooting
Common Issues
1. Plugin not loading:
- Check
config/plugins.tsconfiguration - Verify plugin path resolution
- Check for syntax errors in plugin files
2. Embeddings not generating:
- Verify OpenAI API key is valid
- Check network connectivity to OpenAI
- Review content type field configuration
3. Search returning no results:
- Verify embedding field exists on content type
- Check similarity threshold (try lowering to 0.1)
- Ensure content has embeddings generated
4. Performance issues:
- Monitor OpenAI API rate limits
- Check database query performance
- Consider caching strategies
Debug Commands
# Check plugin loading
tail -f logs/strapi.log | grep "semantic-search"
# Test OpenAI connectivity
curl -X POST https://api.openai.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "test", "model": "text-embedding-ada-002"}'
# Validate embedding storage
curl http://localhost:1337/api/semantic-search/statsLicense
MIT License
Contributing
This plugin is part of the semantic search demo project. Contributions and improvements are welcome.
Built with OpenAI Embeddings and Strapi 5
