@thaon/strapi-plugin-semantic-search
v1.0.6
Published
A Strapi plugin that adds semantic search capabilities using OpenRouter embeddings
Maintainers
Readme
Strapi Plugin Semantic Search
A Strapi plugin that adds semantic search capabilities using OpenRouter embeddings. This plugin enables intelligent content search that understands meaning and context, not just keyword matching.
Features
- Semantic Search: Search content using embeddings for better relevance
- Document Chunking: Automatically chunks large documents for optimal search results
- Similarity Threshold: Configurable similarity threshold for search results
- OpenRouter Integration: Uses OpenRouter's embedding models
- Strapi 4 Compatible: Works with Strapi v4.x and v5.x
Installation
Prerequisites
- Node.js >= 20
- Strapi v4.x or v5.x
- OpenRouter API key
Install the Plugin
# Using npm
npm install strapi-plugin-semantic-search
# Using yarn
yarn add strapi-plugin-semantic-searchConfigure Environment Variables
Add the following environment variables to your .env file:
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_MODEL=openai/text-embedding-3-small
SITE_URL=http://localhost:1337
SITE_NAME=YourSiteNameEnable the Plugin
Add the plugin to your config/plugins.js file:
module.exports = () => ({
"semantic-search": {
enabled: true,
resolve: "@thaon/strapi-plugin-semantic-search",
},
});Usage
The plugin provides two main services: indexing and searching. Both must be called manually through your application code or API endpoints.
1. Index Documents
Before searching, you must manually index your documents using the indexer service.
Single Field Indexing
Index a specific field from a document:
// In your controller or service
const indexer = strapi.plugin("semantic-search").service("indexer");
await indexer.indexDocument(
contentType, // e.g., "api::doc.doc"
documentId, // Document ID to index
field, // Field name to index (e.g., "content")
titleField, // Optional: field to use as title (default: "title")
ownerId // Required: user ID for ownership filtering
);Parameters:
contentType(string): The content type UID (e.g.,api::doc.doc)documentId(string): The document ID to indexfield(string): The field containing text to index (e.g.,content)titleField(string, optional): Field to use as document title reference (default:title)ownerId(number): User ID for ownership-based filtering
Response:
{
"success": true,
"documentId": "123",
"contentType": "api::doc.doc",
"chunksCreated": 5
}Multi-Field Indexing
Index multiple fields from a document:
const indexer = strapi.plugin("semantic-search").service("indexer");
await indexer.indexDocumentFields(
contentType, // e.g., "api::doc.doc"
documentId, // Document ID to index
fields, // Array of field names (e.g., ["title", "content"])
titleField, // Optional: field to use as title
ownerId // Required: user ID for ownership filtering
);Parameters:
contentType(string): The content type UIDdocumentId(string): The document ID to indexfields(string[]): Array of field names to combine and indextitleField(string, optional): Field to use as document title (default:title)ownerId(number): User ID for ownership-based filtering
2. Perform Semantic Search
After indexing, search indexed documents using the search service.
const searchService = strapi.plugin("semantic-search").service("search");
const results = await searchService.querySearch(query, options);Parameters:
query(string): The search queryoptions(object):ownerId(number, required): User ID to filter results by ownershiplimit(number, optional): Maximum results to return (default: 5)threshold(number, optional): Similarity threshold 0-1 (default: 0.5)contentType(string, optional): Filter by specific content type
Response:
[
{
"documentId": "123",
"title": "Document Title",
"textSnippet": "Relevant excerpt from the document...",
"fullContent": "Complete document content...",
"contentType": "api::doc.doc",
"score": 0.85
}
]3. API Endpoints Example
If you expose these services via API endpoints:
# Index a document
POST /api/semantic-search/index
Content-Type: application/json
{
"contentType": "api::doc.doc",
"documentId": "123",
"field": "content",
"ownerId": 1
}
# Search indexed documents
GET /api/semantic-search/search?query=your+search+query&limit=10&threshold=0.5Configuration
The plugin can be configured through environment variables:
| Variable | Description | Default |
| ---------------------- | ---------------------------- | ------------------------------- |
| OPENROUTER_API_KEY | Your OpenRouter API key | Required |
| OPENROUTER_MODEL | Embedding model to use | openai/text-embedding-3-small |
| SITE_URL | Your Strapi site URL | http://localhost:1337 |
| SITE_NAME | Your site name | StrapiSemanticSearch |
| CHUNK_SIZE | Document chunk size | 1000 |
| CHUNK_OVERLAP | Chunk overlap size | 150 |
| SIMILARITY_THRESHOLD | Default similarity threshold | 0.5 |
How It Works
Indexing Process
The plugin provides two main indexing functions:
Document Indexing (
indexDocument):- Retrieves a specific document by content type and ID
- Extracts text content from specified field (e.g., 'content')
- Splits text into chunks using configurable chunk size and overlap
- Generates vector embeddings for each chunk using OpenRouter
- Stores chunks with metadata in
plugin::semantic-search.chunktable - Links chunks to parent document with ownership filtering
Multi-Field Indexing (
indexDocumentFields):- Combines multiple fields from a document into a single text string
- Processes the combined text through the same chunking and embedding pipeline
- Useful for indexing title, content, description, etc. together
Search Process
The querySearch function performs semantic search using these steps:
- Query Vectorization: Converts the search query into a vector embedding
- Chunk Retrieval: Fetches stored chunks filtered by owner ID and optionally by content type
- Similarity Calculation: Computes cosine similarity between query vector and all chunk embeddings
- Threshold Filtering: Removes results below the similarity threshold (default: 0.7)
- Deduplication: Groups results by document ID, keeping the highest-scoring chunk per document
- Full Content Retrieval: For each unique document, fetches the full document content from
api::doc.doctable - Ranking: Returns results sorted by similarity score with configurable limit
Data Flow
- Input: User search query + optional filters (content type, limit, threshold)
- Processing: Vector embedding → cosine similarity → threshold filtering → deduplication
- Output: Array of documents with full content, titles, snippets, and similarity scores
Example Response
{
"data": [
{
"id": 1,
"attributes": {
"title": "Article Title",
"content": "Article content...",
"similarity": 0.85
}
}
],
"meta": {
"total": 1,
"threshold": 0.7
}
}Development
Local Development
# Clone the repository
git clone https://github.com/thaon/strapi-plugin-semantic-search.git
# Install dependencies
cd strapi-plugin-semantic-search
npm install
# Link for local development
npm linkTesting
# Run tests
npm testContributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Include your Strapi version and plugin version
Changelog
v1.0.0
- Initial release
- Basic semantic search functionality
- OpenRouter integration
- Configurable chunking and similarity threshold
