@codeparticle/strapi-plugin-app-search
v2.1.11
Published
App Search integration for strapi
Readme
Strapi Plugin App Search
A Strapi plugin for syncing content to Elasticsearch with automatic vector embedding generation.
Features
- 🔍 Elasticsearch Integration: Automatically sync Strapi content to Elasticsearch indices
- 🤖 AI-Powered Embeddings: Generate vector embeddings for semantic search using an external LLM service
- 🌍 Multi-locale Support: Handle localized content with locale-specific indices
- 🔄 Automatic Sync: Lifecycle hooks for real-time synchronization
- 📊 Batch Operations: Efficient bulk indexing and embedding generation
Usage
Install in any strapi application pnpm add strapi-plugin-app-search
Then add the proper config to config/hook.js
Basic Configuration
module.exports = ({ env }) => ({
settings: {
'strapi-app-search': {
enabled: true,
elasticsearchUrl: env('ELASTICSEARCH_URL'),
elasticsearchApiKey: env('ELASTICSEARCH_API_KEY'),
// Embedding service for vector search
embeddingServiceUrl: env('EMBEDDING_SERVICE_URL'),
embeddingServiceApiKey: env('EMBEDDING_SERVICE_API_KEY'),
embeddingVectorStoreIdNews: env('EMBEDDING_VECTOR_STORE_ID_NEWS', "1234"),
embeddingVectorStoreIdReports: env('EMBEDDING_VECTOR_STORE_ID_REPORTS', "2234"),
embeddingVectorStoreIdMedia: env('EMBEDDING_VECTOR_STORE_ID_MEDIA', "3234"),
formatIndex: (apiName, env) => `${apiName}-${env}`,
apisToSync: [
{
name: 'api::content-type.content-type',
indexName: 'es-index-name',
processEsObj: (obj) => ({ ...obj, newProp: true }),
populate: ['relation'],
},
]
}
}
});Embedding Service Integration
The plugin automatically generates vector embeddings for documents indexed to Elasticsearch by calling an external LLM embedding service.
How It Works
- When a document is created or updated in Strapi
- The document is indexed to Elasticsearch
- After successful indexing, the plugin automatically calls the embedding service
- The embedding service generates vector embeddings and stores them alongside the document
Configuration Options
Add the following to your config/hook.js:
- embeddingServiceUrl: Base URL of the embedding service
- embeddingServiceApiKey: API key for authentication with the embedding service
- embeddingVectorStoreId_: The vector store ID for your documents
Environment Variables
Configure via environment variables:
EMBEDDING_SERVICE_URL=<embedding-service-url>
EMBEDDING_SERVICE_API_KEY=<embedding-service-api-key>
EMBEDDING_VECTOR_STORE_ID=<embedding-vector-store-id>API Endpoint
The plugin calls the following endpoint after indexing:
POST {embeddingServiceUrl}/v1/documents/embed/{embeddingVectorStoreId}Headers:
Content-Type: application/jsonAuthorization: Bearer {embeddingServiceApiKey}
Request Body:
{
"doc_id": "120347",
"force_recompute": false
}Response:
{
"doc_id": "120347",
"success": true,
"chunks_created": 5,
"message": "Successfully processed document with 5 chunks",
"skipped": false
}Behavior
- Asynchronous Processing: Embedding generation runs in the background and doesn't block the main indexing operation
- Automatic Retry: The plugin uses batch processing with concurrency limits for multiple documents
- Error Handling: Embedding failures are logged but don't affect the main document indexing
- Smart Recomputation: By default, embeddings are only generated if they don't exist (
force_recompute: false) - Optional Service: If embedding configuration is not provided, the plugin skips embedding generation gracefully
Batch Operations
For bulk sync operations (e.g., saveObjects), the plugin:
- Indexes all documents to Elasticsearch first
- Processes embeddings in batches of 5 documents concurrently
- Logs progress for each batch
Logging
The plugin provides detailed logging for embedding operations:
[INFO] Generating embeddings for 10 documents
[INFO] Successfully generated embeddings for document 120347: 5 chunks created, skipped: false
[INFO] Embedding batch complete: 10/10 successfulDisabling Embeddings
To disable embedding generation, simply omit the embedding configuration from your hook.js:
// No embedding config = no embedding generation
module.exports = ({ env }) => ({
settings: {
'strapi-app-search': {
elasticsearchUrl: env('ELASTICSEARCH_URL'),
elasticsearchApiKey: env('ELASTICSEARCH_API_KEY'),
// ... other config ...
}
}
});Force Recomputation
If you need to regenerate embeddings for existing documents, we would need to:
- Modify the
generateEmbeddingcall to passforceRecompute: true - Or implement a manual sync endpoint that allows forcing recomputation
processEsObj is used to format the object before saving
You can also choose to individually upload the content type entries to App Search after creating instead of using the Sync button in the App Search plugin home page.
Example: src/api/content-type/content-types/content-type/lifecycles.js
/**
* Lifecycle callbacks for the `content-type` model.
*/
const APP_SEARCH_ENGINE = `content-type-${strapi.config.environment}`;
const saveAppSearchObj = (model) => {
const { apisToSync } = getConfig();
const apiToSync = (apisToSync || []).find(({ name }) => name === `content-type`);
const indexName = getIndex(model.locale);
if (strapi.appSearch) {
strapi.appSearch.saveObject(apiToSync && apiToSync.processEsObj ? apiToSync.processEsObj(model) : model, indexName);
}
}
const shouldUseAppSearch = () => {
const { modelsToIgnore = [] } = getConfig();
return !modelsToIgnore.includes(MODEL_NAME);
}
const getIds = (params, ids = []) => {
if (!params || typeof params !== 'object') {
return ids;
}
if (Array.isArray(params)) {
params.forEach((item) => getIds(item, ids));
return ids;
}
Object.entries(params).forEach(([key, entry]) => {
if (key === 'id') {
if (typeof entry !== 'object') {
ids.push(entry);
} else if (entry['$in']) {
ids.push(...entry['$in']);
}
} else {
getIds(entry, ids);
}
});
return ids;
};
module.exports = {
async afterCreate({ result }) {
if (shouldUseAppSearch()) {
saveAppSearchObj({ ...result });
}
},
async afterUpdate({ result }) {
if (shouldUseAppSearch()) {
saveAppSearchObj({ ...result });
}
},
async afterUpdateMany({ params }) {
const ids = getIds(params);
const service = strapi.service('api::report.report');
if (shouldUseAppSearch()) {
service.find({ filters: { id: { $in: ids } }, populate: [] }).then(({ results }) => {
results.forEach((entry) => {
saveAppSearchObj({ ...entry });
});
});
}
},
async afterDelete({ result }) {
if (shouldUseAppSearch()) {
strapi.appSearch.deleteObject(result.id, getIndex(result.locale));
}
},
async beforeDeleteMany({ params, state }) {
const ids = getIds(params);
const knex = strapi.db.connection;
state.ids = ids;
state.entries = await knex
.select('id', 'locale')
.from('report')
.where('id', 'in', ids);
},
async afterDeleteMany({ state }) {
const { ids, entries } = state;
if (shouldUseAppSearch()) {
ids.forEach((id) => {
const entry = entries.find(({ id: entryId }) => entryId === id);
strapi.appSearch.deleteObject(id, getIndex(entry?.locale));
});
}
}
};Dev
You need to create a new strapi project or just use an existing example one:
pnpm devPublishing
Publishing should already be setup. Just follow these steps to publish the project:
- After code merged to
main/master - Checkout the
main/masterbranch - Run
pnpm version [patch|minor|major] - Push to remote with
git push --tagsto trigger the tag pipeline
Migration: Elasticsearch App Search to Regular Elasticsearch
Overview
This migration utility migrates data from Elastic App Search (hidden .ent-search-engine-documents-* indices) to regular Elasticsearch indices with aliases for zero-downtime cutover.
Configuration
Set the following environment variables before running the migration:
- ELASTICSEARCH_URL (required): Elasticsearch cluster endpoint
- Example:
https://your-cluster.es.region.aws.elastic-cloud.com
- Example:
- ELASTICSEARCH_API_KEY (required): Elasticsearch API key for authentication
- MIGRATION_DRY_RUN (optional): Set to
trueto preview changes without executing- Default:
false
- Default:
- MIGRATION_CONCURRENCY (optional): Number of indices to migrate concurrently
- Default:
3
- Default:
- MIGRATION_STATE_FILE (optional): Path to store migration state (for rollback)
- Default:
.migration-state.jsonin repo root
- Default:
Running the Migration
1. Dry Run (Preview Only)
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
export MIGRATION_DRY_RUN=true
node scripts/migrate-ent-search-production.jsThis will list all indices to be migrated and show what alias changes would occur, without making any actual changes.
2. Full Migration
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
node scripts/migrate-ent-search-production.jsThe migration:
- Creates new destination indices with timestamped names (e.g.,
media-production__migrated_20231219120000) - Reindexes documents from source indices using the Elasticsearch
_reindexAPI with automatic slicing - Verifies document counts match between source and destination
- Atomically flips aliases to point to new indices
- Saves migration state to
.migration-state.jsonfor rollback capability
For large migrations (hundreds of thousands of documents), the script may take a while. You can monitor progress with the status check script in a separate terminal.
Monitoring Migration Progress
While the migration is running (or to check on in-progress migrations), use the status check script:
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
node scripts/check-migration-status.jsOptions:
POLL_INTERVAL: Time in milliseconds between status checks (e.g.,5000for 5 seconds). If 0 or not set, checks once and exits.MAX_WAIT: Maximum time in milliseconds to keep polling (e.g.,3600000for 1 hour). After this time, polling stops even if migrations are incomplete.
Example - Poll every 5 seconds for up to 1 hour:
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
export POLL_INTERVAL=5000
export MAX_WAIT=3600000
node scripts/check-migration-status.jsThe status check displays:
- Document counts for source and destination indices
- Reindex task progress (in-progress, pending, or complete)
- Percentage completion for each migration
- Summary of total complete/in-progress/pending migrations
Retrying Failed Migrations
If some indices have empty destination indices (reindex failed/timed out), use the retry script:
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
node scripts/retry-failed-migrations.jsThe retry script:
- Scans for failed migrations (destination indices with 0 documents but non-empty source)
- Deletes incomplete destination indices
- Creates new destination indices with fresh timestamps
- Reindexes from scratch with
wait_for_completion: true - Flips aliases to the new indices
- Updates migration state file
Dry run:
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
export MIGRATION_DRY_RUN=true
node scripts/retry-failed-migrations.jsRollback
If needed, restore aliases to previous indices:
export ELASTICSEARCH_URL=https://your-cluster.es.region.aws.elastic-cloud.com
export ELASTICSEARCH_API_KEY=your-api-key
node scripts/rollback-migration.jsThe rollback script:
- Reads the migration state file
- Atomically flips aliases back to previous targets
- Preserves the state file for audit purposes (delete manually after confirming)
Supported Indices
The migration only migrates production indices (allowlist):
- media: media-production, media-production-ar-sa, media-production-de-de, media-production-es-es, media-production-fr-fr, media-production-it-it, media-production-ja-jp, media-production-ko-kr, media-production-pt-br, media-production-ru-ru, media-production-zh-cn
- news: news-production, news-production-ar-sa, news-production-de-de, news-production-es-es, news-production-fr-fr, news-production-it-it, news-production-ja-jp, news-production-ko-kr, news-production-pt-br, news-production-ru-ru, news-production-zh-cn
- reports: reports-production, reports-production-ar-sa, reports-production-de-de, reports-production-es-es, reports-production-fr-fr, reports-production-it-it, reports-production-ja-jp, reports-production-ko-kr, reports-production-pt-br, reports-production-ru-ru, reports-production-zh-cn
Notes
- Never hardcode API keys; always use environment variables
- The migration is idempotent; running it multiple times is safe (creates new timestamped indices)
- Source indices are never modified
- Large migrations may take several hours; monitor logs for progress
- Network interruptions are handled gracefully; the migration can be resumed
