npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

strapi-plugin-semantic-search

v1.1.0

Published

Intelligent semantic search plugin for Strapi 5 powered by OpenAI embeddings. Automatically generates embeddings for your content and provides powerful semantic search capabilities.

Readme

Semantic Search Plugin for Strapi 5

Intelligent content discovery powered by OpenAI embeddings

Transform your Strapi CMS into an AI-powered content platform with automatic embedding generation and semantic search capabilities.

🆕 What's New in v1.1.0

  • Configurable Field Mapping - Customize which fields to process per content type
  • Enhanced Validation - Comprehensive configuration validation with helpful error messages
  • Improved Logging - Better visibility into plugin configuration and processing
  • Flexible Content Types - Support for any content type with custom field mappings

Features

  • Automatic Embedding Generation - Content embeddings created automatically on save
  • Semantic Search APIs - RESTful endpoints for intelligent content discovery
  • Multi-Content Type Search - Search across different content types simultaneously
  • High Performance - Sub-300ms search responses with vector similarity
  • Production Ready - Comprehensive error handling and rate limiting
  • Analytics - Monitor embedding coverage and search performance
  • Zero Dependencies - No external vector databases required

Installation

1. Install Dependencies

cd src/plugins/semantic-search
npm install openai@^5.8.2 axios@^1.10.0

2. Configure Environment

Add your OpenAI API key to .env:

OPENAI_API_KEY=your_openai_api_key_here

3. Enable Plugin

Update config/plugins.ts:

export default ({ env }) => ({
  'semantic-search': {
    enabled: true,
    resolve: './src/plugins/semantic-search'
  },
});

4. Add Embedding Fields

Add embedding fields to your content types:

{
  "embedding": {
    "type": "json"
  },
  "embeddingMetadata": {
    "type": "json"
  }
}

5. Restart Strapi

npm run develop

Configuration

Basic Configuration

By default, the plugin processes these content types and fields:

// Default configuration
{
  'api::article.article': ['title', 'content', 'summary'],
  'api::blog.blog': ['title', 'body', 'excerpt']
}

Custom Configuration

Configure which content types and fields to process by updating config/plugins.js:

// config/plugins.js
module.exports = ({ env }) => ({
  'semantic-search': {
    enabled: true,
    resolve: './src/plugins/semantic-search',
    config: {
      contentTypes: {
        'api::article.article': ['title', 'content', 'summary', 'tags'],
        'api::blog.blog': ['title', 'body', 'excerpt', 'category'],
        'api::product.product': ['name', 'description', 'features', 'benefits'],
        'api::course.course': ['title', 'overview', 'learningOutcomes']
      }
    }
  }
});

Configuration Options

| Option | Type | Description | |--------|------|-------------| | contentTypes | Object | Maps content type UIDs to arrays of field names |

Content Type Format: Use Strapi's UID format: api::collection-name.collection-name

Field Names:

  • Must be valid field names from your content type schema
  • Can include any text-based fields (text, textarea, richtext, etc.)
  • Rich text fields are automatically converted to plain text for embedding

Configuration Validation

The plugin validates your configuration on startup:

  • Valid content type format: api::article.article
  • Valid field arrays: ['title', 'content']
  • Non-empty field names: No empty strings or invalid types
  • Invalid formats: Missing api:: prefix, empty fields, etc.

Invalid configurations will log warnings and fallback to defaults.

Usage

Automatic Embedding Generation

Embeddings are generated automatically when you create or update content. The plugin extracts text from the fields you've configured for each content type (see Configuration section above).

Search API

Single Content Type Search

POST /api/semantic-search/search

Request:

{
  "query": "artificial intelligence and machine learning",
  "contentType": "api::article.article", 
  "limit": 10,
  "threshold": 0.1
}

Response:

{
  "success": true,
  "data": {
    "query": "artificial intelligence and machine learning",
    "contentType": "api::article.article",
    "results": [
      {
        "id": 1,
        "title": "Deep Learning Fundamentals",
        "similarityScore": 0.8945,
        "content": "...",
        "createdAt": "2025-01-15T10:30:00.000Z"
      }
    ],
    "metadata": {
      "totalResults": 5,
      "queryProcessing": {
        "embeddingDimensions": 1536
      }
    }
  }
}

Multi-Content Type Search

POST /api/semantic-search/multi-search

Request:

{
  "query": "productivity and remote work",
  "contentTypes": ["api::article.article", "api::blog.blog"],
  "limit": 15,
  "aggregateResults": true
}

Embedding Statistics

GET /api/semantic-search/stats

Response:

{
  "success": true,
  "data": {
    "api::article.article": {
      "total": 50,
      "withEmbeddings": 50,
      "coverage": "100.00%"
    },
    "api::blog.blog": {
      "total": 25, 
      "withEmbeddings": 23,
      "coverage": "92.00%"
    }
  }
}

Configuration

Supported Content Types

The plugin automatically processes these content types:

  • Any content type that starts with api::
  • Excludes admin and plugin content types
  • Configurable in the lifecycle registration

Text Field Mapping

Default fields processed for embedding generation:

const textFields = [
  'title', 'name', 'content', 'body', 
  'summary', 'description', 'excerpt'
];

Search Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | required | Search query text | | contentType | string | required | Strapi content type UID | | limit | number | 10 | Maximum results (max: 50) | | threshold | number | 0.1 | Minimum similarity score | | filters | object | {} | Additional database filters |

Architecture

Plugin Structure

semantic-search/
├── package.json           # Plugin metadata and dependencies
├── strapi-server.js       # Plugin entry point
└── server/
    ├── index.js           # Server exports
    └── src/
        ├── index.js       # Main plugin logic
        ├── controllers/   # API request handlers
        │   ├── index.js
        │   └── search-controller.js
        ├── services/      # Business logic
        │   ├── index.js
        │   ├── embedding-service.js    # OpenAI integration
        │   ├── vector-service.js       # Similarity calculations
        │   └── search-service.js       # Search orchestration
        └── routes/        # API endpoint definitions
            └── index.js

Data Flow

  1. Content Creation/Update → Lifecycle hook triggered
  2. Text Extraction → Combine relevant text fields
  3. OpenAI API Call → Generate 1536-dimension embedding
  4. Database Storage → Save embedding in JSON field
  5. Search Request → Convert query to embedding
  6. Similarity Search → Calculate cosine similarity
  7. Result Ranking → Sort by similarity score

Vector Storage

Embeddings are stored as JSON in your existing database:

{
  "embedding": [0.1234, -0.5678, 0.9012, ...], // 1536 dimensions
  "embeddingMetadata": {
    "model": "text-embedding-ada-002",
    "generatedAt": "2025-01-15T10:30:00.000Z",
    "dimensions": 1536,
    "processedText": "Machine learning fundamentals...",
    "originalLength": 1250,
    "processedLength": 1180
  }
}

Similarity Scores

Understanding similarity score ranges:

| Score Range | Relevance | Description | |-------------|-----------|-------------| | 0.85 - 1.0 | Highly Relevant | Direct topic match | | 0.75 - 0.85 | Relevant | Related concepts | | 0.65 - 0.75 | Somewhat Relevant | Tangential connection | | 0.1 - 0.65 | Low Relevance | Weak semantic relation |

Performance

Benchmarks

  • Embedding Generation: 1-3 seconds per document
  • Search Latency: ~250ms end-to-end
  • Vector Comparison: ~50ms for 1000 documents
  • Memory Usage: ~1.5KB per embedding
  • Storage Overhead: ~6KB per document (embedding + metadata)

Optimization Tips

  1. Batch Processing: Process multiple embeddings in parallel
  2. Caching: Cache frequently searched embeddings
  3. Indexing: Add database indexes on frequently filtered fields
  4. Text Preprocessing: Optimize text extraction for your content types

Development

Local Development

# Install plugin dependencies
cd src/plugins/semantic-search
npm install

# Return to project root
cd ../../../

# Start development server
npm run develop

Testing

# Test embedding generation
curl -X POST http://localhost:1337/api/semantic-search/search \
-H "Content-Type: application/json" \
-d '{"query": "test query", "contentType": "api::article.article"}'

# Check statistics
curl http://localhost:1337/api/semantic-search/stats

Debugging

Enable debug logging in your Strapi configuration:

// config/logger.js
module.exports = {
  level: 'debug',
  transports: [
    {
      type: 'console',
      options: {
        pool: true,
        format: 'combined',
        level: 'debug'
      }
    }
  ]
};

Production Deployment

Environment Variables

# Required
OPENAI_API_KEY=your_production_openai_key

# Optional
NODE_ENV=production

Security Considerations

  1. API Authentication: Enable authentication for search endpoints
  2. Rate Limiting: Configure appropriate rate limits
  3. Input Validation: Validate search queries and parameters
  4. API Key Security: Secure OpenAI API key storage

Monitoring

Monitor these key metrics:

  • Embedding generation success rate
  • Search response times
  • OpenAI API usage and costs
  • Similarity score distributions
  • Search query patterns

Extending the Plugin

Custom Content Type Support

Add support for additional content types:

// In registerEmbeddingLifecycles function
const contentTypes = [
  'api::article.article',
  'api::blog.blog', 
  'api::product.product',  // Add your content type
  'api::course.course'     // Add another content type
];

Custom Text Extraction

Modify text field extraction:

// In extractTextContent function
const textFields = [
  'title', 'content', 'summary',
  'customField',      // Add custom field
  'description'       // Add more fields as needed
];

Advanced Filtering

Add custom filtering logic:

// In search service
const searchWithCustomFilters = async (query, contentType, customFilters) => {
  const filters = {
    ...customFilters,
    publishedAt: { $notNull: true },  // Only published content
    featured: true                     // Only featured content
  };
  
  return await semanticSearch(query, contentType, { filters });
};

Related Documentation

Troubleshooting

Common Issues

1. Plugin not loading:

  • Check config/plugins.ts configuration
  • Verify plugin path resolution
  • Check for syntax errors in plugin files

2. Embeddings not generating:

  • Verify OpenAI API key is valid
  • Check network connectivity to OpenAI
  • Review content type field configuration

3. Search returning no results:

  • Verify embedding field exists on content type
  • Check similarity threshold (try lowering to 0.1)
  • Ensure content has embeddings generated

4. Performance issues:

  • Monitor OpenAI API rate limits
  • Check database query performance
  • Consider caching strategies

Debug Commands

# Check plugin loading
tail -f logs/strapi.log | grep "semantic-search"

# Test OpenAI connectivity
curl -X POST https://api.openai.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "test", "model": "text-embedding-ada-002"}'

# Validate embedding storage
curl http://localhost:1337/api/semantic-search/stats

License

MIT License

Contributing

This plugin is part of the semantic search demo project. Contributions and improvements are welcome.


Built with OpenAI Embeddings and Strapi 5