npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

payloadcms-vectorize

v0.4.1

Published

A plugin to vectorize collections for RAG in Payload 3.0

Readme

PayloadCMS Vectorize

A Payload CMS plugin that adds vector search capabilities to your collections using PostgreSQL's pgvector extension. Perfect for building RAG (Retrieval-Augmented Generation) applications and semantic search features.

Features

  • 🔍 Semantic Search: Vectorize any collection for intelligent content discovery
  • 🚀 Automatic: Documents are automatically vectorized when created or updated, and vectors are deleted as soon as the document is deleted.
  • 📊 PostgreSQL Integration: Built on pgvector for high-performance vector operations
  • Background Processing: Uses Payload's job system for non-blocking vectorization
  • 🎯 Flexible Chunking: Drive chunk creation yourself with toKnowledgePool functions so you can combine any fields or content types
  • 🧩 Extensible Schema: Attach custom extensionFields to the embeddings collection and persist values per chunk and use for querying.
  • 🌐 REST API: Built-in vector-search endpoint with Payload-style where filtering and configurable limits
  • 🏊 Multiple Knowledge Pools: Separate knowledge pools with independent configurations (dims, ivfflatLists, embedding functions) and needs.

Prerequisites

  • Only tested on Payload CMS 3.37.0+
  • PostgreSQL with pgvector extension
  • Node.js 18+

Installation

pnpm add payloadcms-vectorize

Quick Start

0. Install pgvector

The plugin automatically creates the vector extension when Payload initializes. However, your PostgreSQL database user must have permission to create extensions. If your user doesn't have these permissions, you may need to manually create the extension once:

CREATE EXTENSION IF NOT EXISTS vector;

Note: Most managed PostgreSQL services (like AWS RDS, Supabase, etc.) require superuser privileges or specific extension permissions. If you encounter permission errors, contact your database administrator or check your service's documentation.

1. Configure the Plugin

import { buildConfig } from 'payload'
import type { Payload } from 'payload'
import { postgresAdapter } from '@payloadcms/db-postgres'
import { createVectorizeIntegration } from 'payloadcms-vectorize'
import type { ToKnowledgePoolFn } from 'payloadcms-vectorize'

// Configure your embedding functions
const embedDocs = async (texts: string[]) => {
  // Your embedding logic here
  return texts.map((text) => /* vector array */)
}

const embedQuery = async (text: string) => {
  // Your query embedding logic here
  return /* vector array */
}

// Optional chunker helpers (see dev/helpers/chunkers.ts for ideas)
const chunkText = async (text: string, payload: Payload) => {
  return /* string array */
}

const chunkRichText = async (richText: any, payload: Payload) => {
  return /* string array */
}

// Convert a document into chunks + extension-field values
const postsToKnowledgePool: ToKnowledgePoolFn = async (doc, payload) => {
  const entries: Array<{ chunk: string; category?: string; priority?: number }> = []

  const titleChunks = await chunkText(doc.title ?? '', payload)
  titleChunks.forEach((chunk) =>
    entries.push({
      chunk,
      category: doc.category ?? 'general',
      priority: Number(doc.priority ?? 0),
    }),
  )

  const contentChunks = await chunkRichText(doc.content, payload)
  contentChunks.forEach((chunk) =>
    entries.push({
      chunk,
      category: doc.category ?? 'general',
      priority: Number(doc.priority ?? 0),
    }),
  )

  return entries
}

// Create the integration with static configs (dims, ivfflatLists)
const { afterSchemaInitHook, payloadcmsVectorize } = createVectorizeIntegration({
  // Note limitation: Changing these values requires a migration.
  main: {
    dims: 1536, // Vector dimensions
    ivfflatLists: 100, // IVFFLAT index parameter
  },
})

export default buildConfig({
  // ... your existing config
  db: postgresAdapter({
    extensions: ['vector'],
    // afterSchemaInitHook adds 'vector' to your schema
    afterSchemaInit: [afterSchemaInitHook],
    // ... your database config
  }),
  plugins: [
    payloadcmsVectorize({
      knowledgePools: {
        main: {
          collections: {
            posts: {
              toKnowledgePool: postsToKnowledgePool,
            },
          },
          extensionFields: [
            { name: 'category', type: 'text' },
            { name: 'priority', type: 'number' },
          ],
          embedDocs,
          embedQuery,
          embeddingVersion: 'v1.0.0',
        },
      },
      // Optional plugin options:
      // queueName: 'custom-queue',
      // endpointOverrides: { path: '/custom-vector-search', enabled: true }, // will be /api/custom-vector-search
      // disabled: false,
    }),
  ],
})

Important: knowledgePools must have different names than your collections—reusing a collection name for a knowledge pool will cause schema conflicts. (In this example, the knowledge pool is named 'main' and a collection named 'main' will be created.)

2. Search Your Content

The plugin automatically creates a /api/vector-search endpoint:

const response = await fetch('/api/vector-search', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'What is machine learning?', // Required
    knowledgePool: 'main', // Required
    where: {
      category: { equals: 'guides' }, // Optional Payload-style filter
    },
    limit: 5, // Optional (defaults to 10)
  }),
})

const { results } = await response.json()
// Each result contains: id, similarity, sourceCollection, docId, chunkIndex, chunkText,
// embeddingVersion, and any extensionFields you attached (e.g., category, priority).

Configuration Options

Plugin Options

| Option | Type | Required | Description | | ------------------- | --------------------------------------------------- | -------- | ---------------------------------------- | | knowledgePools | Record<KnowledgePool, KnowledgePoolDynamicConfig> | ✅ | Knowledge pools and their configurations | | queueName | string | ❌ | Custom queue name for background jobs | | endpointOverrides | object | ❌ | Customize the search endpoint | | disabled | boolean | ❌ | Disable plugin while keeping schema |

Knowledge Pool Config

Knowledge pools are configured in two steps. The static configs define the database schema (migration required), while dynamic configs define runtime behavior (no migration required).

1. Static Config (passed to createVectorizeIntegration):

  • dims: number - Vector dimensions for pgvector column
  • ivfflatLists: number - IVFFLAT index parameter

The embeddings collection name will be the same as the knowledge pool name.

2. Dynamic Config (passed to payloadcmsVectorize):

  • collections: Record<string, CollectionVectorizeOption> - Collections and their chunking configs
  • embedDocs: EmbedDocsFn - Function to embed multiple documents
  • embedQuery: EmbedQueryFn - Function to embed search queries
  • embeddingVersion: string - Version string for tracking model changes
  • extensionFields?: Field[] - Optional fields to extend the embeddings collection schema

CollectionVectorizeOption

  • toKnowledgePool (doc, payload) – return an array of { chunk, ...extensionFieldValues }. Each object becomes one embedding row and the index in the array determines chunkIndex.

Reserved column names: sourceCollection, docId, chunkIndex, chunkText, embeddingVersion. Avoid reusing them in extensionFields.

Chunkers

Use chunker helpers (see dev/helpers/chunkers.ts) to keep toKnowledgePool implementations focused on orchestration. A toKnowledgePool can combine multiple chunkers, enrich each chunk with metadata, and return everything the embeddings collection needs.

const postsToKnowledgePool: ToKnowledgePoolFn = async (doc, payload) => {
  const chunks = await chunkText(doc.title ?? '', payload)

  return chunks.map((chunk) => ({
    chunk,
    category: doc.category ?? 'general',
  }))
}

Because you control the output, you can mix different field types, discard empty values, or inject any metadata that aligns with your extensionFields.

PostgreSQL Custom Schema Support

The plugin reads the schemaName configuration from your Postgres adapter within the Payload config.

When you configure a custom schema via postgresAdapter({ schemaName: 'custom' }), all plugin SQL queries (for vector columns, indexes, and embeddings) are qualified with that schema name. This is useful for multi-tenant setups or when content tables live in a dedicated schema.

Where schemaName is not specified within the postgresAdapter in the Payload config, the plugin falls back to public as is default adapter behaviour.

Example

Using with Voyage AI

import { voyageEmbedDocs, voyageEmbedQuery } from 'voyage-ai-provider'

export const embedDocs = async (texts: string[]): Promise<number[][]> => {
  const embedResult = await embedMany({
    model: voyage.textEmbeddingModel('voyage-3.5-lite'),
    values: texts,
    providerOptions: {
      voyage: { inputType: 'document' },
    },
  })
  return embedResult.embeddings
}
export const embedQuery = async (text: string): Promise<number[]> => {
  const embedResult = await embed({
    model: voyage.textEmbeddingModel('voyage-3.5-lite'),
    value: text,
    providerOptions: {
      voyage: { inputType: 'query' },
    },
  })
  return embedResult.embedding
}

API Reference

Search Endpoint

POST /api/vector-search

Search for similar content using vector similarity.

Request Body:

{
  "query": "Your search query",
  "knowledgePool": "main",
  "where": {
    "category": { "equals": "guides" },
    "priority": { "gte": 3 },
  },
  "limit": 5,
}

Parameters

  • query (required): Search query string
  • knowledgePool (required): Knowledge pool identifier to search in
  • where (optional): Payload-style Where clause evaluated against the embeddings collection + any extensionFields
  • limit (optional): Maximum results to return (defaults to 10)

Response:

{
  "results": [
    {
      "id": "embedding_id",
      "similarity": 0.85,
      "sourceCollection": "posts",
      "docId": "post_id",
      "chunkIndex": 0,
      "chunkText": "Relevant text chunk",
      "embeddingVersion": "v1.0.0",
      "category": "guides", // example extension field
      "priority": 4, // example extension field
    },
  ],
}

Changelog

See CHANGELOG.md for release history, migration notes, and upgrade guides.

Requirements

  • Payload CMS ^3.37.0
  • PostgreSQL with pgvector extension
  • Node.js ^18.20.2

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

⭐ Star This Repository

If you find this plugin useful, please give it a star! Stars help us understand how many developers are using this plugin and directly influence our development priorities. More stars = more features, better performance, and faster bug fixes.

🐛 Report Issues & Request Features

Help us prioritize development by opening issues for:

  • Bugs: Something not working as expected
  • Feature requests: New functionality you'd like to see
  • Improvements: Ways to make existing features better
  • Documentation: Missing or unclear information
  • Questions: I'll answer through the issues.

The more detailed your issue, the better I can understand and address your needs. Issues with community engagement (reactions, comments) get higher priority!

🗺️ Roadmap

Thank you for the stars! The following updates have been completed:

  • Multiple Knowledge Pools: You can create separate knowledge pools with independent configurations (dims, ivfflatLists, embedding functions) and needs. Each pool operates independently, allowing you to organize your vectorized content by domain, use case, or any other criteria that makes sense for your application.
  • More expressive queries: Added ability to change query limit, search on certain collections or certain fields

The following features are planned for future releases based on community interest and stars:

  • Migrations for vector dimensions: Easy migration tools for changing vector dimensions and/or ivfflatLists after initial setup
  • MongoDB support: Extend vector search capabilities to MongoDB databases
  • Vercel support: Optimized deployment and configuration for Vercel hosting
  • Batch embedding: More efficient bulk embedding operations for large datasets
  • 'Embed all' button: Admin UI button to re-embed all content after embeddingVersion changes

Want to see these features sooner? Star this repository and open issues for the features you need most!