npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

mikser-io-vector

v1.1.1

Published

OpenAI embeddings + sqlite-vec storage and search for mikser-io

Readme

mikser-io-vector

OpenAI embeddings + sqlite-vec storage and search for mikser-io. Indexes entities as they flow through the lifecycle, exposes a findSimilar() runtime helper, and (when a shared Express app is available) mounts a POST /vector/:storeName HTTP search endpoint.

Install

npm install mikser-io-vector

Configure

// mikser.config.js
export default {
  plugins: ['documents', 'layouts', 'render-hbs', 'api', 'vector'],

  vector: {
    // Backend selection. Defaults to local sqlite-vec.
    //   'better-sqlite3' | 'sqlite' | 'sqlite3'     → sqlite-vec
    //   'pg' | 'postgres' | 'postgresql'            → pgvector
    client: 'better-sqlite3',

    // Connection — interpreted per driver:
    //   sqlite: { filename } (defaults to <runtimeFolder>/vectors.db)
    //   pg:     a libpq URL string, or pg.PoolConfig, or omit and use
    //           PGHOST / PGUSER / PGPASSWORD / PGDATABASE / PGSSLMODE.
    // connection: process.env.DATABASE_URL,

    openai: {
      apiKey: process.env.OPENAI_API_KEY,    // or set OPENAI_API_KEY directly
      model: 'text-embedding-3-small',       // default
      dim: 1536,                              // default; must match the model
      // baseURL: 'https://...',              // optional, for Azure / self-hosted
    },

    base: '/vector',                      // HTTP mount path; default '/vector'
    concurrency: 4,                       // parallel OpenAI calls per store; default 4 — per-store override via stores[name].concurrency

    // Multiple named stores. Mirrors the data plugin's
    // (query, map, pick) shape so the same mental model applies.
    stores: {
      documents: {
        // Which entities go into this store. Defaults to
        // `entity => entity.type === 'document'` when omitted.
        // query: entity => entity.type === 'document',

        // Either return a plain object from `map`...
        map: async (entity) => ({
          title: entity.meta?.title,
          tags: entity.meta?.tags,
          content: entity.content,
        }),

        // ...OR a `pick` list of paths.
        // pick: ['meta.title', 'meta.tags', 'content'],
      },

      // Add as many stores as you need; each gets its own vec0 table.
      layouts: {
        query: entity => entity.type === 'layout',
        pick: ['name'],

        // Optional: protect this store's HTTP endpoint with a bearer token.
        // Programmatic findSimilar() is unaffected — auth is HTTP-only.
        token: process.env.VECTOR_LAYOUTS_TOKEN,
      },
    },
  },
}

Provide your OpenAI key either inline (vector.openai.apiKey) or as OPENAI_API_KEY in the environment.

How it indexes

The plugin hooks onBeforeRender and iterates the journal for CREATE, UPDATE, and DELETE operations. For each store:

  1. Apply query(entity) to filter — defaults to entity => entity.type === 'document' when not provided.
  2. Build a plain object via map(entity) (async, must return an object) or pick (path → value). If both are empty, entity.content is embedded as-is.
  3. Serialize the object via TOON — a compact, schema-aware textual format that's lighter on tokens than JSON and gives the embedding model a cleaner signal than ad-hoc string concatenation.
  4. Compute the embedding via OpenAI and upsert into the store's vec0 table.
  5. Deletes remove the vector and its rowid mapping.

In watch mode, only changed entities are re-embedded each cycle. In a one-shot build every CREATE re-embeds — keep that in mind for API cost.

Search — programmatic

import { runtime } from 'mikser-io'
// after runtime.start() once the plugin's onLoaded ran

const results = await runtime.findSimilar('documents', 'how do I publish a report', { limit: 5 })
// → [
//     {
//       id: '/documents/en/report.md',
//       distance: 0.123,
//       data: { title: 'Mikser Quarterly Report', content: '...' },
//     },
//     ...
//   ]

data is the original object returned by your map(entity) (or built from pick) — the thing that was TOON-encoded before embedding. Use it to surface human-readable metadata alongside the score without a second lookup.

Search — HTTP

Requires a shared Express app (--server or setup({ app })). The plugin mounts POST /vector/:storeName:

curl -X POST http://localhost:3001/vector/documents \
  -H 'content-type: application/json' \
  -d '{ "q": "how do I publish a report", "limit": 5 }'

# {
#   "results": [
#     {
#       "id": "/documents/en/report.md",
#       "distance": 0.123,
#       "data": { "title": "Mikser Quarterly Report", "content": "..." }
#     },
#     ...
#   ]
# }

q is required; limit defaults to 5.

Authentication

A store may declare a token — when set, its HTTP endpoint requires Authorization: Bearer <token>. Stores without a token remain open. The programmatic runtime.findSimilar() is never gated by tokens.

curl -X POST http://localhost:3001/vector/layouts \
  -H 'authorization: Bearer s3cr3t' \
  -H 'content-type: application/json' \
  -d '{ "q": "report layout", "limit": 3 }'

# Missing/wrong token → 401 { "error": "Invalid or missing token" }

Storage

sqlite-vec (client: 'better-sqlite3') — vectors live in <runtimeFolder>/vectors.db. Each configured store has two tables: mikser_vector_<storeName> (the vec0 virtual table) and mikser_vector_<storeName>_ids (a regular table mapping string entity_id to numeric rowid and holding the JSON data payload). Wipe with --clear to start fresh — every entity will be re-embedded on the next run.

pgvector (client: 'pg') — one table per store: mikser_vector_<storeName> (id TEXT PRIMARY KEY, embedding vector(N), data jsonb), plus an HNSW index using vector_cosine_ops. Requires the vector extension on the database (Neon and Supabase have it pre-installed; vanilla Postgres needs CREATE EXTENSION vector by a superuser). --clear TRUNCATEs every configured store table so the next run re-embeds from scratch.

Both backends use cosine distance, so values are comparable when switching backends with the same embedding model.

Notes

  • sqlite-vec uses FLAT (brute-force) search — plenty fast up to ~100K vectors. Beyond that, use pgvector with its HNSW index.
  • Embedding model and dimensions can be changed, but the existing schema is fixed at create time. If you change dim, drop the vector tables so they get re-created.
  • The plugin requires runtime.options.app for HTTP search but not for programmatic search — findSimilar() works either way.

License

MIT