npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dotdo/pg-search

v0.1.1

Published

BM25 full-text search with Orama + PGlite and Cap'n Web RPC for hibernated WebSocket connections

Readme

@dotdo/pg-search

BM25 full-text search with Orama + PGlite and Cap'n Web RPC for hibernated WebSocket connections

npm version License: MIT

import { OramaBlobStore } from '@dotdo/pg-search/blob'

const store = new OramaBlobStore({
  schema: { title: 'string', content: 'string', tags: 'string[]' }
})

await store.init()
await store.insert({ title: 'Hello World', content: 'Full-text search at the edge', tags: ['search'] })

const results = await store.search({ term: 'edge' })
console.log(results.documents) // [{ id: '...', title: 'Hello World', ... }]

Installation

npm install @dotdo/pg-search @dotdo/pglite @orama/orama
# or
pnpm add @dotdo/pg-search @dotdo/pglite @orama/orama

Features

  • BM25 Full-Text Search - Industry-standard ranking algorithm via Orama
  • Three Storage Patterns - Blob, Sync, and SQL patterns for different use cases
  • Cap'n Web RPC - Promise pipelining over hibernated WebSockets (95% cost savings)
  • Durable Object Integration - SearchDO class for Cloudflare Workers
  • Type-Safe Client - Full TypeScript support with SearchRpcClient
  • PGlite Persistence - Documents stored in PostgreSQL, index in Orama

Storage Patterns

This package provides three storage patterns, each optimized for different use cases:

1. Blob Pattern (OramaBlobStore)

Stores the entire Orama index as a serialized blob in PGlite. Best for:

  • Cold start scenarios where index needs to be restored
  • Periodic checkpointing for durability
  • Simple persistence without complex sync logic
import { OramaBlobStore } from '@dotdo/pg-search/blob'

const store = new OramaBlobStore({
  schema: {
    title: 'string',
    content: 'string',
    tags: 'string[]',
  },
  indexName: 'articles',
})

await store.init()

// Insert documents
const doc = await store.insert({
  title: 'TypeScript Guide',
  content: 'Learn TypeScript from scratch with practical examples...',
  tags: ['typescript', 'javascript', 'tutorial'],
})

// Search with BM25 ranking
const results = await store.search({
  term: 'TypeScript tutorial',
  limit: 10,
  hydrate: true,
})

console.log(`Found ${results.count} matches in ${results.elapsed.formatted}`)
for (const hit of results.hits) {
  console.log(`${hit.score.toFixed(2)}: ${hit.document.title}`)
}

// Checkpoint index to PGlite for persistence
await store.checkpoint()

// Close when done
await store.close()

Trade-offs:

  • Checkpoint is O(n) - must serialize entire index
  • Space efficient for small-medium indices
  • No partial updates - full serialization on checkpoint

2. Sync Pattern (OramaSyncStore)

Maintains Orama index in sync with PGlite on every write operation. Best for:

  • Real-time search requirements
  • Strong consistency guarantees
  • Applications where search results must reflect latest data
import { OramaSyncStore } from '@dotdo/pg-search/sync'

const store = new OramaSyncStore({
  schema: {
    title: 'string',
    body: 'string',
    author: 'string',
  },
})

await store.init()

// Every write syncs to both PGlite and Orama
await store.insert({
  title: 'Breaking News',
  body: 'Important update about...',
  author: 'John Doe',
})

// Search immediately reflects the new document
const results = await store.search({ term: 'breaking' })

// Verify consistency between index and database
const { consistent, mismatches } = await store.verify()
if (!consistent) {
  console.log('Mismatches found:', mismatches)
  await store.rebuildIndex() // Rebuild from PGlite data
}

await store.close()

Trade-offs:

  • Higher write latency (dual-write to PGlite and Orama)
  • More complex error handling with rollback support
  • Built-in consistency verification

3. SQL Pattern (OramaSqlStore)

Registers Orama search as a virtual SQL function in PGlite. Best for:

  • Complex queries combining full-text search with SQL joins
  • Existing SQL-based applications
  • Hybrid queries (search + filter + aggregate)
import { OramaSqlStore } from '@dotdo/pg-search/sql'

const store = new OramaSqlStore({
  schema: {
    name: 'string',
    description: 'string',
    price: 'number',
  },
  tableName: 'products',
})

await store.init()

// Insert products
await store.insertMany([
  { name: 'Laptop', description: 'Powerful laptop for developers', price: 1299 },
  { name: 'Keyboard', description: 'Mechanical keyboard with RGB', price: 149 },
  { name: 'Monitor', description: 'Ultra-wide developer monitor', price: 499 },
])

// Execute search and store results for SQL joins
const searchId = await store.executeSearch('developer', { limit: 100 })

// Join search results with other SQL operations
const results = await store.query(`
  SELECT d.data, s.score, s.rank
  FROM products d
  JOIN _orama_search_results s ON d.id = s.doc_id
  WHERE s.search_id = $1
    AND (d.data->>'price')::numeric < 1000
  ORDER BY s.rank
`, [searchId])

// Or use the convenience method
const combined = await store.searchWithSql('developer', {
  additionalSql: `AND (d.data->>'price')::numeric < 1000`,
})

await store.close()

Trade-offs:

  • More complex implementation
  • Function registration overhead
  • Results must be serializable to SQL types

Durable Object Integration

The SearchDO class provides a complete Durable Object implementation with both REST API and Cap'n Web RPC support.

Setting Up SearchDO

// src/index.ts
import { SearchDO } from '@dotdo/pg-search/do'

export { SearchDO }

export default {
  async fetch(request: Request, env: Env) {
    const id = env.SEARCH.idFromName('main')
    const stub = env.SEARCH.get(id)
    return stub.fetch(request)
  }
}
# wrangler.toml
[[durable_objects.bindings]]
name = "SEARCH"
class_name = "SearchDO"

[[migrations]]
tag = "v1"
new_classes = ["SearchDO"]

REST API

SearchDO exposes the following REST endpoints:

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /search?q=term&limit=10 | Search documents | | GET | /documents/:id | Get a document by ID | | POST | /documents | Index a new document | | POST | /documents/batch | Index multiple documents | | DELETE | /documents/:id | Delete a document | | GET | /stats | Get index statistics | | POST | /indexes | Create a new search index | | GET | /health | Health check |

Cap'n Web RPC Client

For high-performance scenarios, use the typed RPC client with promise pipelining:

import { SearchRpcClient } from '@dotdo/pg-search/rpc'

interface Article {
  id: string
  title: string
  content: string
  author: string
}

const client = new SearchRpcClient<Article>('wss://search.example.com/rpc', {
  autoReconnect: true,
  maxReconnectAttempts: 5,
})

// Individual calls
const results = await client.search({ term: 'typescript' })

// Get a document
const doc = await client.get('article-123')

// Update a document
await client.update('article-123', { title: 'Updated Title' })

// Close connection when done
client.close()

Promise Pipelining (Batched Operations)

Promise pipelining dramatically reduces latency by batching multiple operations into a single round-trip:

// Without pipelining: 3 round-trips (~150ms at 50ms latency)
const doc1 = await client.index({ title: 'First', content: '...' })
const doc2 = await client.index({ title: 'Second', content: '...' })
const results = await client.search({ term: 'first' })

// With pipelining: 1 round-trip (~50ms)
const batch = client.batch()
const doc1Promise = batch.index({ title: 'First', content: '...' })
const doc2Promise = batch.index({ title: 'Second', content: '...' })
const resultsPromise = batch.search({ term: 'first' })
await batch.execute()

// All promises are now resolved
const doc1 = await doc1Promise
const doc2 = await doc2Promise
const results = await resultsPromise

Search Options

All store implementations support the same search options:

interface SearchOptions {
  /** Search term - required */
  term: string

  /** Fields to search in (default: all indexed fields) */
  properties?: string[]

  /** Maximum results to return (default: 10) */
  limit?: number

  /** Offset for pagination (default: 0) */
  offset?: number

  /** Whether to hydrate full documents from PGlite (default: true) */
  hydrate?: boolean

  /** BM25 algorithm parameters */
  bm25?: {
    k?: number   // Term frequency saturation (default: 1.2)
    b?: number   // Length normalization (default: 0.75)
    d?: number   // TF normalization (default: 0.5)
  }
}

Search Examples

// Basic search
const results = await store.search({ term: 'javascript' })

// Search specific fields
const results = await store.search({
  term: 'tutorial',
  properties: ['title', 'tags'],
})

// Pagination
const page2 = await store.search({
  term: 'programming',
  limit: 20,
  offset: 20,
})

// Just get IDs and scores (skip hydration for performance)
const results = await store.search({
  term: 'react',
  hydrate: false,
})

// Custom BM25 parameters
const results = await store.search({
  term: 'typescript',
  bm25: {
    k: 1.5,   // Increase term frequency importance
    b: 0.5,   // Reduce document length normalization
  },
})

Understanding Full-Text Search: BM25 vs tsvector/tsquery

PostgreSQL Native: tsvector/tsquery

PostgreSQL provides built-in full-text search using tsvector (document representation) and tsquery (query representation):

-- Create a tsvector from text
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- Search using tsquery
SELECT * FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'quick & fox');

How tsvector works:

  • Normalizes text (lowercasing, stemming)
  • Removes stop words ("the", "a", "is")
  • Creates position-aware tokens
  • Supports language-specific configurations

tsquery operators:

  • & (AND): 'cat & dog' - both terms required
  • | (OR): 'cat | dog' - either term
  • ! (NOT): 'cat & !dog' - cat but not dog
  • <-> (FOLLOWED BY): 'quick <-> fox' - phrase search
-- Create a GIN index for fast searches
CREATE INDEX idx_content_fts ON documents USING GIN(to_tsvector('english', content));

-- Rank results using ts_rank
SELECT title, ts_rank(to_tsvector('english', content), query) AS rank
FROM documents, to_tsquery('english', 'typescript & tutorial') AS query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;

Orama BM25 (This Package)

BM25 (Best Matching 25) is a probabilistic ranking algorithm used by search engines like Elasticsearch and Lucene:

// BM25 search with Orama
const results = await store.search({
  term: 'typescript tutorial',
  bm25: {
    k: 1.2,   // Term frequency saturation
    b: 0.75,  // Document length normalization
  },
})

BM25 formula components:

  • TF (Term Frequency): How often the term appears, with saturation (diminishing returns)
  • IDF (Inverse Document Frequency): Rare terms weighted more heavily
  • Document Length: Normalized against average document length

Comparison: tsvector/tsquery vs BM25

| Feature | tsvector/tsquery | BM25 (Orama) | |---------|-----------------|--------------| | Ranking | Basic ts_rank | Probabilistic, tunable | | Setup | Built into PostgreSQL | Requires Orama library | | Boolean queries | Native (&, |, !) | Via code logic | | Phrase search | Native (<->) | Supported | | Relevance tuning | Weight classes (A-D) | k, b, d parameters | | Stemming | Language-specific dictionaries | Automatic | | Index storage | In database (GIN/GiST) | In-memory + checkpoint | | Query flexibility | SQL integration | JavaScript API |

Why This Package Uses BM25

  1. Better Relevance Ranking: BM25's probabilistic model typically produces more relevant results than ts_rank

  2. Tunable Parameters: Easily adjust ranking behavior for your use case:

    // Favor exact matches over frequency
    bm25: { k: 0.5, b: 0.3 }
    
    // Favor comprehensive documents
    bm25: { k: 1.5, b: 0.9 }
  3. In-Memory Speed: Orama searches are typically sub-millisecond, while SQL queries have overhead

  4. JavaScript Integration: Native TypeScript/JavaScript API without SQL string building

When to Use tsvector/tsquery Instead

Use PostgreSQL's native FTS when you need:

  • SQL Integration: Complex JOINs with search results
  • No Additional Dependencies: Pure PostgreSQL solution
  • Stored Procedures: Database-side search logic
  • Large Datasets: When Orama's in-memory index becomes too large

You can use the SQL pattern (OramaSqlStore) to get both:

// Combine Orama BM25 ranking with PostgreSQL queries
const searchId = await store.executeSearch('typescript', { limit: 100 })

const results = await store.query(`
  SELECT d.data, s.score
  FROM documents d
  JOIN _orama_search_results s ON d.id = s.doc_id
  WHERE s.search_id = $1
    AND d.data @> '{"category": "tutorials"}'
  ORDER BY s.rank
`, [searchId])

BM25 Full-Text Search vs Vector Similarity Search

Understanding when to use full-text search (FTS) versus vector similarity search is crucial for building effective search systems.

Full-Text Search (BM25)

BM25 (Best Matching 25) is a probabilistic ranking function that scores documents based on:

  • Term Frequency (TF): How often search terms appear in the document
  • Inverse Document Frequency (IDF): How rare the term is across all documents
  • Document Length Normalization: Prevents long documents from being unfairly favored

Strengths:

  • Exact keyword matching - "TypeScript" matches "TypeScript"
  • Interpretable results - you can explain why a document matched
  • Fast indexing - no ML model required
  • Low latency - typically <1ms for search
  • Great for structured queries - "error AND production"

Use Cases:

  • Documentation search
  • E-commerce product search with specific terms
  • Log search
  • Code search
  • When users search for exact terms
// BM25 search - exact keyword matching
const results = await store.search({ term: 'TypeScript React' })
// Matches documents containing "TypeScript" and/or "React"

Vector Similarity Search (pgvector)

Vector search uses embeddings (dense numerical representations) to find semantically similar content.

Strengths:

  • Semantic understanding - "automobile" matches "car"
  • Handles typos and variations naturally
  • Multilingual search without translation
  • Great for natural language queries
  • Finds conceptually similar content

Use Cases:

  • Semantic search / Q&A
  • Recommendation systems
  • Image/audio similarity
  • When users describe what they want in natural language
-- Vector search - semantic matching
SELECT * FROM items
ORDER BY embedding <=> query_embedding
LIMIT 10;
-- Finds conceptually similar items even without keyword overlap

Comparison Table

| Aspect | BM25 (FTS) | Vector Search | |--------|------------|---------------| | Matching | Exact keywords | Semantic similarity | | Query type | Keywords, boolean operators | Natural language | | Speed | <1ms | 1-10ms (depends on index) | | Index size | Small (~10% of data) | Large (embeddings ~3KB per doc) | | Interpretability | High (keyword hits) | Low (similarity scores) | | Setup | Simple | Requires embedding model | | Typo tolerance | Low (needs stemming) | High | | Cost | CPU only | CPU + GPU for embeddings |

Hybrid Search

For many applications, combining both approaches yields the best results:

interface HybridSearchOptions {
  term: string
  vector?: number[]
  bm25Weight?: number   // 0-1, default: 0.5
  vectorWeight?: number // 0-1, default: 0.5
}

// Hybrid search combines keyword relevance with semantic understanding
const results = await store.hybridSearch({
  term: 'react state management',
  vector: await embed('react state management'),
  bm25Weight: 0.4,
  vectorWeight: 0.6,
})

When to Use Each

| Scenario | Recommendation | |----------|----------------| | User searches "error 404 not found" | BM25 - exact error codes matter | | User searches "how to fix page not loading" | Vector - semantic intent | | Product search "blue nike shoes size 10" | BM25 - specific attributes | | "Something similar to this article" | Vector - conceptual similarity | | Code search "useEffect cleanup" | BM25 - exact function names | | Documentation "how does auth work" | Hybrid - both keyword and concept |

Performance Tips

Indexing

// Use insertMany for bulk operations
await store.insertMany(documents) // 10x faster than individual inserts

// Checkpoint periodically, not after every write
let writeCount = 0
for (const doc of documents) {
  await store.insert(doc)
  if (++writeCount % 100 === 0) {
    await store.checkpoint()
  }
}

Schema Design

// Index only searchable fields
const store = new OramaBlobStore({
  schema: {
    title: 'string',      // Searchable
    content: 'string',    // Searchable
    // Don't index: id, timestamps, relations
  }
})

// Store full document data in PGlite, search fields in Orama

Search Optimization

// Skip hydration when you only need IDs/scores
const results = await store.search({
  term: 'query',
  hydrate: false,  // 2-3x faster
})

// Limit fields to search
const results = await store.search({
  term: 'query',
  properties: ['title'],  // Don't search large content field
})

// Use pagination instead of large limits
const page1 = await store.search({ term: 'query', limit: 20 })
const page2 = await store.search({ term: 'query', limit: 20, offset: 20 })

Memory Management

// Close stores when done to free memory
await store.close()

// For Durable Objects, checkpoint before hibernation
export class MySearchDO {
  async alarm() {
    await this.store.checkpoint()
  }
}

API Reference

OramaBlobStore

| Method | Description | |--------|-------------| | init() | Initialize database and index | | insert(doc) | Insert a document | | insertMany(docs) | Insert multiple documents | | update(id, partial) | Update a document | | delete(id) | Delete a document | | search(options) | Search documents | | get(id) | Get document by ID | | getAll() | Get all documents | | getStats() | Get index statistics | | checkpoint(options?) | Save index to PGlite | | restore(options?) | Restore index from PGlite | | close() | Close connections |

OramaSyncStore

Same as OramaBlobStore plus:

| Method | Description | |--------|-------------| | rebuildIndex() | Rebuild index from PGlite data | | verify() | Verify index/database consistency |

OramaSqlStore

Same as OramaBlobStore plus:

| Method | Description | |--------|-------------| | executeSearch(term, options?) | Execute search and return ID for SQL joins | | searchWithSql(term, options?) | Search and join in one call | | query(sql, params?) | Execute raw SQL query | | registerFunction(name, fn) | Register custom SQL function |

SearchRpcClient

| Method | Description | |--------|-------------| | search(options) | Search documents | | index(doc) | Index a document | | indexMany(docs) | Index multiple documents | | get(id) | Get document by ID | | delete(id) | Delete a document | | update(id, partial) | Update a document | | stats() | Get index statistics | | createIndex(config) | Create new index | | getAll() | Get all documents | | batch() | Create batch for pipelining | | close() | Close WebSocket connection |

Troubleshooting

"OramaStore not initialized"

Always call init() before using the store:

const store = new OramaBlobStore({ schema })
await store.init() // Required!
await store.search({ term: 'query' })

"No checkpoint found for index"

The index hasn't been checkpointed yet. Use failSilently option:

await store.restore({ failSilently: true })

Search returns no results

  1. Check if documents are indexed:
const stats = await store.getStats()
console.log('Document count:', stats.documentCount)
  1. Verify schema matches document structure:
// Schema must include all searchable fields
const store = new OramaBlobStore({
  schema: {
    title: 'string',  // Must match document property names
    content: 'string',
  }
})
  1. Check search term:
// Orama uses word tokenization - single characters may not match
await store.search({ term: 'a' }) // May not work
await store.search({ term: 'apple' }) // Works

WebSocket connection fails

const client = new SearchRpcClient('wss://...', {
  autoReconnect: true,
  maxReconnectAttempts: 5,
  reconnectDelay: 1000,
})

Memory issues in Workers

The Orama index lives in memory. For large indices:

  1. Use pagination instead of large result sets
  2. Consider sharding across multiple Durable Objects
  3. Use the SQL pattern for very large datasets

Related Packages

Links

License

MIT