npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@voidwire/corpus

v0.4.0

Published

Personal knowledge pipeline — extracts signals from sources, deduplicates, and serves hybrid (FTS5 + vector) search over a local SQLite store

Downloads

280

Readme

@voidwire/corpus

Personal knowledge pipeline. Extracts signals from sources, deduplicates, and serves hybrid (FTS5 + vector) search over a local SQLite store.

What it is

Corpus collects signals from project artifacts (transcripts, operator logs, session state, READMEs, commits, notes) into a holding tank, runs LLM-based consolidation (dedup + synthesis), and writes the result to a local SQLite database with FTS5 full-text search and sqlite-vec vector embeddings.

The library exposes a single entry point: a CorpusApi factory that opens the corpus DB and provides hybridSearch (FTS5 BM25 + vector cosine fused), plus the universal ingestion gate insertCorpusEntry.

A corpus CLI ships alongside the library for batch indexing, search, health checks, and one-shot migrations.

Install

bun add @voidwire/corpus

Bun >= 1.0.0 required. Corpus uses bun:sqlite and is not Node-compatible without compilation changes.

Public API

The package exports exactly five symbols from @voidwire/corpus:

createCorpusDb(dbPath: string): CorpusApi

Factory that opens (or creates) the corpus SQLite database at dbPath, runs migrations, loads the sqlite-vec extension, and returns a CorpusApi instance. Lazy: the database file is not touched until the first method call on the returned api.

getCorpusPath(): string

Returns the configured corpus DB path from ~/.config/corpus/config.toml. Lazy — getConfig() fires only when this function is called, never at module evaluation. Use as the argument to createCorpusDb for consumers that want to honour the user's configured path.

interface CorpusApi

Methods (selected): getDb(), insertCorpusEntry(entry, embedding, opts?), hybridSearch(query, options?), plus query helpers (queryByProject, queryByAgentType, etc.) and a few internal helpers used by the consolidation engine.

interface HybridSearchOptions

{ limit?, source?, type?, project?, sourceFilter?, sourceExclude?, includeSuperseded? } — passed to hybridSearch. sourceFilter whitelists sources; sourceExclude blacklists them (exclusion wins on conflict).

interface HybridSearchResult

{ rowid, source, title, content, topic, type, timestamp, score, vectorScore, textScore } — one row per match, fused score in [0, 1].

Quick start

import { createCorpusDb, getCorpusPath } from "@voidwire/corpus";

const corpus = createCorpusDb(getCorpusPath());
const results = await corpus.hybridSearch("session state worker hook", {
  limit: 10,
});
for (const r of results) {
  console.log(`${r.score.toFixed(3)}  ${r.source}  ${r.title}`);
}

Configuration

Corpus reads its config from ~/.config/corpus/config.toml. Required keys (validated at first getConfig() call — missing keys throw):

  • custom_sqlite — path to the SQLite library that supports loadable extensions
  • llm_service — service identifier consumed by @voidwire/llm-core
  • transcripts_dir — Claude Code transcript root
  • projects_dir — root of project directories scraped for state-worker signals
  • assistant_sessions_dir — session JSONL root
  • assistant_operators_dir — operator log root
  • pipeline_log_path — absolute path for pipeline append-log
  • corpus_path — absolute path to corpus.db
  • tank_path — absolute path to the holding-tank DB

Optional indexer paths and embed-provider overrides are also read from this file. Run corpus init to write a starter config.

CLI

The package installs a corpus binary:

  • corpus init [--dry-run] [--force] — bootstrap config.toml and XDG dirs
  • corpus index [source] [--rebuild] — run batch indexers. corpus index personal is idempotent: re-runs skip already-indexed entries (matched by a stable input_hash) and retire entries removed from source. Add --refresh-enrichment (personal/all) to clear all personal rows and re-index with fresh LLM enrichment — run this once after upgrading to the idempotent indexer to reset pre-existing personal rows that predate input_hash.
  • corpus search "<query>" [--source X] [--project Y] [--type Z] [--limit N] [--json] — hybrid search
  • corpus doctor — read-only health check (config, DB, sqlite-vec, embed, LLM, plugins)
  • corpus stats [--growth] [--starvation] [--json] — corpus composition + optional views
  • corpus aging [--dry-run] [--verbose] — type-aware nightly aging (transition rows to confidence=stale / kind=archived); pinned rows are protected
  • corpus delete --source <name> [--confirm] — cascade-delete every row of a source across all tables in one transaction; dry-run by default, --confirm mutates and writes an audit JSONL to ~/.local/state/corpus/audit/
  • corpus migrate-from-lore [--dry-run] [--resume] [--lore-db <path>] — one-shot migration from a legacy lore.db

Every subcommand accepts --help / -h for usage; corpus --help lists all subcommands.

launchd schedules (operator install)

Two launchd plists ship in the repo. Install per-user with:

cp info.voidwire.corpus.pipeline.plist ~/Library/LaunchAgents/
cp info.voidwire.corpus.aging.plist    ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/info.voidwire.corpus.pipeline.plist  # daily 03:00
launchctl load ~/Library/LaunchAgents/info.voidwire.corpus.aging.plist     # daily 04:00

Exit codes: 0 success, 1 usage error, 2 runtime error.

Runtime requirement

Bun >= 1.0.0. The package ships TypeScript source (.ts) — Bun resolves it directly. There is no Node build target.

License

MIT