@nzpr/kb

v0.1.30

Published

a month ago

Knowledge base CLI for querying and curating agent knowledge.

0High
0Medium
0Low

nzpr

knowledge-base cli postgres pgvector llm rag

@nzpr/kb

@nzpr/kb is a CLI for working with a GitHub-backed knowledge base that is synced into a vector database.

A knowledge entry is intentionally minimal:

title
text

The repo is the authority for approved knowledge. The vector database is a published search index built from that repo.

Install

npm install -g @nzpr/kb

Node.js 20+ is required.

Commands

kb init-repo [--interactive] [--layout repo-root|nested-kb] [--dir PATH] [--repo OWNER/REPO]
kb create --title TEXT [--text TEXT|--text-file PATH|--text-stdin] [--path RELATIVE_PATH] [--repo OWNER/REPO] [--yes|--create-anyway] [--json]
kb edit --title TEXT [--text TEXT|--text-file PATH|--text-stdin] [--path RELATIVE_PATH] [--repo OWNER/REPO] [--yes] [--json]
kb delete [--path RELATIVE_PATH] [--title TEXT] [--text TEXT] [--reason TEXT] [--repo OWNER/REPO] [--yes] [--json]
kb get <doc_id|path|title> [--json]
kb status <doc_id|path|title> [--repo OWNER/REPO] [--json]
kb search <query> [--json]
kb ask <question>
kb list [--json] [--with-proposals]
kb catalog [--json]
kb publish [--docs-root PATH] [--knowledge-root PATH]
kb doctor

Run kb <command> --help for exact runtime requirements.

Core Model

There are three separate surfaces:

GitHub repo: the source of truth for approved knowledge documents
GitHub issues and workflows: the proposal, review, approval, and materialization pipeline
Vector database: the published retrieval index used by kb search and kb ask

That separation matters:

creators do not write to the default branch directly
reviewers approve knowledge on the issue
GitHub Actions writes the approved Markdown into the repo
kb publish syncs the repo state into the vector DB

Knowledge Format

Knowledge is stored as Markdown in the knowledge repo under docs/ or kb/docs/.

Each document should just be:

# Title

Text that should be retrieved by search.

During publish, the title and body are extracted from the Markdown document and written into the database as one searchable document row.

Document paths are always relative to the docs root.

use standards/example.md
do not use docs/standards/example.md

Read Workflow

Most agents and applications only need read access to the published KB.

export KB_DATABASE_URL=postgresql://USER:PASSWORD@HOST:5432/DB

kb ask "How do we manage knowledge?"
kb search "knowledge approval"
kb get how-we-manage-knowledge
kb status how-we-manage-knowledge
kb list --with-proposals

What the read side does:

kb ask returns the best matching document text when there is a strong match
kb ask prints a short no-match response when nothing relevant is in the DB
kb search shows the closest matching documents and their paths
kb get prints the full live document body from the DB
kb status shows the live document and any open GitHub proposal for the same entry
kb list --with-proposals shows live entries and open KB issues together
readers do not need GitHub access

For automation, prefer --json on search, get, status, and list.

When to use each command:

kb ask when you want the best answer first and do not care which entry it comes from
kb search when you want to inspect several possible matches
kb get when you already know the entry you want and need the full body
kb status when you need to know whether the live DB entry has an open proposal in GitHub

Proposal Workflow

Use the write commands only when the knowledge itself needs to change:

kb create: propose a new entry
kb edit: revise an existing entry
kb delete: remove an obsolete entry

export KB_GITHUB_REPO=owner/repo
export GITHUB_TOKEN=...

kb create \
  --title "How we manage knowledge" \
  --text-file ./knowledge.txt

What kb create does:

works against the remote GitHub knowledge repo
inspects existing remote documents
computes semantic and lexical similarity against existing entries
shows close matches
lets you choose create, edit, or cancel when run interactively
supports non-interactive runs with --create-anyway or --yes
supports large bodies through --text-file and --text-stdin
returns structured machine-readable output with --json

What this prevents:

duplicate entries for the same topic
creating a second knowledge document when an update to an existing one is the right move

Idempotence:

if the same open proposal already exists, kb create returns that issue instead of opening another one
transient GitHub 5xx and 429 failures are retried automatically with backoff

If you already know the target path for the new document, pass --path. That path is relative to the docs root, not repo root.

Edit Workflow

Use kb edit when you are not satisfied with existing knowledge and want to revise it.

export KB_GITHUB_REPO=owner/repo
export GITHUB_TOKEN=...

kb edit \
  --title "Updated knowledge title" \
  --text-stdin < ./revised-knowledge.txt

What kb edit does:

works against the remote GitHub knowledge repo
finds the best existing matching document semantically
lets you confirm which existing document should be edited
supports fully scripted runs with --yes
reopens the prior proposal issue for that exact document path
rewrites the issue body with the new title and text
removes kb-approved so approval must be explicit and fresh
returns JSON with --json

If you already know the exact document to revise, pass --path.

What this prevents:

editing the wrong document by accident
silently creating a second document instead of revising the real one
preserving stale approval after the content has changed

Idempotence:

if the same open edit proposal already exists, kb edit returns it
if a different proposal is already open for that path, kb edit fails instead of creating ambiguity

Delete Workflow

Use kb delete when an entry should be removed entirely.

kb delete \
  --title "How we do CI/CD" \
  --text "deploy rollback health checks"

Or, if you already know the exact document:

kb delete --path entries/how-we-do-ci-cd.md --reason "Superseded by service-specific knowledge."

That path is also relative to the docs root.

What kb delete does:

works against the remote GitHub knowledge repo
finds the best existing matching document, or uses --path exactly
opens or reopens a delete proposal issue for that document path
marks the GitHub issue title as kb: [DELETE] ...
keeps approval manual through kb-approved
supports --yes for scripted target selection
returns JSON with --json

Delete is repo-first:

the PR removes the Markdown document from the knowledge repo
the next publish removes that document from the vector DB

If one document needs to become two, do it as a composition:

create the first new entry with kb create
create the second new entry with kb create
remove the original entry with kb delete

That keeps the model simple:

add knowledge with kb create
revise knowledge with kb edit
remove knowledge with kb delete

Duplicate Handling And State Visibility

Duplicate protection is part of the authoring flow.

kb create:

looks for semantically similar existing knowledge
shows likely matches before creating anything
lets you cancel or switch to editing one of those entries
can be forced non-interactively with --create-anyway or --yes

kb edit:

looks for the best existing matching knowledge
asks you to confirm the edit target when more than one plausible match exists
is anchored to the existing document path once selected
fails if a different proposal is already open for that path

kb delete:

can resolve the target semantically when you do not pass --path
asks you to confirm which existing document is being removed

Approval is always path-specific:

the issue body stores ### Relative Path
the workflow materializes exactly that document path
the publish step syncs the repo state that exists after merge

State inspection commands:

kb get <ref> prints the full live entry body from the DB
kb status <ref> shows the live entry plus any open proposal issue for the same document path
kb list --with-proposals shows the whole live set together with pending KB issues
each of those commands supports --json for agent use

Review and Approval Workflow

Review happens on GitHub issues.

The standard review loop is:

creator runs kb create, kb edit, or kb delete
KB proposal issue is created or reopened
reviewer edits the issue if needed
reviewer manually adds kb-approved
GitHub Actions materializes the approved file changes as a PR
the PR auto-merges
publish runs and syncs the repo into the vector DB

Important rule:

approval is manual and explicit through the kb-approved label
reopened edit and delete issues do not keep approval; kb-approved must be added again

GitHub Actions Workflows

kb init-repo scaffolds two workflows into the knowledge repo:

`kb-issue-to-pr`

Triggered on:

issue labeled

Behavior:

runs only when the added label is kb-approved
converts the approved issue into the exact Markdown file changes
creates a PR with those file changes
auto-merges the PR
refreshes main and runs kb publish in the same workflow so the DB updates even though the merge was performed by GitHub Actions
comments back on the issue with the PR number

`kb-publish`

Triggered on:

push to main that changes docs/** or workflow files
manual workflow_dispatch

Behavior:

checks out the repo
installs @nzpr/kb
runs kb publish
writes the live repo state into the vector DB

Issue creation and publishing are intentionally separate:

authoring commands talk to GitHub
publish talks to the DB
merged repo state is what gets indexed

Vector Database and Publish

kb publish is the sync step from repo to retrieval index.

export KB_DATABASE_URL=postgresql://USER:PASSWORD@HOST:5432/DB

kb publish --docs-root ./docs

What publish does:

reads every Markdown doc under the docs root
extracts title and body
computes embeddings for each document
upserts documents into the database
removes database rows for docs that no longer exist in the repo

The result is:

repo state and database state stay aligned
the vector DB reflects what is actually approved and merged

Semantic Search

kb search and kb ask query the vector database, not GitHub.

export KB_DATABASE_URL=postgresql://USER:PASSWORD@HOST:5432/DB

kb search "deployment rollback rule"
kb ask "How do we approve a knowledge proposal?"

How retrieval works:

lexical search over document text
semantic search over stored embeddings
score merge of lexical and semantic signals
optional cross-encoder reranking over the top retrieved candidates
weak matches are filtered out

kb ask currently returns the best retrieved guidance with sources. It is a retrieval wrapper, not a generative answer engine.

Embeddings

Authoring commands and publish use the same embedding runtime configuration when semantic matching is needed.

Supported modes:

local-hash
bge-m3-openai

Example remote embedding config:

export KB_EMBEDDING_MODE=bge-m3-openai
export KB_EMBEDDING_API_URL=https://embeddings.example.com/v1/embeddings
export KB_EMBEDDING_MODEL=BAAI/bge-m3
export KB_EMBEDDING_API_KEY=...

Use local-hash for cheap local matching. Use bge-m3-openai for higher quality semantic matching and search.

Optional Reranking

If your first-stage retrieval is decent but the ordering is still weak, enable reranking.

Current integration:

stage 1: existing hybrid retrieval from PostgreSQL full-text search plus pgvector
stage 2: optional reranking of the top retrieved candidates through a cross-encoder endpoint
storage and publish flow do not change

Expected runtime:

export KB_RERANKER_MODE=tei-rerank
export KB_RERANKER_API_URL=https://your-reranker-host/rerank
export KB_RERANKER_MODEL=BAAI/bge-reranker-v2-m3
export KB_RERANKER_API_KEY=...
export KB_RERANKER_TOP_K=10

This is intended for kb search and kb ask.

Consumer Workflow

For users or agents consuming knowledge, the common flow is:

export KB_DATABASE_URL=postgresql://USER:PASSWORD@HOST:5432/DB

kb search "deployment rule"
kb ask "How do we deploy safely?"
kb get deployment-safety
kb status deployment-safety
kb list
kb list --with-proposals
kb catalog --json
kb doctor

That is the normal path. Most usage should be kb search, kb ask, kb get, and kb status, not kb create or kb edit.

Repo Bootstrap

Use kb init-repo to scaffold a knowledge repo and configure the GitHub automation.

kb init-repo --interactive

It creates:

issue template for KB proposals
kb-issue-to-pr workflow
kb-publish workflow
docs directory scaffold

For remote bootstrap, kb init-repo --repo ... needs a GitHub token with repository admin access.

Notes

kb publish does not need GitHub credentials.
kb create and kb edit are GitHub-first authoring commands.
The repo is the approved truth; the vector DB is the published index.
If the repo and DB ever diverge, rerun publish from the repo state.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@nzpr/kb

Install

Commands

Core Model

Knowledge Format

Read Workflow

Proposal Workflow

Edit Workflow

Delete Workflow

Duplicate Handling And State Visibility

Review and Approval Workflow

GitHub Actions Workflows

kb-issue-to-pr

kb-publish

Vector Database and Publish

Semantic Search

Embeddings

Optional Reranking

Consumer Workflow

Repo Bootstrap

Notes

`kb-issue-to-pr`

`kb-publish`