npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@harrylabs/llm-knowledge-bases

v0.4.3

Published

Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.

Readme

LLM Knowledge Bases

Inspired by a public workflow shared by Andrej Karpathy (@karpathy). From raw text, PDFs, images, and structured data to a living Markdown wiki that compounds with every question.

@harrylabs/llm-knowledge-bases is the deterministic runtime behind that workflow. It ships as:

  • a standalone CLI for directly running the kb_* workflow
  • a stdio MCP server for Claude Code, Codex, Cursor, Gemini CLI, and other MCP-capable agents
  • a config generator for wiring that MCP server into different clients
  • an OpenClaw-compatible host entry for teams that also use OpenClaw

If you want the workflow-first entry point, start with the companion skill. Use this package when you want the underlying runtime as an installable CLI/MCP toolchain.

What 0.4.1 Implements

This release makes the runtime representation-first and explicitly multimodal:

  • a raw/wiki/schema operating model with runtime-owned structure and agent-owned synthesis
  • supported raw kinds for text (.md, .txt), PDFs, images (.png, .jpg, .jpeg, .webp, .gif, .svg), and structured data (.csv, .tsv, .json, .html)
  • manifest schema version 2, including raw_kind, mime_type, size_bytes, asset_refs, and stored representations
  • source-id repair through kb_repair_source_ids, so stale source doc ids, source note paths, and raw hashes can be repaired without throwing away readable existing ids
  • stable non-ASCII source ids plus deterministic repair workflows, so legacy src-untitled-* records are migrated forward instead of being preserved by stale manifest state
  • safe raw-asset inspection through kb_get_raw_asset, including deterministic metadata plus a safe absolute path for local viewers
  • full compile context through kb_prepare_source_bundle, including asset refs, stored representations, and compile_readiness
  • runtime-managed representation storage under .llm-kb/representations/ through kb_prepare_representation, kb_upsert_representation, and kb_read_representations
  • compile-readiness tracking with ready, partial, and needs_representation
  • source note validation that keeps raw_kind, mime_type, and asset_paths aligned with the actual reviewed assets
  • archived output notes plus first-class concept, entity, and synthesis note support
  • deterministic gap mapping and promotion through kb_map_gaps and kb_promote_gap
  • generated wiki/index.md, wiki/log.md, and collection indexes, now with raw-kind labels on source pages
  • deterministic lint for schema and wiki health, including warnings for missing representation trails, stale representations, inconsistent asset_paths, isolated pages, stale source coverage, unsupported claims, contradiction candidates, and missing high-value pages
  • CLI and MCP wrappers around the same runtime contract

Multimodal Ingest Model

The runtime now supports two ingest paths:

  1. Text and structured data can still compile directly from raw/ with kb_prepare_source and kb_read_raw.
  2. PDFs and images use a representation-first path:
    • inspect the asset with kb_get_raw_asset
    • inspect compile readiness with kb_prepare_source_bundle
    • store intermediate OCR, vision, page notes, metadata, or profiles under .llm-kb/representations/
    • compile the final source note only after the representation trail is present

The runtime intentionally does not perform OCR or vision itself. Instead, it gives agents a canonical place to store those intermediate artifacts and then validates that the final wiki pages stay grounded in them.

Default Vault Shape

<vault>/
  raw/
  wiki/
    sources/
    outputs/
    concepts/
    entities/
    syntheses/
    _indexes/
    index.md
    log.md
  .llm-kb/
    manifest.json
    runs.jsonl
    representations/

CLI Commands

The standalone CLI exposes the runtime surface directly:

llm-knowledge-bases kb_status --vault-root /vault
llm-knowledge-bases kb_list_raw --vault-root /vault --changed-only
llm-knowledge-bases kb_read_raw --vault-root /vault --raw-path raw/notes/example.md
llm-knowledge-bases kb_get_raw_asset --vault-root /vault --raw-path raw/papers/report.pdf
llm-knowledge-bases kb_prepare_source --vault-root /vault --raw-path raw/notes/example.md
llm-knowledge-bases kb_prepare_source_bundle --vault-root /vault --raw-path raw/papers/report.pdf
llm-knowledge-bases kb_prepare_representation --vault-root /vault --raw-path raw/papers/report.pdf --kind ocr_text
llm-knowledge-bases kb_upsert_representation --vault-root /vault --raw-path raw/papers/report.pdf --kind ocr_text --content '<markdown>'
llm-knowledge-bases kb_read_representations --vault-root /vault --raw-path raw/papers/report.pdf --kinds metadata,ocr_text
llm-knowledge-bases kb_upsert_source_note --vault-root /vault --raw-path raw/papers/report.pdf --markdown '<full markdown>'
llm-knowledge-bases kb_prepare_output --vault-root /vault --title 'Example Query' --query 'What are the tradeoffs?'
llm-knowledge-bases kb_upsert_output --vault-root /vault --markdown '<full markdown>'
llm-knowledge-bases kb_prepare_derived_note --vault-root /vault --kind concept --title 'Agent Memory'
llm-knowledge-bases kb_upsert_derived_note --vault-root /vault --markdown '<full markdown>'
llm-knowledge-bases kb_map_gaps --vault-root /vault --limit 10
llm-knowledge-bases kb_promote_gap --vault-root /vault --note-id synthesis-retrieval-vs-memory
llm-knowledge-bases kb_repair_source_ids --vault-root /vault
llm-knowledge-bases kb_repair_source_ids --vault-root /vault --apply
llm-knowledge-bases kb_rebuild_indexes --vault-root /vault
llm-knowledge-bases kb_search --vault-root /vault --query 'agent memory' --types source,concept,synthesis
llm-knowledge-bases kb_read_notes --vault-root /vault --paths wiki/index.md,wiki/concepts/concept-agent-memory.md
llm-knowledge-bases kb_lint --vault-root /vault

MCP Tools

The MCP server exposes:

  • kb_status
  • kb_list_raw
  • kb_read_raw
  • kb_get_raw_asset
  • kb_prepare_source
  • kb_prepare_source_bundle
  • kb_prepare_representation
  • kb_upsert_representation
  • kb_read_representations
  • kb_upsert_source_note
  • kb_prepare_output
  • kb_upsert_output
  • kb_prepare_derived_note
  • kb_upsert_derived_note
  • kb_map_gaps
  • kb_promote_gap
  • kb_repair_source_ids
  • kb_rebuild_indexes
  • kb_search
  • kb_read_notes
  • kb_lint

Runtime Philosophy

The runtime owns:

  • canonical paths
  • canonical IDs
  • validation
  • deterministic writes
  • manifest-backed representation tracking
  • generated wiki navigation

The agent owns:

  • summarization
  • OCR, vision, or profiling work performed outside the runtime
  • synthesis
  • deciding whether a result belongs in output, concept, entity, or synthesis
  • improving the wiki over time instead of leaving value trapped in chat

kb_prepare_source_bundle is the bridge between those layers for non-text assets: it returns the exact raw metadata, reviewed asset refs, stored representations, and readiness state the agent needs before compiling a source note. kb_map_gaps and kb_promote_gap still cover durable knowledge growth on top of that ingest layer. kb_lint stays deterministic, but now also checks whether multimodal source notes have a believable review trail before the wiki starts depending on them.

Still Out of Scope

This package still does not implement:

  • embeddings or vector search
  • database-backed indexing
  • rename tracking
  • built-in OCR, vision, or PDF parsing inside the runtime itself
  • autonomous background agents inside the package