knowledgebased
v0.7.1
Published
Reusable MCP server for project-local knowledge graphs with semantic search
Downloads
1,568
Maintainers
Readme
knowledgebased
A reusable Model Context Protocol server that provides semantic search and a tag-based knowledge graph for any project. Auto-discovers a knowledge directory from cwd; silently disables when absent.
Written in TypeScript. Uses local sentence-transformer embeddings (Xenova/multilingual-e5-small) — no API keys, no network calls after the first model download.
Features
- 🔍 Semantic search — embedding-based natural language queries (multilingual)
- 🤖 RAG search — tiered results with automatic LLM summarization via MCP sampling
- 🏷️ Tag search with graph traversal — follow
related:links across fragments - 📝 Markdown fragments with YAML frontmatter — human-readable, git-friendly
- 🚀 Zero overhead when unused — exits silently if no knowledge is present
- 🔧 Flexible auto-discovery — co-located, hidden, sibling, or user-global
Quick Start
Install
npm install -g knowledgebased
# or run on demand:
npx -y knowledgebased setupsetup registers the server in ~/.copilot/mcp-config.json (or you can configure any MCP client manually). It will:
- Auto-activate in any project where knowledge is discovered
- Stay disabled (zero overhead) elsewhere
Per-repo install (any MCP client)
Add to your .mcp.json / client config:
{
"mcpServers": {
"knowledge": {
"type": "stdio",
"command": "npx",
"args": ["-y", "knowledgebased"]
}
}
}Knowledge Discovery
The server discovers knowledge from two independent phases, then unions all results.
Given cwd = ~/workspace/my-project/, here is every location the server checks:
~/
├── .knowledgebased.json ← Phase 2: user-global config (always read)
├── notes/ ← Phase 2: external KB (declared in bases)
│ └── *.md
│
└── workspace/
├── my-project.knowledge/ ← Phase 1 ④: sibling folder
│ └── *.md
│
└── my-project/ ← cwd
├── .knowledge.json ← Phase 1 ①: config pointer (highest pri)
├── knowledge/ ← Phase 1 ②: co-located, visible
│ └── *.md
├── .knowledge/ ← Phase 1 ③: co-located, hidden
│ └── *.md
└── src/Phase 1 — project source
Walks up from cwd. At each ancestor directory, tries four patterns in order — first match stops the entire walk:
| Priority | Pattern | Within git root | Beyond git root |
|----------|---------|:-:|:-:|
| ① | .knowledge.json | ✅ | ✅ (explicit intent) |
| ② | knowledge/ | ✅ | ❌ (too generic) |
| ③ | .knowledge/ | ✅ | ❌ (too generic) |
| ④ | ../<project>.knowledge/ | ✅ | ✅ (explicit naming) |
Beyond the git root, only explicitly-intentioned patterns (① config pointer and ④ sibling) are checked. If no git root is found at all, generic patterns are never used — only ① and ④ apply. This prevents accidental matches with unrelated knowledge/ directories outside a project context.
Result: 0 or 1 project source (alias: repo, refs validated against cwd).
Phase 2 — external knowledge bases
Always runs (even if Phase 1 found a project source). Reads ~/.knowledgebased.json and matches cwd against repos entries.
Result: 0–N external sources (alias: base ID, refs unscoped). Both phases are unioned and deduped by canonical directory hash.
User-global config (~/.knowledgebased.json)
Defines named knowledge bases and binds them to repos:
{
"bases": {
"personal": "~/notes",
"team": { "knowledge": "~/team/conventions", "cacheDir": "~/.cache/team" }
},
"repos": {
"*": ["personal"],
"~/workspace/my-project": ["team"]
}
}| Field | Description |
|-------|-------------|
| bases.<id> | A string path (shorthand) or { "knowledge": "...", "cacheDir": "..." }. Paths support ~ expansion. |
| repos."*" | Wildcard — these bases are active in every project. |
| repos.<path> | Array of base IDs to activate when cwd is inside this path. Longest-prefix match wins (segment-boundary, case-insensitive on Windows). |
In the example above:
personalis available everywhere (wildcard"*")teamis only available when working inside~/workspace/my-project- Fragments from external sources are prefixed with their alias:
personal@notes/foo.md
Per-project config (.knowledge.json)
Points to a knowledge directory that lives elsewhere:
{ "knowledge": "../shared-kb", "cacheDir": "./.cache/embeddings" }| Field | Required | Description |
|-------|----------|-------------|
| knowledge | optional | Path to the knowledge directory. Resolved relative to the config file. Defaults to ./knowledge. |
| cacheDir | optional | Override for the embedding cache. Defaults to ~/.cache/knowledgebased/<hash>. |
Validation rules
These conditions cause a loud startup error:
reposreferences a non-existent base ID- Base ID is
"*", or contains@,/, or spaces - Two bases resolve to the same canonical directory
Knowledge Fragments
Markdown files with YAML frontmatter:
---
tags: [workflow, git]
related: [workflow/branch-naming]
source: session/2026-04-21
verified: false
refs: [src/utils.ts::parseArgs]
---
# Fragment Title
Content goes here...MCP Tools
| Tool | Description |
|------|-------------|
| search_knowledge | Tag-based search with graph traversal |
| search_semantic | Embedding-based semantic search with similarity scores |
| search_rag | Semantic search with automatic LLM summarization via MCP sampling |
| list_tags | List all tags with counts |
| list_sources | List loaded knowledge sources |
| add_knowledge | Create a new fragment |
| update_knowledge | Update an existing fragment |
| delete_knowledge | Delete a fragment permanently |
| audit_knowledge | Validate refs and related links |
| reload_sources | Re-discover sources from config |
Which search tool to use?
User question
│
├─ "What topics does the KB cover?" → search_semantic (explore)
│ Low threshold, scan fragment titles and scores.
│
├─ "How does X work?" → search_rag (answer)
│ Returns concise summary + references.
│ If key details are missing, follow up with search_knowledge.
│
└─ "Give me everything about Y" → search_knowledge (enumerate)
tags=["Y"], returns full unabridged content.search_rag — RAG-style search
search_rag combines semantic search with MCP client sampling to deliver concise, query-aware results. Results are split into tiers:
| Tier | Score | Behavior |
|------|-------|----------|
| direct | ≥ directThreshold (0.85) | Full content returned verbatim |
| related | One-hop graph neighbors of direct hits | Summarized via LLM sampling |
| summarized | ≥ threshold (0.80), < directThreshold | Summarized via LLM sampling |
Every response includes a references table listing all used fragments with their similarity score, tier, and reason for inclusion.
When the MCP client doesn't support sampling, summarized/related fragments fall back to metadata-only output (title, tags, and a content preview).
Parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| query | — | Natural language search query |
| threshold | 0.80 | Minimum similarity score for inclusion |
| directThreshold | 0.85 | Score above which fragments are returned verbatim |
| maxTokens | 500 | Max tokens for the LLM summary |
CLI Commands
knowledgebased setup # Register globally in ~/.copilot/mcp-config.json
knowledgebased init # Create knowledge/ in cwd
knowledgebased init --knowledge ../other/kb # Create .knowledge.json pointing elsewhereDevelopment
npm install
npm run build # compile TS → dist/
npm test # run unit tests via node:test + tsx
npm start # run from compiled output
npm run watch # incremental rebuildLicense
MIT
