harper-knowledge
v0.2.0
Published
Knowledge base plugin for Harper with MCP server integration
Downloads
586
Maintainers
Readme
harper-knowledge
Knowledge base for Harper, built on Harper, with MCP server integration.
A Harper sub-component plugin that provides searchable, scoped knowledge entries with vector embeddings for semantic search. Exposes a REST API, MCP endpoint, and web UI.
Consumers
- Support team — finding solutions, patterns, gotchas, customer edge cases
- DX lab "Harper expert" — backing knowledge for the AI expert role in Gas Town labs
- Claude Code / IDE assistants — Harper context via MCP without per-project CLAUDE.md files
- Any MCP client — Cursor, VS Code + Copilot, JetBrains, ChatGPT, Gemini, etc.
Quick Start
Prerequisites
- Harper >= 4.7.0
- Node.js >= 22
Install
npm install harper-knowledgeConfigure
Add to your parent application's config.yaml:
"harper-knowledge":
package: "harper-knowledge"
embeddingModel: nomic-embed-text # defaultDownload the Embedding Model
The plugin uses nomic-embed-text for vector embeddings, run locally via node-llama-cpp. The model is downloaded to ~/hdb/models/ on first startup, but you can pre-download it:
# Download the default model
npm run model:download
# Download and verify with a test embedding
npm run model:testRun
harperdb dev .Embedding Models
Two Nomic embedding models are supported, both run entirely on CPU with no cloud dependency via node-llama-cpp. On first plugin startup (or npm run model:download), the configured model is downloaded from Hugging Face to ~/hdb/models/. A file lock prevents multiple Harper worker threads from downloading simultaneously.
nomic-embed-text v1.5 (default)
| | |
| ---------------- | ------------------------------------------------------------------------------------------------- |
| Model | nomic-ai/nomic-embed-text-v1.5-GGUF |
| Config key | nomic-embed-text |
| Parameters | 137M |
| Dimensions | 768 |
| Quantization | Q4_K_M (~135 MB) |
| Context | 8192 tokens |
| License | Apache 2.0 |
nomic-embed-text v2 MoE
| | |
| ---------------- | ----------------------------------------------------------------------------------------------------- |
| Model | nomic-ai/nomic-embed-text-v2-moe-GGUF |
| Config key | nomic-embed-text-v2-moe |
| Parameters | 475M (Mixture of Experts) |
| Dimensions | 768 |
| Quantization | Q4_K_M |
| Context | 8192 tokens |
| License | Apache 2.0 |
The v2 MoE model is larger but produces higher-quality embeddings, especially for longer and more nuanced content.
Switching models
"harper-knowledge":
package: "harper-knowledge"
embeddingModel: nomic-embed-text # v1.5 (default)
# embeddingModel: nomic-embed-text-v2-moe # v2 MoEArchitecture
harper-knowledge
├── src/
│ ├── index.ts ← plugin entry: handleApplication()
│ ├── core/ ← shared logic
│ │ ├── embeddings.ts ← model download, init, vector generation
│ │ ├── entries.ts ← CRUD + relationship management
│ │ ├── history.ts ← edit history audit log
│ │ ├── search.ts ← keyword / semantic / hybrid search
│ │ ├── tags.ts ← tag registry with counts
│ │ └── triage.ts ← webhook intake queue
│ ├── resources/ ← REST Resource classes
│ │ ├── KnowledgeEntryResource.ts
│ │ ├── TriageResource.ts
│ │ ├── TagResource.ts
│ │ ├── QueryLogResource.ts
│ │ ├── ServiceKeyResource.ts
│ │ └── HistoryResource.ts
│ ├── mcp/ ← MCP server (Streamable HTTP)
│ │ ├── server.ts
│ │ └── tools.ts
│ ├── oauth/ ← OAuth 2.1 authorization server
│ │ ├── authorize.ts
│ │ ├── keys.ts
│ │ ├── metadata.ts
│ │ ├── middleware.ts
│ │ ├── register.ts
│ │ ├── token.ts
│ │ └── validate.ts
│ ├── webhooks/ ← webhook intake (GitHub, Datadog)
│ │ ├── middleware.ts
│ │ ├── github.ts
│ │ └── datadog.ts
│ └── types.ts
├── schema/
│ └── knowledge.graphql ← table definitions (database: "kb")
├── web/ ← static web UI
├── scripts/
│ └── download-model.js ← standalone model download/test
├── config.yaml
├── package.json
└── test/Both REST and MCP run in the Harper process, both call the same core functions with zero overhead.
REST API
| Endpoint | Method | Auth | Description |
| ----------------------- | --------------- | ---------- | ------------------------- |
| /Knowledge/<id> | GET | Public | Get entry by ID |
| /Knowledge/?query=... | GET | Public | Search entries |
| /Knowledge/ | POST | Required | Create entry |
| /Knowledge/<id> | PUT | Required | Update entry |
| /Knowledge/<id> | DELETE | Team | Deprecate entry |
| /KnowledgeTag/ | GET | Public | List all tags |
| /Triage/ | GET | Team | List pending triage items |
| /Triage/ | POST | Service/AI | Submit triage item |
| /Triage/<id> | PUT | Team | Process triage item |
| /QueryLog/ | GET | Team | Search analytics |
| /ServiceKey/ | GET/POST/DELETE | Team | API key management |
| /History/<entryId> | GET | Public | Edit history for an entry |
Search Parameters
GET /Knowledge/?query=MQTT+auth&tags=mqtt,config&limit=10&mode=keyword&context={"harper":"5.0","storageEngine":"lmdb"}query— search text (required)tags— comma-separated tag filterlimit— max results (default 10)mode—keyword,semantic, orhybrid(default)context— JSON applicability context for result boosting
MCP Endpoint
Connect any MCP-compatible client to /mcp:
{
"mcpServers": {
"harper-kb": {
"url": "https://kb.harper.fast:9926/mcp"
}
}
}Tools
| Tool | Description |
| --------------------- | ----------------------------------------------------------------- |
| knowledge_search | Search with keyword/semantic/hybrid modes + applicability context |
| knowledge_add | Add a new entry (auto-tagged ai-generated) |
| knowledge_get | Get entry by ID with full relationship chain |
| knowledge_update | Update an entry with edit history tracking |
| knowledge_related | Find related entries (explicit + semantic similarity) |
| knowledge_list_tags | List all tags with counts |
| knowledge_triage | Submit to triage queue for review |
| knowledge_history | Get edit history for an entry (who changed what, when, why) |
Schema
Tables in the kb database:
- KnowledgeEntry — core entries with HNSW vector index,
@relationshipdirectives for supersession/siblings/related,@createdTime/@updatedTime - KnowledgeEntryEdit — append-only edit history audit log
- TriageItem — webhook intake queue (7-day TTL)
- KnowledgeTag — tag name as primary key with entry counts
- QueryLog — search analytics (30-day TTL)
- ServiceKey — API keys with scrypt-hashed secrets
- OAuthClient — dynamic client registrations (RFC 7591)
- OAuthCode — authorization codes (5-minute TTL)
- OAuthRefreshToken — refresh tokens (30-day TTL)
- OAuthSigningKey — RSA key pair for JWT signing
Applicability Scoping
Entries carry an appliesTo scope:
{
"harper": ">=4.0 <5.0",
"storageEngine": "lmdb",
"node": ">=22",
"platform": "linux"
}Search results are boosted or demoted (never hidden) based on the caller's context.
Entry Relationships
- Supersedes — "This replaces that for newer versions"
- Siblings — "Same topic, different config" (e.g., LMDB vs RocksDB behavior)
- Related — loose "see also" association
Auth Model
| Role | Read | Write | Review | Manage |
| ----------------- | ---- | ---------------------------- | ------ | ------ |
| team | Yes | Yes | Yes | Yes |
| ai_agent | Yes | Yes (flagged ai-generated) | No | No |
| service_account | Yes | Triage queue only | No | No |
MCP uses OAuth 2.1 with PKCE for authentication. MCP clients discover auth requirements via /.well-known/oauth-protected-resource, register dynamically, and authenticate through a browser-based login flow (GitHub OAuth primary, Harper credentials fallback). The web UI uses GitHub OAuth via @harperfast/oauth with Harper credentials as fallback.
Development
# Build
npm run build
# Run tests (202 tests)
npm test
# Test with coverage
npm run test:coverage
# Download embedding model
npm run model:download
# Download + verify embedding model
npm run model:test
# Watch mode
npm run devTesting
Tests use Node.js built-in test runner (node:test) with mock Harper globals (in-memory tables). Tests run against compiled output in dist/.
npm testLicense
MIT
