harper-knowledge

v0.2.0

Published

2 months ago

Knowledge base plugin for Harper with MCP server integration

0High
0Medium
0Low

heskew

harper harperdb plugin knowledge-base mcp ai embeddings vector-search

harper-knowledge

Knowledge base for Harper, built on Harper, with MCP server integration.

A Harper sub-component plugin that provides searchable, scoped knowledge entries with vector embeddings for semantic search. Exposes a REST API, MCP endpoint, and web UI.

Consumers

Support team — finding solutions, patterns, gotchas, customer edge cases
DX lab "Harper expert" — backing knowledge for the AI expert role in Gas Town labs
Claude Code / IDE assistants — Harper context via MCP without per-project CLAUDE.md files
Any MCP client — Cursor, VS Code + Copilot, JetBrains, ChatGPT, Gemini, etc.

Quick Start

Prerequisites

Harper >= 4.7.0
Node.js >= 22

Install

npm install harper-knowledge

Configure

Add to your parent application's config.yaml:

"harper-knowledge":
  package: "harper-knowledge"
  embeddingModel: nomic-embed-text # default

Download the Embedding Model

The plugin uses nomic-embed-text for vector embeddings, run locally via node-llama-cpp. The model is downloaded to ~/hdb/models/ on first startup, but you can pre-download it:

# Download the default model
npm run model:download

# Download and verify with a test embedding
npm run model:test

Run

harperdb dev .

Embedding Models

Two Nomic embedding models are supported, both run entirely on CPU with no cloud dependency via node-llama-cpp. On first plugin startup (or npm run model:download), the configured model is downloaded from Hugging Face to ~/hdb/models/. A file lock prevents multiple Harper worker threads from downloading simultaneously.

nomic-embed-text v1.5 (default)

| | | | ---------------- | ------------------------------------------------------------------------------------------------- | | Model | nomic-ai/nomic-embed-text-v1.5-GGUF | | Config key | nomic-embed-text | | Parameters | 137M | | Dimensions | 768 | | Quantization | Q4_K_M (~135 MB) | | Context | 8192 tokens | | License | Apache 2.0 |

nomic-embed-text v2 MoE

| | | | ---------------- | ----------------------------------------------------------------------------------------------------- | | Model | nomic-ai/nomic-embed-text-v2-moe-GGUF | | Config key | nomic-embed-text-v2-moe | | Parameters | 475M (Mixture of Experts) | | Dimensions | 768 | | Quantization | Q4_K_M | | Context | 8192 tokens | | License | Apache 2.0 |

The v2 MoE model is larger but produces higher-quality embeddings, especially for longer and more nuanced content.

Switching models

"harper-knowledge":
  package: "harper-knowledge"
  embeddingModel: nomic-embed-text # v1.5 (default)
  # embeddingModel: nomic-embed-text-v2-moe # v2 MoE

Architecture

harper-knowledge
├── src/
│   ├── index.ts              ← plugin entry: handleApplication()
│   ├── core/                 ← shared logic
│   │   ├── embeddings.ts     ← model download, init, vector generation
│   │   ├── entries.ts        ← CRUD + relationship management
│   │   ├── history.ts        ← edit history audit log
│   │   ├── search.ts         ← keyword / semantic / hybrid search
│   │   ├── tags.ts           ← tag registry with counts
│   │   └── triage.ts         ← webhook intake queue
│   ├── resources/            ← REST Resource classes
│   │   ├── KnowledgeEntryResource.ts
│   │   ├── TriageResource.ts
│   │   ├── TagResource.ts
│   │   ├── QueryLogResource.ts
│   │   ├── ServiceKeyResource.ts
│   │   └── HistoryResource.ts
│   ├── mcp/                  ← MCP server (Streamable HTTP)
│   │   ├── server.ts
│   │   └── tools.ts
│   ├── oauth/                ← OAuth 2.1 authorization server
│   │   ├── authorize.ts
│   │   ├── keys.ts
│   │   ├── metadata.ts
│   │   ├── middleware.ts
│   │   ├── register.ts
│   │   ├── token.ts
│   │   └── validate.ts
│   ├── webhooks/             ← webhook intake (GitHub, Datadog)
│   │   ├── middleware.ts
│   │   ├── github.ts
│   │   └── datadog.ts
│   └── types.ts
├── schema/
│   └── knowledge.graphql     ← table definitions (database: "kb")
├── web/                      ← static web UI
├── scripts/
│   └── download-model.js     ← standalone model download/test
├── config.yaml
├── package.json
└── test/

Both REST and MCP run in the Harper process, both call the same core functions with zero overhead.

REST API

| Endpoint | Method | Auth | Description | | ----------------------- | --------------- | ---------- | ------------------------- | | /Knowledge/<id> | GET | Public | Get entry by ID | | /Knowledge/?query=... | GET | Public | Search entries | | /Knowledge/ | POST | Required | Create entry | | /Knowledge/<id> | PUT | Required | Update entry | | /Knowledge/<id> | DELETE | Team | Deprecate entry | | /KnowledgeTag/ | GET | Public | List all tags | | /Triage/ | GET | Team | List pending triage items | | /Triage/ | POST | Service/AI | Submit triage item | | /Triage/<id> | PUT | Team | Process triage item | | /QueryLog/ | GET | Team | Search analytics | | /ServiceKey/ | GET/POST/DELETE | Team | API key management | | /History/<entryId> | GET | Public | Edit history for an entry |

Search Parameters

GET /Knowledge/?query=MQTT+auth&tags=mqtt,config&limit=10&mode=keyword&context={"harper":"5.0","storageEngine":"lmdb"}

query — search text (required)
tags — comma-separated tag filter
limit — max results (default 10)
mode — keyword, semantic, or hybrid (default)
context — JSON applicability context for result boosting

MCP Endpoint

Connect any MCP-compatible client to /mcp:

{
  "mcpServers": {
    "harper-kb": {
      "url": "https://kb.harper.fast:9926/mcp"
    }
  }
}

Tools

| Tool | Description | | --------------------- | ----------------------------------------------------------------- | | knowledge_search | Search with keyword/semantic/hybrid modes + applicability context | | knowledge_add | Add a new entry (auto-tagged ai-generated) | | knowledge_get | Get entry by ID with full relationship chain | | knowledge_update | Update an entry with edit history tracking | | knowledge_related | Find related entries (explicit + semantic similarity) | | knowledge_list_tags | List all tags with counts | | knowledge_triage | Submit to triage queue for review | | knowledge_history | Get edit history for an entry (who changed what, when, why) |

Schema

Tables in the kb database:

KnowledgeEntry — core entries with HNSW vector index, @relationship directives for supersession/siblings/related, @createdTime/@updatedTime
KnowledgeEntryEdit — append-only edit history audit log
TriageItem — webhook intake queue (7-day TTL)
KnowledgeTag — tag name as primary key with entry counts
QueryLog — search analytics (30-day TTL)
ServiceKey — API keys with scrypt-hashed secrets
OAuthClient — dynamic client registrations (RFC 7591)
OAuthCode — authorization codes (5-minute TTL)
OAuthRefreshToken — refresh tokens (30-day TTL)
OAuthSigningKey — RSA key pair for JWT signing

Applicability Scoping

Entries carry an appliesTo scope:

{
  "harper": ">=4.0 <5.0",
  "storageEngine": "lmdb",
  "node": ">=22",
  "platform": "linux"
}

Search results are boosted or demoted (never hidden) based on the caller's context.

Entry Relationships

Supersedes — "This replaces that for newer versions"
Siblings — "Same topic, different config" (e.g., LMDB vs RocksDB behavior)
Related — loose "see also" association

Auth Model

| Role | Read | Write | Review | Manage | | ----------------- | ---- | ---------------------------- | ------ | ------ | | team | Yes | Yes | Yes | Yes | | ai_agent | Yes | Yes (flagged ai-generated) | No | No | | service_account | Yes | Triage queue only | No | No |

MCP uses OAuth 2.1 with PKCE for authentication. MCP clients discover auth requirements via /.well-known/oauth-protected-resource, register dynamically, and authenticate through a browser-based login flow (GitHub OAuth primary, Harper credentials fallback). The web UI uses GitHub OAuth via @harperfast/oauth with Harper credentials as fallback.

Development

# Build
npm run build

# Run tests (202 tests)
npm test

# Test with coverage
npm run test:coverage

# Download embedding model
npm run model:download

# Download + verify embedding model
npm run model:test

# Watch mode
npm run dev

Testing

Tests use Node.js built-in test runner (node:test) with mock Harper globals (in-memory tables). Tests run against compiled output in dist/.

npm test

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

harper-knowledge

Consumers

Quick Start

Prerequisites

Install

Configure

Download the Embedding Model

Run

Embedding Models

nomic-embed-text v1.5 (default)

nomic-embed-text v2 MoE

Switching models

Architecture

REST API

Search Parameters

MCP Endpoint

Tools

Schema

Applicability Scoping

Entry Relationships

Auth Model

Development

Testing

License