@lacneu/openclaw-knowledge
v3.1.2
Published
Multi-source knowledge plugin for OpenClaw — pgvector + LightRAG injection via before_prompt_build hook
Maintainers
Readme
openclaw-knowledge-plugin
Dual-source knowledge injection plugin for OpenClaw Automatically enriches agent prompts with relevant context from your document knowledge base, combining pgvector semantic search and LightRAG knowledge graph in a single hook.
Overview
openclaw-knowledge is an OpenClaw plugin that automatically injects relevant
documents and knowledge graph context into every agent turn. It hooks into
before_prompt_build and queries two complementary sources in parallel:
| Source | Technology | What it provides |
|--------|------------|------------------|
| pgvector | PostgreSQL + pgvector extension | Semantic vector search on document chunks (cosine similarity on 3072-dim embeddings) |
| LightRAG | Neo4j + PostgreSQL | Knowledge graph with entity/relation multi-hop traversal |
Both sources run in parallel via Promise.allSettled, so a failure in one
source doesn't block the other. Results are merged and injected into the agent's
system prompt via appendSystemContext.
Why two sources?
Vector search and knowledge graphs answer different kinds of questions:
- Vector search finds passages that are semantically similar to the query. Good for "What did the meeting say about pricing?" — matches embeddings.
- Knowledge graph finds entities and their relationships. Good for "Which clients work in the insurance sector?" — traverses entity links.
Running both gives the agent both capabilities simultaneously, without requiring the LLM to decide which to use.
Architecture

The plugin is the query layer of a larger knowledge pipeline:
- Ingestion (background, via n8n): Google Drive documents are polled,
OCR'd via Mistral, embedded via Gemini, and stored in PostgreSQL (
pgvector) and Neo4j (LightRAG knowledge graph). - Query (real-time, via this plugin): Every user message triggers a parallel search in both sources, results are formatted and prepended to the agent's prompt.
The plugin does not handle ingestion — that's the responsibility of the n8n ETL pipeline. This plugin only reads from the existing data stores.
Query lifecycle

Every user message triggers the following sequence:
- OpenClaw fires
before_prompt_buildwith the user's prompt - The plugin checks its cooldown state (pauses 5 min after 3 consecutive errors)
- Query text is extracted and validated (≥ 3 characters)
- In parallel (
Promise.allSettled):- pgvector path: embed query via Gemini → SQL search on
knowledge_vectors - LightRAG path: POST
/querywithmode=hybridto the LightRAG server
- pgvector path: embed query via Gemini → SQL search on
- Results are merged and truncated to
maxInjectChars - Formatted blocks (
### Document Search Results+### Knowledge Graph Context) are injected viaappendSystemContext - The agent receives the enriched prompt and generates its response
Decision flow

The plugin implements several safeguards to ensure it never blocks the agent:
| Safeguard | Purpose |
|-----------|---------|
| Cooldown (3 errors → 5 min pause) | Avoid log spam and unnecessary API calls during outages |
| Query length check (≥ 3 chars) | Skip meaningless searches |
| Promise.allSettled for sources | A failure in one source doesn't affect the other |
| Silent error handling | Errors are logged but never thrown to the agent |
| Gracefull degradation | If both sources fail, the agent runs as if the plugin weren't there |
Installation
Requirements
- OpenClaw ≥
v2026.3.7(forbefore_prompt_buildhook) - PostgreSQL with
pgvectorextension - LightRAG server (optional — plugin works with pgvector alone)
- Gemini API key (for query embedding)
Install via OpenClaw CLI (recommended)
The plugin is published on npm as @lacneu/openclaw-knowledge. Use the
official openclaw plugins commands — install, update, list, inspect all
work out of the box:
# Install (pulls the latest version from npm)
openclaw plugins install @lacneu/openclaw-knowledge
# Inspect the installed version and manifest
openclaw plugins inspect @lacneu/openclaw-knowledge
# Update to the latest published version
openclaw plugins update @lacneu/openclaw-knowledge
# List everything installed
openclaw plugins listOpenClaw tracks the install source under plugins.installs in your
configuration, so subsequent update calls know where to fetch new versions
from.
Configuration
Add to your openclaw.json:
{
"plugins": {
"allow": ["openclaw-knowledge", "hindsight-openclaw", "telegram"],
"entries": {
"openclaw-knowledge": {
"enabled": true,
"config": {
"geminiApiKey": "${GEMINI_API_KEY}",
"postgresUrl": "postgresql://user:${POSTGRES_PASSWORD}@postgresql:5432/knowledge",
"collections": ["knowledge_alice"],
"topK": 5,
"scoreThreshold": 0,
"maxInjectChars": 4000,
"lightragUrl": "http://lightrag:9621",
"lightragApiKey": "${LIGHTRAG_API_KEY}",
"lightragQueryMode": "hybrid",
"lightragMaxChars": 4000
}
}
}
}
}Then restart the gateway:
openclaw gateway restartConfiguration reference
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| enabled | boolean | true | Master switch for the plugin |
| pgvector source | | | |
| geminiApiKey | string | — | Gemini API key for query embedding (supports ${ENV_VAR}) |
| postgresUrl | string | — | PostgreSQL connection URL (supports ${ENV_VAR}) |
| collections | string[] | ["knowledge_default"] | Collections to search in knowledge_vectors table |
| topK | number | 5 | Max results per collection |
| scoreThreshold | number | 0.3 | Minimum cosine similarity (0–1) |
| maxInjectChars | number | 4000 | Character budget for pgvector results |
| pgvectorEnabled | boolean | true if geminiApiKey set | Disable pgvector while keeping LightRAG |
| LightRAG source | | | |
| lightragUrl | string | — | LightRAG server base URL |
| lightragApiKey | string | — | LightRAG API key (supports ${ENV_VAR}) |
| lightragQueryMode | string | "hybrid" | Query mode: naive, local, global, hybrid |
| lightragMaxChars | number | 4000 | Character budget for LightRAG context |
| lightragEnabled | boolean | true if lightragUrl set | Disable LightRAG while keeping pgvector |
LightRAG query modes
| Mode | Description | Best for |
|------|-------------|----------|
| naive | Simple vector similarity on chunks | Fast, basic keyword matching |
| local | Entity neighborhood traversal | Questions about a specific entity |
| global | Community summaries | Broad, overview questions |
| hybrid | Combines local + global | Recommended for most cases |
Data model
pgvector: knowledge_vectors table
The plugin expects a PostgreSQL table with this structure:
CREATE TABLE knowledge_vectors (
id SERIAL PRIMARY KEY,
collection TEXT NOT NULL,
file_name TEXT,
mime_type TEXT,
text TEXT,
file_id TEXT,
source TEXT,
owner TEXT,
chunk_index INTEGER,
total_chunks INTEGER,
timestamp_start TEXT,
timestamp_end TEXT,
embedded_at TIMESTAMPTZ,
embedding vector(3072) NOT NULL
);
CREATE INDEX idx_knowledge_vectors_hnsw
ON knowledge_vectors
USING hnsw ((embedding::halfvec(3072)) halfvec_cosine_ops);Important: The HNSW index must use halfvec(3072) because pgvector's HNSW
index has a 2000-dimension limit for the native vector type. halfvec
supports up to 4000 dimensions. The plugin query casts both the column and the
parameter accordingly.
Embeddings
- Model:
gemini-embedding-2-previewvia the native Gemini API - Dimensions: 3072
- Distance metric: cosine similarity
- Query endpoint: the plugin uses the native
embedContentendpoint (not the OpenAI-compatible one), because the native endpoint supports multimodal embedding at ingestion time while still working for text queries.
LightRAG query
The plugin sends a POST request:
POST /query HTTP/1.1
X-API-Key: <lightragApiKey>
Content-Type: application/json
{
"query": "<user message>",
"mode": "hybrid",
"only_need_context": true
}only_need_context: true tells LightRAG to return the retrieved context
without running the final LLM synthesis — the plugin only needs the
raw context to inject into the agent's prompt.
Multi-tenant support
Each OpenClaw instance can configure its own set of collections:
// Alice's instance
"collections": ["knowledge_alice", "knowledge_shared"]
// Bob's instance
"collections": ["knowledge_bob", "knowledge_shared"]All instances can share the same PostgreSQL database — isolation is done at the collection level. LightRAG, however, uses one instance per tenant (workspace isolation is not yet exposed in the plugin).
Example output
When the agent receives a user message, it sees something like this in its system prompt:
<existing system prompt>
### Document Search Results (pgvector)
[knowledge_alice] Contrat_Acme_Corp.pdf (score: 0.92, chunk 2/5)
Service agreement between Alice Consulting and Acme Corp. Duration: 6 months,
daily rate: 1500 EUR, start date: 2026-01-15, deliverables: strategy workshops,
CODIR alignment sessions, monthly follow-ups...
[knowledge_shared] Pricing_Grid_2026.pdf (score: 0.87, chunk 1/1)
Standard pricing grid: senior consulting 1500 EUR/day, junior 900 EUR/day,
workshops 3500 EUR/day flat...
### Knowledge Graph Context (LightRAG)
Entity: Acme Corp (Organization)
Relationships:
- Acme Corp → client_of → Alice Consulting (since 2026-01-15)
- Acme Corp → subject_of → Contrat_Acme_Corp.pdf
- Acme Corp → operates_in → Insurance sector
- Acme Corp → represented_by → Thomas Martin (Contact)
User: What were the terms of the Acme contract?The LLM can now cite both the vector search hits (specific text passages) and the knowledge graph entities (relationships and structure) to produce a grounded answer.
Relationship with Hindsight
This plugin complements Hindsight (the memory plugin) without conflict:
| | Hindsight | openclaw-knowledge |
|---|-----------|-------------------|
| Purpose | Conversational memory | Document knowledge (RAG) |
| Source | Facts extracted from chats | Documents from Google Drive |
| Storage | PostgreSQL (Hindsight schema) | PostgreSQL (knowledge_vectors) + Neo4j |
| Trigger | auto-recall on every message | before_prompt_build on every message |
| Injection block | <relevant-memories> | ### Document Search Results + ### Knowledge Graph Context |
| OpenClaw slot | memory (exclusive) | None (coexists freely) |
Both run on every user message. The agent receives both blocks, giving it conversational memory AND document knowledge simultaneously.
Development
This plugin is written in TypeScript and builds against the official
OpenClaw plugin SDK (openclaw/plugin-sdk/plugin-entry).
Project layout
openclaw-knowledge-plugin/
├── src/ # TypeScript source
│ ├── index.ts # Entry point (definePluginEntry + register)
│ ├── config.ts # resolveEnv + default resolution
│ ├── embeddings.ts # Gemini embedContent client
│ ├── pgvector.ts # PostgreSQL search + result formatter
│ ├── lightrag.ts # LightRAG client + truncation
│ └── types.ts # Shared interfaces
├── test/ # TypeScript test suites (node:test)
├── dist/ # Compiled JS + .d.ts (gitignored)
├── tsconfig.json # Strict TS config for src
├── tsconfig.test.json # Typecheck (src + test)
├── tsconfig.test-build.json # Compile tests to dist-test/ for node:test
├── openclaw.plugin.json # Plugin manifest (config schema + uiHints)
└── package.jsonBuild and test
# Install dev dependencies (includes the openclaw SDK for types, ~200 MB)
npm install
# Strict type check (src + tests)
npm run typecheck
# Run the full test suite (compiles tests then runs node:test)
npm test
# Compile TS → dist/
npm run build
# Clean build output
npm run cleanRelease process
- Update
CHANGELOG.mdwith the new version (add a## [x.y.z] - YYYY-MM-DDsection) - Commit the changelog update
- Create and push a git tag:
git tag v3.1.0 git push origin v3.1.0 - GitHub Actions will automatically:
- Run
npm run typecheck,npm test,npm run buildon Node.js 24 - Stamp the version from the tag into
package.jsonandopenclaw.plugin.json - Compile TypeScript (
npm run build) - Publish
@olivierneu/openclaw-knowledgeto npm (public access) - Create a GitHub Release with changelog notes extracted from
CHANGELOG.md
- Run
Required GitHub secret
The workflow needs an NPM_TOKEN secret. Because the npm account has 2FA
enabled with a security key, the token must be an Automation token
(not a regular Publish token), because automation tokens bypass 2FA for CI/CD.
Generate it on npm: Access Tokens → Generate New Token → Classic Token →
Automation, then add it under GitHub repo Settings → Secrets and variables
→ Actions as NPM_TOKEN.
Troubleshooting
| Symptom | Cause | Solution |
|---------|-------|----------|
| Cannot find module 'pg' | Old release (pre-v3.0.4) without bundled deps | Upgrade to v3.0.4+ |
| neither pgvector nor LightRAG configured — plugin disabled | No geminiApiKey and no lightragUrl | Configure at least one source |
| pgvector — source failed: Gemini embedding failed (429) | Gemini quota exceeded | Check Gemini API quotas or back off |
| LightRAG query failed (401) | Wrong or missing lightragApiKey | Verify the header X-API-Key is accepted |
| LightRAG query failed (503) | LightRAG server down | Check LightRAG container status |
| Plugin loads but no context injected | scoreThreshold too high | Lower to 0 to see all matches |
| Plugin enters 5-min cooldown | 3 consecutive errors on all sources | Check logs, fix the underlying issue |
License
MIT — see LICENSE
