@lacneu/openclaw-knowledge

v3.1.2

Published

13 days ago

Multi-source knowledge plugin for OpenClaw — pgvector + LightRAG injection via before_prompt_build hook

0High
0Medium
0Low

olivier.neu

openclaw openclaw-plugin rag pgvector lightrag knowledge-graph gemini embeddings

openclaw-knowledge-plugin

Dual-source knowledge injection plugin for OpenClaw Automatically enriches agent prompts with relevant context from your document knowledge base, combining pgvector semantic search and LightRAG knowledge graph in a single hook.

Overview

openclaw-knowledge is an OpenClaw plugin that automatically injects relevant documents and knowledge graph context into every agent turn. It hooks into before_prompt_build and queries two complementary sources in parallel:

| Source | Technology | What it provides | |--------|------------|------------------| | pgvector | PostgreSQL + pgvector extension | Semantic vector search on document chunks (cosine similarity on 3072-dim embeddings) | | LightRAG | Neo4j + PostgreSQL | Knowledge graph with entity/relation multi-hop traversal |

Both sources run in parallel via Promise.allSettled, so a failure in one source doesn't block the other. Results are merged and injected into the agent's system prompt via appendSystemContext.

Why two sources?

Vector search and knowledge graphs answer different kinds of questions:

Vector search finds passages that are semantically similar to the query. Good for "What did the meeting say about pricing?" — matches embeddings.
Knowledge graph finds entities and their relationships. Good for "Which clients work in the insurance sector?" — traverses entity links.

Running both gives the agent both capabilities simultaneously, without requiring the LLM to decide which to use.

Architecture

System architecture

The plugin is the query layer of a larger knowledge pipeline:

Ingestion (background, via n8n): Google Drive documents are polled, OCR'd via Mistral, embedded via Gemini, and stored in PostgreSQL (pgvector) and Neo4j (LightRAG knowledge graph).
Query (real-time, via this plugin): Every user message triggers a parallel search in both sources, results are formatted and prepended to the agent's prompt.

The plugin does not handle ingestion — that's the responsibility of the n8n ETL pipeline. This plugin only reads from the existing data stores.

Query lifecycle

Runtime sequence

Every user message triggers the following sequence:

OpenClaw fires before_prompt_build with the user's prompt
The plugin checks its cooldown state (pauses 5 min after 3 consecutive errors)
Query text is extracted and validated (≥ 3 characters)
In parallel (Promise.allSettled):
- pgvector path: embed query via Gemini → SQL search on knowledge_vectors
- LightRAG path: POST /query with mode=hybrid to the LightRAG server
Results are merged and truncated to maxInjectChars
Formatted blocks (### Document Search Results + ### Knowledge Graph Context) are injected via appendSystemContext
The agent receives the enriched prompt and generates its response

Decision flow

Plugin lifecycle

The plugin implements several safeguards to ensure it never blocks the agent:

| Safeguard | Purpose | |-----------|---------| | Cooldown (3 errors → 5 min pause) | Avoid log spam and unnecessary API calls during outages | | Query length check (≥ 3 chars) | Skip meaningless searches | | Promise.allSettled for sources | A failure in one source doesn't affect the other | | Silent error handling | Errors are logged but never thrown to the agent | | Gracefull degradation | If both sources fail, the agent runs as if the plugin weren't there |

Installation

Requirements

OpenClaw ≥ v2026.3.7 (for before_prompt_build hook)
PostgreSQL with pgvector extension
LightRAG server (optional — plugin works with pgvector alone)
Gemini API key (for query embedding)

Install via OpenClaw CLI (recommended)

The plugin is published on npm as @lacneu/openclaw-knowledge. Use the official openclaw plugins commands — install, update, list, inspect all work out of the box:

# Install (pulls the latest version from npm)
openclaw plugins install @lacneu/openclaw-knowledge

# Inspect the installed version and manifest
openclaw plugins inspect @lacneu/openclaw-knowledge

# Update to the latest published version
openclaw plugins update @lacneu/openclaw-knowledge

# List everything installed
openclaw plugins list

OpenClaw tracks the install source under plugins.installs in your configuration, so subsequent update calls know where to fetch new versions from.

Configuration

Add to your openclaw.json:

{
  "plugins": {
    "allow": ["openclaw-knowledge", "hindsight-openclaw", "telegram"],
    "entries": {
      "openclaw-knowledge": {
        "enabled": true,
        "config": {
          "geminiApiKey": "${GEMINI_API_KEY}",
          "postgresUrl": "postgresql://user:${POSTGRES_PASSWORD}@postgresql:5432/knowledge",
          "collections": ["knowledge_alice"],
          "topK": 5,
          "scoreThreshold": 0,
          "maxInjectChars": 4000,
          "lightragUrl": "http://lightrag:9621",
          "lightragApiKey": "${LIGHTRAG_API_KEY}",
          "lightragQueryMode": "hybrid",
          "lightragMaxChars": 4000
        }
      }
    }
  }
}

Then restart the gateway:

openclaw gateway restart

Configuration reference

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | enabled | boolean | true | Master switch for the plugin | | pgvector source | | | | | geminiApiKey | string | — | Gemini API key for query embedding (supports ${ENV_VAR}) | | postgresUrl | string | — | PostgreSQL connection URL (supports ${ENV_VAR}) | | collections | string[] | ["knowledge_default"] | Collections to search in knowledge_vectors table | | topK | number | 5 | Max results per collection | | scoreThreshold | number | 0.3 | Minimum cosine similarity (0–1) | | maxInjectChars | number | 4000 | Character budget for pgvector results | | pgvectorEnabled | boolean | true if geminiApiKey set | Disable pgvector while keeping LightRAG | | LightRAG source | | | | | lightragUrl | string | — | LightRAG server base URL | | lightragApiKey | string | — | LightRAG API key (supports ${ENV_VAR}) | | lightragQueryMode | string | "hybrid" | Query mode: naive, local, global, hybrid | | lightragMaxChars | number | 4000 | Character budget for LightRAG context | | lightragEnabled | boolean | true if lightragUrl set | Disable LightRAG while keeping pgvector |

LightRAG query modes

| Mode | Description | Best for | |------|-------------|----------| | naive | Simple vector similarity on chunks | Fast, basic keyword matching | | local | Entity neighborhood traversal | Questions about a specific entity | | global | Community summaries | Broad, overview questions | | hybrid | Combines local + global | Recommended for most cases |

Data model

pgvector: `knowledge_vectors` table

The plugin expects a PostgreSQL table with this structure:

CREATE TABLE knowledge_vectors (
  id SERIAL PRIMARY KEY,
  collection TEXT NOT NULL,
  file_name TEXT,
  mime_type TEXT,
  text TEXT,
  file_id TEXT,
  source TEXT,
  owner TEXT,
  chunk_index INTEGER,
  total_chunks INTEGER,
  timestamp_start TEXT,
  timestamp_end TEXT,
  embedded_at TIMESTAMPTZ,
  embedding vector(3072) NOT NULL
);

CREATE INDEX idx_knowledge_vectors_hnsw
  ON knowledge_vectors
  USING hnsw ((embedding::halfvec(3072)) halfvec_cosine_ops);

Important: The HNSW index must use halfvec(3072) because pgvector's HNSW index has a 2000-dimension limit for the native vector type. halfvec supports up to 4000 dimensions. The plugin query casts both the column and the parameter accordingly.

Embeddings

Model: gemini-embedding-2-preview via the native Gemini API
Dimensions: 3072
Distance metric: cosine similarity
Query endpoint: the plugin uses the native embedContent endpoint (not the OpenAI-compatible one), because the native endpoint supports multimodal embedding at ingestion time while still working for text queries.

LightRAG query

The plugin sends a POST request:

POST /query HTTP/1.1
X-API-Key: <lightragApiKey>
Content-Type: application/json

{
  "query": "<user message>",
  "mode": "hybrid",
  "only_need_context": true
}

only_need_context: true tells LightRAG to return the retrieved context without running the final LLM synthesis — the plugin only needs the raw context to inject into the agent's prompt.

Multi-tenant support

Each OpenClaw instance can configure its own set of collections:

// Alice's instance
"collections": ["knowledge_alice", "knowledge_shared"]

// Bob's instance
"collections": ["knowledge_bob", "knowledge_shared"]

All instances can share the same PostgreSQL database — isolation is done at the collection level. LightRAG, however, uses one instance per tenant (workspace isolation is not yet exposed in the plugin).

Example output

When the agent receives a user message, it sees something like this in its system prompt:

<existing system prompt>

### Document Search Results (pgvector)

[knowledge_alice] Contrat_Acme_Corp.pdf (score: 0.92, chunk 2/5)
Service agreement between Alice Consulting and Acme Corp. Duration: 6 months,
daily rate: 1500 EUR, start date: 2026-01-15, deliverables: strategy workshops,
CODIR alignment sessions, monthly follow-ups...

[knowledge_shared] Pricing_Grid_2026.pdf (score: 0.87, chunk 1/1)
Standard pricing grid: senior consulting 1500 EUR/day, junior 900 EUR/day,
workshops 3500 EUR/day flat...

### Knowledge Graph Context (LightRAG)

Entity: Acme Corp (Organization)
  Relationships:
  - Acme Corp → client_of → Alice Consulting (since 2026-01-15)
  - Acme Corp → subject_of → Contrat_Acme_Corp.pdf
  - Acme Corp → operates_in → Insurance sector
  - Acme Corp → represented_by → Thomas Martin (Contact)

User: What were the terms of the Acme contract?

The LLM can now cite both the vector search hits (specific text passages) and the knowledge graph entities (relationships and structure) to produce a grounded answer.

Relationship with Hindsight

This plugin complements Hindsight (the memory plugin) without conflict:

| | Hindsight | openclaw-knowledge | |---|-----------|-------------------| | Purpose | Conversational memory | Document knowledge (RAG) | | Source | Facts extracted from chats | Documents from Google Drive | | Storage | PostgreSQL (Hindsight schema) | PostgreSQL (knowledge_vectors) + Neo4j | | Trigger | auto-recall on every message | before_prompt_build on every message | | Injection block | <relevant-memories> | ### Document Search Results + ### Knowledge Graph Context | | OpenClaw slot | memory (exclusive) | None (coexists freely) |

Both run on every user message. The agent receives both blocks, giving it conversational memory AND document knowledge simultaneously.

Development

This plugin is written in TypeScript and builds against the official OpenClaw plugin SDK (openclaw/plugin-sdk/plugin-entry).

Project layout

openclaw-knowledge-plugin/
├── src/                       # TypeScript source
│   ├── index.ts               # Entry point (definePluginEntry + register)
│   ├── config.ts              # resolveEnv + default resolution
│   ├── embeddings.ts          # Gemini embedContent client
│   ├── pgvector.ts            # PostgreSQL search + result formatter
│   ├── lightrag.ts            # LightRAG client + truncation
│   └── types.ts               # Shared interfaces
├── test/                      # TypeScript test suites (node:test)
├── dist/                      # Compiled JS + .d.ts (gitignored)
├── tsconfig.json              # Strict TS config for src
├── tsconfig.test.json         # Typecheck (src + test)
├── tsconfig.test-build.json   # Compile tests to dist-test/ for node:test
├── openclaw.plugin.json       # Plugin manifest (config schema + uiHints)
└── package.json

Build and test

# Install dev dependencies (includes the openclaw SDK for types, ~200 MB)
npm install

# Strict type check (src + tests)
npm run typecheck

# Run the full test suite (compiles tests then runs node:test)
npm test

# Compile TS → dist/
npm run build

# Clean build output
npm run clean

Release process

Update CHANGELOG.md with the new version (add a ## [x.y.z] - YYYY-MM-DD section)
Commit the changelog update
Create and push a git tag:
```
git tag v3.1.0
git push origin v3.1.0
```
GitHub Actions will automatically:
- Run npm run typecheck, npm test, npm run build on Node.js 24
- Stamp the version from the tag into package.json and openclaw.plugin.json
- Compile TypeScript (npm run build)
- Publish @olivierneu/openclaw-knowledge to npm (public access)
- Create a GitHub Release with changelog notes extracted from CHANGELOG.md

Required GitHub secret

The workflow needs an NPM_TOKEN secret. Because the npm account has 2FA enabled with a security key, the token must be an Automation token (not a regular Publish token), because automation tokens bypass 2FA for CI/CD.

Generate it on npm: Access Tokens → Generate New Token → Classic Token → Automation, then add it under GitHub repo Settings → Secrets and variables → Actions as NPM_TOKEN.

Troubleshooting

| Symptom | Cause | Solution | |---------|-------|----------| | Cannot find module 'pg' | Old release (pre-v3.0.4) without bundled deps | Upgrade to v3.0.4+ | | neither pgvector nor LightRAG configured — plugin disabled | No geminiApiKey and no lightragUrl | Configure at least one source | | pgvector — source failed: Gemini embedding failed (429) | Gemini quota exceeded | Check Gemini API quotas or back off | | LightRAG query failed (401) | Wrong or missing lightragApiKey | Verify the header X-API-Key is accepted | | LightRAG query failed (503) | LightRAG server down | Check LightRAG container status | | Plugin loads but no context injected | scoreThreshold too high | Lower to 0 to see all matches | | Plugin enters 5-min cooldown | 3 consecutive errors on all sources | Check logs, fix the underlying issue |

License

MIT — see LICENSE