seekx-openclaw

v0.3.1

Published

3 months ago

OpenClaw memory backend: hybrid BM25 + vector search with reranking and CJK support

0High
0Medium
0Low

seekx-openclaw

OpenClaw memory backend powered by seekx — local-first hybrid search with BM25, vector kNN, cross-encoder reranking, and CJK support.

What it does

seekx-openclaw replaces OpenClaw's built-in memory-core backend with seekx's hybrid search engine. The agent's memory_search and memory_get tool calls are transparently routed through the seekx pipeline:

memory_search("your query")
  │
  ├─ [query expansion]   → 2–3 rephrased variants (optional, LLM)
  │
  ├─ BM25 full-text ─────┐
  └─ vector kNN ─────────┼─ RRF fusion → [cross-encoder rerank] → results
     └─ HyDE pass ───────┘

No changes to your agent or prompts are needed. The agent keeps using memory_search as before — result quality improves automatically.

Compared to the built-in backend

| Capability | memory-core | seekx-openclaw | |---|---|---| | Full-text search | trigram | BM25 + Jieba (CJK-aware) | | Semantic / vector search | — | optional, OpenAI-compatible API | | Cross-encoder reranking | — | optional | | Query expansion | — | optional (LLM) | | Auto-recall | — | optional proactive pre-search on recall-style prompts | | Citations | — | optional Source: path#line footer on snippets (QMD-compatible) | | Search timeout | — | configurable per search (default 8 s) | | Extra directories to index | — | unlimited |

All optional stages degrade gracefully: if no API key is configured, seekx runs in BM25-only mode and still outperforms the built-in backend for most queries.

Requirements

| Dependency | Version | |---|---| | OpenClaw | ≥ 2026.4.0 | | Node.js | ≥ 22 LTS | | Bun (alternative runtime) | ≥ 1.1.0 |

An OpenAI-compatible embedding API is optional but strongly recommended for semantic search. Supported providers include SiliconFlow, OpenAI, Jina AI, Ollama, or any OpenAI-compatible endpoint.

Installation

Option A — Agent-assisted (recommended)

Tell your agent (Cursor, Claude Desktop, or any capable agent with file-system access):

Fetch and follow this skill:
https://raw.githubusercontent.com/oceanbase/seekx/main/packages/openclaw-plugin/skills/install/SKILL.md

The agent will verify OpenClaw is installed, install the plugin, ask you four targeted questions (provider, API key, query expansion, extra directories), write the config, restart the gateway, and confirm the setup works — all without manual steps.

Option B — Manual (four steps)

Step 1 — Install the plugin package

openclaw plugins install seekx-openclaw

If that command is unavailable (older OpenClaw versions), use the npm fallback:

npm install -g seekx-openclaw
openclaw plugins install -l "$(npm root -g)/seekx-openclaw"

Step 2 — Edit `~/.openclaw/openclaw.json`

Add the plugins block below. If the file does not exist, create it with {} as the starting content.

Minimal config — inherits API credentials from ~/.seekx/config.yml if you already use the seekx CLI:

{
  "plugins": {
    "slots": {
      "memory": "seekx"
    },
    "entries": {
      "seekx": {
        "enabled": true
      }
    }
  }
}

Standalone config — include credentials directly (no seekx CLI required):

{
  "plugins": {
    "slots": {
      "memory": "seekx"
    },
    "entries": {
      "seekx": {
        "enabled": true,
        "config": {
          "apiKey":      "sk-xxx",
          "baseUrl":     "https://api.siliconflow.cn/v1",
          "embedModel":  "BAAI/bge-large-zh-v1.5",
          "rerankModel": "BAAI/bge-reranker-v2-m3",
          "expandModel": "Qwen/Qwen3-8B"
        }
      }
    }
  }
}

Step 3 — Restart the gateway

openclaw gateway restart

Step 4 — Verify

openclaw status

The Memory row should show plugin seekx. If it shows plugin seekx · vector off, the plugin is working correctly in BM25-only mode — vector search is inactive because no embedding model is configured or sqlite-vec could not load.

Provider configuration examples

SiliconFlow (recommended for CJK content)

{
  "apiKey":      "sk-xxx",
  "baseUrl":     "https://api.siliconflow.cn/v1",
  "embedModel":  "BAAI/bge-large-zh-v1.5",
  "rerankModel": "BAAI/bge-reranker-v2-m3",
  "expandModel": "Qwen/Qwen3-8B"
}

Get a free API key at cloud.siliconflow.cn.

OpenAI

{
  "apiKey":      "sk-xxx",
  "baseUrl":     "https://api.openai.com/v1",
  "embedModel":  "text-embedding-3-small",
  "expandModel": "gpt-4o-mini"
}

Omit rerankModel — OpenAI does not expose a reranking endpoint.

Ollama (fully local, no API key)

ollama pull nomic-embed-text   # pull the embedding model first

{
  "apiKey":     "ollama",
  "baseUrl":    "http://localhost:11434/v1",
  "embedModel": "nomic-embed-text"
}

Omit rerankModel and expandModel for a minimal local setup, or set expandModel to any chat model you have pulled (e.g. qwen2.5, llama3.2).

Indexing extra directories

Index your own notes, project docs, or any Markdown/text directory alongside OpenClaw's built-in memory files:

{
  "plugins": {
    "entries": {
      "seekx": {
        "enabled": true,
        "config": {
          "paths": [
            { "name": "notes",   "path": "~/notes" },
            { "name": "docs",    "path": "~/projects/docs", "pattern": "**/*.md" },
            { "name": "company", "path": "~/brain/companies" }
          ]
        }
      }
    }
  }
}

| Field | Required | Description | |---|---|---| | name | yes | Collection identifier; appears in search results and openclaw status. Must be unique. | | path | yes | Absolute path or ~/-prefixed home-relative path. Non-existent directories are silently skipped at startup. | | pattern | no | Glob filter. Default: **/*.{md,txt,markdown}. |

Files are indexed on startup and watched for changes. Edits are re-indexed within 1–2 seconds.

How the agent uses memory

No changes are needed in your agent prompts. The agent automatically calls memory_search before answering queries that may benefit from stored context.

You can also ask explicitly:

memory_search("kubernetes pod restart loop")
memory_search("Alice's contact info")
memory_search("架构设计决策")

To scope a search to a specific collection:

memory_search("API design", { collection: "docs" })

Collection names come from plugins.entries.seekx.config.paths[].name. The built-in OpenClaw memory files are always available as collection openclaw-memory.

To fetch a full document by its path (only paths already indexed by seekx are accessible):

memory_get("/Users/me/notes/people/alice.md")

Configuration reference

All fields under plugins.entries.seekx.config are optional. Fields not provided here inherit from ~/.seekx/config.yml when that file exists, then fall back to built-in defaults.

API credentials

| Field | Type | Description | |---|---|---| | apiKey | string | API key for the embedding/reranking/expansion service. | | baseUrl | string | OpenAI-compatible base URL, e.g. https://api.siliconflow.cn/v1. Must end with /v1. | | embedModel | string | Embedding model name. Required for vector search. | | rerankModel | string | Cross-encoder model name. Omit to disable reranking. | | expandModel | string | LLM for query expansion. Omit to disable expansion. |

Storage

| Field | Type | Default | Description | |---|---|---|---| | dbPath | string | ~/.seekx/openclaw.db | SQLite database path. Kept separate from the seekx CLI database (~/.seekx/index.sqlite). |

Indexed content

| Field | Type | Default | Description | |---|---|---|---| | includeOpenClawMemory | boolean | true | Index ~/.openclaw/workspace/MEMORY.md and ~/.openclaw/workspace/memory/**/*.md. | | paths | array | [] | Extra directories to index. See Indexing extra directories. |

Search behavior

| Field | Type | Default | Description | |---|---|---|---| | searchLimit | number | 6 | Maximum results per memory_search call. Recommended range: 4–12. | | citations | string | "auto" | "auto" / "on" / "off" — append Source: path#line footer to search snippets. | | searchTimeoutMs | number | 8000 | Maximum time (ms) for a single memory_search call. Set to 0 to disable. |

Auto recall

| Field | Type | Default | Description | |---|---|---|---| | autoRecall.enabled | boolean | true | Run a proactive seekx search before prompt build on recall-style user prompts. | | autoRecall.maxResults | number | 3 | Maximum injected matches. | | autoRecall.minScore | number | 0.2 | Minimum score threshold for injection. | | autoRecall.maxChars | number | 1200 | Character budget for injected context. | | autoRecall.minQueryLength | number | 4 | Skip auto-recall when the user prompt is shorter than this length. |

Watcher and refresh

| Field | Type | Default | Description | |---|---|---|---| | refreshIntervalMs | number | 300000 | Periodic full re-index interval (ms). The file watcher handles incremental updates; this is a safety net for missed events. Set to 0 to disable. |

Degraded modes

The plugin is designed to degrade gracefully. If a component is unavailable, the pipeline continues with what is available:

| Condition | Behavior | |---|---| | No apiKey or empty embedModel | BM25-only mode; no vector search | | sqlite-vec unavailable | BM25-only mode; vector columns exist but are unused | | rerankModel not set | RRF-ranked results are returned directly | | expandModel not set | Only the original query runs; no expanded variants | | All of the above | Pure BM25 + Jieba; still outperforms trigram-based built-in |

Credential precedence

When the same field is present in multiple places, the plugin resolves it in this order (highest priority first):

plugins.entries.seekx.config in ~/.openclaw/openclaw.json
~/.seekx/config.yml (seekx CLI config)
Built-in defaults

You can configure credentials once in ~/.seekx/config.yml (via seekx onboard) and reuse them for the plugin without duplication.

Uninstalling / switching backends

To switch back to the built-in backend, update ~/.openclaw/openclaw.json:

{
  "plugins": {
    "slots": {
      "memory": "memory-core"
    }
  }
}

Then restart the gateway:

openclaw gateway restart

The seekx database (~/.seekx/openclaw.db) is not deleted automatically. Remove it manually if you no longer need the indexed content.

To uninstall the plugin package:

openclaw plugins uninstall seekx

Troubleshooting

`openclaw status` does not show `plugin seekx` in the Memory row

Confirm plugins.slots.memory is set to "seekx" (not "memory-core" or absent).
Run openclaw plugins list to verify the plugin is listed as enabled.
Restart the gateway after any config change.

`memory_search` returns no results right after startup

The first search may block briefly while the initial index pass completes. Run openclaw status — if files and chunks are still 0 after 30 seconds, check gateway logs for indexing errors.

`plugin seekx · vector off` in `openclaw status`

Valid BM25-only mode. Vector search is inactive because no embedding model is configured, the API credentials are invalid, or sqlite-vec could not be loaded. Verify:

embedModel is set
baseUrl ends with /v1
apiKey is correct for the chosen provider

Embedding API errors in gateway logs

Confirm baseUrl ends with /v1.
Confirm the API key matches the provider (SiliconFlow keys start with sk-).
Confirm the model name is spelled exactly as the provider expects (case-sensitive).

Files in an extra directory are not indexed

Confirm the directory existed when the gateway started (non-existent paths are silently skipped).
Confirm file extensions match the configured pattern (default: **/*.{md,txt,markdown}).
Restart the gateway after creating a previously missing directory.

Search results are stale after editing a file

The file watcher picks up changes within ~1 second under normal conditions. Network drives and Docker bind mounts may miss chokidar events; the periodic re-index (refreshIntervalMs, default 5 min) handles those cases.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

seekx-openclaw

What it does

Compared to the built-in backend

Requirements

Installation

Option A — Agent-assisted (recommended)

Option B — Manual (four steps)

Step 1 — Install the plugin package

Step 2 — Edit ~/.openclaw/openclaw.json

Step 3 — Restart the gateway

Step 4 — Verify

Provider configuration examples

SiliconFlow (recommended for CJK content)

OpenAI

Ollama (fully local, no API key)

Indexing extra directories

How the agent uses memory

Configuration reference

API credentials

Storage

Indexed content

Search behavior

Auto recall

Watcher and refresh

Degraded modes

Credential precedence

Uninstalling / switching backends

Troubleshooting

openclaw status does not show plugin seekx in the Memory row

memory_search returns no results right after startup

plugin seekx · vector off in openclaw status

Embedding API errors in gateway logs

Files in an extra directory are not indexed

Search results are stale after editing a file

Further reading

Step 2 — Edit `~/.openclaw/openclaw.json`

`openclaw status` does not show `plugin seekx` in the Memory row

`memory_search` returns no results right after startup

`plugin seekx · vector off` in `openclaw status`