npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@hzttt/multimodal-rag

v0.5.3

Published

OpenClaw plugin for multimodal RAG - semantic indexing and time-aware search for images and audio using local AI models

Readme

Multimodal RAG Plugin

OpenClaw 多模态 RAG 插件 —— 使用本地 AI 模型对图像 / 音频 / 文档做语义索引与时间感知搜索。源文件删除时自动移除索引(不会删除原文件)。

flowchart LR
  WP[(watchPaths)] --> W[chokidar Watcher]
  W --> P{fileType?}
  P -- image --> V[Qwen3-VL 描述]
  P -- audio --> A[Whisper / GLM-ASR 转录]
  P -- document --> D[pdfjs / officeparser<br/>+ OCR 回落]
  D --> C[递归切分 chunk]
  V --> E[Embedding 向量化]
  A --> E
  C --> E
  E --> M[(LanceDB media 表)]
  E --> B[(LanceDB doc_chunks 表)]
  M --> T1[media_search]
  B --> T1
  M --> T2[media_list]
  B --> T2
  M --> T3[media_describe]
  B --> T3
  M --> T4[media_stats]
  B --> T4
  M --> CLI[openclaw multimodal-rag *]
  B --> CLI

能力一览

| 能力 | 实现 | | --- | --- | | 图像理解 | Qwen3-VL(默认 qwen3-vl:2b),HEIC 自动 ffmpeg 拼图 | | 音频转录 | 本地 Whisper CLI 或智谱 GLM-ASR-2512(>30s 自动切片) | | 文档解析 | PDF (pdfjs-dist) / docx-xlsx-pptx (officeparser) / txt-md-html 原文;扫描页走 OCR 回落 | | OCR | 复用 Ollama VLM(ocrModel 或 fallback visionModel);PDF 渲染用 pdftoppm(poppler) | | Chunk 切分 | 递归"段落→句子→字数",默认 800 字符 + 120 overlap | | 向量化 | Ollama qwen3-embedding(4096 维)或 OpenAI text-embedding-3-* | | 向量存储 | LanceDB 双表:media(image/audio)+ doc_chunks(document);带标量索引、auto-optimize、checkoutLatest | | 文件监听 | chokidar,含 debounce、SHA256 去重、move-reuse(仅 media)、broken-file 隔离 | | Agent 工具 | media_search / media_list / media_describe / media_stats(统一覆盖 media + document) | | CLI | 9 个命令,包括 doctor / index / search / cleanup-* / reindex | | 通知 | 批次聚合 → openclaw agent --deliver 触发 agent 主动回复 |

快速开始

# 安装
openclaw plugins install @hzttt/multimodal-rag@latest
openclaw plugins enable multimodal-rag

# 配置(编辑 ~/.openclaw/openclaw.json)
# 至少填 watchPaths;详见 docs/configuration.md

# 自检
openclaw multimodal-rag doctor

# 验证
openclaw multimodal-rag stats
openclaw multimodal-rag search "东方明珠"

最小配置示例(本地 Ollama + 本地 Whisper):

{
  "plugins": {
    "entries": {
      "multimodal-rag": {
        "enabled": true,
        "config": {
          "watchPaths": ["~/mic-recordings", "~/usb_data"]
        }
      }
    }
  }
}

完整配置参考、provider 切换、远程 Ollama、智谱 GLM-ASR、通知 targets 等场景见 docs/configuration.md

前置依赖

  • Ollama 本地或网关,并已 ollama pull qwen3-vl:2b qwen3-embedding:latest
  • ffmpeg 在 PATH 中(HEIC 拼图、音频格式转换、ffprobe 元数据)
  • 音频转录二选一:
    • 本地 Whisper(pipx install openai-whisper,可用 OPENCLAW_WHISPER_BIN 覆盖路径)
    • 智谱 GLM-ASR(无需安装 Whisper,配置 whisper.zhipuApiKey
  • 文档索引(可选,启用 fileTypes.document 时):
    • pdftoppm(来自 poppler):PDF 扫描页 → PNG,供 OCR 使用。brew install poppler / apt-get install poppler-utils
    • OCR 默认复用 ollama.visionModel,也可自定义 ollama.ocrModel(例如 gemma3:27b
    • 文本型 PDF 只用 pdfjs-dist 提取字符,不经过 OCR,无需 poppler

详细依赖检查清单见 docs/operations.md

文档导航

技术文档已按模块拆分,均以源代码为唯一可信源:

| 文档 | 主题 | | --- | --- | | 架构总览 | 组件拓扑、加载流程、运行时执行模型、deferred 配置策略 | | LanceDB 存储层 | 表结构、标量索引、checkoutLatest、where→scan 回退、auto-optimize、清理 | | 索引主链路 | watcher 队列、indexFile 决策、重试与 broken-file、move-reuse、HEIC 拼图、GLM-ASR 切片 | | 检索链路 | query 向量化、minScore、去重、置信度、未索引兜底、失效自愈 | | Agent 工具 | 4 个工具的参数、返回、错误码、决策树 | | CLI 参考 | 9 个 CLI 命令的完整参数与示例 | | 配置参考 | 完整字段表、provider 分支、典型场景 JSON | | 通知机制 | 状态机、目标解析三级回退、agent --deliver 命令 | | 运维与故障排查 | doctor、broken-file marker、cleanup 三件套、健康检查、故障树 | | HTTP 接入接口 | serve 命令暴露的 /get_file_info/search_file 契约 |

历史版本文档归档在 docs/legacy/

许可证

MIT