@neuralsea/workspace-indexer
v0.6.4
Published
Local-first multi-repo workspace indexer (semantic embeddings + git-aware incremental updates + hybrid retrieval profiles) for AI agents.
Downloads
61
Readme
@neuralsea/workspace-indexer
A local-first, multi-repo workspace indexer for AI agents (e.g. your custom agent “Damocles”).
This package provides high-fidelity indexing, retrieval, and context expansion across entire workspaces, while remaining safe to run locally (including VS Code extension hosts).
Default index backends
- Catalogue / Indexing DB: SQLite via sql.js (WASM)
Runs everywhere (Node, VS Code extension host, webview environments). No native binaries required. - Vector backend:
bruteforce(default)
Zero‑config, in‑memory exact search. - Graph backend: disabled by default
For enterprise‑scale persistence and performance, configure a remote vector backend such as Qdrant, and optionally a graph backend such as Neo4j.
What this package provides
- Whole‑workspace indexing
Multiple Git repositories under a single workspace root. - Meaningful chunking
TypeScript/JavaScript AST‑aware chunking with robust fallbacks for other languages. - Semantic embeddings
Pluggable providers:- Ollama (local)
- OpenAI
- Deterministic offline hash embeddings
- Hybrid retrieval
Vector similarity + lexical search (SQLite FTS5) with configurable weights. - Pluggable vector backends
bruteforce,hnswlib,qdrant,faiss, or a custom provider. - Enterprise‑safe invalidation
Repo indices are keyed by:(repo_id, head_commit, embedder_id, index_fingerprint)
Any change forces a clean rebuild to avoid stale context. - Incremental updates
File watching +.git/HEADdetection. - Security controls
Git‑native ignore rules, additional ignore files, and redaction hooks.
This allows the same index to support multiple agent domains:
- Search
- Refactor
- Review
- Architecture understanding
- RCA (root cause analysis)
…by selecting different retrieval profiles.
Index backends (vector & graph)
Workspace‑Indexer separates index infrastructure from agent logic.
Index backends define where and how indexed knowledge is stored and queried:
- Catalogue DB (files, chunks, metadata, FTS)
- Vector backend (similarity search)
- Graph backend (optional dependency / symbol / architecture graph)
Backends are configured via profiles, allowing:
- Local or remote providers
- Safe backend switching (automatic rebuilds)
- Environment‑specific defaults
Install
npm i @neuralsea/workspace-indexerNode 18+ required.
Docs: docs/README.md
Browser / VS Code webview
This package publishes a browser‑safe entrypoint:
import { chunkSource, OpenAIEmbeddingsProvider } from "@neuralsea/workspace-indexer/browser";The full indexer (WorkspaceIndexer, file watching, git scanning, persistence) is Node‑only and should run in the VS Code extension host, communicating with webviews via postMessage.
Quick start (library)
import {
WorkspaceIndexer,
OllamaEmbeddingsProvider,
IndexerProgressObservable
} from "@neuralsea/workspace-indexer";
const embedder = new OllamaEmbeddingsProvider({ model: "nomic-embed-text" });
const progress = new IndexerProgressObservable();
progress.subscribe(e => console.log(e.type, e));
const ix = new WorkspaceIndexer("/path/to/workspace", embedder, { progress });
await ix.indexAll();
const search = await ix.retrieve("Where is authentication enforced?", {
profile: "search"
});
console.log(search.hits.map(h => h.chunk.path));
await ix.closeAsync();Retrieval profiles
The same index can be queried differently depending on the task.
Built‑in profiles:
- search — tight top‑k, precise matches
- refactor — wider k, follows imports and adjacency
- review — biases to changed files, includes file synopsis
- architecture — aggressive expansion across imports
- rca — review + recency bias
Profiles control:
- k (primary hits)
- weights (vector / lexical / recency)
- expansion rules
- candidate pool sizes
Profiles can be overridden at runtime.
Index backend configuration (profiles)
Index backends are configured using named profiles.
{
"indexBackends": {
"vectorProfiles": {
"local-default": {
"kind": "local",
"provider": "bruteforce",
"metric": "cosine"
},
"qdrant-dev": {
"kind": "qdrant",
"url": "http://localhost:6333",
"collectionPrefix": "petri"
}
},
"graphProfiles": {
"none": { "kind": "none" },
"neo4j-local": {
"kind": "neo4j",
"uri": "neo4j://localhost:7687",
"user": "neo4j",
"passwordRef": "NEO4J_PASSWORD",
"database": "neo4j",
"labelPrefix": "Petri"
}
},
"defaults": {
"vectorProfile": "local-default",
"graphProfile": "none"
}
}
}The selected profiles are resolved internally into runtime configuration.
Neo4j migration note
Earlier versions accepted Neo4j configuration under workspace.graph.
This version automatically migrates those settings into a graph profile on first run. After migration, legacy settings are ignored.
Persistence semantics
Disabling the graph backend does not disable index persistence.
Persistence of catalogue data, embeddings, and vector indices is controlled independently via storage settings.
Security model
- Git‑native ignore (
git ls-files) - Additional
.petriignore/.augmentignore - Redaction hooks before embedding and storage
For higher assurance:
- set
storage.ftsMode = "tokens" - review redaction patterns
Licence
MIT
