@neuralsea/workspace-indexer

v0.6.4

Published

2 months ago

Local-first multi-repo workspace indexer (semantic embeddings + git-aware incremental updates + hybrid retrieval profiles) for AI agents.

Downloads

0High
0Medium
0Low

neuralsea

@neuralsea/workspace-indexer

A local-first, multi-repo workspace indexer for AI agents (e.g. your custom agent “Damocles”).

This package provides high-fidelity indexing, retrieval, and context expansion across entire workspaces, while remaining safe to run locally (including VS Code extension hosts).

Default index backends

Catalogue / Indexing DB: SQLite via sql.js (WASM)
Runs everywhere (Node, VS Code extension host, webview environments). No native binaries required.
Vector backend: bruteforce (default)
Zero‑config, in‑memory exact search.
Graph backend: disabled by default

For enterprise‑scale persistence and performance, configure a remote vector backend such as Qdrant, and optionally a graph backend such as Neo4j.

What this package provides

Whole‑workspace indexing
Multiple Git repositories under a single workspace root.
Meaningful chunking
TypeScript/JavaScript AST‑aware chunking with robust fallbacks for other languages.
Semantic embeddings
Pluggable providers:
- Ollama (local)
- OpenAI
- Deterministic offline hash embeddings
Hybrid retrieval
Vector similarity + lexical search (SQLite FTS5) with configurable weights.
Pluggable vector backends
bruteforce, hnswlib, qdrant, faiss, or a custom provider.
Enterprise‑safe invalidation
Repo indices are keyed by: (repo_id, head_commit, embedder_id, index_fingerprint)
Any change forces a clean rebuild to avoid stale context.
Incremental updates
File watching + .git/HEAD detection.
Security controls
Git‑native ignore rules, additional ignore files, and redaction hooks.

This allows the same index to support multiple agent domains:

Search
Refactor
Review
Architecture understanding
RCA (root cause analysis)

…by selecting different retrieval profiles.

Index backends (vector & graph)

Workspace‑Indexer separates index infrastructure from agent logic.

Index backends define where and how indexed knowledge is stored and queried:

Catalogue DB (files, chunks, metadata, FTS)
Vector backend (similarity search)
Graph backend (optional dependency / symbol / architecture graph)

Backends are configured via profiles, allowing:

Local or remote providers
Safe backend switching (automatic rebuilds)
Environment‑specific defaults

Install

npm i @neuralsea/workspace-indexer

Node 18+ required.

Docs: docs/README.md

Browser / VS Code webview

This package publishes a browser‑safe entrypoint:

import { chunkSource, OpenAIEmbeddingsProvider } from "@neuralsea/workspace-indexer/browser";

The full indexer (WorkspaceIndexer, file watching, git scanning, persistence) is Node‑only and should run in the VS Code extension host, communicating with webviews via postMessage.

Quick start (library)

import {
  WorkspaceIndexer,
  OllamaEmbeddingsProvider,
  IndexerProgressObservable
} from "@neuralsea/workspace-indexer";

const embedder = new OllamaEmbeddingsProvider({ model: "nomic-embed-text" });

const progress = new IndexerProgressObservable();
progress.subscribe(e => console.log(e.type, e));

const ix = new WorkspaceIndexer("/path/to/workspace", embedder, { progress });

await ix.indexAll();

const search = await ix.retrieve("Where is authentication enforced?", {
  profile: "search"
});

console.log(search.hits.map(h => h.chunk.path));

await ix.closeAsync();

Retrieval profiles

The same index can be queried differently depending on the task.

Built‑in profiles:

search — tight top‑k, precise matches
refactor — wider k, follows imports and adjacency
review — biases to changed files, includes file synopsis
architecture — aggressive expansion across imports
rca — review + recency bias

Profiles control:

k (primary hits)
weights (vector / lexical / recency)
expansion rules
candidate pool sizes

Profiles can be overridden at runtime.

Index backend configuration (profiles)

Index backends are configured using named profiles.

{
  "indexBackends": {
    "vectorProfiles": {
      "local-default": {
        "kind": "local",
        "provider": "bruteforce",
        "metric": "cosine"
      },
      "qdrant-dev": {
        "kind": "qdrant",
        "url": "http://localhost:6333",
        "collectionPrefix": "petri"
      }
    },
    "graphProfiles": {
      "none": { "kind": "none" },
      "neo4j-local": {
        "kind": "neo4j",
        "uri": "neo4j://localhost:7687",
        "user": "neo4j",
        "passwordRef": "NEO4J_PASSWORD",
        "database": "neo4j",
        "labelPrefix": "Petri"
      }
    },
    "defaults": {
      "vectorProfile": "local-default",
      "graphProfile": "none"
    }
  }
}

The selected profiles are resolved internally into runtime configuration.

Neo4j migration note

Earlier versions accepted Neo4j configuration under workspace.graph.

This version automatically migrates those settings into a graph profile on first run. After migration, legacy settings are ignored.

Persistence semantics

Disabling the graph backend does not disable index persistence.

Persistence of catalogue data, embeddings, and vector indices is controlled independently via storage settings.

Security model

Git‑native ignore (git ls-files)
Additional .petriignore / .augmentignore
Redaction hooks before embedding and storage

For higher assurance:

set storage.ftsMode = "tokens"
review redaction patterns

Licence

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@neuralsea/workspace-indexer

Default index backends

What this package provides

Index backends (vector & graph)

Install

Browser / VS Code webview

Quick start (library)

Retrieval profiles

Index backend configuration (profiles)

Neo4j migration note

Persistence semantics

Security model

Licence