@swarmvaultai/engine
v3.14.1
Published
Core engine for SwarmVault: ingest, compile, query, lint, and provider abstractions.
Maintainers
Readme
@swarmvaultai/engine
@swarmvaultai/engine is the runtime library behind SwarmVault.
It exposes the primitives for initializing a workspace, ingesting sources, importing an inbox, compiling a wiki, querying the vault, recording agent tasks, maintaining retrieval, running lint, serving the graph viewer, watching the inbox, and exposing the vault over MCP.
Who This Is For
Use this package if you want to:
- build your own interface on top of the SwarmVault runtime
- integrate vault operations into another Node application
- embed watch or MCP behavior without shelling out to the CLI
- customize provider loading or orchestration in code
If you only want to use SwarmVault as a tool, install @swarmvaultai/cli instead.
Core Exports
import {
addInput,
addManagedSource,
benchmarkVault,
buildContextPack,
compileVault,
createMcpServer,
createWebSearchAdapter,
deleteContextPack,
deleteManagedSource,
defaultVaultConfig,
defaultVaultSchema,
doctorRetrieval,
doctorVault,
exploreVault,
exportGraphFormat,
exportGraphHtml,
explainGraphVault,
finishMemoryTask,
getRetrievalStatus,
getWatchStatus,
getGitHookStatus,
importInbox,
ingestInput,
initVault,
installAgent,
installGitHooks,
getWebSearchAdapterForTask,
lintVault,
listGodNodes,
listContextPacks,
listMemoryTasks,
listManagedSourceRecords,
listSchedules,
loadVaultConfig,
loadVaultSchema,
loadVaultSchemas,
pathGraphVault,
pushGraphNeo4j,
queryGraphVault,
queryVault,
readContextPack,
readMemoryTask,
refreshGraphClusters,
rebuildRetrievalIndex,
reloadManagedSources,
resumeMemoryTask,
runWatchCycle,
runSchedule,
searchVault,
serveSchedules,
startGraphServer,
startMemoryTask,
startMcpServer,
syncTrackedRepos,
uninstallGitHooks,
updateMemoryTask,
watchVault,
} from "@swarmvaultai/engine";The engine also exports the main runtime types for providers, graph artifacts, pages, manifests, query results, task records, vault doctor reports with prioritized recommendations, retrieval status, lint findings, and watch records.
Example
import {
addInput,
addManagedSource,
benchmarkVault,
buildContextPack,
compileVault,
doctorRetrieval,
doctorVault,
exploreVault,
exportGraphHtml,
exportGraphFormat,
finishMemoryTask,
getRetrievalStatus,
getWatchStatus,
importInbox,
initVault,
installGitHooks,
listMemoryTasks,
listManagedSourceRecords,
loadVaultSchemas,
pushGraphNeo4j,
queryGraphVault,
queryVault,
readContextPack,
refreshGraphClusters,
rebuildRetrievalIndex,
resumeMemoryTask,
reloadManagedSources,
runWatchCycle,
startMemoryTask,
updateMemoryTask,
watchVault
} from "@swarmvaultai/engine";
const rootDir = process.cwd();
await initVault(rootDir, { obsidian: true });
const schemas = await loadVaultSchemas(rootDir);
console.log(schemas.root.path);
const managed = await addManagedSource(rootDir, "https://github.com/karpathy/micrograd");
console.log(managed.source.id);
await addInput(rootDir, "https://arxiv.org/abs/2401.12345");
await importInbox(rootDir);
await compileVault(rootDir, {});
const benchmark = await benchmarkVault(rootDir);
console.log(benchmark.avgQueryTokens);
const saved = await queryVault(rootDir, { question: "What changed most recently?" });
console.log(saved.savedPath);
console.log(await getRetrievalStatus(rootDir));
const contextPack = await buildContextPack(rootDir, {
goal: "Implement the auth refactor",
target: "./src",
budgetTokens: 8000
});
console.log(contextPack.markdownPath);
const memory = await startMemoryTask(rootDir, {
goal: "Implement the auth refactor",
target: "./src",
agent: "codex"
});
await updateMemoryTask(rootDir, memory.task.id, {
decision: "Keep the implementation local-first and file-backed.",
changedPath: "packages/engine/src/memory.ts"
});
await finishMemoryTask(rootDir, memory.task.id, {
outcome: "Task ledger shipped behind CLI, MCP, graph, and viewer surfaces."
});
console.log((await readMemoryTask(rootDir, memory.task.id)).status);
console.log((await listMemoryTasks(rootDir)).length);
console.log((await resumeMemoryTask(rootDir, memory.task.id)).content);
console.log((await doctorVault(rootDir)).recommendations);
console.log(await doctorRetrieval(rootDir));
const graphQuery = await queryGraphVault(rootDir, "Which nodes bridge the biggest communities?");
console.log(graphQuery.summary);
console.log(await refreshGraphClusters(rootDir, { resolution: 1 }));
const exploration = await exploreVault(rootDir, { question: "What should I investigate next?", steps: 3, format: "report" });
console.log(exploration.hubPath);
await exportGraphHtml(rootDir, "./exports/graph.html");
await exportGraphFormat(rootDir, "graphml", "./exports/graph.graphml");
await pushGraphNeo4j(rootDir, {
uri: "bolt://127.0.0.1:7687",
username: "neo4j",
passwordEnv: "NEO4J_PASSWORD",
dryRun: true
});
await runWatchCycle(rootDir, { repo: true });
console.log(await getWatchStatus(rootDir));
console.log(await listManagedSourceRecords(rootDir));
await reloadManagedSources(rootDir, { all: true, compile: true });
await installGitHooks(rootDir);
const watcher = await watchVault(rootDir, { lint: true, repo: true });Schema Layer
Each workspace carries a root markdown file named swarmvault.schema.md.
The engine treats that file as vault-specific operating guidance for compile and query work. Currently:
initVault()creates the default schema fileinitVault()also creates a human-onlywiki/insights/areainitVault({ obsidian: true })can also seed a minimal.obsidian/workspaceswarmvault.config.jsoncan defineprojectswith root matching and optional per-project schema files- compile and query prompts include the schema content
- generated pages store
schema_hash - generated pages also carry lifecycle metadata such as
status,created_at,updated_at,compiled_from,managed_by, andproject_ids - saved visual outputs also carry
output_assets lintVault()marks generated pages stale when the schema changes
Provider Model
The engine supports:
heuristicopenaianthropicgeminiollamaopenroutergroqtogetherxaicerebrasopenai-compatiblecustom
Providers are capability-driven. Each provider declares support for features such as:
chatstructuredvisiontoolsembeddingsstreaminglocalimage_generation
This matters because many "OpenAI-compatible" backends only implement part of the OpenAI surface.
Main Engine Surfaces
Ingest
addManagedSource(rootDir, input, { compile, brief, maxPages, maxDepth, branch, ref, checkoutDir })registers and syncs a recurring source, then optionally compiles and writes a source brieflistManagedSourceRecords(rootDir)lists registry-backed managed sources fromstate/sources.jsonreloadManagedSources(rootDir, { id, all, compile, brief, maxPages, maxDepth })re-syncs one managed source or the full registrydeleteManagedSource(rootDir, id)removes a managed-source registry entry and transient sync state without deleting canonical vault artifactsingestInput(rootDir, input, { includeAssets, maxAssetSize })ingests a local file path or URLingestInputDetailed(rootDir, input, { includeAssets, maxAssetSize })returns a summary envelope withcreated,updated,unchanged, andremovedmanifests when one input expands into multiple sourcesaddInput(rootDir, input, { author, contributor })captures supported URLs into normalized markdown before ingesting them, or falls back to generic URL ingestingestDirectory(rootDir, inputDir, { repoRoot, include, exclude, maxFiles, gitignore, extractClasses })recursively ingests a local directory as a repo-aware code/content source treeimportInbox(rootDir, inputDir?)recursively imports supported inbox files plus markdown and HTML browser-clipper style bundles- managed sources support local directories, public GitHub repo root URLs, and bounded same-domain docs hubs; GitHub repo sources can pin
branch,ref, andcheckoutDir - registry data lives in
state/sources.json, working state lives understate/sources/<id>/, and source briefs are written towiki/outputs/source-briefs/<id>.md - EPUB inputs split into chapter-level manifests with shared group metadata so books stay navigable instead of becoming one giant source
- CSV and TSV inputs produce bounded tabular summaries with delimiter-aware previews and compact column hints
- XLSX inputs extract workbook-level and sheet-level previews, while PPTX inputs extract slide text plus speaker notes when present
- JavaScript, JSX, TypeScript, TSX, Bash/shell script, Python, Go, Rust, Java, Kotlin, Scala, Dart, Lua, Zig, C#, C, C++, PHP, Ruby, PowerShell, Elixir, OCaml, Objective-C, ReScript, Solidity, HTML, CSS, Vue, Svelte, Julia, Verilog/SystemVerilog, R, and SQL inputs are treated as code sources and compiled into both source pages and
wiki/code/module pages where parser support exists. Julia, Verilog/SystemVerilog, and R currently emit explicit parser-asset diagnostics when no packaged WASM grammar is available. .rstand.restinputs are treated as first-class text sources with lightweight heading and directive normalization before analysis- code manifests can carry
repoRelativePath, and compile writesstate/code-index.jsonso local imports can resolve across an ingested repo tree - repo-aware manifests, graph nodes, and graph pages can also carry
sourceClassso first-party, third-party, resource, and generated material can be filtered and reported separately - HTML and markdown URL ingests localize remote image references into
raw/assets/<sourceId>/by default and rewrite the stored markdown to local relative paths - PDF, DOCX, EPUB, CSV/TSV, XLSX, and PPTX ingests write extracted-text and metadata sidecars under
state/extracts/, and image ingest keeps the same sidecar model for vision extraction - Tree-sitter-backed languages now verify runtime and grammar compatibility per language; failures stay local to the affected source and surface as diagnostics instead of aborting the whole compile
Compile + Query
compileVault(rootDir, { approve })writes wiki pages, graph data, and search state using the vault schema as guidance, or stages a review bundle- compile also writes graph orientation artifacts such as
wiki/graph/report.md,wiki/graph/share-card.md,wiki/graph/share-card.svg,wiki/graph/share-kit/,wiki/graph/report.json, andwiki/graph/communities/<community>.md - compile propagates semantic tags onto page frontmatter and source-backed graph nodes, and records deterministic
contradictsedges plus a Contradictions section in the graph report when conflicting claims are found benchmarkVault(rootDir, { questions })writesstate/benchmark.jsonand folds the latest benchmark summary intowiki/graph/report.mdandwiki/graph/report.json- semantic graph query and embedding-backed similarity enrichment cache vectors under
state/embeddings.jsonso graph-semantic refresh stays incremental queryVault(rootDir, { question, save, format, review })answers against the compiled vault using the same schema layer and saves by defaultbuildContextPack(rootDir, { goal, target, budgetTokens, format })builds an agent-ready evidence bundle with relevant pages, graph nodes, edges, hyperedges, citations, token-budget accounting, and explicit omitted entriesexploreVault(rootDir, { question, steps, format, review })runs a save-first multi-step exploration loop and writes a hub page plus step outputssearchVault(rootDir, query, limit)searches compiled pages directlyqueryGraphVault(rootDir, question, { traversal, budget })runs deterministic local graph search, preferring semantic seed matches fromtasks.embeddingProviderwhen configured and falling back to lexical search plus matching group patterns otherwisepathGraphVault(rootDir, from, to)returns the shortest graph path between two targetsexplainGraphVault(rootDir, target)returns node, community, neighbor, provenance, and group-pattern detailslistGraphHyperedges(rootDir, target?, limit?)returns graph hyperedges globally or for a specific node/page targetlistGodNodes(rootDir, limit)returns the most connected bridge-heavy graph nodesbuildGraphShareArtifact(...),renderGraphShareMarkdown(...),renderGraphShareSvg(...),renderGraphSharePreviewHtml(...), andrenderGraphShareBundleFiles(...)produce the post-ready text, 1200x630 visual card, self-contained HTML preview, and portable share kit used bywiki/graph/share-card.md,wiki/graph/share-card.svg,wiki/graph/share-kit/, and the CLIgraph sharecommandlistContextPacks(rootDir),readContextPack(rootDir, id), anddeleteContextPack(rootDir, id)manage saved context-pack artifacts understate/context-packs/- project-aware compile also builds
wiki/projects/index.mdpluswiki/projects/<project>/index.mdrollups without duplicating page trees - human-authored insight pages in
wiki/insights/are indexed into search and available to query without being rewritten by compile chartandimageformats save wrapper markdown pages plus local output assets underwiki/outputs/assets/<slug>/
Automation
watchVault(rootDir, options)watches the inbox and optionally tracked repo roots, then appends run records tostate/jobs.ndjsonrunWatchCycle(rootDir, options)runs the same inbox/repo refresh logic once without starting a watchergetWatchStatus(rootDir)reads the latest watch-status artifact plus pending semantic refresh entriessyncTrackedRepos(rootDir)refreshes previously ingested repo roots, updates changed manifests, and removes deleted repo manifestssyncTrackedReposForWatch(rootDir)is the repo-watch sync path that defers non-code semantic refresh intostate/watch/- large ingest and compile passes emit low-noise progress on TTYs, and report presentation rolls up tiny fragmented communities without mutating the canonical graph artifact
installGitHooks(rootDir),uninstallGitHooks(rootDir), andgetGitHookStatus(rootDir)manage localpost-commitandpost-checkouthook blocks for the nearest git repositoryinstallAgent(rootDir, agent, { hook })writes agent instructions and returns the primarytarget, all touchedtargets, and optional merge warnings for agents such as AiderlintVault(rootDir, options)runs structural lint, optional deep lint, optional contradiction-only filtering through{ conflicts: true }, and optional web-augmented evidence gatheringlistSchedules(rootDir),runSchedule(rootDir, jobId), andserveSchedules(rootDir)manage recurring local jobs from config- compile, query, explore, lint, and watch also write canonical markdown session artifacts to
state/sessions/ - scheduled
queryandexplorejobs stage saved outputs through approvals when they write artifacts - optional orchestration roles can enrich
lint,explore, and compile post-pass behavior without bypassing the approval flow
Web Search Adapters
createWebSearchAdapter(rootDir, id, config)constructs a normalized web search adaptergetWebSearchAdapterForTask(rootDir, "deepLintProvider")resolves the configured adapter forlint --deep --web
MCP
createMcpServer(rootDir)creates an MCP server instancestartMcpServer(rootDir)runs the MCP server over stdioexportGraphHtml(rootDir, outputPath)exports the graph workspace as a standalone HTML fileexportGraphFormat(rootDir, "svg" | "graphml" | "cypher", outputPath)exports the graph into interoperable file formatspushGraphNeo4j(rootDir, options)upserts the current graph into Neo4j over Bolt/Aura with shared-database-safevaultIdnamespacing
The MCP surface includes tools for workspace info, page search, page reads, source listing, querying, context-pack build/read/list, task start/update/finish/list/read/resume, compatibility memory task tools, retrieval status/rebuild/doctor, ingestion, compile, lint, graph report reads, hyperedge reads, and graph-native read operations such as graph query, node explain, neighbor lookup, shortest path, and god-node listing, along with resources for config, graph, manifests, schema, page content, context-pack listings, task listings, compatibility memory task listings, and session artifacts.
Artifacts
Running the engine produces a local workspace with these main areas:
swarmvault.schema.md: vault-specific compile and query instructionsinbox/: capture staging area for markdown bundles and imported filesraw/sources/: immutable source copiesraw/assets/: copied attachments referenced by ingested markdown bundles and remote URL ingestswiki/: generated markdown pages, the append-onlylog.mdactivity trail, staged candidates, saved query outputs, exploration hub pages, and a human-onlyinsights/areawiki/graph/: generated graph report pages, markdown/SVG share cards, the portableshare-kit/, optionaltree.html, and per-community summaries derived fromstate/graph.jsonwiki/context/: markdown companions for saved context packswiki/memory/: markdown index and task pages for the agent task ledgerwiki/graph/report.json: machine-readable graph report data used by the viewer and export surfaceswiki/outputs/assets/: local chart/image artifacts and JSON manifests for saved visual outputswiki/code/: generated module pages for ingested code sourceswiki/projects/: generated project rollups over canonical pageswiki/candidates/: staged concept and entity pages awaiting confirmation on a later compilestate/manifests/: source manifestsstate/sources.json: managed-source registry statestate/sources/: managed-source working state such as GitHub checkouts and crawl metadatastate/extracts/: extracted markdown plus JSON sidecars describing extractor kind, warnings, PDF page counts, and image-vision metadatastate/analyses/: model analysis outputstate/code-index.json: repo-aware code module aliases and local resolution datastate/benchmark.json: latest benchmark/trust summary for the current vaultstate/graph.json: compiled graph, including semantic-similarity edges and hyperedge-style group patterns- graph helpers include tree export, graph merge for SwarmVault/node-link JSON, read-only status checks, shrink-guarded code refresh, community refresh, graph query/path/explain, blast radius, and file exports
state/context-packs/: JSON context-pack artifacts for agent kickoff, review, and handoff workflowsstate/memory/tasks/: JSON task records for the agent task ledgerstate/retrieval/: local retrieval index directory, including the SQLite FTS shard and manifeststate/sessions/: canonical session artifactsstate/approvals/: staged review bundles fromcompileVault({ approve: true })state/schedules/: persisted schedule state and leasesstate/watch/: watch-status and pending semantic refresh artifacts for repo automationstate/jobs.ndjson: watch-mode automation logs
Saved outputs are indexed immediately into the graph page registry and search index, then linked back into compiled source, concept, and entity pages immediately through the lightweight artifact sync path. New concept and entity pages stage into wiki/candidates/ first and promote to active pages on the next matching compile. Insight pages are indexed into search and page reads, but compile does not mutate them. Project-scoped pages receive project_ids, project tags, and layered root-plus-project schema hashes when all contributing sources resolve to the same configured project.
Code sources also emit module, symbol, and parser-backed rationale nodes into state/graph.json, so local imports, exports, inheritance, same-module call edges, and rationale links are queryable through the same viewer and search pipeline.
Ingest, inbox import, compile, query, lint, review, and candidate operations also append human-readable entries to wiki/log.md.
PDF sources now go through a local text-extraction pass before analysis, and image sources use the configured visionProvider for structured OCR/diagram extraction when a real multimodal provider is available. When image extraction is unavailable, SwarmVault records an explicit warning in the extraction sidecar and carries that warning forward into analysis instead of silently treating the source as empty.
Compile and repo-refresh runs also keep benchmark artifacts current by default, so graph report consumers can show freshness and stale-state without requiring a separate benchmark command first. The graph report now also carries deterministic “why this is surprising” explanations plus group-pattern sections built from hyperedges.
Notes
- The engine expects Node
>=24 - The local search layer uses the built-in
node:sqlitemodule on Node>=24; current CLI releases suppress the upstream experimental warning during normal runs - The viewer source lives in the companion
@swarmvaultai/viewerpackage, and the built assets are bundled into the engine package for CLI installs
Links
- Website: https://www.swarmvault.ai
- Docs: https://www.swarmvault.ai/docs
- GitHub: https://github.com/swarmclawai/swarmvault
