@bodhi-ventures/aiocs
v0.6.1
Published
Local-only documentation store, fetcher, and search CLI for AI agents.
Readme
aiocs
Local-only documentation fetch, versioning, and search CLI for AI agents.
What it does
- fetches docs from websites with Playwright
- snapshots curated external git repositories as shared local reference sources
- supports authenticated sources via environment-backed headers and cookies
- runs lightweight canaries to detect source drift before full refreshes
- normalizes them into Markdown
- stores immutable local snapshots in a shared catalog
- diffs snapshots to show what changed between fetches
- indexes heading-aware chunks in SQLite FTS5
- adds optional hybrid retrieval with local Ollama embeddings and a dedicated Qdrant vector index
- links docs sources to local projects for scoped search
- exports and imports manifest-backed backups for
~/.aiocs
All state is local. By default, data lives under ~/.aiocs:
- data:
~/.aiocs/data - config:
~/.aiocs/config
For testing or local overrides, set:
AIOCS_DATA_DIRAIOCS_CONFIG_DIR
Install
npm install -g @bodhi-ventures/aiocs
aiocs --version
aiocs --help
command -v aiocs-mcpZero-install fallback:
npx -y -p @bodhi-ventures/aiocs aiocs --version
npx -y -p @bodhi-ventures/aiocs aiocs-mcpFor repository development only:
pnpm install
pnpm build
pnpm dev -- --help
pnpm dev:mcpFor AI agents, prefer the root-level --json flag for one-shot commands:
aiocs --json version
aiocs --json doctor
aiocs --json init --no-fetch
aiocs --json source list
aiocs --json source describe hyperliquid
aiocs --json page list hyperliquid --query "auth"
aiocs --json search "maker flow" --source hyperliquid
aiocs --json retrieve "where is maker flow documented" --source hyperliquid --mode lexical
aiocs --json show 42--json emits exactly one JSON document to stdout with this envelope:
{
"ok": true,
"command": "search",
"data": {
"total": 0,
"limit": 20,
"offset": 0,
"hasMore": false,
"results": []
}
}Failures still exit with status 1, but emit a JSON error document instead of human text:
{
"ok": false,
"command": "show",
"error": {
"code": "CHUNK_NOT_FOUND",
"message": "Chunk 42 not found"
}
}The full stable JSON contract lives in docs/json-contract.md.
Release
Stable releases are tag-driven. Bump package.json.version, commit the change, then create and push a matching stable tag:
git add package.json
git commit -m "release: vX.Y.Z"
git tag vX.Y.Z
git push origin main
git push origin vX.Y.ZGitHub Actions publishes @bodhi-ventures/aiocs publicly to npm and creates the GitHub release only from pushed tags matching vX.Y.Z. The workflow validates that the tag exactly matches package.json.version and is safe to rerun after partial success.
Codex integration
For Codex-first setup, automatic-use guidance, MCP recommendations, and agent definitions, see docs/codex-integration.md.
Canonical Codex setup:
- register
aiocs-mcpin~/.codex/config.toml - link
skills/aiocsinto~/.codex/skills/aiocs - link
skills/aiocs-curationinto~/.codex/skills/aiocs-curation - keep the optional specialist subagent linked only as a fallback for heavier docs workflows
Managed sources
The open-source repo bundles both web and git sources in sources/:
hyperliquidfor the public docs sitenktkas-hyperliquidfor thenktkas/hyperliquidGitHub repository
Additional machine-local source specs belong in ~/.aiocs/sources.
aiocs init bootstraps both managed locations, so source behavior is the same regardless of
whether a spec lives in the repo or in ~/.aiocs/sources.
Bootstrap managed sources in one command:
aiocs init --no-fetch
aiocs --json init --no-fetchValidate the machine before bootstrapping:
aiocs doctor
aiocs --json doctorWorkflow
Register a source:
mkdir -p ~/.aiocs/sources
cp /path/to/source.yaml ~/.aiocs/sources/my-source.yaml
aiocs source upsert ~/.aiocs/sources/my-source.yaml
aiocs source upsert /path/to/source.yaml
aiocs source listFetch and snapshot docs:
aiocs refresh due hyperliquid
aiocs snapshot list hyperliquid
aiocs refresh dueForce fetch remains available for explicit maintenance:
aiocs fetch hyperliquid
aiocs fetch allLink docs to a local project:
aiocs project link /absolute/path/to/project hyperliquid lighter
aiocs project unlink /absolute/path/to/project lighterSearch and inspect results:
aiocs source describe hyperliquid
aiocs source context show hyperliquid
aiocs page list hyperliquid --query "auth"
aiocs page show hyperliquid --url "https://hyperliquid.gitbook.io/hyperliquid-docs/for-developers/api"
aiocs search "maker flow" --source hyperliquid
aiocs search "WebSocketTransport" --source nktkas-hyperliquid --path "src/**" --language typescript
aiocs search "maker flow" --source hyperliquid --mode lexical
aiocs search "maker flow" --source hyperliquid --mode hybrid
aiocs search "maker flow" --source hyperliquid --mode semantic
aiocs search "maker flow" --all
aiocs search "maker flow" --source hyperliquid --limit 5 --offset 0
aiocs show 42
aiocs canary hyperliquid
aiocs diff hyperliquid
aiocs embeddings status
aiocs embeddings backfill all
aiocs embeddings run
aiocs backup export /absolute/path/to/backup
aiocs verify coverage hyperliquid /absolute/path/to/reference.mdAwareness and learning flow:
aiocs source context upsert hyperliquid ~/.aiocs/source-context/hyperliquid.yaml
aiocs learning save --source hyperliquid --kind discovery --intent "maker flow" --page-url "https://..."
aiocs learning list --source hyperliquid
aiocs retrieve "where is maker flow documented" --source hyperliquid --mode lexicalWhen aiocs search runs inside a linked project, it automatically scopes to that project's linked sources unless --source or --all is provided.
For agents, the intended decision order is:
- check
source list,source describe, orpage listfirst - if the source exists and is due, run
refresh due <source-id> - use
searchto shortlist candidates, thenretrieveorpage showto read the full page before answering - if the source is missing but worth reusing, add a spec under
~/.aiocs/sources, then upsert and refresh only that source - save durable discoveries or negative paths with
learning save - avoid
fetch allunless the user explicitly asks or the daemon is doing maintenance
Git repo sources
aiocs supports first-class kind: git sources for curated external repositories that should be
reused across multiple local projects.
Example:
kind: git
id: nktkas-hyperliquid
label: nktkas/hyperliquid Repo
repo:
url: https://github.com/nktkas/hyperliquid.git
ref: main
include:
- README.md
- docs/**
- src/**
exclude:
- .github/**
- dist/**
schedule:
everyHours: 24Git source snapshots are commit-based, stored under the shared local catalog, and searchable with the same project linking, diffing, canary, and hybrid search flows as website docs.
Hybrid search
aiocs keeps SQLite FTS5/BM25 as the canonical lexical index and adds an optional hybrid layer:
--mode lexical: lexical search only--mode hybrid: BM25 plus vector recall fused with reciprocal-rank fusion--mode semantic: vector-only recall over the latest indexed snapshots--mode auto: default; uses hybrid only when the vector layer is healthy and current for the requested scope
Vector state is derived from the catalog, not a second source of truth. If Ollama or Qdrant is unavailable, auto degrades back to lexical search.
Authenticated sources
Source specs can reference secrets from the environment without storing raw values in YAML:
auth:
headers:
- name: authorization
valueFromEnv: AIOCS_DOCS_TOKEN
hosts:
- docs.example.com
include:
- https://docs.example.com/private/**
cookies:
- name: session
valueFromEnv: AIOCS_DOCS_SESSION
domain: docs.example.com
path: /Header secrets are scoped per entry. If hosts is omitted, the header applies to the source allowedHosts; include can further narrow it to specific URL patterns.
Canary checks
Canaries execute the real extraction strategy without creating snapshots. They are intended to catch selector/copy-markdown drift before a full refresh degrades silently.
canary:
everyHours: 6
checks:
- url: https://docs.example.com/start
expectedTitle: Private Docs Start
expectedText: Secret market structure docs
minMarkdownLength: 40If canary is omitted, aiocs defaults to a lightweight canary against the first startUrl.
Backups
backup export creates a manifest-backed directory snapshot. The catalog database is exported with SQLite's native backup mechanism so the backup stays consistent even if aiocs is reading or writing the catalog while the export runs.
Backups intentionally include only the canonical ~/.aiocs data/config state. The Qdrant vector index is treated as derived state and is rebuilt from the restored catalog after backup import.
JSON command reference
All one-shot commands support --json:
versioninitdoctorsource upsertsource listfetchcanaryrefresh duesnapshot listdiffproject linkproject unlinkbackup exportbackup importembeddings statusembeddings backfillembeddings clearembeddings runsearchverify coverageshow
Representative examples:
aiocs --json doctor
aiocs --json init --no-fetch
aiocs --json source list
aiocs --json source upsert sources/hyperliquid.yaml
aiocs --json refresh due hyperliquid
aiocs --json canary hyperliquid
aiocs --json refresh due
aiocs --json diff hyperliquid
aiocs --json embeddings status
aiocs --json embeddings backfill all
aiocs --json embeddings clear hyperliquid
aiocs --json embeddings run
aiocs --json project link /absolute/path/to/project hyperliquid lighter
aiocs --json snapshot list hyperliquid
aiocs --json backup export /absolute/path/to/backup
aiocs --json verify coverage hyperliquid /absolute/path/to/reference.mdFor multi-result commands like fetch, refresh due, and search, data contains structured collections rather than line-by-line output:
{
"ok": true,
"command": "search",
"data": {
"query": "maker flow",
"total": 42,
"limit": 20,
"offset": 0,
"hasMore": true,
"modeRequested": "auto",
"modeUsed": "hybrid",
"results": []
}
}Daemon
aiocs ships a first-class long-running refresh process:
aiocs daemonThe daemon bootstraps source specs from the configured directories, refreshes due sources, sleeps for the configured interval, and repeats. Configured source spec directories are treated as the daemon’s source of truth:
- if a managed source spec changes, the source is made due immediately in the same daemon cycle
- if a managed source spec is removed from disk, the source is removed from the catalog on the next bootstrap
- if
AIOCS_SOURCE_SPEC_DIRSis explicitly set but resolves to missing or empty directories, the daemon fails fast instead of silently idling - due canaries run independently from full fetch schedules so drift is caught earlier than the next full snapshot refresh
- daemon heartbeat state is persisted in the local catalog and surfaced through
aiocs doctor - queued embedding jobs are processed in the same daemon cycle after fetches complete
Environment variables:
AIOCS_DAEMON_INTERVAL_MINUTES- positive integer, defaults to
60
- positive integer, defaults to
AIOCS_DAEMON_FETCH_ON_STARTtrueby default- accepted values:
true,false,1,0,yes,no,on,off
AIOCS_SOURCE_SPEC_DIRS- comma-separated list of source spec directories
- defaults to
~/.aiocs/sources, the bundledsources/path, plus/app/sourcesinside Docker when present
For local agents, the daemon keeps the shared catalog under ~/.aiocs warm while agents continue to use the normal CLI with --json.
Daemon JSON mode
aiocs daemon --json is intentionally different from one-shot commands. Because it is long-running, it emits one JSON event per line:
aiocs --json daemonExample event stream:
{"type":"daemon.started","intervalMinutes":60,"fetchOnStart":true,"sourceSpecDirs":["/app/sources"]}
{"type":"daemon.cycle.started","reason":"startup","startedAt":"2026-03-26T00:00:00.000Z"}
{"type":"daemon.cycle.completed","reason":"startup","result":{"canaryDueSourceIds":[],"dueSourceIds":[],"bootstrapped":{"processedSpecCount":5,"sources":[]},"canaried":[],"canaryFailed":[],"refreshed":[],"failed":[],"embedded":[],"embeddingFailed":[]}}MCP server
aiocs also ships an MCP server binary for tool-native agent integrations:
command -v aiocs-mcp
aiocs-mcpFor repository development only:
pnpm dev:mcpThe MCP server exposes the same shared operations as the CLI without shell parsing:
versiondoctorinitsource_upsertsource_listcanaryfetchrefresh_duesnapshot_listdiff_snapshotsproject_linkproject_unlinkembeddings_statusembeddings_backfillembeddings_clearembeddings_runbackup_exportbackup_importsearchshowverify_coveragebatch
Release automation
The repo ships two GitHub Actions workflows:
- ci.yml: validation for lint, tests, build, pack, and Docker smoke coverage
- release.yml: tag-driven stable release flow that validates the tagged package state, publishes to npm, and creates a GitHub release
The release workflow is triggered only by pushed stable tags matching vX.Y.Z and expects an npm publish token in GitHub repository secrets. The release job is retryable: if @bodhi-ventures/[email protected] already exists on npm or the GitHub release already exists for vX.Y.Z, the workflow skips the completed publication step and finishes the remaining one.
Successful MCP results use an envelope:
{
"ok": true,
"data": {
"name": "@bodhi-ventures/aiocs",
"version": "0.1.1"
}
}Failed MCP results use the same machine-readable error shape:
{
"ok": false,
"error": {
"code": "CHUNK_NOT_FOUND",
"message": "Chunk 42 not found"
}
}Docker
The repo ships a long-running Docker service for scheduled refreshes.
Build and start it with:
docker compose up --build -dThe compose file:
- runs
aiocs daemonas the container entrypoint - bind-mounts
${HOME}/.aiocsinto/root/.aiocsso the container shares the same local catalog defaults as the host CLI - bind-mounts
./sourcesinto/app/sourcesso source spec edits are picked up without rebuilding - runs a dedicated
aiocs-qdrantcontainer for vector search - points the daemon at host Ollama with
AIOCS_OLLAMA_BASE_URL(defaults tohttp://host.docker.internal:11434in Compose)
Override cadence with environment variables when starting compose:
AIOCS_DAEMON_INTERVAL_MINUTES=15 docker compose up --build -dSource spec shape
Each source spec is YAML or JSON and must define:
idlabelstartUrlsallowedHostsdiscovery.includediscovery.excludediscovery.maxPagesextractnormalizeschedule.everyHours
Supported extraction strategies:
clipboardButtonselectorreadability
Verification
pnpm lint
pnpm test
pnpm build
npm pack --dry-run