@webcrawlerapi/ckb
v0.1.0
Published
A command-line tool for building a searchable knowledge base. Documents are stored in Cloudflare R2 and indexed for keyword, hybrid, and semantic retrieval via Cloudflare AI Search (AutoRAG).
Readme
ckb — personal knowledge base CLI
A command-line tool for building a searchable knowledge base. Documents are stored in Cloudflare R2 and indexed for keyword, hybrid, and semantic retrieval via Cloudflare AI Search (AutoRAG).
Requirements
- Node.js 18+
- Cloudflare account with:
- An AI Search (AutoRAG) instance
- An R2 bucket
- An API token with AI Search + R2 permissions
- WebCrawlerAPI key (optional, for
crawl)
Install
npm iSetup
ckb authPrompts for:
| Field | Where to find it | |-------|-----------------| | Cloudflare Account ID | Cloudflare dashboard → right sidebar | | API Token | My Profile → API Tokens | | AI Search Instance ID | AI Search dashboard | | R2 Bucket Name | R2 dashboard | | R2 Access Key ID | R2 → Manage R2 API Tokens | | R2 Secret Access Key | R2 → Manage R2 API Tokens | | WebCrawlerAPI Key | dash.webcrawlerapi.com/access (optional) |
Credentials are saved to ~/.ckb/config.json.
Commands
Typical flow:
ckb add ./notes.md -c personal
ckb crawl -u https://docs.example.com -c docs/example
ckb reindex
ckb search "auth flow"
ckb query "auth flow" --files
ckb get docs/example/page.mdckb add <file> -c <collection>
Upload a local file to R2 under the given collection prefix.
ckb add ./notes.md -c personal
ckb add ./report.pdf -c work/q1The R2 key becomes <collection>/<filename>.
ckb crawl -u <url> -c <collection>
Crawl a website, convert pages to markdown, and upload them to R2.
ckb crawl -u https://docs.example.com -c docs/example
ckb crawl -u https://docs.example.com -c docs/example -l 50 --main-content-onlyOptions:
| Flag | Description |
|------|-------------|
| -u, --url <url> | Starting URL (required) |
| -c, --collection <folder> | Collection name / R2 prefix (required) |
| -l, --items-limit <n> | Max pages to crawl |
| -w, --whitelist-regexp <pattern> | Only crawl URLs matching pattern |
| -b, --blacklist-regexp <pattern> | Skip URLs matching pattern |
| -m, --main-content-only | Extract main content only (skip nav/footer) |
Pages are cached locally at ~/.ckb/cache/<hostname>-<timestamp>/. If a crawl is interrupted, re-running the same command resumes from the saved job ID.
ckb get <key>
Print the full content of a file stored in R2 by object key.
ckb get docs/example/page.md
ckb search "auth flow" --filesUse the key printed by ckb search --files or returned in search JSON as chunk.item.key.
ckb search <query>
Keyword search across the knowledge base.
ckb search "how to configure nginx"
ckb search "telegram bot setup" -c docs/n8n
ckb search "auth flow" -n 5Options:
| Flag | Description |
|------|-------------|
| -c, --collection <folder> | Scope results to a collection |
| -n, --max_num_results <n> | Max results to return |
| --files | Print matching file names only |
| --match-threshold <n> | Min retrieval score threshold (0-1) |
| --debug | Print the curl request before executing |
| --json | Print raw JSON response |
ckb query <query>
Hybrid retrieval with reranking across the knowledge base.
ckb query "how do I configure nginx for websockets"
ckb query "telegram bot setup" -c docs/n8n
ckb query "auth flow" -n 5Options:
| Flag | Description |
|------|-------------|
| -c, --collection <folder> | Scope results to a collection |
| -n, --max_num_results <n> | Max results to return |
| --files | Print matching file names only |
| --match-threshold <n> | Min retrieval score threshold (0-1) |
| --debug | Print the curl request before executing |
| --json | Print raw JSON response |
ckb reindex
Trigger an AI Search indexing job to pick up newly uploaded documents.
ckb reindexIndexing runs asynchronously on Cloudflare's side. Run ckb status to check progress.
ckb status
Show AI Search instance metadata and recent indexing jobs.
ckb statusCollections
Collections are R2 key prefixes (collection/filename). They let you organize documents and scope searches:
# Add documents to different collections
ckb add ./k8s-guide.md -c docs/kubernetes
ckb add ./postgres.md -c docs/databases
# Keyword search within a specific collection
ckb search "connection pooling" -c docs/databases
# Hybrid retrieval with reranking within a specific collection
ckb query "how should I tune connection pooling" -c docs/databasesConfig file
~/.ckb/config.json — written by ckb auth, read by all other commands. Edit manually to update credentials without re-running the full auth flow.
