@webcrawlerapi/ckb

v0.1.0

Published

a month ago

A command-line tool for building a searchable knowledge base. Documents are stored in Cloudflare R2 and indexed for keyword, hybrid, and semantic retrieval via Cloudflare AI Search (AutoRAG).

0High
0Medium
0Low

niiotyo

ckb — personal knowledge base CLI

A command-line tool for building a searchable knowledge base. Documents are stored in Cloudflare R2 and indexed for keyword, hybrid, and semantic retrieval via Cloudflare AI Search (AutoRAG).

Requirements

Node.js 18+
Cloudflare account with:
- An AI Search (AutoRAG) instance
- An R2 bucket
- An API token with AI Search + R2 permissions
WebCrawlerAPI key (optional, for crawl)

Install

npm i

Setup

ckb auth

How to get your API keys

Prompts for:

| Field | Where to find it | |-------|-----------------| | Cloudflare Account ID | Cloudflare dashboard → right sidebar | | API Token | My Profile → API Tokens | | AI Search Instance ID | AI Search dashboard | | R2 Bucket Name | R2 dashboard | | R2 Access Key ID | R2 → Manage R2 API Tokens | | R2 Secret Access Key | R2 → Manage R2 API Tokens | | WebCrawlerAPI Key | dash.webcrawlerapi.com/access (optional) |

Credentials are saved to ~/.ckb/config.json.

Commands

Typical flow:

ckb add ./notes.md -c personal
ckb crawl -u https://docs.example.com -c docs/example
ckb reindex
ckb search "auth flow"
ckb query "auth flow" --files
ckb get docs/example/page.md

`ckb add <file> -c <collection>`

Upload a local file to R2 under the given collection prefix.

ckb add ./notes.md -c personal
ckb add ./report.pdf -c work/q1

The R2 key becomes <collection>/<filename>.

`ckb crawl -u <url> -c <collection>`

Crawl a website, convert pages to markdown, and upload them to R2.

ckb crawl -u https://docs.example.com -c docs/example
ckb crawl -u https://docs.example.com -c docs/example -l 50 --main-content-only

Options:

| Flag | Description | |------|-------------| | -u, --url <url> | Starting URL (required) | | -c, --collection <folder> | Collection name / R2 prefix (required) | | -l, --items-limit <n> | Max pages to crawl | | -w, --whitelist-regexp <pattern> | Only crawl URLs matching pattern | | -b, --blacklist-regexp <pattern> | Skip URLs matching pattern | | -m, --main-content-only | Extract main content only (skip nav/footer) |

Pages are cached locally at ~/.ckb/cache/<hostname>-<timestamp>/. If a crawl is interrupted, re-running the same command resumes from the saved job ID.

`ckb get <key>`

Print the full content of a file stored in R2 by object key.

ckb get docs/example/page.md
ckb search "auth flow" --files

Use the key printed by ckb search --files or returned in search JSON as chunk.item.key.

`ckb search <query>`

Keyword search across the knowledge base.

ckb search "how to configure nginx"
ckb search "telegram bot setup" -c docs/n8n
ckb search "auth flow" -n 5

Options:

| Flag | Description | |------|-------------| | -c, --collection <folder> | Scope results to a collection | | -n, --max_num_results <n> | Max results to return | | --files | Print matching file names only | | --match-threshold <n> | Min retrieval score threshold (0-1) | | --debug | Print the curl request before executing | | --json | Print raw JSON response |

`ckb query <query>`

Hybrid retrieval with reranking across the knowledge base.

ckb query "how do I configure nginx for websockets"
ckb query "telegram bot setup" -c docs/n8n
ckb query "auth flow" -n 5

Options:

`ckb reindex`

Trigger an AI Search indexing job to pick up newly uploaded documents.

ckb reindex

Indexing runs asynchronously on Cloudflare's side. Run ckb status to check progress.

`ckb status`

Show AI Search instance metadata and recent indexing jobs.

ckb status

Collections

Collections are R2 key prefixes (collection/filename). They let you organize documents and scope searches:

# Add documents to different collections
ckb add ./k8s-guide.md -c docs/kubernetes
ckb add ./postgres.md -c docs/databases

# Keyword search within a specific collection
ckb search "connection pooling" -c docs/databases

# Hybrid retrieval with reranking within a specific collection
ckb query "how should I tune connection pooling" -c docs/databases

Config file

~/.ckb/config.json — written by ckb auth, read by all other commands. Edit manually to update credentials without re-running the full auth flow.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ckb — personal knowledge base CLI

Requirements

Install

Setup

Commands

ckb add <file> -c <collection>

ckb crawl -u <url> -c <collection>

ckb get <key>

ckb search <query>

ckb query <query>

ckb reindex

ckb status

Collections

Config file

`ckb add <file> -c <collection>`

`ckb crawl -u <url> -c <collection>`

`ckb get <key>`

`ckb search <query>`

`ckb query <query>`

`ckb reindex`

`ckb status`