octoflow-cli
v1.0.1
Published
Command-line entrypoint for OctoFlow projects and runtime diagnostics.
Readme
octoflow-cli
Command-line entrypoint for OctoFlow projects and runtime diagnostics. Install this package when you want the public octoflow binary; use octoflow-core when you only need the TypeScript runtime.
Install
npm install -D octoflow-cli
# or
pnpm add -D octoflow-cli
yarn add -D octoflow-cli
bun add -D octoflow-cliNode.js >=20 is required.
Use It When
- You want the
octoflowbinary for project setup, backend inspection, and diagnostics. - You want to run
octoflow initto scaffold a project oroctoflow doctorto check which backends are ready. - You need a quick one-shot
octoflow run "..."without writing any TypeScript. - You want a coding-agent REPL (
octoflow chat) that uses persisted backend, model, host, cache, compaction, and event-log settings.
Use octoflow-core when you need the programmatic TypeScript runtime inside your own app.
Run
From a project that has octoflow-cli installed:
npx octoflow --help
npx octoflow --version
npx octoflow init
npx octoflow doctor
npx octoflow run "Say hello from OctoFlow."
npx octoflow chatFrom this monorepo
# launch the chat REPL straight from source (no build needed)
npm run -w octoflow-cli start
# any other subcommand via tsx
npm run -w octoflow-cli start:dev -- doctor
npm run -w octoflow-cli start:dev -- host list
# or build once and run the compiled binary
npm run -w octoflow-cli build
node packages/octoflow-cli/dist/main.js chatZero-config quick start (Ollama)
octoflow chat defaults to Ollama on http://localhost:11434 so a fresh install needs no API key — just a local Ollama daemon. The first run looks like:
# 1. install + start Ollama (one-time)
brew install ollama && ollama serve & # macOS; or download from https://ollama.com/download
# 2. pull the default model — small (~2 GB), runs everywhere
ollama pull llama3.2:3b
# 3. chat
npx octoflow chatIf llama3.2:3b isn't pulled but you already have another Ollama model on disk, the chat auto-substitutes one for you on boot (a coder model first if available, otherwise a known small model like qwen2.5 / gemma2 / mistral, otherwise whatever the daemon listed first) and persists the choice to settings. You see a single line in the log explaining what was swapped and how to opt back into the original — no failing first send, no manual /model step. To use Anthropic / OpenAI / Gemini instead, set the matching env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY) and pick the provider's model from /model — the chat restarts in-place against the new backend.
Interactive coding agent (octoflow chat)
A Claude-Code-style REPL rendered with React (via Ink) — built around what developers actually want from an agentic CLI:
/command palette — type/and a fuzzy-filtered overlay shows every built-in slash command, every loaded SKILL.md, every MCP tool (asserver/tool), and every model the runtime supports — labelledbackend / model(e.g.openai-api / gpt-4o,anthropic-api / claude-sonnet-4-6, plus everything pulled locally onollama). Pick with ↑/↓ + Enter; Esc closes. Skills/MCP picks seed your input with a hint (Use the \octocode-engineer` skill for this task.) you can edit before sending; **model picks flip both the active backend and the active model in one step**, persist to~/.octoflow/cli/settings.json`, and the chat restarts in-place against the new agent — no shell relaunch needed.@file picker — type@(at the start of input or after a space) and a fuzzy-filtered overlay shows every file in the workspace, relative tocwd. The list comes fromgit ls-fileswhen in a repo (so.gitignoreis honored automatically) and falls back to a recursive walk that skipsnode_modules,.git,dist,build,coverage,.next,.turbo, and friends. Pick a file and@path/to/file.tsis inserted into your input — keep typing your message after it.- Provider · model headline — the banner now leads with
provider · model[ @ host]so the active route is obvious at a glance, plus a breakdown oftools · mcp · skills · cache · log. - Dynamic system prompt — built fresh on every turn from a Claude-Code-grade base + the live workspace (cwd, git branch), the active model, and the actual tool / MCP / skill roster — so the agent can't hallucinate a tool that isn't loaded, and a
/modelswitch is reflected on the very next message. - Streaming reply — agent text appears as it's generated, not after the turn ends.
- Esc cancels the current turn — the in-flight task is killed via
session.cancelTask; the process keeps running so you don't lose state. - Multi-line input — end a line with
\and press Enter to insert a newline (paste a stack trace, dictate a multi-paragraph spec, then send). - Up/Down history — recall previous prompts ref-backed, no log re-render.
- Compaction-aware queue — type ahead during busy/compaction; messages are queued FIFO and replayed on idle.
- Live activity — tool calls, cache hits, compaction start/end, MCP invocations are streamed into the log with their own colored badges.
- Boot-time backend health check — on startup the chat probes every configured backend via
agent.inspectBackends(). If your active backend isn't ready (missing API key, unreachable host, …) you see a loud red banner in the log telling you exactly why and which alternative is ready, with a hint to type/modeland switch — no more cryptic "no backend available" on first send. - Live metrics footer — tasks, tool calls, cache hit rate, cached/written tokens, compactions, errors, elapsed, and a
ctx≈Ntokindicator showing the running token cost of the next turn (system prompt + persisted history). Type/contextfor a breakdown or/clearto drop the whole thread back to the system-prompt baseline.
Performance is intentional: the log view, footer, banner, and input pane are all React.memo'd. The 80 ms spinner frame and the 250 ms metrics tick live inside their consumers, so they don't invalidate the 500-entry log on every paint.
Workspace tools (read/edit/write/list/search/shell), MCP, SKILL.md bundles, prompt caching, and context compaction are wired automatically from ~/.octoflow/settings.json.
# launch with the persisted active profile
npx octoflow chat
# use a registered custom host for this session
npx octoflow chat --host groq-llamaInside the REPL:
| Command | Effect |
| ---------- | --------------------------------------- |
| / | Open the command palette (commands · skills · MCP tools · models) |
| @ | Open the file picker — fuzzy-filter workspace files and insert @path |
| /help | List slash commands |
| /metrics | Print token + cache snapshot |
| /backends| Show backend readiness |
| /tools | List actions registered on the agent |
| /skills | List SKILL.md bundles in scope |
| /model | Open the model picker — lists every model from every supported backend (curated from inspectBackendCatalog()), plus locally pulled Ollama models. Picking switches backend + model in one step and restarts the chat in-place. |
| /prompt | Print the dynamic system prompt and its size (chars + approx tokens) |
| /context | Print the current context cost (system + history token estimate) |
| /clear | Drop the entire conversation history and the visible log — starts a fresh thread (new contextId); the model sees only the system prompt on the next turn |
| /exit | Quit cleanly |
| Up / Down | Recall previous prompt (or move palette selection) |
| Enter | Send (or pick palette item, or queue if agent is busy) |
| \ + Enter | Continue input on a new line |
| Esc (idle) | Clear current input (or close palette) |
| Esc (busy) | Cancel current turn — or drop the pending queue first |
| Ctrl+C | Quit immediately |
Adding a custom model
Custom models are added once with octoflow host add (so secrets stay out of the chat transcript / event log) and then picked from the /model palette inside the chat. Example flow:
# one-time: register the host
npx octoflow host add \
--name groq-llama \
--backend openai-api \
--host https://api.groq.com/openai/v1 \
--model llama-3.3-70b-versatile \
--api-key-env GROQ_API_KEY
# launch chat using that host
npx octoflow chat --host groq-llama
# inside chat: type "/" → pick "/model" → pick a model → switch in-placeFor Ollama, installed models are auto-discovered from /api/tags and shown in the palette without any extra registration.
Compaction-aware message queue
When the agent is mid-turn or the runtime starts compacting context (token budget exceeded), the input pane switches into a queueing mode:
task:compaction:startflips the prompt to "compacting context… (type ahead — messages are queued)" and prints a log line with the trigger, strategy, message count, and pre-compaction tokens.task:compaction:endprints a follow-up showing before → after token counts and how many messages were removed (plus+ summarywhen the strategy summarized).- Anything you type while busy or compacting is enqueued (FIFO). The footer shows
queued: N. When the agent returns to idle, queued messages are dequeued and sent automatically, one per turn — so you can type a follow-up the moment a thought hits, even mid-response. - Press Esc during a busy/compacting state to clear the pending queue without exiting.
Requires Node
>=22and a real TTY (Ink uses raw stdin for keystrokes).
Run on Ollama (octoflow ollama)
Ollama runs locally and needs no API key. The ollama subcommand sets the active profile to a local Ollama server in one step.
# 1) make sure Ollama is running
ollama serve
# 2) list installed models
npx octoflow ollama models
# 3) one-shot setup — picks the first installed model (or pass --model)
npx octoflow ollama setup
npx octoflow ollama setup --model qwen2.5:7b
# 4) or set explicitly
npx octoflow ollama use --model qwen2.5:7b --host http://localhost:11434
# 5) launch the chat REPL on Ollama
npx octoflow chatOLLAMA_HOST is honored as a fallback for --host. Default base URL is http://localhost:11434.
Settings (octoflow config)
Persistent settings live at ~/.octoflow/settings.json and cover: active profile (backend + model), prompt cache, context compaction, JSONL event-log path, custom hosts, and MCP servers.
# view everything (api keys are masked)
npx octoflow config list
# read / write a single value
npx octoflow config get active.backend
npx octoflow config set active.backend openai-api
npx octoflow config set active.model claude-sonnet-4-6
npx octoflow config set cache.ttl 1h
npx octoflow config set compaction.maxContext 64000
npx octoflow config set compaction.strategy summarize
npx octoflow config set eventLog.path ~/.octoflow/events.jsonl
# reset to defaults (keeps custom hosts unless --all is passed)
npx octoflow config resetAllowed keys: active.backend, active.model, active.host, active.apiKey, active.apiKeyEnv, cache.enabled, cache.ttl, compaction.maxContext, compaction.maxSectionTokens, compaction.strategy, compaction.keepRecent, eventLog.enabled, eventLog.path.
Custom hosts (octoflow host)
Register OpenAI-compatible hosts (Groq, OpenRouter, Together, Fireworks, …), Ollama instances, or any other backend with a custom URL or key. Prefer --api-key-env over --api-key so secrets stay out of the on-disk file.
# add a host
npx octoflow host add \
--name groq-llama \
--backend openai-api \
--host https://api.groq.com/openai/v1 \
--model llama-3.3-70b-versatile \
--api-key-env GROQ_API_KEY
# list / use / remove
npx octoflow host list
npx octoflow host use groq-llama
npx octoflow host remove groq-llamaCaching, compaction, and the event log
octoflow chat automatically wires:
- Prompt caching —
cache: { enabled, ttl }(5m or 1h). Anthropic, OpenAI, Gemini, and supported Vercel-AI providers report cache hits viabridge:cache:hit, which the metrics module aggregates intocached-tokens. - Context compaction —
maxContext,maxSectionTokens, andcontextCompaction.strategykeep the window healthy across long sessions;agent:compactionevents are surfaced in the activity stream. - Event log — every interesting bridge topic is appended as one JSONL line per event to
eventLog.pathfor postmortem analysis.
Package-manager equivalents:
pnpm exec octoflow --help
yarn octoflow --help
bunx octoflow --helpPackage Boundary
octoflow-cli owns the public octoflow binary and CLI identity. Command behavior is implemented by octoflow-core/cli, so the runtime package remains the single source for command registration, backend inspection, project scaffolding, sessions, tasks, and diagnostics.
The octoflow-core package also exposes an octoflow-core binary for direct runtime-package diagnostics, but user-facing docs should prefer octoflow from this package.
Learn More
../../docs/configuration.md- config files, profiles, and env vars used by the CLI.../../docs/backend-integration.md- backend setup foroctoflow doctorandoctoflow run.../octoflow-core- runtime the CLI wraps.
Validate
npm run -w octoflow-cli lint
npm run -w octoflow-cli typecheck
npm run -w octoflow-cli testStatus
Beta. The octoflow binary and core commands (init, doctor, run, --help, --version) are stable. Additional subcommands are preview surfaces. Pin versions before depending on it in production.
