octoflow-cli

v1.0.1

Published

2 months ago

Command-line entrypoint for OctoFlow projects and runtime diagnostics.

Downloads

0High
0Medium
0Low

bgauryy

agent ai cli octoflow

octoflow-cli

Command-line entrypoint for OctoFlow projects and runtime diagnostics. Install this package when you want the public octoflow binary; use octoflow-core when you only need the TypeScript runtime.

Install

npm install -D octoflow-cli
# or
pnpm add -D octoflow-cli
yarn add -D octoflow-cli
bun add -D octoflow-cli

Node.js >=20 is required.

Use It When

You want the octoflow binary for project setup, backend inspection, and diagnostics.
You want to run octoflow init to scaffold a project or octoflow doctor to check which backends are ready.
You need a quick one-shot octoflow run "..." without writing any TypeScript.
You want a coding-agent REPL (octoflow chat) that uses persisted backend, model, host, cache, compaction, and event-log settings.

Use octoflow-core when you need the programmatic TypeScript runtime inside your own app.

Run

From a project that has octoflow-cli installed:

npx octoflow --help
npx octoflow --version
npx octoflow init
npx octoflow doctor
npx octoflow run "Say hello from OctoFlow."
npx octoflow chat

From this monorepo

# launch the chat REPL straight from source (no build needed)
npm run -w octoflow-cli start

# any other subcommand via tsx
npm run -w octoflow-cli start:dev -- doctor
npm run -w octoflow-cli start:dev -- host list

# or build once and run the compiled binary
npm run -w octoflow-cli build
node packages/octoflow-cli/dist/main.js chat

Zero-config quick start (Ollama)

octoflow chat defaults to Ollama on http://localhost:11434 so a fresh install needs no API key — just a local Ollama daemon. The first run looks like:

# 1. install + start Ollama (one-time)
brew install ollama && ollama serve &      # macOS; or download from https://ollama.com/download

# 2. pull the default model — small (~2 GB), runs everywhere
ollama pull llama3.2:3b

# 3. chat
npx octoflow chat

If llama3.2:3b isn't pulled but you already have another Ollama model on disk, the chat auto-substitutes one for you on boot (a coder model first if available, otherwise a known small model like qwen2.5 / gemma2 / mistral, otherwise whatever the daemon listed first) and persists the choice to settings. You see a single line in the log explaining what was swapped and how to opt back into the original — no failing first send, no manual /model step. To use Anthropic / OpenAI / Gemini instead, set the matching env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY) and pick the provider's model from /model — the chat restarts in-place against the new backend.

Interactive coding agent (`octoflow chat`)

A Claude-Code-style REPL rendered with React (via Ink) — built around what developers actually want from an agentic CLI:

/ command palette — type / and a fuzzy-filtered overlay shows every built-in slash command, every loaded SKILL.md, every MCP tool (as server/tool), and every model the runtime supports — labelled backend / model (e.g. openai-api / gpt-4o, anthropic-api / claude-sonnet-4-6, plus everything pulled locally on ollama). Pick with ↑/↓ + Enter; Esc closes. Skills/MCP picks seed your input with a hint (Use the \octocode-engineer` skill for this task.) you can edit before sending; **model picks flip both the active backend and the active model in one step**, persist to ~/.octoflow/cli/settings.json`, and the chat restarts in-place against the new agent — no shell relaunch needed.
@ file picker — type @ (at the start of input or after a space) and a fuzzy-filtered overlay shows every file in the workspace, relative to cwd. The list comes from git ls-files when in a repo (so .gitignore is honored automatically) and falls back to a recursive walk that skips node_modules, .git, dist, build, coverage, .next, .turbo, and friends. Pick a file and @path/to/file.ts is inserted into your input — keep typing your message after it.
Provider · model headline — the banner now leads with provider · model[ @ host] so the active route is obvious at a glance, plus a breakdown of tools · mcp · skills · cache · log.
Dynamic system prompt — built fresh on every turn from a Claude-Code-grade base + the live workspace (cwd, git branch), the active model, and the actual tool / MCP / skill roster — so the agent can't hallucinate a tool that isn't loaded, and a /model switch is reflected on the very next message.
Streaming reply — agent text appears as it's generated, not after the turn ends.
Esc cancels the current turn — the in-flight task is killed via session.cancelTask; the process keeps running so you don't lose state.
Multi-line input — end a line with \ and press Enter to insert a newline (paste a stack trace, dictate a multi-paragraph spec, then send).
Up/Down history — recall previous prompts ref-backed, no log re-render.
Compaction-aware queue — type ahead during busy/compaction; messages are queued FIFO and replayed on idle.
Live activity — tool calls, cache hits, compaction start/end, MCP invocations are streamed into the log with their own colored badges.
Boot-time backend health check — on startup the chat probes every configured backend via agent.inspectBackends(). If your active backend isn't ready (missing API key, unreachable host, …) you see a loud red banner in the log telling you exactly why and which alternative is ready, with a hint to type /model and switch — no more cryptic "no backend available" on first send.
Live metrics footer — tasks, tool calls, cache hit rate, cached/written tokens, compactions, errors, elapsed, and a ctx≈Ntok indicator showing the running token cost of the next turn (system prompt + persisted history). Type /context for a breakdown or /clear to drop the whole thread back to the system-prompt baseline.

Performance is intentional: the log view, footer, banner, and input pane are all React.memo'd. The 80 ms spinner frame and the 250 ms metrics tick live inside their consumers, so they don't invalidate the 500-entry log on every paint.

Workspace tools (read/edit/write/list/search/shell), MCP, SKILL.md bundles, prompt caching, and context compaction are wired automatically from ~/.octoflow/settings.json.

# launch with the persisted active profile
npx octoflow chat

# use a registered custom host for this session
npx octoflow chat --host groq-llama

Inside the REPL:

| Command | Effect | | ---------- | --------------------------------------- | | / | Open the command palette (commands · skills · MCP tools · models) | | @ | Open the file picker — fuzzy-filter workspace files and insert @path | | /help | List slash commands | | /metrics | Print token + cache snapshot | | /backends| Show backend readiness | | /tools | List actions registered on the agent | | /skills | List SKILL.md bundles in scope | | /model | Open the model picker — lists every model from every supported backend (curated from inspectBackendCatalog()), plus locally pulled Ollama models. Picking switches backend + model in one step and restarts the chat in-place. | | /prompt | Print the dynamic system prompt and its size (chars + approx tokens) | | /context | Print the current context cost (system + history token estimate) | | /clear | Drop the entire conversation history and the visible log — starts a fresh thread (new contextId); the model sees only the system prompt on the next turn | | /exit | Quit cleanly | | Up / Down | Recall previous prompt (or move palette selection) | | Enter | Send (or pick palette item, or queue if agent is busy) | | \ + Enter | Continue input on a new line | | Esc (idle) | Clear current input (or close palette) | | Esc (busy) | Cancel current turn — or drop the pending queue first | | Ctrl+C | Quit immediately |

Adding a custom model

Custom models are added once with octoflow host add (so secrets stay out of the chat transcript / event log) and then picked from the /model palette inside the chat. Example flow:

# one-time: register the host
npx octoflow host add \
  --name groq-llama \
  --backend openai-api \
  --host https://api.groq.com/openai/v1 \
  --model llama-3.3-70b-versatile \
  --api-key-env GROQ_API_KEY

# launch chat using that host
npx octoflow chat --host groq-llama

# inside chat: type "/" → pick "/model" → pick a model → switch in-place

For Ollama, installed models are auto-discovered from /api/tags and shown in the palette without any extra registration.

Compaction-aware message queue

When the agent is mid-turn or the runtime starts compacting context (token budget exceeded), the input pane switches into a queueing mode:

task:compaction:start flips the prompt to "compacting context… (type ahead — messages are queued)" and prints a log line with the trigger, strategy, message count, and pre-compaction tokens.
task:compaction:end prints a follow-up showing before → after token counts and how many messages were removed (plus + summary when the strategy summarized).
Anything you type while busy or compacting is enqueued (FIFO). The footer shows queued: N. When the agent returns to idle, queued messages are dequeued and sent automatically, one per turn — so you can type a follow-up the moment a thought hits, even mid-response.
Press Esc during a busy/compacting state to clear the pending queue without exiting.

Requires Node >=22 and a real TTY (Ink uses raw stdin for keystrokes).

Run on Ollama (`octoflow ollama`)

Ollama runs locally and needs no API key. The ollama subcommand sets the active profile to a local Ollama server in one step.

# 1) make sure Ollama is running
ollama serve

# 2) list installed models
npx octoflow ollama models

# 3) one-shot setup — picks the first installed model (or pass --model)
npx octoflow ollama setup
npx octoflow ollama setup --model qwen2.5:7b

# 4) or set explicitly
npx octoflow ollama use --model qwen2.5:7b --host http://localhost:11434

# 5) launch the chat REPL on Ollama
npx octoflow chat

OLLAMA_HOST is honored as a fallback for --host. Default base URL is http://localhost:11434.

Settings (`octoflow config`)

Persistent settings live at ~/.octoflow/settings.json and cover: active profile (backend + model), prompt cache, context compaction, JSONL event-log path, custom hosts, and MCP servers.

# view everything (api keys are masked)
npx octoflow config list

# read / write a single value
npx octoflow config get active.backend
npx octoflow config set active.backend openai-api
npx octoflow config set active.model claude-sonnet-4-6
npx octoflow config set cache.ttl 1h
npx octoflow config set compaction.maxContext 64000
npx octoflow config set compaction.strategy summarize
npx octoflow config set eventLog.path ~/.octoflow/events.jsonl

# reset to defaults (keeps custom hosts unless --all is passed)
npx octoflow config reset

Allowed keys: active.backend, active.model, active.host, active.apiKey, active.apiKeyEnv, cache.enabled, cache.ttl, compaction.maxContext, compaction.maxSectionTokens, compaction.strategy, compaction.keepRecent, eventLog.enabled, eventLog.path.

Custom hosts (`octoflow host`)

Register OpenAI-compatible hosts (Groq, OpenRouter, Together, Fireworks, …), Ollama instances, or any other backend with a custom URL or key. Prefer --api-key-env over --api-key so secrets stay out of the on-disk file.

# add a host
npx octoflow host add \
  --name groq-llama \
  --backend openai-api \
  --host https://api.groq.com/openai/v1 \
  --model llama-3.3-70b-versatile \
  --api-key-env GROQ_API_KEY

# list / use / remove
npx octoflow host list
npx octoflow host use groq-llama
npx octoflow host remove groq-llama

Caching, compaction, and the event log

octoflow chat automatically wires:

Prompt caching — cache: { enabled, ttl } (5m or 1h). Anthropic, OpenAI, Gemini, and supported Vercel-AI providers report cache hits via bridge:cache:hit, which the metrics module aggregates into cached-tokens.
Context compaction — maxContext, maxSectionTokens, and contextCompaction.strategy keep the window healthy across long sessions; agent:compaction events are surfaced in the activity stream.
Event log — every interesting bridge topic is appended as one JSONL line per event to eventLog.path for postmortem analysis.

Package-manager equivalents:

pnpm exec octoflow --help
yarn octoflow --help
bunx octoflow --help

Package Boundary

octoflow-cli owns the public octoflow binary and CLI identity. Command behavior is implemented by octoflow-core/cli, so the runtime package remains the single source for command registration, backend inspection, project scaffolding, sessions, tasks, and diagnostics.

The octoflow-core package also exposes an octoflow-core binary for direct runtime-package diagnostics, but user-facing docs should prefer octoflow from this package.

Learn More

../../docs/configuration.md - config files, profiles, and env vars used by the CLI.
../../docs/backend-integration.md - backend setup for octoflow doctor and octoflow run.
../octoflow-core - runtime the CLI wraps.

Validate

npm run -w octoflow-cli lint
npm run -w octoflow-cli typecheck
npm run -w octoflow-cli test

Status

Beta. The octoflow binary and core commands (init, doctor, run, --help, --version) are stable. Additional subcommands are preview surfaces. Pin versions before depending on it in production.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

octoflow-cli

Install

Use It When

Run

From this monorepo

Zero-config quick start (Ollama)

Interactive coding agent (octoflow chat)

Adding a custom model

Compaction-aware message queue

Run on Ollama (octoflow ollama)

Settings (octoflow config)

Custom hosts (octoflow host)

Caching, compaction, and the event log

Package Boundary

Learn More

Validate

Status

Interactive coding agent (`octoflow chat`)

Run on Ollama (`octoflow ollama`)

Settings (`octoflow config`)

Custom hosts (`octoflow host`)