@biks2013/image-tool
v0.1.0
Published
TypeScript CLI for generating, editing, examining, and varying images via Azure OpenAI and Google Gemini.
Downloads
28
Maintainers
Readme
image-tool
A TypeScript CLI for generating, editing, examining, and varying images via Azure OpenAI and Google Gemini.
Install
npm installBuild
npm run buildDev
npm run dev -- generate --provider gemini --prompt "a cat" --out cat.pngQuick start
cp .env.example .env
# edit .env and fill in credentials for the provider(s) you want to use
npm install
npm run build
./dist/bin/image-tool.js generate --provider gemini --prompt "a cat" --out cat.pngFor the full functional specification (subcommands, flags, error model, exit codes, configuration contract, capability matrix), see docs/design/refined-request-image-tool-cli.md.
Configuration
All configuration is supplied through environment variables (loaded from .env via dotenv). No defaults are applied except for IMAGE_TOOL_MAX_CONCURRENCY. Missing required values raise MissingConfigurationError (exit code 2).
| Variable | Description |
|---|---|
| IMAGE_TOOL_DEFAULT_PROVIDER | Default provider (openai-azure | gemini) when neither --provider nor a REPL session override is supplied. |
| IMAGE_TOOL_DEFAULT_MODEL | Default model identifier resolved when --model is not supplied. |
| IMAGE_TOOL_MAX_CONCURRENCY | Cap on parallel provider calls inside one command. Optional positive integer. Default: 4 (the only sanctioned default). |
| IMAGE_TOOL_OUTPUT_DIR | Optional output directory for generated artifacts. Defaults to the current working directory. |
| IMAGE_TOOL_LOG_LEVEL | Optional log verbosity override (silent | error | warn | info | debug | trace). |
| AZURE_OPENAI_API_KEY | Azure OpenAI API key. Required when provider = openai-azure. |
| AZURE_OPENAI_ENDPOINT | Azure OpenAI resource endpoint URL. Required when provider = openai-azure. |
| AZURE_OPENAI_DEPLOYMENT_NAME | Azure deployment name for the image-generation model. Required when provider = openai-azure for generate / edit / variations. |
| AZURE_OPENAI_API_VERSION | Azure OpenAI REST API version supporting the deployed image model. Required when provider = openai-azure. |
| AZURE_OPENAI_VISION_DEPLOYMENT_NAME | Azure deployment name for a vision-capable chat model. Required when provider = openai-azure and operation = examine. |
| GOOGLE_API_KEY | Google AI Studio API key (preferred). Required when provider = gemini if GEMINI_API_KEY is not set. |
| GEMINI_API_KEY | Google AI Studio API key (fallback alias). Required when provider = gemini if GOOGLE_API_KEY is not set. |
| GEMINI_IMAGE_MODEL | Gemini model identifier for image operations (e.g. gemini-3.1-flash-image-preview). Required when provider = gemini. |
See .env.example for a complete annotated template.
Subcommands
The CLI exposes the following subcommands (implementation arrives in subsequent units):
generate— produce a new image from a text prompt.edit— modify an existing image using a prompt and optional mask.examine— describe / answer questions about an existing image.variations— produce N variations of an existing image.config show— print the resolved configuration (with secrets redacted).config doctor— diagnose configuration problems and suggest fixes.repl— open an interactive session that remembers provider / model selections.
Run image-tool <subcommand> --help for per-command flags once the binary is built.
Agent mode
image-tool agent adds a LangGraph ReAct agent that drives the four image operations
through an LLM. The LLM decides which tools to call, chains them (generate → examine → edit),
and reports results in plain prose.
Quick start
# Set the LLM provider and provider-specific credentials
export IMAGE_TOOL_AGENT_PROVIDER=azure-openai
export AZURE_OPENAI_AGENT_DEPLOYMENT=gpt-4.1 # your chat deployment
# One-shot
image-tool agent "generate a watercolor sunset and describe what you see"
# Interactive REPL
image-tool agent --interactive
# Inspect resolved configuration (no LLM calls)
image-tool agent --doctorWorked examples
1. Azure OpenAI one-shot
export IMAGE_TOOL_AGENT_PROVIDER=azure-openai
export AZURE_OPENAI_AGENT_DEPLOYMENT=gpt-4.1
image-tool agent "generate a futuristic city at dusk, then tell me the dominant colours"2. Azure OpenAI interactive REPL
image-tool agent --llm-provider azure-openai --interactive
# > generate a kitten playing with yarn
# > now make it look more like a watercolour painting
# > /exit3. JSON mode (CI-friendly)
image-tool agent --json --no-stream "generate a minimalist logo for a coffee shop"4. Local model with OLLaMA
export OLLAMA_HOST=http://localhost:11434
export LOCAL_LLM_MODEL=llama3.1
image-tool agent --llm-provider local-openai-compatible "what image formats can you generate?"Provider setup
| Provider id | Required env vars |
|---|---|
| azure-openai | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_AGENT_DEPLOYMENT |
| openai | OPENAI_API_KEY, OPENAI_AGENT_MODEL |
| anthropic | ANTHROPIC_API_KEY, ANTHROPIC_AGENT_MODEL |
| gemini | GOOGLE_API_KEY (or GEMINI_API_KEY), GEMINI_AGENT_MODEL |
| azure-anthropic | AZURE_ANTHROPIC_API_KEY, AZURE_ANTHROPIC_ENDPOINT, AZURE_ANTHROPIC_DEPLOYMENT, AZURE_ANTHROPIC_API_VERSION |
| local-openai-compatible | LOCAL_LLM_BASE_URL (or OLLAMA_HOST), LOCAL_LLM_MODEL |
See docs/reference/.env.example for a fully annotated template.
Configuration precedence
Policy A (shell-wins):
CLI flag > shell env var > ~/.tool-agents/image-tool/config > project .env > errorSafety note
All four image tools are auto-approved — the agent invokes them without per-call
confirmation. Images land in .image-tool-agent/<timestamp>/ under the current working
directory unless you specify outputDir in your prompt.
TUI (interactive mode)
Running image-tool agent with no positional prompt opens a raw-mode terminal UI on top of
the same LangGraph agent. The TUI gives you multi-line editing, token-by-token streaming,
ESC-to-abort, persistent JSONL session transcripts, and a small set of slash commands.
# Launch
image-tool agentSlash commands — /help, /history [N], /memory, /new, /last, /copy,
/model <id>, /provider <id>, /quit (alias /exit).
Keybindings
| Key | Action | |---|---| | Enter | Submit | | Ctrl-J or Alt-Enter | Insert newline (universal Shift+Enter fallback) | | Up / Down | Browse user-input history (or move row in multi-line input) | | Left / Right | Cursor by character | | Home / End / Ctrl-A / Ctrl-E | Line start / end | | Alt-←/→ or Ctrl-←/→ | Word motion | | Ctrl-W / Ctrl-U / Ctrl-K | Delete word back / to start / to end | | Backspace / Delete | Delete left / delete at cursor | | ESC or Ctrl-C while streaming | Abort the in-flight agent response | | Ctrl-D on empty buffer | Quit cleanly |
About Shift+Enter — most terminals (Apple Terminal, default iTerm2, WezTerm) send
plain \r for both Enter and Shift+Enter, so they are indistinguishable. The TUI accepts
the modern keyboard-protocol variants emitted by Kitty, Ghostty, Alacritty, Windows
Terminal, and xterm with modifyOtherKeys=2. The universal portable fallback is
Ctrl-J (literal LF byte 0x0A) which every terminal emits unambiguously.
Persistence — each session writes a JSONL transcript under
~/.tool-agents/image-tool/history/${ISO8601}-${shortid}.jsonl (directory mode 0700,
files mode 0600). Override the directory with the optional env var
IMAGE_TOOL_TUI_HISTORY_DIR.
Example transcript
image-tool agent (LangGraph)
LLM: openai / gpt-4.1
Session: kx7m2a → /Users/me/.tool-agents/image-tool/history/2026-04-24T20-15-03-kx7m2a.jsonl
Commands: /help /history /memory /new /last /copy /model /provider /quit
Keys: Enter=submit · Ctrl-J or Alt-Enter=newline · Up/Down=history · ESC=abort streaming · Ctrl-D=quit
❯ generate a watercolor mountain at dusk
Agent
▸ generate_image {"prompt":"a watercolor mountain at dusk"} › ✓
· /Users/me/work/.image-tool-agent/20260424-201510/img-001.png
I have generated the watercolor mountain at dusk for you.
[openai · gpt-4.1 · 1 turns · /help]
❯ /quitFilesystem operations
image-tool fs <subcommand> exposes 10 first-class filesystem helpers
that are also available as LangChain tools to the agent. The CLI side is
NOT sandboxed (paths resolve relative to the current directory and the
human user is trusted); the agent side enforces a sandbox rooted at
IMAGE_TOOL_FS_ROOT (or process.cwd() when unset) and gates the
destructive ops behind an interactive confirmation prompt.
image-tool fs ls /tmp [-a] [--json]
image-tool fs stat /etc/hosts [--json]
image-tool fs read /etc/hosts [--max-bytes 4096] [--offset 0] [--base64-only] [--json]
image-tool fs write /tmp/x.txt --content "hello" [--overwrite] [--no-mkdir-p]
image-tool fs append /tmp/x.txt --from-stdin
image-tool fs mkdir -p /tmp/a/b/c
image-tool fs rm -rf /tmp/scratch
image-tool fs mv /tmp/old.txt /tmp/new.txt --overwrite
image-tool fs cp -r /tmp/src /tmp/dst
image-tool fs find "**/*.json" --root . --max-results 100When --json is passed the command emits a single JSON object on stdout
that mirrors the typed result in src/fs/types.ts — useful for scripting.
The agent catalog grows from 4 image tools to 14 once the fs tools are
included (10 fs + 4 image). Destructive agent ops (fs_rm, fs_mv,
fs_cp) require interactive confirmation; in one-shot mode they
return a typed FsConfirmationRequiredError to the LLM so the agent
sees a clear "the user needs to switch to interactive mode" signal
instead of silently failing.
Optional environment
IMAGE_TOOL_FS_ROOT Absolute sandbox root for the agent fs tools.
Defaults to `process.cwd()` at agent launch.
Has no effect on the CLI subcommands.