@biks2013/image-tool

v0.1.0

Published

a month ago

TypeScript CLI for generating, editing, examining, and varying images via Azure OpenAI and Google Gemini.

Downloads

0High
0Medium
0Low

giorgos-marinos

image cli azure-openai gemini image-generation image-editing langgraph langchain agent tui typescript

image-tool

A TypeScript CLI for generating, editing, examining, and varying images via Azure OpenAI and Google Gemini.

Install

npm install

Build

npm run build

Dev

npm run dev -- generate --provider gemini --prompt "a cat" --out cat.png

Quick start

cp .env.example .env
# edit .env and fill in credentials for the provider(s) you want to use
npm install
npm run build
./dist/bin/image-tool.js generate --provider gemini --prompt "a cat" --out cat.png

For the full functional specification (subcommands, flags, error model, exit codes, configuration contract, capability matrix), see docs/design/refined-request-image-tool-cli.md.

Configuration

All configuration is supplied through environment variables (loaded from .env via dotenv). No defaults are applied except for IMAGE_TOOL_MAX_CONCURRENCY. Missing required values raise MissingConfigurationError (exit code 2).

| Variable | Description | |---|---| | IMAGE_TOOL_DEFAULT_PROVIDER | Default provider (openai-azure | gemini) when neither --provider nor a REPL session override is supplied. | | IMAGE_TOOL_DEFAULT_MODEL | Default model identifier resolved when --model is not supplied. | | IMAGE_TOOL_MAX_CONCURRENCY | Cap on parallel provider calls inside one command. Optional positive integer. Default: 4 (the only sanctioned default). | | IMAGE_TOOL_OUTPUT_DIR | Optional output directory for generated artifacts. Defaults to the current working directory. | | IMAGE_TOOL_LOG_LEVEL | Optional log verbosity override (silent | error | warn | info | debug | trace). | | AZURE_OPENAI_API_KEY | Azure OpenAI API key. Required when provider = openai-azure. | | AZURE_OPENAI_ENDPOINT | Azure OpenAI resource endpoint URL. Required when provider = openai-azure. | | AZURE_OPENAI_DEPLOYMENT_NAME | Azure deployment name for the image-generation model. Required when provider = openai-azure for generate / edit / variations. | | AZURE_OPENAI_API_VERSION | Azure OpenAI REST API version supporting the deployed image model. Required when provider = openai-azure. | | AZURE_OPENAI_VISION_DEPLOYMENT_NAME | Azure deployment name for a vision-capable chat model. Required when provider = openai-azure and operation = examine. | | GOOGLE_API_KEY | Google AI Studio API key (preferred). Required when provider = gemini if GEMINI_API_KEY is not set. | | GEMINI_API_KEY | Google AI Studio API key (fallback alias). Required when provider = gemini if GOOGLE_API_KEY is not set. | | GEMINI_IMAGE_MODEL | Gemini model identifier for image operations (e.g. gemini-3.1-flash-image-preview). Required when provider = gemini. |

See .env.example for a complete annotated template.

Subcommands

The CLI exposes the following subcommands (implementation arrives in subsequent units):

generate — produce a new image from a text prompt.
edit — modify an existing image using a prompt and optional mask.
examine — describe / answer questions about an existing image.
variations — produce N variations of an existing image.
config show — print the resolved configuration (with secrets redacted).
config doctor — diagnose configuration problems and suggest fixes.
repl — open an interactive session that remembers provider / model selections.

Run image-tool <subcommand> --help for per-command flags once the binary is built.

Agent mode

image-tool agent adds a LangGraph ReAct agent that drives the four image operations through an LLM. The LLM decides which tools to call, chains them (generate → examine → edit), and reports results in plain prose.

Quick start

# Set the LLM provider and provider-specific credentials
export IMAGE_TOOL_AGENT_PROVIDER=azure-openai
export AZURE_OPENAI_AGENT_DEPLOYMENT=gpt-4.1   # your chat deployment

# One-shot
image-tool agent "generate a watercolor sunset and describe what you see"

# Interactive REPL
image-tool agent --interactive

# Inspect resolved configuration (no LLM calls)
image-tool agent --doctor

Worked examples

1. Azure OpenAI one-shot

export IMAGE_TOOL_AGENT_PROVIDER=azure-openai
export AZURE_OPENAI_AGENT_DEPLOYMENT=gpt-4.1
image-tool agent "generate a futuristic city at dusk, then tell me the dominant colours"

2. Azure OpenAI interactive REPL

image-tool agent --llm-provider azure-openai --interactive
# > generate a kitten playing with yarn
# > now make it look more like a watercolour painting
# > /exit

3. JSON mode (CI-friendly)

image-tool agent --json --no-stream "generate a minimalist logo for a coffee shop"

4. Local model with OLLaMA

export OLLAMA_HOST=http://localhost:11434
export LOCAL_LLM_MODEL=llama3.1
image-tool agent --llm-provider local-openai-compatible "what image formats can you generate?"

Provider setup

| Provider id | Required env vars | |---|---| | azure-openai | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_AGENT_DEPLOYMENT | | openai | OPENAI_API_KEY, OPENAI_AGENT_MODEL | | anthropic | ANTHROPIC_API_KEY, ANTHROPIC_AGENT_MODEL | | gemini | GOOGLE_API_KEY (or GEMINI_API_KEY), GEMINI_AGENT_MODEL | | azure-anthropic | AZURE_ANTHROPIC_API_KEY, AZURE_ANTHROPIC_ENDPOINT, AZURE_ANTHROPIC_DEPLOYMENT, AZURE_ANTHROPIC_API_VERSION | | local-openai-compatible | LOCAL_LLM_BASE_URL (or OLLAMA_HOST), LOCAL_LLM_MODEL |

See docs/reference/.env.example for a fully annotated template.

Configuration precedence

Policy A (shell-wins):

CLI flag  >  shell env var  >  ~/.tool-agents/image-tool/config  >  project .env  >  error

Safety note

All four image tools are auto-approved — the agent invokes them without per-call confirmation. Images land in .image-tool-agent/<timestamp>/ under the current working directory unless you specify outputDir in your prompt.

TUI (interactive mode)

Running image-tool agent with no positional prompt opens a raw-mode terminal UI on top of the same LangGraph agent. The TUI gives you multi-line editing, token-by-token streaming, ESC-to-abort, persistent JSONL session transcripts, and a small set of slash commands.

# Launch
image-tool agent

Slash commands — /help, /history [N], /memory, /new, /last, /copy, /model <id>, /provider <id>, /quit (alias /exit).

Keybindings

| Key | Action | |---|---| | Enter | Submit | | Ctrl-J or Alt-Enter | Insert newline (universal Shift+Enter fallback) | | Up / Down | Browse user-input history (or move row in multi-line input) | | Left / Right | Cursor by character | | Home / End / Ctrl-A / Ctrl-E | Line start / end | | Alt-←/→ or Ctrl-←/→ | Word motion | | Ctrl-W / Ctrl-U / Ctrl-K | Delete word back / to start / to end | | Backspace / Delete | Delete left / delete at cursor | | ESC or Ctrl-C while streaming | Abort the in-flight agent response | | Ctrl-D on empty buffer | Quit cleanly |

About Shift+Enter — most terminals (Apple Terminal, default iTerm2, WezTerm) send plain \r for both Enter and Shift+Enter, so they are indistinguishable. The TUI accepts the modern keyboard-protocol variants emitted by Kitty, Ghostty, Alacritty, Windows Terminal, and xterm with modifyOtherKeys=2. The universal portable fallback is Ctrl-J (literal LF byte 0x0A) which every terminal emits unambiguously.

Persistence — each session writes a JSONL transcript under ~/.tool-agents/image-tool/history/${ISO8601}-${shortid}.jsonl (directory mode 0700, files mode 0600). Override the directory with the optional env var IMAGE_TOOL_TUI_HISTORY_DIR.

Example transcript

image-tool agent (LangGraph)
LLM: openai / gpt-4.1
Session: kx7m2a → /Users/me/.tool-agents/image-tool/history/2026-04-24T20-15-03-kx7m2a.jsonl
Commands: /help /history /memory /new /last /copy /model /provider /quit
Keys: Enter=submit · Ctrl-J or Alt-Enter=newline · Up/Down=history · ESC=abort streaming · Ctrl-D=quit

❯ generate a watercolor mountain at dusk

Agent
  ▸ generate_image  {"prompt":"a watercolor mountain at dusk"} › ✓
  · /Users/me/work/.image-tool-agent/20260424-201510/img-001.png
I have generated the watercolor mountain at dusk for you.
[openai · gpt-4.1 · 1 turns · /help]

❯ /quit

Filesystem operations

image-tool fs <subcommand> exposes 10 first-class filesystem helpers that are also available as LangChain tools to the agent. The CLI side is NOT sandboxed (paths resolve relative to the current directory and the human user is trusted); the agent side enforces a sandbox rooted at IMAGE_TOOL_FS_ROOT (or process.cwd() when unset) and gates the destructive ops behind an interactive confirmation prompt.

image-tool fs ls /tmp [-a] [--json]
image-tool fs stat /etc/hosts [--json]
image-tool fs read /etc/hosts [--max-bytes 4096] [--offset 0] [--base64-only] [--json]
image-tool fs write /tmp/x.txt --content "hello" [--overwrite] [--no-mkdir-p]
image-tool fs append /tmp/x.txt --from-stdin
image-tool fs mkdir -p /tmp/a/b/c
image-tool fs rm -rf /tmp/scratch
image-tool fs mv /tmp/old.txt /tmp/new.txt --overwrite
image-tool fs cp -r /tmp/src /tmp/dst
image-tool fs find "**/*.json" --root . --max-results 100

When --json is passed the command emits a single JSON object on stdout that mirrors the typed result in src/fs/types.ts — useful for scripting.

The agent catalog grows from 4 image tools to 14 once the fs tools are included (10 fs + 4 image). Destructive agent ops (fs_rm, fs_mv, fs_cp) require interactive confirmation; in one-shot mode they return a typed FsConfirmationRequiredError to the LLM so the agent sees a clear "the user needs to switch to interactive mode" signal instead of silently failing.

Optional environment

IMAGE_TOOL_FS_ROOT     Absolute sandbox root for the agent fs tools.
                       Defaults to `process.cwd()` at agent launch.
                       Has no effect on the CLI subcommands.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

image-tool

Install

Build

Dev

Quick start

Configuration

Subcommands

Agent mode

Quick start

Worked examples

Provider setup

Configuration precedence

Safety note

TUI (interactive mode)

Filesystem operations

Optional environment