npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@biks2013/image-tool

v0.1.0

Published

TypeScript CLI for generating, editing, examining, and varying images via Azure OpenAI and Google Gemini.

Downloads

28

Readme

image-tool

A TypeScript CLI for generating, editing, examining, and varying images via Azure OpenAI and Google Gemini.

Install

npm install

Build

npm run build

Dev

npm run dev -- generate --provider gemini --prompt "a cat" --out cat.png

Quick start

cp .env.example .env
# edit .env and fill in credentials for the provider(s) you want to use
npm install
npm run build
./dist/bin/image-tool.js generate --provider gemini --prompt "a cat" --out cat.png

For the full functional specification (subcommands, flags, error model, exit codes, configuration contract, capability matrix), see docs/design/refined-request-image-tool-cli.md.

Configuration

All configuration is supplied through environment variables (loaded from .env via dotenv). No defaults are applied except for IMAGE_TOOL_MAX_CONCURRENCY. Missing required values raise MissingConfigurationError (exit code 2).

| Variable | Description | |---|---| | IMAGE_TOOL_DEFAULT_PROVIDER | Default provider (openai-azure | gemini) when neither --provider nor a REPL session override is supplied. | | IMAGE_TOOL_DEFAULT_MODEL | Default model identifier resolved when --model is not supplied. | | IMAGE_TOOL_MAX_CONCURRENCY | Cap on parallel provider calls inside one command. Optional positive integer. Default: 4 (the only sanctioned default). | | IMAGE_TOOL_OUTPUT_DIR | Optional output directory for generated artifacts. Defaults to the current working directory. | | IMAGE_TOOL_LOG_LEVEL | Optional log verbosity override (silent | error | warn | info | debug | trace). | | AZURE_OPENAI_API_KEY | Azure OpenAI API key. Required when provider = openai-azure. | | AZURE_OPENAI_ENDPOINT | Azure OpenAI resource endpoint URL. Required when provider = openai-azure. | | AZURE_OPENAI_DEPLOYMENT_NAME | Azure deployment name for the image-generation model. Required when provider = openai-azure for generate / edit / variations. | | AZURE_OPENAI_API_VERSION | Azure OpenAI REST API version supporting the deployed image model. Required when provider = openai-azure. | | AZURE_OPENAI_VISION_DEPLOYMENT_NAME | Azure deployment name for a vision-capable chat model. Required when provider = openai-azure and operation = examine. | | GOOGLE_API_KEY | Google AI Studio API key (preferred). Required when provider = gemini if GEMINI_API_KEY is not set. | | GEMINI_API_KEY | Google AI Studio API key (fallback alias). Required when provider = gemini if GOOGLE_API_KEY is not set. | | GEMINI_IMAGE_MODEL | Gemini model identifier for image operations (e.g. gemini-3.1-flash-image-preview). Required when provider = gemini. |

See .env.example for a complete annotated template.

Subcommands

The CLI exposes the following subcommands (implementation arrives in subsequent units):

  • generate — produce a new image from a text prompt.
  • edit — modify an existing image using a prompt and optional mask.
  • examine — describe / answer questions about an existing image.
  • variations — produce N variations of an existing image.
  • config show — print the resolved configuration (with secrets redacted).
  • config doctor — diagnose configuration problems and suggest fixes.
  • repl — open an interactive session that remembers provider / model selections.

Run image-tool <subcommand> --help for per-command flags once the binary is built.

Agent mode

image-tool agent adds a LangGraph ReAct agent that drives the four image operations through an LLM. The LLM decides which tools to call, chains them (generate → examine → edit), and reports results in plain prose.

Quick start

# Set the LLM provider and provider-specific credentials
export IMAGE_TOOL_AGENT_PROVIDER=azure-openai
export AZURE_OPENAI_AGENT_DEPLOYMENT=gpt-4.1   # your chat deployment

# One-shot
image-tool agent "generate a watercolor sunset and describe what you see"

# Interactive REPL
image-tool agent --interactive

# Inspect resolved configuration (no LLM calls)
image-tool agent --doctor

Worked examples

1. Azure OpenAI one-shot

export IMAGE_TOOL_AGENT_PROVIDER=azure-openai
export AZURE_OPENAI_AGENT_DEPLOYMENT=gpt-4.1
image-tool agent "generate a futuristic city at dusk, then tell me the dominant colours"

2. Azure OpenAI interactive REPL

image-tool agent --llm-provider azure-openai --interactive
# > generate a kitten playing with yarn
# > now make it look more like a watercolour painting
# > /exit

3. JSON mode (CI-friendly)

image-tool agent --json --no-stream "generate a minimalist logo for a coffee shop"

4. Local model with OLLaMA

export OLLAMA_HOST=http://localhost:11434
export LOCAL_LLM_MODEL=llama3.1
image-tool agent --llm-provider local-openai-compatible "what image formats can you generate?"

Provider setup

| Provider id | Required env vars | |---|---| | azure-openai | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_AGENT_DEPLOYMENT | | openai | OPENAI_API_KEY, OPENAI_AGENT_MODEL | | anthropic | ANTHROPIC_API_KEY, ANTHROPIC_AGENT_MODEL | | gemini | GOOGLE_API_KEY (or GEMINI_API_KEY), GEMINI_AGENT_MODEL | | azure-anthropic | AZURE_ANTHROPIC_API_KEY, AZURE_ANTHROPIC_ENDPOINT, AZURE_ANTHROPIC_DEPLOYMENT, AZURE_ANTHROPIC_API_VERSION | | local-openai-compatible | LOCAL_LLM_BASE_URL (or OLLAMA_HOST), LOCAL_LLM_MODEL |

See docs/reference/.env.example for a fully annotated template.

Configuration precedence

Policy A (shell-wins):

CLI flag  >  shell env var  >  ~/.tool-agents/image-tool/config  >  project .env  >  error

Safety note

All four image tools are auto-approved — the agent invokes them without per-call confirmation. Images land in .image-tool-agent/<timestamp>/ under the current working directory unless you specify outputDir in your prompt.

TUI (interactive mode)

Running image-tool agent with no positional prompt opens a raw-mode terminal UI on top of the same LangGraph agent. The TUI gives you multi-line editing, token-by-token streaming, ESC-to-abort, persistent JSONL session transcripts, and a small set of slash commands.

# Launch
image-tool agent

Slash commands/help, /history [N], /memory, /new, /last, /copy, /model <id>, /provider <id>, /quit (alias /exit).

Keybindings

| Key | Action | |---|---| | Enter | Submit | | Ctrl-J or Alt-Enter | Insert newline (universal Shift+Enter fallback) | | Up / Down | Browse user-input history (or move row in multi-line input) | | Left / Right | Cursor by character | | Home / End / Ctrl-A / Ctrl-E | Line start / end | | Alt-←/→ or Ctrl-←/→ | Word motion | | Ctrl-W / Ctrl-U / Ctrl-K | Delete word back / to start / to end | | Backspace / Delete | Delete left / delete at cursor | | ESC or Ctrl-C while streaming | Abort the in-flight agent response | | Ctrl-D on empty buffer | Quit cleanly |

About Shift+Enter — most terminals (Apple Terminal, default iTerm2, WezTerm) send plain \r for both Enter and Shift+Enter, so they are indistinguishable. The TUI accepts the modern keyboard-protocol variants emitted by Kitty, Ghostty, Alacritty, Windows Terminal, and xterm with modifyOtherKeys=2. The universal portable fallback is Ctrl-J (literal LF byte 0x0A) which every terminal emits unambiguously.

Persistence — each session writes a JSONL transcript under ~/.tool-agents/image-tool/history/${ISO8601}-${shortid}.jsonl (directory mode 0700, files mode 0600). Override the directory with the optional env var IMAGE_TOOL_TUI_HISTORY_DIR.

Example transcript

image-tool agent (LangGraph)
LLM: openai / gpt-4.1
Session: kx7m2a → /Users/me/.tool-agents/image-tool/history/2026-04-24T20-15-03-kx7m2a.jsonl
Commands: /help /history /memory /new /last /copy /model /provider /quit
Keys: Enter=submit · Ctrl-J or Alt-Enter=newline · Up/Down=history · ESC=abort streaming · Ctrl-D=quit

❯ generate a watercolor mountain at dusk

Agent
  ▸ generate_image  {"prompt":"a watercolor mountain at dusk"} › ✓
  · /Users/me/work/.image-tool-agent/20260424-201510/img-001.png
I have generated the watercolor mountain at dusk for you.
[openai · gpt-4.1 · 1 turns · /help]

❯ /quit

Filesystem operations

image-tool fs <subcommand> exposes 10 first-class filesystem helpers that are also available as LangChain tools to the agent. The CLI side is NOT sandboxed (paths resolve relative to the current directory and the human user is trusted); the agent side enforces a sandbox rooted at IMAGE_TOOL_FS_ROOT (or process.cwd() when unset) and gates the destructive ops behind an interactive confirmation prompt.

image-tool fs ls /tmp [-a] [--json]
image-tool fs stat /etc/hosts [--json]
image-tool fs read /etc/hosts [--max-bytes 4096] [--offset 0] [--base64-only] [--json]
image-tool fs write /tmp/x.txt --content "hello" [--overwrite] [--no-mkdir-p]
image-tool fs append /tmp/x.txt --from-stdin
image-tool fs mkdir -p /tmp/a/b/c
image-tool fs rm -rf /tmp/scratch
image-tool fs mv /tmp/old.txt /tmp/new.txt --overwrite
image-tool fs cp -r /tmp/src /tmp/dst
image-tool fs find "**/*.json" --root . --max-results 100

When --json is passed the command emits a single JSON object on stdout that mirrors the typed result in src/fs/types.ts — useful for scripting.

The agent catalog grows from 4 image tools to 14 once the fs tools are included (10 fs + 4 image). Destructive agent ops (fs_rm, fs_mv, fs_cp) require interactive confirmation; in one-shot mode they return a typed FsConfirmationRequiredError to the LLM so the agent sees a clear "the user needs to switch to interactive mode" signal instead of silently failing.

Optional environment

IMAGE_TOOL_FS_ROOT     Absolute sandbox root for the agent fs tools.
                       Defaults to `process.cwd()` at agent launch.
                       Has no effect on the CLI subcommands.