imgx-bridge
v0.1.0
Published
A CLI image bridge for text-only LLMs.
Downloads
166
Maintainers
Readme
imgx
A CLI image bridge for text-only LLMs.
imgx is a CLI tool that connects to any OpenAI-compatible vision model, analyzes local images, persists results in SQLite, and exposes image understanding capabilities to text-only LLM workflows.
imgx 是一个基于 OpenAI-compatible 多模态模型的图像 CLI 工具,用于为无多模态能力的大模型提供外置图像理解、OCR、问答、批处理和代理转发能力。
Why
Many cost-effective text LLMs, local models, and coding agents cannot read images directly. In terminal workflows, OpenCode-style agents, automation scripts, and batch jobs, users still need to inspect screenshots, papers, charts, scanned documents, UI mockups, and error images.
imgx bridges that gap by sending local images to a configured vision model and returning text, OCR, or structured JSON that can be consumed by text-only models and other tools.
Goals
- Configure one OpenAI-compatible vision provider:
baseURL,model, andapiKey. - Read local image files that the user explicitly passes to the CLI.
- Support image Q&A, description, OCR, batch processing, and JSON output.
- Persist history, cache entries, and batch state in SQLite.
- Support npm global installation and
npxusage. - Provide standard CLI help, error codes, exit codes, and logging behavior.
- Later expose an OpenAI-compatible proxy that wraps text-only models as pseudo-multimodal providers.
Install
The npm package is imgx-bridge; the installed command is imgx.
npm install -g imgx-bridge
imgx --helpYou can also run it with npx:
npx imgx-bridge --helpQuick Start
npm install -g imgx-bridge
imgx set \
--base-url https://api.example.com/v1 \
--model gpt-4o-mini \
--api-key sk-xxx
imgx doctor
imgx ./screenshot.png "这张截图里的报错是什么?"
imgx ocr ./paper.png --json
imgx batch "./images/**/*.png" \
--task ocr \
--jsonl results.jsonl \
--concurrency 3Commands
Usage: imgx <command> [options]
Commands:
init Initialize imgx config
config Manage configuration
doctor Check environment and provider connectivity
ask <image> <question> Ask a question about an image
describe <image> Describe an image
ocr <image> Extract text from an image
locate <image> <target> Locate a target region in an image
crop <image> Crop an image by bbox or semantic target
annotate <image> Annotate image regions
batch <pattern> Run task on multiple images
history Show analysis history
cache Manage cache
set Save provider settings to local SQLite
unset Delete saved local settings
proxy Start OpenAI-compatible proxy server
Global Options:
--config <path> Use custom config file
--json Output JSON
--quiet Suppress non-result output
--verbose Show more logs
--debug Show debug logs with secrets redacted
-h, --help Show help
-v, --version Show versionHighlights
| Feature | Status | | --- | --- | | OpenAI-compatible vision model | V0.1 | | Local image path input | V0.1 | | Image Q&A, description, OCR | V0.1 | | SQLite-backed cache and history | V0.1 | | JSON and JSONL output | V0.1 | | Batch processing | V0.1 | | Standard error and exit codes | V0.1 | | OpenAI-compatible proxy for text-only models | Planned |
MVP Scope
V0.1 focuses on the core loop:
imgx init
imgx doctor
imgx ask <image> <question>
imgx describe <image>
imgx ocr <image>
imgx batch <glob>
imgx history
imgx cache clearV0.1 must support:
- OpenAI-compatible vision models.
- Local image loading.
- Base64 data URL image upload.
- Text output.
- JSON output.
- SQLite cache and history.
- Batch JSONL output.
- Standard error and exit codes.
- npm global installation.
Configuration
Use imgx set once after installation. Provider settings are stored in local SQLite, including the API key. The JSON config file is kept for non-secret defaults and backward compatibility.
imgx set \
--base-url https://api.example.com/v1 \
--model gpt-4o-mini \
--api-key sk-xxxDelete saved credentials:
imgx unset api-key
imgx unset providerConfig file locations:
User: ~/.config/imgx/config.json
Project: ./imgx.config.jsonConfig precedence:
CLI options > local SQLite settings > project config > user config > defaultsExample:
{
"provider": {
"type": "openai-compatible",
"baseURL": "https://api.example.com/v1",
"model": "gpt-4o-mini"
},
"image": {
"maxSizeMB": 20,
"autoResize": true,
"maxWidth": 2000,
"maxHeight": 2000,
"stripExif": true,
"allowedExtensions": [".png", ".jpg", ".jpeg", ".webp"]
},
"output": {
"defaultFormat": "text"
},
"cache": {
"enabled": true,
"ttlDays": 30
},
"database": {
"path": "~/.imgx/imgx.db"
}
}Command Requirements
imgx init
Initializes user configuration, database, and cache directory.
imgx init
imgx init \
--base-url https://api.example.com/v1 \
--model gpt-4o-miniAcceptance criteria:
- Creates
~/.config/imgx/config.json. - Does not write plaintext API keys to config.
- Creates the SQLite database.
- Creates the cache directory.
imgx doctor
Checks local environment and provider connectivity.
imgx doctor
imgx doctor --jsonChecks:
- Node.js version.
- imgx version.
- Config file path.
- SQLite writability.
- Cache directory writability.
- API key presence.
baseURLreachability.- Whether the model can process a test image.
imgx ask
Asks a question about an image.
imgx ask ./error.png "这张截图报错是什么?"
imgx ./error.png "这张截图报错是什么?"Options:
--json
--task coding | general | ui | document | chart
--no-cache
--save-raw
--timeout <ms>imgx describe
Generates an image description.
imgx describe ./ui.png
imgx describe ./ui.png --json
imgx describe ./ui.png --for coding--for values:
general
coding
ui
document
chart
paperimgx ocr
Extracts visible text from an image.
imgx ocr ./paper.png
imgx ocr ./paper.png --jsonOCR should preserve line breaks, table structure, code indentation, and punctuation when possible. Uncertain characters should be marked as [?].
imgx batch
Runs a task across multiple images.
imgx batch "./screenshots/**/*.png" \
--task ocr \
--jsonl out.jsonl \
--concurrency 3 \
--continue-on-errorOptions:
--task describe | ocr | coding | ui | chart | document
--prompt <text>
--jsonl <path>
--concurrency <n>
--resume
--continue-on-error
--no-cache
--limit <n>Batch acceptance criteria:
- Supports glob patterns and recursive directories.
- Supports concurrency limits.
- Can continue after individual item failures.
- Streams JSONL to stdout or a specified file.
- Supports resume.
- Writes progress to stderr.
JSON Output
Example ask --json result:
{
"ok": true,
"analysis_id": "ana_01J...",
"task": "ask",
"question": "这张截图报错是什么?",
"image": {
"path": "/abs/error.png",
"sha256": "8b7c...",
"mime": "image/png",
"width": 1920,
"height": 1080,
"size_bytes": 345672
},
"result": {
"answer": "图片显示前端运行时报错,错误为 TypeError: Cannot read properties of undefined (reading 'map')。",
"visible_text": [
"TypeError: Cannot read properties of undefined (reading 'map')"
],
"observations": [
"页面中央有错误覆盖层",
"错误与 JavaScript 数组 map 调用有关"
]
},
"provider": {
"model": "gpt-4o-mini"
},
"cached": false,
"usage": {
"latency_ms": 1804,
"input_tokens": null,
"output_tokens": 228
}
}Example JSONL batch output:
{"ok":true,"path":"a.png","analysis_id":"ana_1","result":{"summary":"..."}}
{"ok":false,"path":"b.png","error":{"code":"IMAGE_DECODE_FAILED","message":"Cannot decode image"}}Provider Request
Vision requests use the OpenAI Chat Completions format:
{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "请描述这张图片。"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}
]
}
],
"temperature": 0
}Provider requirements:
- Custom
baseURL. - Bearer token authentication.
- Request timeout.
- Usage recording when returned by provider.
- Raw response persistence for non-standard provider fields.
- Provider errors mapped to standard
imgxerror codes.
Built-In Prompt Tasks
Planned prompt templates:
general
coding
ocr
ui
chart
document
paper
locate
compareJSON mode should return:
{
"summary": "string",
"visible_text": ["string"],
"observations": ["string"],
"limitations": ["string"]
}Persistence
SQLite is used for image metadata, analyses, requests, batches, and migrations. The recommended implementation library is better-sqlite3.
Cache keys are derived from:
sha256(image_bytes)
+ task
+ prompt
+ model
+ baseURL
+ promptTemplateVersion
+ imageOptionsErrors And Exit Codes
All machine-readable failures use this shape:
{
"ok": false,
"error": {
"code": "IMAGE_FILE_NOT_FOUND",
"message": "Image file not found: ./a.png",
"hint": "Use an absolute path or check whether the file exists."
}
}Error code groups:
CONFIG_MISSING_API_KEY
CONFIG_INVALID_BASE_URL
CONFIG_MODEL_MISSING
IMAGE_FILE_NOT_FOUND
IMAGE_PERMISSION_DENIED
IMAGE_UNSUPPORTED_FORMAT
IMAGE_TOO_LARGE
IMAGE_DECODE_FAILED
PROVIDER_UNAUTHORIZED
PROVIDER_RATE_LIMITED
PROVIDER_TIMEOUT
PROVIDER_BAD_REQUEST
PROVIDER_UNSUPPORTED_IMAGE
PROVIDER_INVALID_RESPONSE
DB_OPEN_FAILED
DB_MIGRATION_FAILED
CACHE_READ_FAILED
CACHE_WRITE_FAILED
BATCH_NO_MATCH
BATCH_PARTIAL_FAILEDExit codes:
0 success
1 runtime error
2 argument error
3 config error
4 file error
5 provider error
6 database error
7 batch partial failureLogging And Security
- stdout is reserved for command results.
- stderr is used for progress, warnings, and logs.
--quietoutputs only results.--verboseincludes paths, cache hits, and latency.--debugincludes debug details with secrets redacted.--jsonensures stdout contains only JSON.- API keys, Authorization headers, and image base64 are never logged.
- Saved API keys are stored only in the local SQLite database and can be removed with
imgx unset api-key. - Local files are read only when explicitly passed by the user.
- URL images require explicit
--allow-url. - EXIF is stripped by default.
- Image size and batch size are bounded.
- Proxy mode listens on
127.0.0.1by default.
Suggested Stack
- Language: TypeScript
- Runtime: Node.js >= 18, recommended >= 20
- CLI: commander
- Image processing: sharp
- SQLite: better-sqlite3
- Config validation: zod
- HTTP: undici
- Glob: fast-glob
- Logging: pino or a lightweight custom logger
- Tests: vitest
- Bundling: tsup
- Publishing: npm
Roadmap
V0.1:
init,doctor,ask,describe,ocr,batch,history,cache clear- OpenAI-compatible vision provider
- SQLite cache and history
- JSON and JSONL output
- npm global install
V0.2:
locate,crop, andannotate- Batch resume
- Multi-image input
V0.3:
- OpenAI-compatible proxy
- Main text model plus vision model forwarding
- Streaming passthrough
- OpenCode integration example
V0.4:
- MCP server
- Plugin-style providers
- Local VLM backend
- Generative image editing backend
V0.1 Acceptance
npm install -g imgx-bridgemakesimgx --helpavailable.imgx setsaves provider settings locally.imgx doctorcompletes provider checks.imgx describe ./a.pngreturns an image description.imgx ocr ./a.pngreturns OCR text.imgx ask ./a.png "question"returns an answer.--jsonreturns stable machine-readable JSON.- Repeating the same image and task hits the cache.
batchhandles image directories and writes JSONL.- Error cases return standard error codes and non-zero exit codes.
