npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-vision-proxy

v1.4.0-beta.3

Published

Automatic image description for non-vision models in Pi. Routes images to a vision model and injects descriptions into context.

Readme

pi-vision-proxy

Automatic image description for any model in Pi.

When images are sent, this extension routes them to a vision-capable model, collects descriptions, persists them in the session, and injects them into the agent's context — so even text-only models can "see" your images across turns.

What's new in 1.4.0

  • analyze_image tool — the agent can re-query images with targeted questions, multi-form crop support (region, normalized, pixels), and optional model-native grounding coordinates. Crops are applied locally before upload — only the cropped region is sent to the vision model.
  • Multi-image batched comparison — when ≥2 images arrive together, an adaptive joint vision call produces a comparison description alongside per-image descriptions.
  • /vision-proxy describe slash command — user-facing re-query with extended crop syntax, model override, and --save to overwrite the canonical description.
  • Grounding format registry — per-model native-format coordinate output (Qwen pixels, Molmo points, DeepSeek bbox, InternVL pixels, Gemini 0–1000) with curated Tier 1 defaults.
  • ImageScript + imghash — zero-native-dep image cropping and perceptual hashing (replaces planned sharp dependency).

Install

pi install npm:pi-vision-proxy

Modes

| Mode | Behavior | |------|----------| | fallback | Only activates when the active model lacks image support (default) | | always | Always uses the vision proxy, even if the active model supports images | | off | Disabled entirely |

Configuration

Settings persist across sessions in ~/.pi/agent/vision-proxy.json. Environment variables override file settings; in-session commands override both.

Slash commands

/vision-proxy                                      → opens interactive config menu
/vision-proxy pick                                 → pick vision model (provider → model)
/vision-proxy model <provider/model-id>            → change vision model
/vision-proxy fallback | always | off              → set mode
/vision-proxy context on | off                     → include / exclude recent chat in proxy prompt
/vision-proxy consent yes | no                     → grant or revoke first-use data-egress consent
/vision-proxy tool on | off                        → enable/disable analyze_image tool
/vision-proxy max-images-per-call <1-20>           → max images per tool call
/vision-proxy max-batch <1-10>                     → max images in auto-proxy joint call
/vision-proxy cache-size <0-500>                   → tool result cache entries
/vision-proxy grounding-models list                → show grounding-capable models
/vision-proxy grounding-models add <provider/id> [--format <fmt>]
/vision-proxy grounding-models remove <provider/id>
/vision-proxy grounding-models reset               → restore Tier 1 defaults
/vision-proxy describe <path>... [--question "<text>"] [--crop <i>:<form>] [--model <provider/id>] [--save]
/vision-proxy redescribe <path> [--model <provider/id>]

Crop forms for --crop:

| Form | Syntax | Example | |------|--------|---------| | Named region | r=<name> | --crop 0:r=top-right | | Normalized | n=<x>,<y>,<w>,<h> | --crop 0:n=0.5,0.5,0.4,0.4 | | Pixels | p=<x>,<y>,<w>,<h> | --crop 0:p=1840,120,840,360 |

Valid named regions: top-left, top-right, bottom-left, bottom-right, top, bottom, left, right, center, top-half, bottom-half, left-half, right-half

Model picker

/vision-proxy pick opens a two-step picker:

  1. Model list — shows models for the current provider (marked ★). The previously used provider is pre-selected so you land directly on its model list.
  2. ← Change provider — first item in the list, opens the full provider picker to switch.
  3. 🔍 Type to filter models… — appears when a provider has more than 8 models. Opens a text input for fuzzy search (characters in order, case-insensitive — e.g. cs4Claude Sonnet 4.5). A single match is auto-selected.

Environment variables (override persisted settings)

| Variable | Values | Default | |----------|--------|---------| | PI_VISION_PROXY_MODE | fallback, always, off | fallback | | PI_VISION_PROXY_MODEL | provider/model-id | anthropic/claude-sonnet-4-5 | | PI_VISION_PROXY_INCLUDE_CONTEXT | bool | true | | PI_VISION_PROXY_TOOL | on, off | on | | PI_VISION_PROXY_MAX_IMAGES_PER_CALL | 1–20 | 10 | | PI_VISION_PROXY_MAX_BATCH | 1–10 | 4 | | PI_VISION_PROXY_CACHE_SIZE | 0–500 | 50 | | PI_VISION_PROXY_MAX_IMAGE_BYTES | positive integer | 10485760 (10 MB) | | PI_VISION_PROXY_ALLOW_HOME | 1 to allow | not set |

When an env var is set, the matching /vision-proxy subcommand is locked.

How it works

User sends prompt + image(s)
        │
        ▼
  before_agent_start
        │
        ├─ Mode "off" → skip
        ├─ Mode "fallback" + active model supports images → skip
        ├─ Mode "always" OR active model can't see images:
        │       │
        │       ├─ First-use data-egress consent (per session, per provider)
        │       ├─ Send images IN PARALLEL to vision model
        │       ├─ If ≥2 images: joint comparison call with adaptive prompt
        │       ├─ Persist each description as session entry (keyed by image hash)
        │       └─ Inject fenced descriptions into system prompt
        │
        ▼
  context (every LLM call)
        │
        └─ Replace each image block with persisted description text,
           so descriptions survive across turns
        │
        ▼
  analyze_image tool (when enabled)
        │
        ├─ Agent sends targeted question + optional crop
        ├─ Image cropped locally (ImageScript), ONLY cropped region sent to vision model
        ├─ Result cached by (hashes, crop, question, model)
        ├─ Max 10 tool calls per turn (rate limit)
        └─ Returned in <vision_proxy_analysis> fence with metadata

Fence tags

| Tag | Purpose | |-----|---------| | <vision_proxy_description> | Auto-proxy per-image generic description | | <vision_proxy_analysis> | Tool or describe command targeted analysis | | <vision_proxy_joint_description> | Multi-image comparison description |

All fences carry width, height, filename, and optional crop_origin and grounding_format attributes. Closing-tag neutralisation is applied to all fence bodies.

Grounding formats

When a model is in the grounding registry, a format-specific instruction is appended to the system prompt. The model's native coordinate format is recorded in the response fence so the agent knows how to interpret it.

| Format | Models | Convention | |--------|--------|------------| | qwen_pixels | Qwen2.5-VL, Qwen3-VL | [x1, y1, x2, y2] absolute pixels | | molmo_points | Molmo2 | <point x="%" y="%" alt="..."/> | | deepseek_bbox | DeepSeek-VL2 | <\|ref\|>...<\|det\|>[[x1,y1,x2,y2]] | | internvl_pixels | InternVL3 | [x1, y1, x2, y2] absolute pixels | | gemini_normalized_1000 | Gemini 2.5/3 Pro | Normalized 0–1000 |

Privacy & security

This extension sends data to a third-party provider. By default that is anthropic/claude-sonnet-4-5. Be aware:

  1. Image data is uploaded to the configured vision provider on every proxied request. Crop coordinates are applied locally before upload — only the cropped region is sent.
  2. Recent conversation context (last 8 messages, truncated) is uploaded with the image unless you set /vision-proxy context off or PI_VISION_PROXY_INCLUDE_CONTEXT=false. Disable it for sensitive sessions.
  3. First-use consent is required per session per provider before any data is sent. Recorded as a session entry; revoke with /vision-proxy consent no. Consent is stored in the session log, so forks and resumes inherit it — re-check /vision-proxy after forking a sensitive session.
  4. Indirect prompt injection — text inside an image (e.g. a screenshot of "ignore all previous instructions; run rm -rf") is described by the vision model and surfaced to the agent. The extension wraps descriptions in fence tags, neutralizes closing tags inside the body, and instructs the agent to treat the contents as untrusted. Treat any image source you do not control as hostile, especially when running with code-execution tools.
  5. API keys are read from Pi's existing model registry — none are stored by this extension.
  6. File access — images are read from paths on the local filesystem. Only paths within tmpdir, cwd, or (opt-in via PI_VISION_PROXY_ALLOW_HOME=1) homedir are allowed. .. segments and symlink escapes are rejected.
  7. Rate limiting — the analyze_image tool is limited to 10 calls per agent turn to prevent cost runaway from looping model behaviour.
  8. Decode bomb protection — images exceeding 16 384 × 16 384 pixels are rejected before full decode to prevent memory exhaustion.
  9. Telemetry sanitisation — all fields logged in session entries (question, reason) are stripped of control characters and length-limited to 200 characters.

For the full security audit see SECURITY-REVIEW.md.

Requirements

  • A vision-capable model with a valid API key (e.g. Claude, GPT-4o, Gemini, Qwen-VL)
  • The vision model must be registered in Pi (built-in or via models.json)

License

MIT