npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-vision-tool

v1.3.6

Published

Pi Agent extension that adds a describe_image tool, letting non-multimodal models delegate image analysis to a vision-capable model (like Qwen VL)

Readme

Pi Vision Tool

A Pi Agent extension that adds a describe_image tool, letting non-multimodal models (like DeepSeek V4 Pro, GPT-5 Codex without image support, etc.) delegate image analysis to a vision-capable model.

Screenshots

Features

The calling model has full control over every call, deciding what matters for each image:

| Feature | Parameter | What the model controls | |---|---|---| | Compression | compress | true for faster/general use, false for pixel-perfect accuracy | | Reasoning depth | reasoning | "off" for instant answers, "high"/"xhigh" for complex analysis | | Prompt | prompt | Free-text instruction: "describe", "extract text", "find the bug", ... | | Image source | image_path | File path, data URL, or raw base64 |

This means the model itself decides the cost/quality tradeoff per call — no pre-configuration needed. Just like a developer chooses between a quick cat and a deep git bisect, the model picks the right tool settings for the job.

How it works

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  DeepSeek Pro    │────▶│  describe_image  │────▶│  Qwen VL / any   │
│  (no vision)     │     │  (this tool)     │     │  vision model    │
│                  │◀────│                  │◀────│                  │
│  "that's red"    │     │  text response   │     │  "it's red"      │
└──────────────────┘     └──────────────────┘     └──────────────────┘
  1. The calling model decides it needs to understand an image
  2. It calls describe_image with an image path and a specific prompt
  3. The tool sends the image + prompt to your vision model
  4. The vision model's text response is returned to the calling model as a tool result
  5. The calling model integrates the result into its reasoning

Reasoning / extended thinking

For vision models with reasoning: true, the calling model can choose the reasoning effort per call via the reasoning parameter:

| Level | When to use | |---|---| | off | Simple queries: "what color is this?" | | minimal | Quick checks: "is there an error on this screenshot?" | | low | Basic descriptions, text extraction | | medium | UI analysis, layout descriptions | | high | Architecture diagrams, complex screenshots | | xhigh | Bug hunting, multi-step visual reasoning |

When omitted, the tool uses the configured default (off by default). The calling model should decide based on task complexity — similar to how it picks compress: true/false. Read the models.md thinking level map section for per-model tuning.

Important: For non-OpenAI vision models (Qwen, llama.cpp, DeepSeek, etc.), you must set compat.thinkingFormat in models.json so the tool sends the correct parameter. Without it, the tool defaults to reasoning_effort (OpenAI format), which your provider may reject.

{
  "id": "qwen3.5",
  "reasoning": true,
  "input": ["text", "image"],
  "compat": {
    "thinkingFormat": "qwen"
  }
}

Supported formats:

| Format | API parameter sent | Use case | |---|---|---| | (default, no compat) | reasoning_effort | OpenAI, any OpenAI-compatible proxy | | qwen | enable_thinking | Qwen via llama.cpp, vLLM, Ollama | | qwen-chat-template | chat_template_kwargs.enable_thinking | llama-server with Qwen chat template | | deepseek | reasoning: { effort } | DeepSeek API | | openrouter | reasoning: { effort } | OpenRouter | | together | reasoning: { enabled: boolean } + reasoning_effort | Together AI |

Additionally, thinkingLevelMap in models.json maps pi's level names to provider-specific values. Use this when a provider uses non-standard level strings (e.g., Kimi K2.6 uses "none" instead of "off"):

{
  "id": "Kimi-K2.6",
  "reasoning": true,
  "input": ["text", "image"],
  "thinkingLevelMap": {
    "off": "none",
    "xhigh": null
  }
}

Set the default reasoning level via:

/vision config reasoning-effort medium
# or via env var:
export PI_VISION_REASONING_EFFORT=medium

Installation

Via npm (recommended)

pi install npm:pi-vision-tool

This is the primary installation method and the way it's listed in the Pi package gallery.

Via git

pi install git:github.com/xezpeleta/pi-vision-tool

Via local path

pi install /path/to/pi-vision-tool

Quick test (no install)

pi -e /path/to/pi-vision-tool

Configuration

1. Add a vision model to ~/.pi/agent/models.json

{
  "providers": {
    "my-vision-provider": {
      "baseUrl": "https://your-llm-server/v1",
      "apiKey": "$VISION_API_KEY",
      "api": "openai-completions",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "my-vision-model",
          "reasoning": true,
          "input": ["text", "image"]
        }
      ]
    }
  }
}

The input: ["text", "image"] field is required — it tells Pi the model supports images.

2. Set the API key in ~/.pi/agent/auth.json

{
  "my-vision-provider": {
    "type": "api_key",
    "key": "sk-your-key-here"
  }
}

3. Configure the vision model

Recommended: Use the /vision command (persistent)

In any Pi session with the extension loaded:

/vision config provider my-vision-provider
/vision config model my-vision-model

Settings are saved to ~/.pi/agent/vision-tool.json and persist across all sessions. Changes take effect immediately — no /reload or restart needed.

Run /vision with no arguments to see current configuration.

Enable / disable

/vision on
/vision off

Running /vision off disables the tool entirely: the 👁 indicator disappears from the footer and any describe_image call returns an error. Use /vision on to re-enable it. The toggle is persisted across sessions.

Legacy: Environment variables

export PI_VISION_PROVIDER=my-vision-provider
export PI_VISION_MODEL=my-vision-model

Env vars work but must be set before starting Pi and don't persist between sessions. When a config file exists, it takes priority over env vars.

4. (Optional) Install sharp for image compression

npm install sharp

If sharp is available, images are automatically compressed before sending:

  • Downscaled to 1568px max dimension (screenshots, high-res photos)
  • Alpha channel stripped (RGBA → RGB)
  • Lossless PNG converted to JPEG (quality 85)

This reduces payload size ~4x and speeds up responses significantly. Without sharp, images are sent as raw bytes.

Compression controls

| Env var | Default | Description | |---|---|---| | PI_VISION_MAX_DIM | 1568 | Max width/height in pixels before downscaling | | PI_VISION_JPEG_QUALITY | 85 | JPEG quality (1-100) for converted images |

The calling model controls per-call compression via the compress parameter. Set compress: false when pixel-perfect accuracy is needed (e.g., reading coordinates or detecting small UI elements).

Usage

Once installed, any model in your session will see the describe_image tool. Just reference an image in your prompt and the model will call it automatically.

Example prompts

| What you need | How to ask | |---|---| | Description | "Describe everything visible in this screenshot" | | Pixel coordinates | "Give [x,y,w,h] bounding boxes for all buttons" | | Text extraction | "Read all visible text, preserving structure" | | Error analysis | "What error is shown in this terminal screenshot?" | | UI inspection | "List all interactive elements and their states" | | Color values | "What hex color is the header bar?" | | Layout analysis | "Describe the page layout: sidebar, main content, etc." | | Comparison | "Compare these two screenshots — what changed?" |

For complex analysis, the calling model can set reasoning: "high":

{
  "image_path": "/tmp/architecture.png",
  "prompt": "Analyze this system architecture diagram in detail",
  "compress": true,
  "reasoning": "high"
}

Image formats

  • File path: /tmp/screenshot.png, ~/Desktop/photo.jpg
  • Data URL: data:image/png;base64,iVBORw0KGgo...
  • Raw base64: A base64-encoded string over 100 characters

Supported formats: PNG, JPEG, GIF, WebP, BMP.

How it works (technical)

The tool:

  1. Resolves the vision model from Pi's model registry using ctx.modelRegistry.find()
  2. Resolves the API key via ctx.modelRegistry.getApiKeyAndHeaders()
  3. Decodes the image (file path, data URL, or raw base64)
  4. Optionally compresses the image (resize, strip alpha, convert to JPEG) via sharp
  5. Makes a direct OpenAI-compatible /chat/completions call to the vision model's base URL
  6. Returns the vision model's text response as the tool result

License

MIT