pi-describe-image
v0.0.0
Published
A pi extension to describe images using vision models
Maintainers
Readme
pi-describe-image
A pi extension that provides a describe_image tool to analyze and describe images using vision-capable AI models.
When to use this: This extension is primarily useful when your main conversation model doesn't have vision capabilities (e.g., older models, text-only APIs, or lightweight local models), but you still need to analyze images. You can keep using your preferred model for text/chat while delegating image descriptions to a dedicated vision model (Claude, GPT-4o, Gemini, etc.).
Quick Start
# 1. Install the extension
cd ~/workbench/pi-describe-image
ln -s "$(pwd)" ~/.pi/extensions/pi-describe-image
# 2. Create configuration
cd ~/my-project
mkdir -p .pi
cat > .pi/describe-image.json << 'EOF'
{
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
}
EOF
# 3. Set your API key
export ANTHROPIC_API_KEY="your-api-key"
# 4. Reload pi and test
cd ~/my-project
pi /reload
# Then ask: "Describe this image: https://example.com/photo.jpg"Installation
From local directory (development)
ln -s /path/to/pi-describe-image ~/.pi/extensions/pi-describe-imageVia npm (when published)
npm install -g pi-describe-imageThen reload pi: pi /reload
Configuration
Create a describe-image.json configuration file with just two fields: provider and model.
Project-level config (recommended)
Create .pi/describe-image.json in your project root:
{
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
}Global config
Create ~/.pi/describe-image.json:
{
"provider": "openai",
"model": "gpt-5.2"
}Config search order:
<cwd>/.pi/describe-image.json(project-specific)~/.pi/describe-image.json(global fallback)
Usage
Once configured, the describe_image tool is available for the LLM to use. This is especially helpful when your main model lacks vision - the LLM can "see" images by calling out to a vision-capable model on demand:
User: What's in this image? https://example.com/photo.jpg
User: Read the text from this screenshot: ./screenshot.png
User: What colors are in this image? https://example.com/painting.jpgThe LLM can pass a custom prompt parameter to control how the image is described (general description, extract text, analyze style, etc.). If no prompt is given, it uses a default: "Describe this image in detail. What do you see?"
Tool Parameters
path- Local file path to an imageurl- URL of an image (either path or url required)prompt- (Optional) Custom instructions for how to describe the image
Supported Providers & Models
Any model that supports image input can be used. Some popular options:
Anthropic
claude-sonnet-4-20250514(recommended)claude-opus-4-20250514claude-sonnet-3-7-20250219
OpenAI
gpt-5.2gpt-5.3gpt-5.4gpt-4o
gemini-2.5-progemini-2.0-flash
AWS Bedrock
anthropic.claude-sonnet-4-20250514-v1:0amazon.nova-pro-v1:0
Configuration Format
{
"provider": "<provider-name>", // Required: e.g., "anthropic", "openai"
"model": "<model-id>" // Required: specific model ID
}API Key Setup
The extension uses the same API key resolution as pi's core:
- OAuth credentials (if provider supports /login)
- Environment variables (e.g.,
ANTHROPIC_API_KEY,OPENAI_API_KEY) - Configured API keys in
~/.pi/agent/config.json
Error Handling
Common errors and solutions:
- "No describe-image.json configuration found" - Create the config file
- "Model not found" - Check the provider/model ID in your config
- "Model does not support image input" - Use a vision-capable model
- "No API key available" - Configure your API key for the selected provider
License
MIT
