@dtmirizzi/pi-openrouter-multimodal
v1.6.0
Published
OpenRouter multimodal tools for Pi — search, fetch, image gen, vision, video, PDF, TTS, STT
Downloads
475
Readme
@dtmirizzi/pi-openrouter-multimodal
OpenRouter multimodal tool integration for Pi. Provides 8 independently toggleable tools with per-modality model selection and session-persistent settings.
| Tool | What it does |
|------|-------------|
| web_search | Server-side web search with real-time results |
| web_fetch | Fetch page content from a URL (web, docs, PDFs) |
| image_generate | Text-to-image generation via OpenRouter chat completions |
| image_understand | Analyze images via vision models |
| video_understand | Analyze videos (YouTube links work with Gemini) |
| pdf_read | Extract and analyze PDF content |
| tts_speak | Text-to-speech via OpenRouter /audio/speech endpoint |
| stt_transcribe | Speech-to-text via OpenRouter /audio/transcriptions endpoint |
Install
pi install npm:@dtmirizzi/pi-openrouter-multimodalOr from a local checkout:
pi install /path/to/pi-openrouter-multimodalAPI Key
The extension resolves the OpenRouter API key from (priority order):
OPENROUTER_API_KEYenvironment variable- Pi model registry (provider
openrouter) ~/.pi/agent/models.jsonunderproviders.openrouter.apiKey
Commands
| Command | Description |
|---------|-------------|
| /web-tools | Toggle tools on/off and configure search/fetch engines |
| /web-models | Select models per modality (image, vision, video, PDF, TTS voice, STT) |
| /web-search | Toggle web_search and configure search engine |
| /web-fetch | Toggle web_fetch and configure fetch engine |
Each command opens an interactive overlay. Use ↑↓ to navigate, ←→ to
cycle values, Esc to close. Settings persist across sessions and survive
compaction, shutdown, and tree navigation.
/web-tools
Toggle each tool on/off and set search/fetch engine preferences. Also includes a verbose/compact status-bar display toggle.
/web-models
Select the model for each modality from a list fetched live from the OpenRouter API at startup. Falls back to a comprehensive built-in list if the API is unavailable.
Tools
web_search
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | string | required | Search query |
| engine | string | auto | auto, native, exa, firecrawl, parallel |
| max_results | integer | 5 | Results per search (1-25) |
| search_context_size | string | — | low (5K), medium (15K), high (30K) |
| allowed_domains | string[] | — | Only return results from these domains |
| excluded_domains | string[] | — | Exclude results from these domains |
web_fetch
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| url | string | required | URL to fetch content from |
| engine | string | auto | auto, native, exa, openrouter, firecrawl, parallel |
| max_content_tokens | integer | — | Max content length (approximate tokens) |
image_generate
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| prompt | string | required | Text prompt describing the image |
| model | string | state | Override the default model from /web-models |
Selected via /web-models. Models are fetched live from OpenRouter at startup;
fallback list includes Gemini Flash Image, GPT-5 Image, FLUX.2, Seedream,
Riverflow, Recraft, Grok Imagine, and more.
image_understand
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| url | string | required | Image URL or base64 data URL |
| prompt | string | Describe this image in detail | Analysis prompt |
| model | string | state | Override default from /web-models |
video_understand
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| url | string | required | Video URL (YouTube links work with Gemini) |
| prompt | string | Describe what happens in this video | Analysis prompt |
| model | string | state | Override default from /web-models |
pdf_read
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| url | string | required | URL of the PDF document |
| prompt | string | Summarize this document | Analysis prompt |
| model | string | state | Override default from /web-models |
| engine | string | cloudflare-ai | cloudflare-ai (free), mistral-ocr (scanned docs), or native |
tts_speak
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| text | string | required | Text to convert to speech |
| model | string | state | Override default from /web-models |
| voice | string | state | Override default from /web-models |
stt_transcribe
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| audio | string | required | Base64-encoded audio data |
| format | string | required | wav, mp3, flac, m4a, ogg, webm, aac |
| model | string | state | Override default from /web-models |
| language | string | — | ISO-639-1 language code (optional) |
How It Works
All tools proxy requests through OpenRouter's API:
- web_search / web_fetch — Chat completions with server tool definitions
(
openrouter:web_search/openrouter:web_fetch) - image_generate — Chat completions with
modalities: ["image", "text"]on the selected image generation model - image_understand / video_understand — Chat completions with
multimodal content blocks (
image_url,video_url) - pdf_read — Chat completions with file content block and
file-parserplugin - tts_speak — Direct call to
/api/v1/audio/speech - stt_transcribe — Direct call to
/api/v1/audio/transcriptions
Model Discovery
On startup, the extension fetches available models from
GET /api/v1/models?output_modalities=... and caches them for use in the
/web-models settings panel. If the API is unreachable, a comprehensive
set of fallback models is used.
Development
# Install dependencies
npm install
# Run tests
npm test # all tests
npm run test:unit # unit tests only
npm run test:integration # requires OPENROUTER_API_KEY
# Format and lint
npm run fmt # format all files
npm run lint # lint + fix all files
npm run check # format + lint + organize imports
npm run check:ci # strict CI check (format + lint, no writes)The repo uses Biome for formatting and linting. CI enforces both on every push and PR.
