gemini-multimodal

v1.0.3

Published

3 months ago

Gemini multimodal skill for Claude Code - video, PDF, image analysis & generation via browser cookies

0High
0Medium
0Low

nicopreme

claude claude-code skill gemini gemini-3-pro ai image-generation video-analysis

gemini-multimodal

Gemini multimodal skill for Claude Code. Video, PDF, image analysis & generation via browser cookies - no API key required.

Installation

npx gemini-multimodal

This installs the skill to ~/.claude/skills/gemini/ and sets up the Python environment automatically.

Prerequisites:

Python 3.8+
Chrome logged into gemini.google.com
On macOS, allow Keychain access when prompted (first run)

Features

| Capability | Description | |------------|-------------| | Text Queries | Complex reasoning with "Thinking with 3 Pro" mode | | Video Analysis | Upload MP4 files for summarization, timestamps, insights | | YouTube Analysis | Analyze videos via URL (uses YouTube extension) | | Document Analysis | PDF and document Q&A | | Image Analysis | Describe, OCR, analyze uploaded images | | Image Generation | Create images from text prompts | | Image Editing | Modify images with natural language | | Google Search | Automatic grounding for current information |

How It Works

User Request → webapi CLI → gemini-webapi → Gemini Web (cookies) → Response

Authentication uses Chrome browser cookies - no API key needed. Just be logged into gemini.google.com.

Usage

Text Queries

# Complex reasoning (Thinking with 3 Pro)
webapi "Explain the implications of quantum computing for cryptography"

# Show thinking process
webapi "Solve step by step: What is 15% of 240?" --show-thoughts

File Analysis

# Video analysis
webapi "Summarize this video with timestamps" --file meeting.mp4

# Document analysis
webapi "Extract key findings" --file research.pdf

# Image analysis
webapi "What's in this image?" --file photo.png

YouTube Analysis

webapi "What are the main points discussed?" --youtube "https://youtube.com/watch?v=VIDEO_ID"

Requires YouTube extension enabled in gemini.google.com settings.

Image Generation

# Generate image
webapi "A cyberpunk cityscape at night" --generate-image city.png

# With aspect ratio
webapi "Mountain landscape" --generate-image landscape.png --aspect 16:9

# Edit existing image
webapi "Make the sky purple" --edit photo.jpg --output edited.png

Current Information (Grounded)

webapi "What are the latest AI news this week? Search the web."

Google Search grounding is automatic when queries need current information.

CLI Options

| Option | Description | |--------|-------------| | --file, -f FILE | Input file (MP4, PDF, PNG, JPG, etc.) | | --youtube URL | YouTube video URL | | --generate-image FILE | Generate and save image | | --edit IMAGE | Edit image (with --output) | | --output, -o FILE | Output path for images | | --aspect RATIO | Aspect ratio (16:9, 1:1, 4:3, 3:4) | | --show-thoughts | Display thinking process | | --model MODEL | Model to use (default: gemini-3.0-pro) | | --json | JSON output | | --help, -h | Show help |

File Structure

gemini/
├── SKILL.md           # Claude Code skill definition
├── README.md          # This file
├── requirements.txt   # Python dependencies
├── .venv/             # Virtual environment
├── webapi             # Bash wrapper
└── webapi.py          # Python implementation

Troubleshooting

"Error initializing client"

Log into gemini.google.com in Chrome
On macOS, allow Keychain access when prompted

"No images generated"

Rephrase prompt, some content is filtered
Be more explicit about what you want

"Module not found"

Activate venv: source .venv/bin/activate
Install deps: pip install -r requirements.txt

"YouTube not working"

Enable YouTube extension in gemini.google.com settings

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

gemini-multimodal

Installation

Features

How It Works

Usage

Text Queries

File Analysis

YouTube Analysis

Image Generation

Current Information (Grounded)

CLI Options

File Structure

Troubleshooting

License