vision-generator-mcp

v0.1.0

Published

a month ago

Local MCP server for image and video generation via OpenAI-compatible providers

0High
0Medium
0Low

rayss-dev

mcp model-context-protocol image-generation video-generation openai-compatible

🎨 Vision Generator MCP

Local-first MCP server for image & video generation through OpenAI-compatible providers

Discover models automatically · generate media locally · save outputs explicitly · avoid context bloat

🚀 Why this exists

Most image/video APIs force clients to understand:

different endpoints
inconsistent payloads
mixed sync/async behavior
provider-specific output handling
confusing model lists with text-only models mixed in

Vision Generator MCP gives you one local MCP layer that:

auto-discovers models from the provider
filters only image/video-capable models
normalizes generation flows
writes outputs to a folder you choose
keeps chat context clean by not returning huge base64 images

✨ Highlights

| Feature | What you get | |---|---| | Auto model discovery | Uses GET /models at runtime | | Vision-only filtering | Hides irrelevant text-only models via buildVisionModelRegistry() | | Required output folder | Every generated asset has a clear final_path | | Async video flow | Submit + poll video jobs cleanly | | Local-first workflow | Great for desktop / VS Code / Claude workflows | | Configurable timeouts | Provider and download timeouts are configurable from MCP settings | | Modular architecture | Clean separation across src/providers/, src/core/, src/tools/, src/validation/ |

🧭 How it works

┌───────────────────────────────────────────────┐
│ MCP Client / Agent                            │
│ Claude / VS Code / Desktop / Local workflow   │
└───────────────────────┬───────────────────────┘
                        │
                        │ MCP tools
                        ▼
┌───────────────────────────────────────────────┐
│ Vision Generator MCP                          │
│-----------------------------------------------│
│ Tool handlers                                 │
│ Validation                                    │
│ Vision service                                │
│ Model discovery                               │
│ Capability filtering                          │
│ Output publishing                             │
└───────────────────────┬───────────────────────┘
                        │
                        │ Adapter abstraction
                        ▼
┌───────────────────────────────────────────────┐
│ OpenAI-compatible provider adapter            │
└───────────────────────┬───────────────────────┘
                        │
                        │ HTTP
                        ▼
┌───────────────────────────────────────────────┐
│ Provider API                                  │
│ /models                                       │
│ /images/generations                           │
│ /images/edits                                 │
│ /videos/generations                           │
└───────────────────────────────────────────────┘

🧱 Project structure

.
├─ README.md
├─ package.json
├─ tsconfig.json
├─ plans/
│  └─ mcp-image-video-architecture-plan.md
├─ src/
│  ├─ index.ts
│  ├─ config/
│  │  └─ providers.ts
│  ├─ core/
│  │  ├─ errors.ts
│  │  ├─ file-output-publisher.ts
│  │  ├─ model-discovery.ts
│  │  └─ vision-service.ts
│  ├─ providers/
│  │  ├─ base-provider.ts
│  │  ├─ openai-compatible.adapter.ts
│  │  └─ provider-factory.ts
│  ├─ tools/
│  │  ├─ animate-image.ts
│  │  ├─ edit-image.ts
│  │  ├─ generate-image.ts
│  │  ├─ generate-video.ts
│  │  ├─ get-job-status.ts
│  │  ├─ get-model-capabilities.ts
│  │  └─ list-models.ts
│  ├─ types/
│  │  └─ contracts.ts
│  ├─ utils/
│  │  ├─ mime.ts
│  │  └─ path.ts
│  └─ validation/
│     ├─ common.ts
│     ├─ image.ts
│     ├─ job.ts
│     ├─ output.ts
│     ├─ schemas.ts
│     └─ video.ts
└─ outputs/

✅ Current supported workflow

Image

text-to-image via generateImage()
image editing via editImage()

Video

text-to-video via generateVideo()
image-to-video via animateImage()
async polling via getJobStatus()

Discovery

runtime model discovery via listModels()
vision filtering via buildVisionModelRegistry()

📦 Installation

Requirements

Node.js 18+
npm
an OpenAI-compatible provider endpoint

Install from npm

npm install -g vision-generator-mcp

Run installed binary

vision-generator-mcp

Local development install

npm install

Type-check

npm run check

Build

npm run build

Publishable package notes

CLI entry is exposed via bin
installable package files are limited via files
build runs automatically before publish/install from source via prepare

⚙️ MCP settings

This server reads configuration from MCP settings using:

PROVIDER_BASE_URL
PROVIDER_API_KEY
PROVIDER_TIMEOUT_MS
DOWNLOAD_TIMEOUT_MS

Example configuration:

{
  "mcpServers": {
    "vision-generator": {
      "command": "node",
      "args": [
        "d:/All_project/own/AI_Coder/Native Tools/vision-generator/build/index.js"
      ],
      "disabled": false,
      "timeout": 600,
      "alwaysAllow": [],
      "disabledTools": [],
      "env": {
        "PROVIDER_BASE_URL": "https://ai.rayzs.qzz.io/v1",
        "PROVIDER_API_KEY": "your-api-key-1",
        "PROVIDER_TIMEOUT_MS": "300000",
        "DOWNLOAD_TIMEOUT_MS": "300000"
      }
    }
  }
}

Timeout layers

| Timeout | Scope | |---|---| | timeout in MCP settings | How long the MCP host waits for the server tool call | | PROVIDER_TIMEOUT_MS | Timeout for provider API requests | | DOWNLOAD_TIMEOUT_MS | Timeout for binary asset download |

Current provider timeout config is loaded in loadProviderConfig() and applied in OpenAICompatibleAdapter.

📁 Output strategy

output.directory is required for image and video tools.

Recommended folders:

outputs/
d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs

Why this is better:

every result has a clear location
no hidden temp output behavior
no base64 image spam in chat context
much better for GitHub-friendly, local-first workflows

🛠️ Tool reference

`list_models`

Discover image/video-capable models.

Example output

{
  "provider": "https://ai.rayzs.qzz.io/v1",
  "models": [
    {
      "id": "gpt-image-2",
      "operations": {
        "image_generation": true,
        "image_editing": true,
        "image_variation": false,
        "text_to_video": false,
        "image_to_video": false
      }
    }
  ]
}

`get_model_capabilities`

Inspect a discovered model.

Example input

{
  "model": "gpt-image-2"
}

`generate_image`

Generate an image and write it to your chosen folder.

Example input

{
  "model": "gpt-image-2",
  "prompt": "A futuristic Jakarta skyline at sunset, cinematic lighting",
  "aspect_ratio": "16:9",
  "resolution": "1536x1024",
  "output": {
    "directory": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs",
    "filename_prefix": "jakarta-future-city",
    "create_directory": true
  }
}

Example output

{
  "status": "succeeded",
  "provider": "https://ai.rayzs.qzz.io/v1",
  "model": "gpt-image-2",
  "operation": "image_generation",
  "outputs": [
    {
      "type": "image",
      "mime_type": "image/png",
      "final_path": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs/jakarta-future-city_2026-05-19T01-00-00-000Z.png",
      "width": 1536,
      "height": 1024
    }
  ]
}

`edit_image`

Edit a local image and save the output.

Example input

{
  "model": "gpt-image-2",
  "prompt": "Replace the background with a neon cyberpunk street",
  "image_path": "d:/assets/input.png",
  "output": {
    "directory": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs",
    "filename_prefix": "edited-scene",
    "create_directory": true
  }
}

`generate_video`

Submit an async text-to-video job.

Example input

{
  "model": "your-video-model",
  "prompt": "A cinematic aerial shot flying over a futuristic city",
  "duration_seconds": 5,
  "fps": 24,
  "output": {
    "directory": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs",
    "filename_prefix": "future-city-video",
    "create_directory": true
  }
}

Example submit result

{
  "status": "submitted",
  "provider": "https://ai.rayzs.qzz.io/v1",
  "model": "your-video-model",
  "operation": "text_to_video",
  "job_id": "video_...",
  "provider_job_id": "provider_...",
  "outputs": []
}

`animate_image`

Submit an async image-to-video job.

`get_job_status`

Poll video job status until the final file is downloaded and written to your chosen folder.

🧩 Implementation map

| Concern | Entry point | |---|---| | Composition root | src/index.ts | | Main orchestration | src/core/vision-service.ts | | Provider contract | src/providers/base-provider.ts | | OpenAI-compatible provider | src/providers/openai-compatible.adapter.ts | | Adapter selection | src/providers/provider-factory.ts | | Model discovery | src/core/model-discovery.ts | | File output | src/core/file-output-publisher.ts | | Validation layer | src/validation/ | | Tool handlers | src/tools/ | | Utilities | src/utils/ |

🔮 Future provider list

Easiest next additions

more OpenAI-compatible gateways
provider-specific quirks layer

Best next adapters

Later / higher-effort adapters

These are roadmap targets, not currently implemented files.

🧪 Development workflow

npm install
npm run check
npm run build

After changing MCP settings or rebuilding:

reload the MCP runtime / extension
start a fresh session if needed

✅ Project status

local MCP server implemented
OpenAI-compatible provider adapter implemented
modular structure aligned with the plan
explicit output directory required
configurable provider/download timeout support added
no image base64 context bloat
build verified
ready for runtime MCP usage after MCP reload

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🎨 Vision Generator MCP

Local-first MCP server for image & video generation through OpenAI-compatible providers

🚀 Why this exists

✨ Highlights

🧭 How it works

🧱 Project structure

✅ Current supported workflow

Image

Video

Discovery

📦 Installation

Requirements

Install from npm

Run installed binary

Local development install

Type-check

Build

Publishable package notes

⚙️ MCP settings

Timeout layers

📁 Output strategy

🛠️ Tool reference

list_models

Example output

get_model_capabilities

Example input

generate_image

Example input

Example output

edit_image

Example input

generate_video

Example input

Example submit result

animate_image

get_job_status

🧩 Implementation map

🔮 Future provider list

Easiest next additions

Best next adapters

Later / higher-effort adapters

🧪 Development workflow

✅ Project status

`list_models`

`get_model_capabilities`

`generate_image`

`edit_image`

`generate_video`

`animate_image`

`get_job_status`