antigone

v0.2.0

Published

9 days ago

Vision bridge for text-only AI models. Like Antigone guiding blind Oedipus — gives sight to models that can't see.

0High
0Medium
0Low

zjpiazza

vision ai mcp image-analysis text-only multimodal gemini llm model-context-protocol cli

About The Project

Many powerful AI models — DeepSeek, big-pickle, MiMo, MiniMax — are text-only. They can write code, reason through problems, and orchestrate complex workflows, but they can't see. When your agentic workflow involves screenshots, UI comparisons, or visual QA, these models hit a wall.

Antigone bridges that gap. It routes images through multimodal models (Gemini, Claude, GPT-4V) and returns structured text that any model can understand.

Why Antigone?

Multi-image comparison — Send a Figma export and a device screenshot in a single call, get back structured diffs
Provider-agnostic — Gemini (free), OpenRouter, Anthropic, or OpenAI — swap via config
Three interfaces — Use as a CLI, a library, or an MCP server
Structured output — JSON with typed fields, not just prose

Built With

Getting Started

Prerequisites

Node.js 18+
A vision provider API key (Gemini free tier is the default)

Installation

Get a free Gemini API key at aistudio.google.com
Install globally or use via npx
```
npm install -g antigone
```
Set your API key
```
export GEMINI_API_KEY=your-key-here
```

Usage

CLI

# Describe an image
npx antigone describe screenshot.png

# Compare two images (e.g., Figma design vs device screenshot)
npx antigone compare design.png screenshot.png

# Extract text from an image
npx antigone ocr receipt.png

# Check UI against platform guidelines
npx antigone check-ui app-screen.png --platform ios

Library

import { describe, compare, ocr, checkUI } from 'antigone'

const description = await describe('screenshot.png')
const diff = await compare(['design.png', 'device.png'])
const text = await ocr('document.png')
const audit = await checkUI('screen.png', { platform: 'ios' })

MCP Server

npx antigone serve

{
  "mcpServers": {
    "antigone": {
      "command": "npx",
      "args": ["antigone", "serve"],
      "env": {
        "GEMINI_API_KEY": "your-key"
      }
    }
  }
}

Agent Skill

Antigone ships with a skill/SKILL.md that any agent framework can use. No protocol overhead — just CLI calls that return text.

For more examples, please refer to the Documentation

Roadmap

[x] Core describe and compare tools with Gemini provider
[x] CLI interface
[x] MCP server adapter
[x] ocr tool
[x] check-ui tool with platform guidelines
[x] Agent skill definition
[x] Multi-provider support (OpenRouter, Anthropic, OpenAI)
[ ] Provider fallback chains
[ ] Caching layer

See the open issues for a full list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Project Link: https://github.com/zjpiazza/antigone