antigone
v0.2.0
Published
Vision bridge for text-only AI models. Like Antigone guiding blind Oedipus — gives sight to models that can't see.
Maintainers
Readme
About The Project
Many powerful AI models — DeepSeek, big-pickle, MiMo, MiniMax — are text-only. They can write code, reason through problems, and orchestrate complex workflows, but they can't see. When your agentic workflow involves screenshots, UI comparisons, or visual QA, these models hit a wall.
Antigone bridges that gap. It routes images through multimodal models (Gemini, Claude, GPT-4V) and returns structured text that any model can understand.
Why Antigone?
- Multi-image comparison — Send a Figma export and a device screenshot in a single call, get back structured diffs
- Provider-agnostic — Gemini (free), OpenRouter, Anthropic, or OpenAI — swap via config
- Three interfaces — Use as a CLI, a library, or an MCP server
- Structured output — JSON with typed fields, not just prose
Built With
Getting Started
Prerequisites
- Node.js 18+
- A vision provider API key (Gemini free tier is the default)
Installation
Get a free Gemini API key at aistudio.google.com
Install globally or use via npx
npm install -g antigoneSet your API key
export GEMINI_API_KEY=your-key-here
Usage
CLI
# Describe an image
npx antigone describe screenshot.png
# Compare two images (e.g., Figma design vs device screenshot)
npx antigone compare design.png screenshot.png
# Extract text from an image
npx antigone ocr receipt.png
# Check UI against platform guidelines
npx antigone check-ui app-screen.png --platform iosLibrary
import { describe, compare, ocr, checkUI } from 'antigone'
const description = await describe('screenshot.png')
const diff = await compare(['design.png', 'device.png'])
const text = await ocr('document.png')
const audit = await checkUI('screen.png', { platform: 'ios' })MCP Server
npx antigone serve{
"mcpServers": {
"antigone": {
"command": "npx",
"args": ["antigone", "serve"],
"env": {
"GEMINI_API_KEY": "your-key"
}
}
}
}Agent Skill
Antigone ships with a skill/SKILL.md that any agent framework can use. No protocol overhead — just CLI calls that return text.
For more examples, please refer to the Documentation
Roadmap
- [x] Core
describeandcomparetools with Gemini provider - [x] CLI interface
- [x] MCP server adapter
- [x]
ocrtool - [x]
check-uitool with platform guidelines - [x] Agent skill definition
- [x] Multi-provider support (OpenRouter, Anthropic, OpenAI)
- [ ] Provider fallback chains
- [ ] Caching layer
See the open issues for a full list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link: https://github.com/zjpiazza/antigone
