mageagent-local
v2.1.0
Published
Run 4 AI models together on Apple Silicon. Get results that rival cloud AI. Pay nothing. Forever.
Downloads
355
Maintainers
Readme
Adverant Nexus - Local Apple Silicon MageAgent
Multi-Model AI Orchestration for Apple Silicon
Run 4 specialized models together. Get results that rival cloud AI. Pay nothing.
Download & Install
Quick Start • Why MageAgent • Patterns • Tool Execution • Contributing
The Problem
You bought an M1/M2/M3/M4 Mac with 64GB+ unified memory. You want to run AI locally. But:
- Single models hit a ceiling - Even the best 72B model can't match multi-model orchestration
- Ollama alone isn't enough - You get inference, not intelligence
- Cloud AI costs add up - $200+/month for API calls that send your code to someone else's servers
- Tool calling is unreliable - Local models hallucinate file contents instead of reading them
MageAgent solves all of this.
The Solution
MageAgent orchestrates 4 specialized models working together:
┌──────────────────────────────────────────────────────────────────┐
│ Your Request │
└─────────────────────────────┬────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ MageAgent Orchestrator │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │
│ │ Qwen-72B │ │ Qwen-32B │ │ Qwen-7B │ │ Hermes-3│ │
│ │ Q8_0 │ │ Q4_K_M │ │ Q4_K_M │ │ Q8_0 │ │
│ │ │ │ │ │ │ │ │ │
│ │ Reasoning │ │ Coding │ │ Validate │ │ Tools │ │
│ │ Planning │ │ Compete │ │ Judge │ │ ReAct │ │
│ │ Analysis │ │ Generate │ │ Fast │ │ Files │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘ │
│ 77GB 18GB 5GB 9GB │
└──────────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ Better Response │
│ Multiple perspectives. Validated. Tool-grounded. │
└──────────────────────────────────────────────────────────────────┘The key insight: Different models excel at different tasks. Orchestrating them together produces results that exceed any single model—including cloud APIs.
30-Second Install
git clone https://github.com/adverant/nexus-local-mageagent.git
cd nexus-local-mageagent
./scripts/install.shThat's it. The installer:
- Sets up the Python environment with MLX
- Installs the native menu bar app
- Configures auto-start on login
- Downloads models (optional, ~109GB)
- Starts the server
Or with npm:
npm install -g mageagent-local && mageagent setupWhy MageAgent
vs. Running Ollama Alone
| Capability | Ollama | MageAgent | |------------|--------|-----------| | Single model inference | Yes | Yes | | Multi-model orchestration | No | Yes | | Model competition + judging | No | Yes | | Generate + validate loops | No | Yes | | Real tool execution | No | Yes | | Native menu bar app | No | Yes | | Claude Code integration | No | Yes |
vs. Cloud AI APIs
| Factor | Cloud API | MageAgent | |--------|-----------|-----------| | Cost per query | $0.01-0.10 | $0 | | Monthly cost (heavy use) | $200+ | $0 | | Your code leaves your machine | Yes | No | | Rate limits | Yes | No | | Works offline | No | Yes | | Latency | Network dependent | Local speed |
Quality Improvements (Measured)
| Task Type | Single 72B Model | MageAgent Pattern | Improvement |
|-----------|------------------|-------------------|-------------|
| Complex reasoning | Baseline | hybrid (72B + tools) | +5% |
| Code generation | Baseline | validated (72B + 7B check) | +5-10% |
| Security-critical code | Baseline | compete (72B vs 32B + judge) | +10-15% |
| Tool-grounded tasks | Often hallucinates | execute (ReAct loop) | 100% accurate |
Based on internal testing across 500+ prompts. Your results may vary based on task type.
Orchestration Patterns
Choose the right pattern for your task:
mageagent:hybrid — Best Overall
72B reasoning + Hermes-3 tool extraction
The default pattern. Qwen-72B handles complex thinking, Hermes-3 extracts any tool calls with surgical precision.
curl -X POST http://localhost:3457/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mageagent:hybrid", "messages": [{"role": "user", "content": "Explain the architecture of this codebase and suggest improvements"}]}'mageagent:validated — Code with Confidence
72B generates + 7B validates + 72B revises
Never ship broken code. The 7B model catches errors, the 72B fixes them before you see the output.
mageagent:compete — When Quality is Critical
72B and 32B compete + 7B judges the winner
Two models solve the problem independently. A third picks the best solution. Use for security-sensitive code, complex algorithms, or anything where being wrong is expensive.
mageagent:execute — Real Tool Execution
ReAct loop with actual file/web/command access
Not simulated. When MageAgent needs to read a file, it reads the file. When it needs to run a command, it runs the command.
You: "Read my .zshrc and tell me what shell plugins I have"
MageAgent:
1. Qwen-72B decides to read the file
2. Hermes-3 extracts: {"tool": "Read", "path": "~/.zshrc"}
3. Tool executor actually reads ~/.zshrc
4. Qwen-72B analyzes real contents: "You have oh-my-zsh with git, docker, kubectl plugins..."mageagent:auto — Let MageAgent Decide
Intelligent routing based on task analysis
Don't want to think about patterns? Auto-mode analyzes your request and picks the best pattern automatically.
Real Tool Execution
The execute pattern is the breakthrough feature of v2.0.
Most local AI setups: Model generates text that looks like it read a file. It didn't.
MageAgent execute: Model actually reads files, runs commands, searches the web.
Available Tools
| Tool | What It Does |
|------|--------------|
| Read | Read actual file contents |
| Write | Write to files |
| Bash | Execute shell commands |
| Glob | Find files by pattern |
| Grep | Search file contents |
| WebSearch | Search the web (DuckDuckGo) |
Security
- Dangerous commands are blocked (
rm -rf /, etc.) - 30-second timeout on all commands
- File size limits (50KB) prevent memory issues
- All execution is sandboxed to your user permissions
Menu Bar App
Control everything from your Mac menu bar:
Activity Monitor-Style System Pressure (v2.1)
Real-time system resource monitoring with color-coded indicators:
- Memory: Shows used/total GB and percentage (green/yellow/red based on pressure)
- CPU: Shows current usage percentage with pressure indicator
- GPU/Metal: Shows Metal status (Idle/Standby/Active with loaded model count)
Pressure thresholds:
- Green (Normal): < 75% memory, < 70% CPU
- Yellow (Warning): 75-90% memory, 70-90% CPU
- Red (Critical): > 90% memory or CPU
Server Controls
- Start/Stop/Restart the server with one click
- Load models individually or all at once
- Switch patterns with automatic model loading
- Run tests with streaming colored output
- View logs and debug issues
- See status at a glance (server health, loaded models)
The app is native Swift/Cocoa—no Electron bloat.
Claude Code Integration
MageAgent integrates directly with Claude Code CLI and VSCode extension.
Slash Commands
/mage hybrid # Switch to hybrid pattern
/mage execute # Switch to execute pattern
/mage compete # Switch to compete pattern
/mageagent status # Check server health
/warmup all # Preload all models into memoryNatural Language
Just say what you want:
- "use mage for this"
- "use best local model"
- "mage this code"
- "use local AI for security review"
VSCode Integration
MageAgent hooks into the Claude Code VSCode extension:
- Automatic model routing based on task
- Pre-tool and post-response hooks
- Custom instructions per pattern
Performance
Tested on M4 Max with 128GB unified memory:
| Model | Tokens/sec | Memory | |-------|------------|--------| | Hermes-3 Q8 | ~50 tok/s | 9GB | | Qwen-7B Q4 | ~105 tok/s | 5GB | | Qwen-32B Q4 | ~25 tok/s | 18GB | | Qwen-72B Q8 | ~8 tok/s | 77GB |
| Pattern | Typical Response Time | Models Loaded |
|---------|----------------------|---------------|
| hybrid | 15-30s | 72B + 8B |
| validated | 20-45s | 72B + 7B |
| compete | 45-90s | 72B + 32B + 7B |
| execute | 30-60s | 72B + 8B |
Requirements
| Requirement | Minimum | Recommended | |-------------|---------|-------------| | macOS | 13.0 (Ventura) | 14.0+ (Sonoma) | | Chip | Apple Silicon M1 | M2 Pro/Max or M3/M4 | | RAM | 64GB | 128GB | | Storage | 120GB free | 150GB free | | Python | 3.9+ | 3.11+ |
Memory by Pattern
| Pattern | Minimum RAM | Why |
|---------|-------------|-----|
| auto | 8GB | Only loads 7B router |
| tools | 12GB | Hermes-3 only |
| hybrid | 90GB | 72B + 8B |
| validated | 85GB | 72B + 7B |
| compete | 105GB | 72B + 32B + 7B |
How It Works
MageAgent is built on three key technologies:
1. MLX
Apple's machine learning framework, optimized for Apple Silicon. Models run on unified memory with near-zero overhead.
2. Mixture of Agents
Research from Together AI shows that combining multiple LLM outputs produces better results than any single model. MageAgent implements this with local models.
3. ReAct Pattern
Reasoning + Acting. The model thinks about what to do, does it, observes the result, and repeats until the task is complete. This is how execute achieves 100% accurate tool usage.
API Reference
MageAgent exposes an OpenAI-compatible API on localhost:3457.
Health Check
curl http://localhost:3457/healthList Models
curl http://localhost:3457/v1/modelsChat Completion
curl -X POST http://localhost:3457/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mageagent:hybrid",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 2048,
"temperature": 0.7
}'Load/Unload Models
curl -X POST http://localhost:3457/models/load \
-H "Content-Type: application/json" \
-d '{"model": "primary"}'
curl -X POST http://localhost:3457/models/unload \
-H "Content-Type: application/json" \
-d '{"model": "primary"}'Documentation
| Doc | Description | |-----|-------------| | Quick Start | Get running in 5 minutes | | Orchestration Patterns | Deep dive on each pattern | | Menu Bar App | Using the native app | | Claude Code Setup | VSCode integration | | Auto-Start | LaunchAgent configuration | | Troubleshooting | Common issues and fixes | | Contributing | How to contribute |
Roadmap
Completed
- [x] Multi-model orchestration (hybrid, validated, compete)
- [x] Real tool execution with ReAct loop
- [x] Native macOS menu bar app
- [x] Claude Code integration (hooks, commands)
- [x] One-command installation
- [x] OpenAI-compatible API
In Progress
- [ ] MCP (Model Context Protocol) server
- [ ] Web UI dashboard
- [ ] Ollama backend option
Planned
- [ ] Custom pattern builder
- [ ] Distributed model loading (multi-Mac)
- [ ] Fine-tuning integration
- [ ] Prompt caching
Contributing
MageAgent is open source. We welcome contributions.
Ways to contribute:
- Report bugs and issues
- Suggest new orchestration patterns
- Improve documentation
- Submit code improvements
- Test on different Mac configurations
See CONTRIBUTING.md for guidelines.
FAQ
Q: Why not just use Ollama? A: Ollama is great for single-model inference. MageAgent adds orchestration—multiple models working together, validation loops, real tool execution. It's the difference between a calculator and a spreadsheet.
Q: How much does it cost? A: $0. Forever. MageAgent is MIT licensed. The models are open weights. Your Mac's electricity is the only cost.
Q: Will it work on my Mac? A: If you have Apple Silicon (M1/M2/M3/M4) and 64GB+ RAM, yes. The more RAM, the more patterns you can run simultaneously.
Q: Is my data private? A: 100%. Everything runs locally. Your code never leaves your machine. No telemetry, no analytics, no phone-home.
Q: How does it compare to Claude/GPT-4?
A: For many tasks, especially code-related ones, MageAgent's orchestrated output is comparable. The compete pattern often exceeds single-model cloud responses. But cloud models still win on some tasks—this is a tool, not a replacement.
Honest Comparison: MageAgent vs Cloud AI
We believe in transparency. Here's how MageAgent actually compares:
| Aspect | MageAgent Local | Claude Sonnet 4.5 | Claude Opus 4.5 | |--------|-----------------|-------------------|-----------------| | Response Quality | 60-70% | 85-90% | 95-100% | | Tool Calling Reliability | ~70% | ~95% | ~98% | | Speed (simple task) | 1-5s (validator) | 2-4s | 3-6s | | Speed (complex task) | 30-120s (72B) | 5-15s | 8-20s | | Cost | Free | ~$0.01-0.10/task | ~$0.05-0.50/task | | Privacy | 100% local | Cloud | Cloud |
When to Use MageAgent
- Privacy matters (sensitive code)
- Cost matters (high volume, simple tasks)
- Fast iteration on simple questions
- Offline work
When to Use Cloud AI
- Complex architecture decisions
- Multi-file refactoring
- Nuanced requirements
- Maximum quality matters more than cost
Bottom line: MageAgent is a solid free/private option for coding tasks and quick iterations. For critical work or complex reasoning, cloud AI may still be the better choice.
Acknowledgments
MageAgent builds on the work of:
- MLX — Apple's ML framework that makes this possible
- Qwen — The base models from Alibaba
- NousResearch — Hermes-3 model for tool calling
- Together AI — Mixture of Agents research
- The local AI community — r/LocalLLaMA, MLX Discord, and everyone pushing the boundaries
License
MIT License. See LICENSE.
