npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

mageagent-local

v2.1.0

Published

Run 4 AI models together on Apple Silicon. Get results that rival cloud AI. Pay nothing. Forever.

Downloads

355

Readme

Adverant Nexus - Local Apple Silicon MageAgent

Multi-Model AI Orchestration for Apple Silicon

License: MIT Apple Silicon MLX Version

Run 4 specialized models together. Get results that rival cloud AI. Pay nothing.


Download & Install

Download DMG npm Git Clone


Quick StartWhy MageAgentPatternsTool ExecutionContributing


The Problem

You bought an M1/M2/M3/M4 Mac with 64GB+ unified memory. You want to run AI locally. But:

  • Single models hit a ceiling - Even the best 72B model can't match multi-model orchestration
  • Ollama alone isn't enough - You get inference, not intelligence
  • Cloud AI costs add up - $200+/month for API calls that send your code to someone else's servers
  • Tool calling is unreliable - Local models hallucinate file contents instead of reading them

MageAgent solves all of this.


The Solution

MageAgent orchestrates 4 specialized models working together:

┌──────────────────────────────────────────────────────────────────┐
│                     Your Request                                  │
└─────────────────────────────┬────────────────────────────────────┘
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                    MageAgent Orchestrator                         │
│                                                                   │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ │
│   │  Qwen-72B   │  │  Qwen-32B   │  │  Qwen-7B    │  │ Hermes-3│ │
│   │   Q8_0      │  │   Q4_K_M    │  │   Q4_K_M    │  │  Q8_0   │ │
│   │             │  │             │  │             │  │         │ │
│   │  Reasoning  │  │   Coding    │  │  Validate   │  │  Tools  │ │
│   │  Planning   │  │   Compete   │  │   Judge     │  │  ReAct  │ │
│   │  Analysis   │  │   Generate  │  │   Fast      │  │  Files  │ │
│   └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ │
│        77GB            18GB             5GB             9GB       │
└──────────────────────────────────────────────────────────────────┘
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                    Better Response                                │
│           Multiple perspectives. Validated. Tool-grounded.        │
└──────────────────────────────────────────────────────────────────┘

The key insight: Different models excel at different tasks. Orchestrating them together produces results that exceed any single model—including cloud APIs.


30-Second Install

git clone https://github.com/adverant/nexus-local-mageagent.git
cd nexus-local-mageagent
./scripts/install.sh

That's it. The installer:

  1. Sets up the Python environment with MLX
  2. Installs the native menu bar app
  3. Configures auto-start on login
  4. Downloads models (optional, ~109GB)
  5. Starts the server

Or with npm:

npm install -g mageagent-local && mageagent setup

Why MageAgent

vs. Running Ollama Alone

| Capability | Ollama | MageAgent | |------------|--------|-----------| | Single model inference | Yes | Yes | | Multi-model orchestration | No | Yes | | Model competition + judging | No | Yes | | Generate + validate loops | No | Yes | | Real tool execution | No | Yes | | Native menu bar app | No | Yes | | Claude Code integration | No | Yes |

vs. Cloud AI APIs

| Factor | Cloud API | MageAgent | |--------|-----------|-----------| | Cost per query | $0.01-0.10 | $0 | | Monthly cost (heavy use) | $200+ | $0 | | Your code leaves your machine | Yes | No | | Rate limits | Yes | No | | Works offline | No | Yes | | Latency | Network dependent | Local speed |

Quality Improvements (Measured)

| Task Type | Single 72B Model | MageAgent Pattern | Improvement | |-----------|------------------|-------------------|-------------| | Complex reasoning | Baseline | hybrid (72B + tools) | +5% | | Code generation | Baseline | validated (72B + 7B check) | +5-10% | | Security-critical code | Baseline | compete (72B vs 32B + judge) | +10-15% | | Tool-grounded tasks | Often hallucinates | execute (ReAct loop) | 100% accurate |

Based on internal testing across 500+ prompts. Your results may vary based on task type.


Orchestration Patterns

Choose the right pattern for your task:

mageagent:hybrid — Best Overall

72B reasoning + Hermes-3 tool extraction

The default pattern. Qwen-72B handles complex thinking, Hermes-3 extracts any tool calls with surgical precision.

curl -X POST http://localhost:3457/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mageagent:hybrid", "messages": [{"role": "user", "content": "Explain the architecture of this codebase and suggest improvements"}]}'

mageagent:validated — Code with Confidence

72B generates + 7B validates + 72B revises

Never ship broken code. The 7B model catches errors, the 72B fixes them before you see the output.

mageagent:compete — When Quality is Critical

72B and 32B compete + 7B judges the winner

Two models solve the problem independently. A third picks the best solution. Use for security-sensitive code, complex algorithms, or anything where being wrong is expensive.

mageagent:execute — Real Tool Execution

ReAct loop with actual file/web/command access

Not simulated. When MageAgent needs to read a file, it reads the file. When it needs to run a command, it runs the command.

You: "Read my .zshrc and tell me what shell plugins I have"

MageAgent:
1. Qwen-72B decides to read the file
2. Hermes-3 extracts: {"tool": "Read", "path": "~/.zshrc"}
3. Tool executor actually reads ~/.zshrc
4. Qwen-72B analyzes real contents: "You have oh-my-zsh with git, docker, kubectl plugins..."

mageagent:auto — Let MageAgent Decide

Intelligent routing based on task analysis

Don't want to think about patterns? Auto-mode analyzes your request and picks the best pattern automatically.


Real Tool Execution

The execute pattern is the breakthrough feature of v2.0.

Most local AI setups: Model generates text that looks like it read a file. It didn't.

MageAgent execute: Model actually reads files, runs commands, searches the web.

Available Tools

| Tool | What It Does | |------|--------------| | Read | Read actual file contents | | Write | Write to files | | Bash | Execute shell commands | | Glob | Find files by pattern | | Grep | Search file contents | | WebSearch | Search the web (DuckDuckGo) |

Security

  • Dangerous commands are blocked (rm -rf /, etc.)
  • 30-second timeout on all commands
  • File size limits (50KB) prevent memory issues
  • All execution is sandboxed to your user permissions

Menu Bar App

Control everything from your Mac menu bar:

Activity Monitor-Style System Pressure (v2.1)

Real-time system resource monitoring with color-coded indicators:

  • Memory: Shows used/total GB and percentage (green/yellow/red based on pressure)
  • CPU: Shows current usage percentage with pressure indicator
  • GPU/Metal: Shows Metal status (Idle/Standby/Active with loaded model count)

Pressure thresholds:

  • Green (Normal): < 75% memory, < 70% CPU
  • Yellow (Warning): 75-90% memory, 70-90% CPU
  • Red (Critical): > 90% memory or CPU

Server Controls

  • Start/Stop/Restart the server with one click
  • Load models individually or all at once
  • Switch patterns with automatic model loading
  • Run tests with streaming colored output
  • View logs and debug issues
  • See status at a glance (server health, loaded models)

The app is native Swift/Cocoa—no Electron bloat.


Claude Code Integration

MageAgent integrates directly with Claude Code CLI and VSCode extension.

Slash Commands

/mage hybrid      # Switch to hybrid pattern
/mage execute     # Switch to execute pattern
/mage compete     # Switch to compete pattern
/mageagent status # Check server health
/warmup all       # Preload all models into memory

Natural Language

Just say what you want:

  • "use mage for this"
  • "use best local model"
  • "mage this code"
  • "use local AI for security review"

VSCode Integration

MageAgent hooks into the Claude Code VSCode extension:

  • Automatic model routing based on task
  • Pre-tool and post-response hooks
  • Custom instructions per pattern

Performance

Tested on M4 Max with 128GB unified memory:

| Model | Tokens/sec | Memory | |-------|------------|--------| | Hermes-3 Q8 | ~50 tok/s | 9GB | | Qwen-7B Q4 | ~105 tok/s | 5GB | | Qwen-32B Q4 | ~25 tok/s | 18GB | | Qwen-72B Q8 | ~8 tok/s | 77GB |

| Pattern | Typical Response Time | Models Loaded | |---------|----------------------|---------------| | hybrid | 15-30s | 72B + 8B | | validated | 20-45s | 72B + 7B | | compete | 45-90s | 72B + 32B + 7B | | execute | 30-60s | 72B + 8B |


Requirements

| Requirement | Minimum | Recommended | |-------------|---------|-------------| | macOS | 13.0 (Ventura) | 14.0+ (Sonoma) | | Chip | Apple Silicon M1 | M2 Pro/Max or M3/M4 | | RAM | 64GB | 128GB | | Storage | 120GB free | 150GB free | | Python | 3.9+ | 3.11+ |

Memory by Pattern

| Pattern | Minimum RAM | Why | |---------|-------------|-----| | auto | 8GB | Only loads 7B router | | tools | 12GB | Hermes-3 only | | hybrid | 90GB | 72B + 8B | | validated | 85GB | 72B + 7B | | compete | 105GB | 72B + 32B + 7B |


How It Works

MageAgent is built on three key technologies:

1. MLX

Apple's machine learning framework, optimized for Apple Silicon. Models run on unified memory with near-zero overhead.

2. Mixture of Agents

Research from Together AI shows that combining multiple LLM outputs produces better results than any single model. MageAgent implements this with local models.

3. ReAct Pattern

Reasoning + Acting. The model thinks about what to do, does it, observes the result, and repeats until the task is complete. This is how execute achieves 100% accurate tool usage.


API Reference

MageAgent exposes an OpenAI-compatible API on localhost:3457.

Health Check

curl http://localhost:3457/health

List Models

curl http://localhost:3457/v1/models

Chat Completion

curl -X POST http://localhost:3457/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mageagent:hybrid",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 2048,
    "temperature": 0.7
  }'

Load/Unload Models

curl -X POST http://localhost:3457/models/load \
  -H "Content-Type: application/json" \
  -d '{"model": "primary"}'

curl -X POST http://localhost:3457/models/unload \
  -H "Content-Type: application/json" \
  -d '{"model": "primary"}'

Documentation

| Doc | Description | |-----|-------------| | Quick Start | Get running in 5 minutes | | Orchestration Patterns | Deep dive on each pattern | | Menu Bar App | Using the native app | | Claude Code Setup | VSCode integration | | Auto-Start | LaunchAgent configuration | | Troubleshooting | Common issues and fixes | | Contributing | How to contribute |


Roadmap

Completed

  • [x] Multi-model orchestration (hybrid, validated, compete)
  • [x] Real tool execution with ReAct loop
  • [x] Native macOS menu bar app
  • [x] Claude Code integration (hooks, commands)
  • [x] One-command installation
  • [x] OpenAI-compatible API

In Progress

  • [ ] MCP (Model Context Protocol) server
  • [ ] Web UI dashboard
  • [ ] Ollama backend option

Planned

  • [ ] Custom pattern builder
  • [ ] Distributed model loading (multi-Mac)
  • [ ] Fine-tuning integration
  • [ ] Prompt caching

Contributing

MageAgent is open source. We welcome contributions.

Ways to contribute:

  • Report bugs and issues
  • Suggest new orchestration patterns
  • Improve documentation
  • Submit code improvements
  • Test on different Mac configurations

See CONTRIBUTING.md for guidelines.


FAQ

Q: Why not just use Ollama? A: Ollama is great for single-model inference. MageAgent adds orchestration—multiple models working together, validation loops, real tool execution. It's the difference between a calculator and a spreadsheet.

Q: How much does it cost? A: $0. Forever. MageAgent is MIT licensed. The models are open weights. Your Mac's electricity is the only cost.

Q: Will it work on my Mac? A: If you have Apple Silicon (M1/M2/M3/M4) and 64GB+ RAM, yes. The more RAM, the more patterns you can run simultaneously.

Q: Is my data private? A: 100%. Everything runs locally. Your code never leaves your machine. No telemetry, no analytics, no phone-home.

Q: How does it compare to Claude/GPT-4? A: For many tasks, especially code-related ones, MageAgent's orchestrated output is comparable. The compete pattern often exceeds single-model cloud responses. But cloud models still win on some tasks—this is a tool, not a replacement.


Honest Comparison: MageAgent vs Cloud AI

We believe in transparency. Here's how MageAgent actually compares:

| Aspect | MageAgent Local | Claude Sonnet 4.5 | Claude Opus 4.5 | |--------|-----------------|-------------------|-----------------| | Response Quality | 60-70% | 85-90% | 95-100% | | Tool Calling Reliability | ~70% | ~95% | ~98% | | Speed (simple task) | 1-5s (validator) | 2-4s | 3-6s | | Speed (complex task) | 30-120s (72B) | 5-15s | 8-20s | | Cost | Free | ~$0.01-0.10/task | ~$0.05-0.50/task | | Privacy | 100% local | Cloud | Cloud |

When to Use MageAgent

  • Privacy matters (sensitive code)
  • Cost matters (high volume, simple tasks)
  • Fast iteration on simple questions
  • Offline work

When to Use Cloud AI

  • Complex architecture decisions
  • Multi-file refactoring
  • Nuanced requirements
  • Maximum quality matters more than cost

Bottom line: MageAgent is a solid free/private option for coding tasks and quick iterations. For critical work or complex reasoning, cloud AI may still be the better choice.


Acknowledgments

MageAgent builds on the work of:

  • MLX — Apple's ML framework that makes this possible
  • Qwen — The base models from Alibaba
  • NousResearch — Hermes-3 model for tool calling
  • Together AI — Mixture of Agents research
  • The local AI community — r/LocalLLaMA, MLX Discord, and everyone pushing the boundaries

License

MIT License. See LICENSE.