code-mode-toon
v1.0.3
Published
Lightweight MCP orchestrator with TOON compression (30-90% token savings) and lazy loading for efficient AI agent workflows
Downloads
385
Maintainers
Readme
CodeModeTOON MCP Server
A lightweight Model Context Protocol (MCP) orchestrator designed for efficiency at scale. It features TOON compression (reducing token usage by 30-90%) and Lazy Loading, making it the ideal solution for complex, multi-tool agentic workflows.
The "Context Trap" in Agentic Workflows
Recent articles from Anthropic and Cloudflare (see Here) highlights a critical bottleneck: AI agents struggle with complex, multi-step workflows because they lack state.
While Code Execution (e.g., TypeScript) allows agents to maintain state and structure workflows effectively, it introduces a new problem: Data Bloat. Real-world operations (like SRE log analysis or database dumps) generate massive JSON payloads that explode the context window, making stateful execution prohibitively expensive.
CodeModeTOON bridges this gap. It enables:
- Stateful Execution: Run complex TypeScript workflows to maintain context outside the model.
- Context Efficiency: Use TOON Compression to "zip" the results, allowing agents to process massive datasets without blowing their token budget.
How It Works
graph LR
A[AI Agent<br/>Claude/Cursor] -->|JSON-RPC| B[CodeModeTOON<br/>Server]
B -->|Lazy Load| C[Perplexity]
B -->|Lazy Load| D[Context7]
B -->|Lazy Load| E[Custom Servers]
C -->|Raw JSON| B
D -->|Raw JSON| B
E -->|Raw JSON| B
B -->|TOON<br/>Compressed| A
style B fill:#4f46e5,color:#fff
style A fill:#10b981,color:#fffData Flow: Requests route through CodeModeTOON → Servers are lazy-loaded on-demand → Responses are TOON-compressed before returning to the agent.
🔥 Key Features
🗜️ TOON Compression
Reduces token usage by 30-90% for structured data.
- Validated: ~83% savings on Kubernetes audits
- Best for: SRE logs, database dumps, API responses
- How it works: Schema extraction + value compression
⚡ Lazy Loading
Servers only start when needed. Zero overhead for unused tools.
- Best for: Multi-tool workflows, resource-constrained environments
- Performance: Sub-100ms startup for active servers
🔒 Sandboxed Execution
Secure JS execution with auto-proxied MCP tool access.
- Best for: Complex stateful workflows, batch operations
- Security: Uses Node.js
vmmodule (not for multi-tenant use)
🤖 Agent-Friendly Features
Designed for programmatic discovery and self-correction.
suggest_approach: Meta-tool that recommends the best execution strategy (code vs workflow vs direct call).- Efficiency Metrics:
execute_codereturns operation counts and compression savings to reinforce efficient behavior. - Recovery Hints: Error messages include actionable next steps for agents (e.g., "Server not found? Try list_servers").
Table of Contents
- The Context Trap
- How It Works
- Key Features
- When to Use
- Installation
- Quick Start
- Usage Examples
- Workflows
- Performance Benchmark
- Troubleshooting
- Security
- Contributing
- License
When to Use CodeModeTOON
✅ Perfect for:
- Multi-step AI workflows requiring state management
- Processing large structured datasets (logs, DB dumps, K8s manifests)
- Coordinating multiple MCP servers in parallel
- Token-constrained environments (reducing API costs)
❌ Not ideal for:
- Simple single-tool queries
- Unstructured text-heavy responses (compression <10%)
- Multi-tenant production servers (vm module security limitation)
Installation
One‑Click (Cursor)
Manual Setup
Add this to your ~/.cursor/mcp.json:
{
"mcpServers": {
"code-mode-toon": {
"type": "stdio",
"command": "npx",
"args": ["-y", "code-mode-toon"],
"env": {
"CODE_MODE_TOON_CONFIG": "~/.cursor/mcp.json"
}
}
}
}🧠 Claude Skills
CodeModeTOON includes a pre-built Claude Skill to make your AI assistant an expert at using this orchestrator.
code-mode-toon-workflow-expert
A specialized skill that teaches Claude how to:
- Decide when to use a workflow vs ad-hoc code.
- Create new workflows following best practices.
- Orchestrate multiple tools efficiently.
Installation:
- Unzip
claude-skills/code-mode-toon-workflow-expert.skill - Place the folder in your
.claude/skills/directory (or import via Claude desktop app).
🤖 AI Assistant Prompts
Copy these prompts into your AI's custom instructions (e.g., .cursorrules or Claude Project instructions) to maximize CodeModeTOON's potential.
1. System Identity & Orchestration (Essential)
Goal: Teaches the AI to act as an orchestrator and prioritize workflows.
YOU ARE AN AGENTIC ORCHESTRATOR. You have access to "CodeModeTOON", a high-efficiency MCP bridge.
1. PRIORITIZE WORKFLOWS: Before running single tools, check `list_workflows`. If a workflow exists (e.g., `research`, `k8s-detective`), USE IT. It is faster and saves tokens.
2. HANDLE COMPRESSED DATA: Outputs may be "TOON encoded" (highly compressed JSON). This is normal. Do not complain about "unreadable data" - simply parse it or ask for specific fields if needed.
3. BATCH OPERATIONS: Never run 3+ sequential tool calls if they can be batched. Use `execute_code` to run them in a single block.2. Tool Discovery (Lazy Loading)
Goal: Prevents the AI from giving up if a tool isn't immediately visible.
TOOLS ARE LAZY LOADED. If you need a capability (e.g., "search", "kubernetes", "database") and don't see the tool:
1. DO NOT assume it's missing.
2. RUN `search_tools({ query: "..." })` to find it.
3. RUN `get_tool_api({ serverName: "..." })` to learn how to use it.
4. Only then, execute the tool.3. Efficiency & TOON Compression
Goal: Enforces token-saving behaviors for large data operations.
OPTIMIZE FOR TOKENS. When fetching large datasets (logs, docs, API responses):
1. ALWAYS wrap the output in `TOON.encode(data)` inside `execute_code`.
2. PREFER structured data (JSON/Objects) over plain text. TOON compresses structure by ~83%, but text by only ~4%.
3. IF synthesizing data, do it server-side (via workflow `synthesize: true`) to avoid pulling raw data into context.Quick Start
After installation, try this 30-second demo in Claude or Cursor:
// Ask your AI assistant to run this via execute_code
const api = await get_tool_api({ serverName: 'perplexity' });
const result = await servers['perplexity'].perplexity_ask({
messages: [{ role: 'user', content: "Explain TOON compression" }]
});
console.log(result); // See compression in action! ~40% token savingsWhat just happened? The response was automatically TOON-encoded, saving tokens.
Usage Examples
// Inside execute_code
const api = await get_tool_api({ serverName: 'perplexity' });
// Request large data - automatically compressed!
const result = await servers['perplexity'].perplexity_ask({
messages: [{ role: 'user', content: "Summarize the history of Rome" }]
});
console.log(result); // Returns TOON-encoded string, saving ~40% tokens// Fetch large documentation from Context7
const api = await get_tool_api({ serverName: 'context7' });
const docs = await servers['context7']['get-library-docs']({
context7CompatibleLibraryID: 'kubernetes/kubernetes'
});
console.log(TOON.encode(docs)); // Massive compression on structured data// Run a complex research workflow
const result = await workflows.research({
goal: "Compare xsync vs sync.Map performance",
queries: ["xsync vs sync.Map benchmarks"],
synthesize: true,
outputFile: "/tmp/research.toon"
});
console.log(result.synthesis); // LLM-synthesized findingsWorkflows
CodeModeTOON supports Workflows—pre-defined, server-side TypeScript modules that orchestrate multiple MCP tools.
Research Workflow
A powerful research assistant that:
- Parallelizes data fetching from multiple sources (Context7, Wikipedia, Perplexity).
- Synthesizes findings using LLMs (optional).
- Outputs TOON-encoded files for maximum context efficiency.
- Retries failed requests automatically.
See .workflows/README.md for detailed documentation, usage examples, and AI prompts.
Performance Benchmark
Why This Matters
Scenario 2 (92% savings) demonstrates CodeModeTOON's strength:
| Metric | Original | TOON | Savings | |--------|----------|------|---------| | Characters | 37,263 | 2,824 | ~83% | | Estimated Tokens* | ~9,315 | ~706 | ~8,600 tokens | | Cost (Claude Sonnet)** | $0.028 | $0.002 | $0.026 |
*Assuming 4 chars/token average
***$3/M tokens input pricing*
Key Insight: For infrastructure audits, log analysis, or database dumps, TOON compression can reduce token costs by 90%+, making complex agentic workflows feasible within budget.
Scenario 1: Natural Language Query (History of Rome) Unstructured text compresses poorly, as expected.
- Original JSON: 11,651 chars
- TOON Encoded: 11,166 chars
- Compression Ratio: ~4.16% Savings
Scenario 2: Kubernetes Cluster Audit (50 Pods) Highly structured, repetitive JSON (infrastructure dumps) compresses extremely well.
- Original JSON: 37,263 chars
- TOON Encoded: 2,824 chars
- Compression Ratio: ~83% Savings 📉
Troubleshooting
"Server not found" error
Cause: CodeModeTOON can't locate your MCP config.
Solution: Ensure CODE_MODE_TOON_CONFIG points to your config:
export CODE_MODE_TOON_CONFIG=~/.cursor/mcp.jsonTOON encoding not working
Cause: Results aren't being encoded.
Solution: Use console.log(TOON.encode(data)), not console.log(data).
Lazy server won't load
Cause: Server name mismatch.
Solution: Verify server name matches your config. Use get_tool_api({ serverName: 'name' }) to inspect available servers.
Security Note
⚠️ The vm module is NOT a security sandbox. Suitable for personal AI assistant use (Claude, Cursor) with trusted code. Not for multi-tenant or public services.
Acknowledgments
- Anthropic: Code execution with MCP
- Cloudflare: Code Mode announcement
Author
Built by Ziad Hassan (Senior SRE/DevOps) — LinkedIn · GitHub
Contributing
Contributions are welcome! 🙌
Ways to Contribute
- Report bugs - Open an issue with reproduction steps
- Suggest features - Discuss use cases in Issues
- Add workflows - See Workflows
- Improve docs - Documentation PRs always welcome
Development Setup
git clone https://github.com/ziad-hsn/code-mode-toon.git
cd code-mode-toon
npm install
npm testLicense
MIT License — see LICENSE for details.
