code-mode-toon

v1.0.3

Published

14 days ago

Lightweight MCP orchestrator with TOON compression (30-90% token savings) and lazy loading for efficient AI agent workflows

Downloads

385

0High
0Medium
0Low

ziad.hassan

mcp model-context-protocol ai-agents token-optimization toon-compression typescript sre devops lazy-loading orchestrator

CodeModeTOON MCP Server

NPM Version

A lightweight Model Context Protocol (MCP) orchestrator designed for efficiency at scale. It features TOON compression (reducing token usage by 30-90%) and Lazy Loading, making it the ideal solution for complex, multi-tool agentic workflows.

The "Context Trap" in Agentic Workflows

Recent articles from Anthropic and Cloudflare (see Here) highlights a critical bottleneck: AI agents struggle with complex, multi-step workflows because they lack state.

While Code Execution (e.g., TypeScript) allows agents to maintain state and structure workflows effectively, it introduces a new problem: Data Bloat. Real-world operations (like SRE log analysis or database dumps) generate massive JSON payloads that explode the context window, making stateful execution prohibitively expensive.

CodeModeTOON bridges this gap. It enables:

Stateful Execution: Run complex TypeScript workflows to maintain context outside the model.
Context Efficiency: Use TOON Compression to "zip" the results, allowing agents to process massive datasets without blowing their token budget.

How It Works

graph LR
    A[AI Agent<br/>Claude/Cursor] -->|JSON-RPC| B[CodeModeTOON<br/>Server]
    B -->|Lazy Load| C[Perplexity]
    B -->|Lazy Load| D[Context7]
    B -->|Lazy Load| E[Custom Servers]
    C -->|Raw JSON| B
    D -->|Raw JSON| B
    E -->|Raw JSON| B
    B -->|TOON<br/>Compressed| A
    
    style B fill:#4f46e5,color:#fff
    style A fill:#10b981,color:#fff

Data Flow: Requests route through CodeModeTOON → Servers are lazy-loaded on-demand → Responses are TOON-compressed before returning to the agent.

🔥 Key Features

🗜️ TOON Compression

Reduces token usage by 30-90% for structured data.

Validated: ~83% savings on Kubernetes audits
Best for: SRE logs, database dumps, API responses
How it works: Schema extraction + value compression

⚡ Lazy Loading

Servers only start when needed. Zero overhead for unused tools.

Best for: Multi-tool workflows, resource-constrained environments
Performance: Sub-100ms startup for active servers

🔒 Sandboxed Execution

Secure JS execution with auto-proxied MCP tool access.

Best for: Complex stateful workflows, batch operations
Security: Uses Node.js vm module (not for multi-tenant use)

🤖 Agent-Friendly Features

Designed for programmatic discovery and self-correction.

suggest_approach: Meta-tool that recommends the best execution strategy (code vs workflow vs direct call).
Efficiency Metrics: execute_code returns operation counts and compression savings to reinforce efficient behavior.
Recovery Hints: Error messages include actionable next steps for agents (e.g., "Server not found? Try list_servers").

When to Use CodeModeTOON

✅ Perfect for:

Multi-step AI workflows requiring state management
Processing large structured datasets (logs, DB dumps, K8s manifests)
Coordinating multiple MCP servers in parallel
Token-constrained environments (reducing API costs)

❌ Not ideal for:

Simple single-tool queries
Unstructured text-heavy responses (compression <10%)
Multi-tenant production servers (vm module security limitation)

Installation

One‑Click (Cursor)

Manual Setup

Add this to your ~/.cursor/mcp.json:

{
  "mcpServers": {
    "code-mode-toon": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "code-mode-toon"],
      "env": {
        "CODE_MODE_TOON_CONFIG": "~/.cursor/mcp.json"
      }
    }
  }
}

🧠 Claude Skills

CodeModeTOON includes a pre-built Claude Skill to make your AI assistant an expert at using this orchestrator.

`code-mode-toon-workflow-expert`

A specialized skill that teaches Claude how to:

Decide when to use a workflow vs ad-hoc code.
Create new workflows following best practices.
Orchestrate multiple tools efficiently.

Installation:

Unzip claude-skills/code-mode-toon-workflow-expert.skill
Place the folder in your .claude/skills/ directory (or import via Claude desktop app).

🤖 AI Assistant Prompts

Copy these prompts into your AI's custom instructions (e.g., .cursorrules or Claude Project instructions) to maximize CodeModeTOON's potential.

1. System Identity & Orchestration (Essential)

Goal: Teaches the AI to act as an orchestrator and prioritize workflows.

YOU ARE AN AGENTIC ORCHESTRATOR. You have access to "CodeModeTOON", a high-efficiency MCP bridge.
1. PRIORITIZE WORKFLOWS: Before running single tools, check `list_workflows`. If a workflow exists (e.g., `research`, `k8s-detective`), USE IT. It is faster and saves tokens.
2. HANDLE COMPRESSED DATA: Outputs may be "TOON encoded" (highly compressed JSON). This is normal. Do not complain about "unreadable data" - simply parse it or ask for specific fields if needed.
3. BATCH OPERATIONS: Never run 3+ sequential tool calls if they can be batched. Use `execute_code` to run them in a single block.

2. Tool Discovery (Lazy Loading)

Goal: Prevents the AI from giving up if a tool isn't immediately visible.

TOOLS ARE LAZY LOADED. If you need a capability (e.g., "search", "kubernetes", "database") and don't see the tool:
1. DO NOT assume it's missing.
2. RUN `search_tools({ query: "..." })` to find it.
3. RUN `get_tool_api({ serverName: "..." })` to learn how to use it.
4. Only then, execute the tool.

3. Efficiency & TOON Compression

Goal: Enforces token-saving behaviors for large data operations.

OPTIMIZE FOR TOKENS. When fetching large datasets (logs, docs, API responses):
1. ALWAYS wrap the output in `TOON.encode(data)` inside `execute_code`.
2. PREFER structured data (JSON/Objects) over plain text. TOON compresses structure by ~83%, but text by only ~4%.
3. IF synthesizing data, do it server-side (via workflow `synthesize: true`) to avoid pulling raw data into context.

Quick Start

After installation, try this 30-second demo in Claude or Cursor:

// Ask your AI assistant to run this via execute_code
const api = await get_tool_api({ serverName: 'perplexity' });

const result = await servers['perplexity'].perplexity_ask({
  messages: [{ role: 'user', content: "Explain TOON compression" }]
});

console.log(result); // See compression in action! ~40% token savings

What just happened? The response was automatically TOON-encoded, saving tokens.

Usage Examples

// Inside execute_code
const api = await get_tool_api({ serverName: 'perplexity' });

// Request large data - automatically compressed!
const result = await servers['perplexity'].perplexity_ask({
  messages: [{ role: 'user', content: "Summarize the history of Rome" }]
});

console.log(result); // Returns TOON-encoded string, saving ~40% tokens

// Fetch large documentation from Context7
const api = await get_tool_api({ serverName: 'context7' });
const docs = await servers['context7']['get-library-docs']({
  context7CompatibleLibraryID: 'kubernetes/kubernetes'
});

console.log(TOON.encode(docs)); // Massive compression on structured data

// Run a complex research workflow
const result = await workflows.research({
  goal: "Compare xsync vs sync.Map performance",
  queries: ["xsync vs sync.Map benchmarks"],
  synthesize: true,
  outputFile: "/tmp/research.toon"
});

console.log(result.synthesis); // LLM-synthesized findings

Workflows

CodeModeTOON supports Workflows—pre-defined, server-side TypeScript modules that orchestrate multiple MCP tools.

Research Workflow

A powerful research assistant that:

Parallelizes data fetching from multiple sources (Context7, Wikipedia, Perplexity).
Synthesizes findings using LLMs (optional).
Outputs TOON-encoded files for maximum context efficiency.
Retries failed requests automatically.

See .workflows/README.md for detailed documentation, usage examples, and AI prompts.

Performance Benchmark

Why This Matters

Scenario 2 (92% savings) demonstrates CodeModeTOON's strength:

| Metric | Original | TOON | Savings | |--------|----------|------|---------| | Characters | 37,263 | 2,824 | ~83% | | Estimated Tokens* | ~9,315 | ~706 | ~8,600 tokens | | Cost (Claude Sonnet)** | $0.028 | $0.002 | $0.026 |

*Assuming 4 chars/token average
***$3/M tokens input pricing*

Key Insight: For infrastructure audits, log analysis, or database dumps, TOON compression can reduce token costs by 90%+, making complex agentic workflows feasible within budget.

Scenario 1: Natural Language Query (History of Rome) Unstructured text compresses poorly, as expected.

Original JSON: 11,651 chars
TOON Encoded: 11,166 chars
Compression Ratio: ~4.16% Savings

Scenario 2: Kubernetes Cluster Audit (50 Pods) Highly structured, repetitive JSON (infrastructure dumps) compresses extremely well.

Original JSON: 37,263 chars
TOON Encoded: 2,824 chars
Compression Ratio: ~83% Savings 📉

Troubleshooting

"Server not found" error

Cause: CodeModeTOON can't locate your MCP config. Solution: Ensure CODE_MODE_TOON_CONFIG points to your config:

export CODE_MODE_TOON_CONFIG=~/.cursor/mcp.json

TOON encoding not working

Cause: Results aren't being encoded. Solution: Use console.log(TOON.encode(data)), not console.log(data).

Lazy server won't load

Cause: Server name mismatch. Solution: Verify server name matches your config. Use get_tool_api({ serverName: 'name' }) to inspect available servers.

Security Note

⚠️ The vm module is NOT a security sandbox. Suitable for personal AI assistant use (Claude, Cursor) with trusted code. Not for multi-tenant or public services.

Acknowledgments

Anthropic: Code execution with MCP
Cloudflare: Code Mode announcement

Author

Built by Ziad Hassan (Senior SRE/DevOps) — LinkedIn · GitHub

Contributing

Contributions are welcome! 🙌

Ways to Contribute

Report bugs - Open an issue with reproduction steps
Suggest features - Discuss use cases in Issues
Add workflows - See Workflows
Improve docs - Documentation PRs always welcome

Development Setup

git clone https://github.com/ziad-hsn/code-mode-toon.git
cd code-mode-toon
npm install
npm test

License

MIT License — see LICENSE for details.