agentfold-mcp-server

v2.0.0

Published

a month ago

MCP Server implementation of AgentFold - Long-Horizon Web Agents with Proactive Context Management. v2.0 adds analytics, validation, quality evaluation, and performance benchmarking.

0High
0Medium
0Low

opakgraham

mcp mcp-server model-context-protocol agent agentfold context-management llm ai analytics validation quality-evaluation benchmarking web-automation playwright

AgentFold MCP Server v2.0

MCP Server implementation of AgentFold: Long-Horizon Web Agents with Proactive Context Management based on arXiv:2510.24699.

Version 2.0 adds objective analytics, robust validation, comprehensive evaluation, and performance benchmarking.

Overview

AgentFold is a novel agent paradigm that treats context as a dynamic cognitive workspace to be actively sculpted, rather than a passive log to be filled. It addresses the fundamental trade-off in context management for long-horizon tasks through proactive "folding" operations.

Key Features

Core Features:

Proactive Context Management: Automatically manages context to prevent saturation
Dual Folding Operations:
- Condensation: Granular condensations to preserve vital, fine-grained details
- Consolidation: Deep consolidations to abstract away entire multi-step sub-tasks
Dynamic Cognitive Workspace: Context is actively managed at multiple scales
MCP Integration: Full Model Context Protocol support for seamless integration
Perfect for Web Automation: Designed for long-horizon web tasks, works excellently with Playwright MCP

v2.0 Enhanced Features: ⭐

📊 Analytics Engine: 20+ objective metrics (context efficiency, folding performance, information density)
🔍 AI-Powered Insights: Health scores, recommendations, warnings, trends, and predictions
📈 Comprehensive Reports: Executive summaries with detailed analysis
✅ Quality Evaluation: Objective scoring for action, observation, thought quality
🏆 Performance Benchmarking: Measure and compare performance over time
🛡️ Robust Validation: Comprehensive input validation and error handling
📝 Step Analysis: Automatic classification and importance scoring

💡 Recommended: Combine AgentFold with Playwright MCP for powerful web automation with intelligent context management. See INTEGRATION_PLAYWRIGHT.md for details.
📚 New in v2.0: See ENHANCED_FEATURES.md for complete feature documentation.

Installation

cd agentfold-mcp-server
npm install
npm run build

Usage

As MCP Server

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "agentfold": {
      "command": "node",
      "args": ["/path/to/agentfold-mcp-server/dist/index.js"]
    }
  }
}

Available Tools (17 Total)

Core Tools (4)

1. agentfold_add_step - Add trajectory step with automatic validation and analysis

action (required, max 5000 chars): The action taken
observation (required, max 50000 chars): The observation/result
thought (optional, max 10000 chars): Reasoning process
metadata (optional, max 10KB): Additional metadata

2. agentfold_get_context - Get formatted context for LLM

3. agentfold_get_stats - Get basic statistics

4. agentfold_get_workspace - Get raw workspace state

Goal Management (3)

5. agentfold_set_goal - Set main goal (validated, max 1000 chars)

6. agentfold_add_subgoal - Add sub-goal (validated, max 500 chars)

7. agentfold_complete_subgoal - Mark sub-goal as completed

Analytics Tools (3) ⭐ NEW

8. agentfold_get_analytics - Get comprehensive metrics

Returns: 20+ objective metrics including context efficiency, folding performance, information density

9. agentfold_get_insights - Get AI-powered insights

Returns: Health scores, recommendations, warnings, trends, predictions

10. agentfold_get_comprehensive_report - Generate full analytics report

Returns: Executive summary with metrics, insights, and analyses

Quality & Evaluation (4) ⭐ NEW

11. agentfold_get_quality_report - Get quality assessment

Returns: Objective scores for action, observation, thought quality, coherence, completeness

12. agentfold_benchmark_performance - Benchmark current performance

Returns: Steps per goal, time per goal, context efficiency, throughput

13. agentfold_set_baseline - Set performance baseline for comparison

14. agentfold_compare_with_baseline - Compare with baseline

Returns: Improvements and regressions report

State Management (3)

15. agentfold_export_state - Export workspace as JSON

16. agentfold_import_state - Import workspace (validated, max 10MB)

17. agentfold_reset - Reset workspace (clears all data including analytics)

📚 See ENHANCED_FEATURES.md for detailed documentation of all tools.

How It Works

Folding Operations

AgentFold implements two types of folding operations inspired by human cognitive processes:

Condensation

Purpose: Preserve vital, fine-grained details while reducing context size
Trigger: When context utilization exceeds 70% (configurable)
Method: Selectively condenses less important steps while maintaining key information
Use Case: Preserving specific data points, URLs, or critical observations

Consolidation

Purpose: Abstract away entire multi-step sub-tasks
Trigger: When context utilization exceeds 50% (configurable)
Method: Groups related steps into high-level summaries
Use Case: Completed sub-tasks that don't need granular details

Context Management Strategy

┌─────────────────────────────────────────┐
│     Cognitive Workspace (8000 tokens)   │
├─────────────────────────────────────────┤
│  Recent Steps (always preserved)        │
│  - Step N                                │
│  - Step N-1                              │
│  - Step N-2                              │
│  - Step N-3                              │
│  - Step N-4                              │
├─────────────────────────────────────────┤
│  Active Steps (may be folded)           │
│  - Step N-5                              │
│  - Step N-6                              │
│  - ...                                   │
├─────────────────────────────────────────┤
│  Folded Segments                         │
│  - [CONSOLIDATED] Steps 1-10: Login     │
│  - [CONDENSED] Steps 11-15: Search      │
│  - [CONSOLIDATED] Steps 16-25: Extract  │
└─────────────────────────────────────────┘

Configuration

Default configuration (can be customized):

{
  condensationThreshold: 0.7,      // 70% context utilization
  consolidationThreshold: 0.5,     // 50% context utilization
  maxActiveSteps: 20,              // Maximum active steps before folding
  importanceDecayFactor: 0.95,     // Importance decay over time
  preserveRecentSteps: 5,          // Always keep N most recent steps
}

Example Workflow

// Set the main goal
await agentfold_set_goal({
  goal: "Research and summarize recent advances in quantum computing"
});

// Add sub-goals
await agentfold_add_subgoal({
  subgoal: "Find authoritative sources"
});

await agentfold_add_subgoal({
  subgoal: "Extract key findings"
});

// Execute steps
await agentfold_add_step({
  action: "search for quantum computing papers",
  observation: "Found 15 papers from 2024",
  thought: "Should focus on papers with high citation counts"
});

await agentfold_add_step({
  action: "open first paper",
  observation: "Title: Advances in Quantum Error Correction...",
  thought: "This looks relevant to the goal"
});

// ... more steps ...

// Get formatted context for LLM
const context = await agentfold_get_context();

// Check statistics
const stats = await agentfold_get_stats();
console.log(`Context utilization: ${stats.contextUtilization * 100}%`);

Architecture

src/
├── types.ts              # Type definitions
├── folding-engine.ts     # Core folding operations
├── workspace-manager.ts  # Cognitive workspace management
└── index.ts             # MCP server implementation

Research Reference

This implementation is based on the paper:

AgentFold: Long-Horizon Web Agents with Proactive Context Management
Rui Ye, Zhongwang Zhang, Kuan Li, et al.
arXiv:2510.24699 (2025)

Key contributions from the paper:

Proactive context management paradigm
Dual-scale folding operations (condensation & consolidation)
Dynamic cognitive workspace concept
Superior performance on BrowseComp benchmarks

Integration with Playwright MCP

AgentFold is highly recommended to be used with Playwright MCP for web automation tasks. This combination provides:

✅ Intelligent Context Management: AgentFold manages long browser interaction histories
✅ Automatic Folding: Browser actions automatically folded when context grows
✅ Goal Tracking: Track multi-step web workflows with sub-goals
✅ Scalable: Handle 100+ browser interactions without context overflow

Quick Setup

{
  "mcpServers": {
    "agentfold": {
      "command": "node",
      "args": ["/path/to/agentfold-mcp-server/dist/index.js"]
    },
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Example Usage

// Set goal
agentfold_set_goal({ goal: "Research quantum computing papers on arXiv" });

// Use Playwright to browse
browser_navigate({ url: "https://arxiv.org" });
agentfold_add_step({
  action: "navigate to arXiv",
  observation: "Successfully loaded homepage"
});

// Continue browsing... AgentFold automatically manages context!

See INTEGRATION_PLAYWRIGHT.md for complete guide.

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Future Enhancements

[ ] ML-based folding decision model
[ ] Multi-modal context support
[ ] Advanced importance scoring
[ ] Persistent storage backend
[ ] Real-time visualization dashboard
[ ] Integration with web browsing tools