@ezark-publish/agentdesk-mcp

v1.3.0

Published

4 months ago

MCP server for AgentDesk AI-to-AI Service Marketplace. Quality review, service catalog, and marketplace execution — all via MCP tools.

AgentDesk MCP — Adversarial AI Review

Quality control for AI pipelines — one MCP tool. Works with Claude Code, Claude Desktop, and any MCP client.

29.5% of teams do NO evaluation of AI outputs. (LangChain Survey) Knowledge workers spend 4.3 hours/week fact-checking AI outputs. (Microsoft 2025)

AgentDesk MCP fixes this. Add independent adversarial review to any AI pipeline in 30 seconds.

Quick Start

npm (recommended)

npx @ezark-publish/agentdesk-mcp

Claude Code

claude mcp add agentdesk-mcp -- npx @ezark-publish/agentdesk-mcp

Claude Desktop

{
  "mcpServers": {
    "agentdesk-mcp": {
      "command": "npx",
      "args": ["-y", "@ezark-publish/agentdesk-mcp"],
      "env": { "ANTHROPIC_API_KEY": "sk-ant-..." }
    }
  }
}

HTTP Transport (Streamable HTTP)

Run as an HTTP server for remote access, Smithery hosting, or multi-client setups:

# Start with HTTP transport on port 3100
MCP_HTTP_PORT=3100 npx @ezark-publish/agentdesk-mcp

# Or use the --http flag (defaults to port 3100)
npx @ezark-publish/agentdesk-mcp --http

MCP endpoint: POST http://localhost:3100/mcp Health check: GET http://localhost:3100/health

Install from GitHub (alternative)

npm install github:Rih0z/agentdesk-mcp

Requirements

ANTHROPIC_API_KEY environment variable (uses your own key — BYOK)

Tools

`review_output`

Adversarial quality review of any AI-generated output. An independent reviewer assumes the author made mistakes and actively looks for problems.

Input: | Parameter | Required | Description | |-----------|----------|-------------| | output | Yes | The AI-generated output to review | | criteria | No | Custom review criteria | | review_type | No | Category: code, content, factual, translation, etc. | | model | No | Reviewer model (default: claude-sonnet-4-6) |

Output:

{
  "verdict": "PASS | FAIL | CONDITIONAL_PASS",
  "score": 82,
  "issues": [
    {
      "severity": "high",
      "category": "accuracy",
      "description": "Claim about X is unsupported",
      "suggestion": "Add citation or remove claim"
    }
  ],
  "checklist": [
    {
      "item": "Factual accuracy",
      "status": "pass",
      "evidence": "All statistics match cited sources"
    }
  ],
  "summary": "Overall assessment...",
  "reviewer_model": "claude-sonnet-4-6"
}

`review_dual`

Dual adversarial review — two independent reviewers assess the output from different angles, then a merge agent combines findings.

If either reviewer finds a critical issue → merged verdict is FAIL
Takes the lower score
Combines and deduplicates all issues

Use for high-stakes outputs where quality is critical.

Same parameters as review_output.

How It Works

Adversarial prompting: The reviewer is instructed to assume mistakes were made. No benefit of the doubt.
Evidence-based checklist: Every PASS item requires specific evidence. Items without evidence are automatically downgraded to FAIL.
Anti-gaming validation: If >30% of checklist items lack evidence, the entire review is forced to FAIL with a capped score of 50.
Structured output: Verdict + numeric score + categorized issues + checklist (not just "looks good").

Use Cases

Code review: Check for bugs, security issues, performance problems
Content review: Verify accuracy, readability, SEO, audience fit
Factual verification: Validate claims in AI-generated text
Translation quality: Check accuracy and naturalness
Data extraction: Verify completeness and correctness
Any AI output: Summaries, reports, proposals, emails, etc.

Why Not Just Ask the Same AI to Review?

Self-review has systematic leniency bias. An LLM reviewing its own output shares the same blind spots that created the errors. Research shows models are 34% more likely to use confident language when hallucinating.

AgentDesk uses a separate reviewer invocation with adversarial prompting — fundamentally different from self-review.

Comparison

| Feature | AgentDesk MCP | Manual prompt | Braintrust | DeepEval | |---------|--------------|---------------|------------|----------| | One-tool setup | Yes | No | No | No | | Adversarial review | Yes | DIY | No | No | | Dual reviewer | Yes | DIY | No | No | | Anti-gaming validation | Yes | No | No | No | | No SDK required | Yes | Yes | No | No | | MCP native | Yes | No | No | No |

Limitations

Prompt injection: Like all LLM-as-judge systems, adversarial inputs could attempt to manipulate reviewer verdicts. The anti-gaming validation layer mitigates superficial gaming, but determined adversarial inputs remain a challenge. For high-stakes use cases, combine with deterministic validation.
BYOK cost: Each review_output call makes 1 LLM API call; review_dual makes 3. Factor this into your pipeline costs.

Hosted API (Separate Product)

For teams that prefer HTTP integration, a hosted REST API with additional features (agent marketplace, context learning, workflows) is available at agentdesk.usedevtools.com.

Development

git clone https://github.com/Rih0z/agentdesk-mcp.git
cd agentdesk-mcp
npm install
npm test        # 35 tests
npm run build

License

MIT

Built by EZARK Consulting | Web Version