@jayarrowz/mcp-arsr

v1.0.1

Published

2 months ago

Adaptive Retrieval-Augmented Self-Refinement MCP Server — a closed-loop system that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.

0High
0Medium
0Low

jayarrowz

ARSR MCP Server

Adaptive Retrieval-Augmented Self-Refinement — a closed-loop MCP server that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.

What it does

Unlike one-shot RAG (retrieve → generate), ARSR runs a refinement loop:

Generate draft → Decompose claims → Score uncertainty
       ↑                                    ↓
   Decide stop ← Revise with evidence ← Retrieve for low-confidence claims

The key insight: retrieval is guided by uncertainty. Only claims the model is unsure about trigger evidence fetching, and the queries are adversarial — designed to disprove the claim, not just confirm it.

Architecture

The server exposes 6 MCP tools. The outer LLM (Claude, GPT, etc.) orchestrates the loop by calling them in sequence:

| # | Tool | Purpose | |---|------|---------| | 1 | arsr_draft_response | Generate initial candidate answer (returns is_refusal flag) | | 2 | arsr_decompose_claims | Split into atomic verifiable claims | | 3 | arsr_score_uncertainty | Estimate confidence via semantic entropy | | 4 | arsr_retrieve_evidence | Web search for low-confidence claims | | 5 | arsr_revise_response | Rewrite draft with evidence | | 6 | arsr_should_continue | Decide: iterate or finalize |

Inner LLM: Tools 1-5 use Claude Haiku internally for intelligence (query generation, claim extraction, evidence evaluation). This keeps costs low while the outer model handles orchestration.

Refusal detection: arsr_draft_response returns a structured is_refusal flag (classified by the inner LLM) indicating whether the draft is a non-answer. When is_refusal is true, downstream tools (decompose, revise) pivot to extracting claims from the original query and building an answer from retrieved evidence instead of trying to refine a refusal.

Web Search: arsr_retrieve_evidence uses the Anthropic API's built-in web search tool — no external search API keys needed.

Setup

Prerequisites

Node.js 18+
An Anthropic API key

Install & Build

cd arsr-mcp-server
npm install
npm run build

Environment

export ANTHROPIC_API_KEY="sk-ant-..."

Run

stdio mode (for Claude Desktop, Cursor, etc.):

npm start

HTTP mode (for remote access):

TRANSPORT=http PORT=3001 npm start

Claude Desktop Configuration

Add to your claude_desktop_config.json:

Npm:

{
  "mcpServers": {
    "arsr": {
      "command": "npx",
      "args": ["@jayarrowz/mcp-arsr"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ARSR_MAX_ITERATIONS": "3",
        "ARSR_ENTROPY_SAMPLES": "3",
        "ARSR_RETRIEVAL_STRATEGY": "adversarial",
        "ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
      }
    }
  }
}

Local build:

{
  "mcpServers": {
    "arsr": {
      "command": "node",
      "args": ["/path/to/arsr-mcp-server/dist/src/index.js"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ARSR_MAX_ITERATIONS": "3",
        "ARSR_ENTROPY_SAMPLES": "3",
        "ARSR_RETRIEVAL_STRATEGY": "adversarial",
        "ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
      }
    }
  }
}

How the outer LLM uses it

The orchestrating LLM calls the tools in sequence:

1. draft = arsr_draft_response({ query: "When was Tesla founded?" })
   // draft.is_refusal indicates if the inner LLM refused to answer
2. claims = arsr_decompose_claims({ draft: draft.draft, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
3. scored = arsr_score_uncertainty({ claims: claims.claims })
4. low = scored.scored.filter(c => c.confidence < 0.85)
5. evidence = arsr_retrieve_evidence({ claims_to_check: low })
6. revised = arsr_revise_response({ draft: draft.draft, evidence: evidence.evidence, scored: scored.scored, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
7. decision = arsr_should_continue({ iteration: 1, scored: revised_scores })
   → if "continue": go to step 2 with revised text
   → if "stop": return revised.revised to user

Configuration

All settings can be overridden via environment variables, falling back to defaults if unset:

| Setting | Env var | Default | Description | |---------|---------|---------|-------------| | max_iterations | ARSR_MAX_ITERATIONS | 3 | Budget limit for refinement loops | | confidence_threshold | ARSR_CONFIDENCE_THRESHOLD | 0.85 | Claims above this skip retrieval | | entropy_samples | ARSR_ENTROPY_SAMPLES | 3 | Rephrasings for semantic entropy | | retrieval_strategy | ARSR_RETRIEVAL_STRATEGY | adversarial | adversarial, confirmatory, or balanced | | inner_model | ARSR_INNER_MODEL | claude-haiku-4-5-20251001 | Model for internal intelligence |

Cost estimate

Per refinement loop iteration (assuming ~5 claims, 3 low-confidence):

Inner LLM calls: ~6-10 Haiku calls ≈ $0.002-0.005
Web searches: 6-9 queries ≈ included in API
Typical total for 2 iterations: < $0.02

Images

Before:

After:

License

MIT