mcp-rca

v0.4.0

Published

7 months ago

Root Cause Analysis MCP server scaffolding.

0High
0Medium
0Low

mako10k

mcp model context protocol root cause analysis incident response

mcp-rca

Root Cause Analysis MCP server that helps SRE teams structure observations, hypotheses, and test plans while collaborating with an LLM.

Highlights

Prompt-based guidance: MCP prompts guide the LLM through each RCA phase
- rca_start_investigation - Begin investigation with structured initial steps
- rca_next_step - Get context-aware recommendations based on case state
- rca_hypothesis_propose - Generate testable root cause hypotheses
- rca_verification_planning - Create effective test plans
- rca_conclusion_guide - Document conclusions with root causes and follow-ups
LLM-oriented tools: Guidance tools provide best practices and phase-specific checklists
- guidance_best_practices - RCA principles and anti-patterns
- guidance_phase - Phase-specific steps and red flags
- guidance_prompt_scaffold - Structured output formats for tasks
- guidance_followups - Post-conclusion follow-up actions
- guidance_prompts_catalog - Discover available prompts with default templates
- guidance_tools_catalog - Comprehensive tool catalog with workflow guidance
Hypothesis generation returns persisted objects with IDs
- hypothesis_propose persists generated hypotheses and returns each item with id, caseId, createdAt, and updatedAt.
- When the generator supplies a verification plan in its output, an initial test_plan_create is called automatically and minimal info is attached to the hypothesis (method/expected/metric?).
Git/deploy metadata on Case / Observation / TestPlan
- Optional fields: gitBranch, gitCommit, deployEnv.
- Set on create and update tools; passing null on update clears the field.

Installation

npm install mcp-rca

To launch the server directly as a CLI:

npx mcp-rca

The server communicates over stdio and can be attached to any MCP-compatible client. CLI flags include --help (-h) for usage and --version (-v) to print the current release.

Getting Started (Development)

Clone the repository and install dependencies:

git clone https://github.com/mako10k/mcp-rca.git
cd mcp-rca
npm install

Launch the developer server with hot reloading:
```
npm run dev
```
Produce a production bundle (emits dist/ and copies prompt assets):
```
npm run build
```

Project Layout

src/
  framework/         # Local stub for MCP server lifecycle
  server.ts          # MCP server entrypoint
  schema/            # TypeScript data models
  tools/             # Tool handlers surfaced to MCP clients
  llm/               # Prompt assets and LLM utilities
data/
  .gitkeep           # Runtime storage directory (cases.json generated at runtime)
scripts/
  copy-assets.mjs    # Copies static prompt assets into dist/ post-build

Refer to AGENT.md for the full specification, roadmap, and design guidelines.

Quick Start: Using Prompts

MCP prompts guide your investigation through each phase:

1. Start Investigation

Use prompt: rca_start_investigation
→ Creates a structured plan for case creation and initial observations

2. Track Progress

Use prompt: rca_next_step with caseId
→ Analyzes current state and suggests next actions

3. Generate Hypotheses

Use prompt: rca_hypothesis_propose with caseId
→ Guides hypothesis generation with best practices
→ Then call tool: hypothesis_propose to create and persist hypotheses

4. Plan Verification

Use prompt: rca_verification_planning with caseId, hypothesisId, hypothesisText
→ Provides test plan templates and prioritization guidance
→ Then call tool: test_plan_create to create verification plans

5. Document Conclusion

Use prompt: rca_conclusion_guide with caseId
→ Guides documentation of root causes, fixes, and follow-ups
→ Then call tool: conclusion_finalize to close the case

LLM Guidance Tools

Call guidance tools at any time for additional support:

guidance_best_practices - Core RCA principles
guidance_phase - Phase-specific checklists (observation/hypothesis/testing/conclusion)
guidance_prompt_scaffold - Output format templates for specific tasks
guidance_followups - Prevention and follow-up suggestions

MCP Tool Highlights

hypothesis_propose

Input (summary):

{
  "caseId": "case_...",
  "text": "Short incident summary",
  "rationale": "Optional background",
  "context": { "service": "api", "region": "us-east-1" },
  "logs": "... optional log snippets ..."
}

Output (each hypothesis is persisted and includes identifiers; an initial test plan may be present if provided by the generator):

{
  "hypotheses": [
    {
      "id": "hyp_...",
      "caseId": "case_...",
      "text": "Cache node eviction storm caused by oversized payloads",
      "rationale": "Spike correlates with payload growth and cache TTL",
      "createdAt": "2025-10-21T00:00:00.000Z",
      "updatedAt": "2025-10-21T00:00:00.000Z",
      "testPlan": {
        "id": "tp_...",          
        "hypothesisId": "hyp_...",
        "method": "Reproduce with oversized payloads and inspect eviction rate",
        "expected": "Evictions rise sharply with payload size > X",
        "metric": "cache.evictions"
      }
    }
  ]
}

Metadata arguments (git/deploy)

The following tools accept optional metadata fields; on update, null clears the field.

Case
- case_create: gitBranch, gitCommit, deployEnv
- case_update: gitBranch?, gitCommit?, deployEnv? (nullable clears)
Observation
- observation_add: gitBranch?, gitCommit?, deployEnv?
- observation_update: gitBranch?, gitCommit?, deployEnv? (nullable clears)
Test Plan
- test_plan_create: gitBranch?, gitCommit?, deployEnv?
- test_plan_update: gitBranch?, gitCommit?, deployEnv? (nullable clears)

Example update payload that clears gitCommit on an observation:

{
  "caseId": "case_...",
  "observationId": "obs_...",
  "gitCommit": null
}

Responses include the persisted metadata when set; fields are omitted when unset.

Observation search & pagination

Use observations_list to query observations without pulling the full case payload:

{
  "caseId": "case_...",
  "query": "DriveNotFoundException",
  "fields": ["what", "context"],
  "pageSize": 10,
  "gitBranch": "release",
  "order": "desc"
}

The response returns observations, nextCursor, total, pageSize, and hasMore. Pass cursor with the next call to page through the set. Combine with the case_get summary mode (include: []) to minimize token usage. See docs/CASE_GET_PAGINATION.md for cursor details.

API Response Structure

All mutation tools follow a consistent response structure for predictability and ease of use:

Standard Mutation Response

{
  caseId: string;           // Always at top level
  [resourceName]: Resource; // The created/updated/removed resource
  case: Case;               // Full case object after the mutation
}

Benefits:

✅ Consistent: Same pattern across all mutation tools
✅ Context Access: caseId always at top level
✅ Immediate State: Full case object available without additional queries
✅ Token Optimization: Combine with case_get's include parameter for efficient workflows

Examples:

observation_add → { caseId, observation, case }
hypothesis_propose → { caseId, hypotheses, case }
test_plan_create → { caseId, testPlan, case }
conclusion_finalize → { caseId, conclusion, case }

See docs/RESPONSE_STRUCTURE_STANDARDIZATION.md for complete details.

Performance & Best Practices

Token Optimization

Many mutation tools (e.g., observation_add, hypothesis_update) return the complete case object in their responses, which can consume thousands of tokens per operation.

Recommended pattern:

// Perform mutations without relying on the case field
await observation_add({ caseId, what: "..." });
await observation_add({ caseId, what: "..." });

// Fetch case data selectively when needed
const caseData = await case_get({
  caseId,
  include: ['observations'],  // Only fetch what you need
});

See docs/API_RESPONSE_OPTIMIZATION.md for detailed optimization strategies and token savings examples.

For paging details (limits, cursors, and include semantics) see docs/CASE_GET_PAGINATION.md.

License

This project is released under the MIT License. See the LICENSE file for details.

Publishing

The package is configured for the public npm registry. After bumping the version, run:

npm publish --access public

prepublishOnly rebuilds TypeScript sources and copies required assets before the tarball is generated.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mcp-rca

Highlights

Installation

Getting Started (Development)

Project Layout

Quick Start: Using Prompts

1. Start Investigation

2. Track Progress

3. Generate Hypotheses

4. Plan Verification

5. Document Conclusion

LLM Guidance Tools

MCP Tool Highlights

hypothesis_propose

Metadata arguments (git/deploy)

Observation search & pagination

API Response Structure

Standard Mutation Response

Performance & Best Practices

Token Optimization

License

Publishing