mcp-rca
v0.4.0
Published
Root Cause Analysis MCP server scaffolding.
Maintainers
Readme
mcp-rca
Root Cause Analysis MCP server that helps SRE teams structure observations, hypotheses, and test plans while collaborating with an LLM.
Highlights
- Prompt-based guidance: MCP prompts guide the LLM through each RCA phase
rca_start_investigation- Begin investigation with structured initial stepsrca_next_step- Get context-aware recommendations based on case staterca_hypothesis_propose- Generate testable root cause hypothesesrca_verification_planning- Create effective test plansrca_conclusion_guide- Document conclusions with root causes and follow-ups
- LLM-oriented tools: Guidance tools provide best practices and phase-specific checklists
guidance_best_practices- RCA principles and anti-patternsguidance_phase- Phase-specific steps and red flagsguidance_prompt_scaffold- Structured output formats for tasksguidance_followups- Post-conclusion follow-up actionsguidance_prompts_catalog- Discover available prompts with default templatesguidance_tools_catalog- Comprehensive tool catalog with workflow guidance
- Hypothesis generation returns persisted objects with IDs
hypothesis_proposepersists generated hypotheses and returns each item withid,caseId,createdAt, andupdatedAt.- When the generator supplies a verification plan in its output, an initial
test_plan_createis called automatically and minimal info is attached to the hypothesis (method/expected/metric?).
- Git/deploy metadata on Case / Observation / TestPlan
- Optional fields:
gitBranch,gitCommit,deployEnv. - Set on create and update tools; passing
nullon update clears the field.
- Optional fields:
Installation
npm install mcp-rcaTo launch the server directly as a CLI:
npx mcp-rcaThe server communicates over stdio and can be attached to any MCP-compatible client. CLI flags include --help (-h) for usage and --version (-v) to print the current release.
Getting Started (Development)
- Clone the repository and install dependencies:
git clone https://github.com/mako10k/mcp-rca.git cd mcp-rca npm install - Launch the developer server with hot reloading:
npm run dev - Produce a production bundle (emits
dist/and copies prompt assets):npm run build
Project Layout
src/
framework/ # Local stub for MCP server lifecycle
server.ts # MCP server entrypoint
schema/ # TypeScript data models
tools/ # Tool handlers surfaced to MCP clients
llm/ # Prompt assets and LLM utilities
data/
.gitkeep # Runtime storage directory (cases.json generated at runtime)
scripts/
copy-assets.mjs # Copies static prompt assets into dist/ post-buildRefer to AGENT.md for the full specification, roadmap, and design guidelines.
Quick Start: Using Prompts
MCP prompts guide your investigation through each phase:
1. Start Investigation
Use prompt: rca_start_investigation
→ Creates a structured plan for case creation and initial observations2. Track Progress
Use prompt: rca_next_step with caseId
→ Analyzes current state and suggests next actions3. Generate Hypotheses
Use prompt: rca_hypothesis_propose with caseId
→ Guides hypothesis generation with best practices
→ Then call tool: hypothesis_propose to create and persist hypotheses4. Plan Verification
Use prompt: rca_verification_planning with caseId, hypothesisId, hypothesisText
→ Provides test plan templates and prioritization guidance
→ Then call tool: test_plan_create to create verification plans5. Document Conclusion
Use prompt: rca_conclusion_guide with caseId
→ Guides documentation of root causes, fixes, and follow-ups
→ Then call tool: conclusion_finalize to close the caseLLM Guidance Tools
Call guidance tools at any time for additional support:
guidance_best_practices- Core RCA principlesguidance_phase- Phase-specific checklists (observation/hypothesis/testing/conclusion)guidance_prompt_scaffold- Output format templates for specific tasksguidance_followups- Prevention and follow-up suggestions
MCP Tool Highlights
hypothesis_propose
Input (summary):
{
"caseId": "case_...",
"text": "Short incident summary",
"rationale": "Optional background",
"context": { "service": "api", "region": "us-east-1" },
"logs": "... optional log snippets ..."
}Output (each hypothesis is persisted and includes identifiers; an initial test plan may be present if provided by the generator):
{
"hypotheses": [
{
"id": "hyp_...",
"caseId": "case_...",
"text": "Cache node eviction storm caused by oversized payloads",
"rationale": "Spike correlates with payload growth and cache TTL",
"createdAt": "2025-10-21T00:00:00.000Z",
"updatedAt": "2025-10-21T00:00:00.000Z",
"testPlan": {
"id": "tp_...",
"hypothesisId": "hyp_...",
"method": "Reproduce with oversized payloads and inspect eviction rate",
"expected": "Evictions rise sharply with payload size > X",
"metric": "cache.evictions"
}
}
]
}Metadata arguments (git/deploy)
The following tools accept optional metadata fields; on update, null clears the field.
- Case
case_create:gitBranch,gitCommit,deployEnvcase_update:gitBranch?,gitCommit?,deployEnv?(nullable clears)
- Observation
observation_add:gitBranch?,gitCommit?,deployEnv?observation_update:gitBranch?,gitCommit?,deployEnv?(nullable clears)
- Test Plan
test_plan_create:gitBranch?,gitCommit?,deployEnv?test_plan_update:gitBranch?,gitCommit?,deployEnv?(nullable clears)
Example update payload that clears gitCommit on an observation:
{
"caseId": "case_...",
"observationId": "obs_...",
"gitCommit": null
}Responses include the persisted metadata when set; fields are omitted when unset.
Observation search & pagination
Use observations_list to query observations without pulling the full case payload:
{
"caseId": "case_...",
"query": "DriveNotFoundException",
"fields": ["what", "context"],
"pageSize": 10,
"gitBranch": "release",
"order": "desc"
}The response returns observations, nextCursor, total, pageSize, and hasMore. Pass cursor with the next call to page through the set. Combine with the case_get summary mode (include: []) to minimize token usage. See docs/CASE_GET_PAGINATION.md for cursor details.
API Response Structure
All mutation tools follow a consistent response structure for predictability and ease of use:
Standard Mutation Response
{
caseId: string; // Always at top level
[resourceName]: Resource; // The created/updated/removed resource
case: Case; // Full case object after the mutation
}Benefits:
- ✅ Consistent: Same pattern across all mutation tools
- ✅ Context Access:
caseIdalways at top level - ✅ Immediate State: Full
caseobject available without additional queries - ✅ Token Optimization: Combine with
case_get'sincludeparameter for efficient workflows
Examples:
observation_add→{ caseId, observation, case }hypothesis_propose→{ caseId, hypotheses, case }test_plan_create→{ caseId, testPlan, case }conclusion_finalize→{ caseId, conclusion, case }
See docs/RESPONSE_STRUCTURE_STANDARDIZATION.md for complete details.
Performance & Best Practices
Token Optimization
Many mutation tools (e.g., observation_add, hypothesis_update) return the complete case object in their responses, which can consume thousands of tokens per operation.
Recommended pattern:
// Perform mutations without relying on the case field
await observation_add({ caseId, what: "..." });
await observation_add({ caseId, what: "..." });
// Fetch case data selectively when needed
const caseData = await case_get({
caseId,
include: ['observations'], // Only fetch what you need
});See docs/API_RESPONSE_OPTIMIZATION.md for detailed optimization strategies and token savings examples.
For paging details (limits, cursors, and include semantics) see docs/CASE_GET_PAGINATION.md.
License
This project is released under the MIT License. See the LICENSE file for details.
Publishing
The package is configured for the public npm registry. After bumping the version, run:
npm publish --access publicprepublishOnly rebuilds TypeScript sources and copies required assets before the tarball is generated.
