goodhabitz-evalab-mcp

v3.0.8

Published

a month ago

MCP servers split into 4 specialized services: Studio (content creation), Dev (prompt engineering), Test (QA automation), Admin (configuration)

0High
0Medium
0Low

henols

mcp prompt validation chat-testing template-fetching extensible claude bedrock ai

MCP Tool Schema and Documentation Improvements

Last Updated: 2026-02-19

Current Tool Definitions

The evalab MCP tools expose validation errors through three optional fields in their output:

validationSummary - Error counts and categories
integrityIssues - Array of individual errors
errorReport - Reference to downloadable error report file

Recommended Improvements

1. Tool Descriptions - Be More Explicit About Errors

Current (Generic):

{
  name: "startEvalabChatTool",
  description: "Start a new evalab chat session"
}

Improved (Error-Aware):

{
  name: "startEvalabChatTool",
  description: "Start a new evalab chat session. Returns validation errors in validationSummary field when prompt generates invalid output. Check validationSummary.hasErrors to detect issues."
}

Current (Generic):

{
  name: "continueEvalabChatTool",
  description: "Continue an existing chat session"
}

Improved (Error-Aware):

{
  name: "continueEvalabChatTool",
  description: "Continue an existing chat session. Returns validation errors when detected. Always check validationSummary.hasErrors in response before assuming success."
}

2. Output Schema - Add Field Descriptions

Problem: Field names alone don't explain what data means or how to use it.

Solution: Add descriptions to each output field.

{
  name: "startEvalabChatTool",
  outputSchema: {
    type: "object",
    properties: {
      id: {
        type: "string",
        description: "Session ID - use this with continueEvalabChatTool"
      },
      content: {
        type: "string",
        description: "AI response content"
      },
      validationSummary: {
        type: "object",
        description: "Quick error overview. Check hasErrors field first. Present only when validation issues detected.",
        properties: {
          hasErrors: {
            type: "boolean",
            description: "True if critical errors found. If true, check integrityIssues for details."
          },
          hasWarnings: {
            type: "boolean",
            description: "True if non-critical warnings found. Safe to continue but review recommended."
          },
          errorCount: {
            type: "number",
            description: "Number of critical errors that should be fixed"
          },
          warningCount: {
            type: "number",
            description: "Number of warnings that can be ignored"
          },
          categories: {
            type: "array",
            description: "Error types found: parse=JSON errors, schema_validation=field violations, task_integrity=invalid task states",
            items: { type: "string" }
          }
        }
      },
      integrityIssues: {
        type: "array",
        description: "Individual error details. Use this to show specific problems to user. Present only when errors detected.",
        items: {
          type: "object",
          properties: {
            type: {
              type: "string",
              description: "Error category: parse, schema_validation, task_integrity, content_extraction, json_recovery"
            },
            severity: {
              type: "string",
              description: "error=must fix, warning=can ignore"
            },
            message: {
              type: "string",
              description: "Human-readable error description. Show this to user."
            },
            rule: {
              type: "string",
              description: "Error code for documentation lookup (e.g. TASK_RESULT_MAX_LENGTH)"
            },
            taskId: {
              type: "string",
              description: "Which task caused this error (if applicable)"
            }
          }
        }
      },
      errorReport: {
        type: "object",
        description: "Reference to full error report file. Use filename with getErrorReportTool for complete diagnostic info. Present only when errors detected.",
        properties: {
          filename: {
            type: "string",
            description: "Pass this to getErrorReportTool to fetch full error report"
          },
          url: {
            type: "string",
            description: "Direct download URL (for non-MCP clients only)"
          }
        }
      }
    }
  }
}

3. Simplify Error Detection - Add Helper Field

Problem: Checking errors requires nested property access:

if (response.validationSummary?.hasErrors) { ... }

Solution: Add top-level boolean flag.

{
  outputSchema: {
    properties: {
      // Add this at top level
      hasValidationErrors: {
        type: "boolean",
        description: "Quick check: true if validation errors present. Shortcut for validationSummary?.hasErrors"
      },
      validationSummary: { ... },
      integrityIssues: { ... }
    }
  }
}

Usage becomes simpler:

// Before
if (response.validationSummary?.hasErrors) { ... }

// After
if (response.hasValidationErrors) { ... }

4. Error Report Tool - Add Usage Examples

Current:

{
  name: "getErrorReportTool",
  description: "Fetch detailed error report by filename"
}

Improved:

{
  name: "getErrorReportTool",
  description: "Fetch detailed error report. Use filename from chat tool's errorReport field. Returns plain text diagnostic report with full validation details, LLM response, and fix suggestions.",
  inputSchema: {
    properties: {
      filename: {
        type: "string",
        description: "Error report filename from chat response errorReport.filename field"
      }
    }
  },
  examples: [
    {
      description: "Fetch error report after detecting validation errors",
      input: {
        filename: "error_session-123_2026-02-19T16-12-07-955Z.txt"
      }
    }
  ]
}

5. Add Error Handling Examples to Tool Docs

Add usage examples showing error checking:

{
  name: "startEvalabChatTool",
  examples: [
    {
      description: "Start session and check for validation errors",
      code: `
const session = await startEvalabChatTool({
  domain: "expert-data-collection",
  message: "Generate expert interview content"
})

// Always check for validation errors
if (session.validationSummary?.hasErrors) {
  console.log(\`Found \${session.validationSummary.errorCount} validation errors\`)

  // Show error details
  session.integrityIssues?.forEach(err => {
    console.log(\`[\${err.severity}] \${err.message}\`)
  })

  // Fetch full error report if needed
  if (session.errorReport) {
    const report = await getErrorReportTool({
      filename: session.errorReport.filename
    })
    console.log('Full diagnostic report:', report.content)
  }
} else {
  console.log('Session created successfully:', session.content)
}
`
    }
  ]
}

6. Group Related Fields in Schema

Problem: Flat structure mixes core data with error data.

Solution: Use nested grouping for clarity.

{
  outputSchema: {
    properties: {
      // Core session data
      session: {
        type: "object",
        properties: {
          id: { type: "string" },
          domain: { type: "string" },
          content: { type: "string" },
          tasks: { type: "array" }
        }
      },

      // Validation data (only present when errors detected)
      validation: {
        type: "object",
        description: "Validation error information. Omitted when no errors.",
        properties: {
          hasErrors: { type: "boolean" },
          errorCount: { type: "number" },
          issues: { type: "array" },
          reportFilename: { type: "string" }
        }
      }
    }
  }
}

Note: This is a breaking change - only do if acceptable to MCP users.

7. Add Validation Status Enum

Problem: Boolean flags don't show validation state clearly.

Solution: Add status field with clear values.

{
  outputSchema: {
    properties: {
      validationStatus: {
        type: "string",
        enum: ["success", "errors", "warnings", "errors_and_warnings"],
        description: "success=no issues, errors=critical problems found, warnings=minor issues, errors_and_warnings=both types present"
      }
    }
  }
}

Usage:

switch (response.validationStatus) {
  case "success":
    // All good
    break
  case "errors":
    // Must handle errors
    break
  case "warnings":
    // Can continue but review recommended
    break
  case "errors_and_warnings":
    // Mixed severity
    break
}

8. Simplify Error Categories

Problem: Category names are technical (schema_validation, content_extraction).

Solution: Add user-friendly category labels.

{
  integrityIssues: [
    {
      type: "schema_validation",
      typeLabel: "Field Validation Error",  // Add this
      severity: "error",
      message: "Task result is 308 characters but maximum is 300"
    },
    {
      type: "task_integrity",
      typeLabel: "Task State Error",  // Add this
      severity: "warning",
      message: "Invalid status transition from completed to review"
    }
  ]
}

9. Add Quick Error Summary String

Problem: Need to parse multiple fields to show error overview.

Solution: Add pre-formatted summary string.

{
  validationSummary: {
    hasErrors: true,
    errorCount: 2,
    warningCount: 1,
    // Add this
    summaryText: "Found 2 errors and 1 warning in schema validation and task integrity"
  }
}

Usage:

// Simple one-line display
if (response.validationSummary) {
  console.log(response.validationSummary.summaryText)
}

10. Link Errors to Tasks

Problem: Hard to know which task has which error.

Solution: Group errors by task in response.

{
  tasks: [
    {
      id: "task-1",
      name: "Gather requirements",
      status: "completed",
      // Add this
      validationErrors: [
        {
          severity: "error",
          message: "Task result exceeds 300 characters",
          rule: "TASK_RESULT_MAX_LENGTH"
        }
      ]
    },
    {
      id: "task-2",
      name: "Define objectives",
      status: "in_progress",
      validationErrors: []  // No errors for this task
    }
  ]
}

Benefit: Immediately see which tasks have problems without cross-referencing.

Implementation Priority

Must Do (Critical for UX)

Add field descriptions to output schema
Improve tool descriptions to mention error checking
Add error handling examples

Should Do (Better UX)

Add hasValidationErrors top-level flag
Add validationStatus enum
Add summaryText quick overview

Nice to Have (Polish)

Add typeLabel friendly category names
Link errors to tasks directly
Add usage examples to getErrorReportTool

Breaking Changes (Only if acceptable)

Nest fields into session and validation groups

Example: Complete Improved Tool Definition

{
  name: "continueEvalabChatTool",
  description: "Continue an existing evalab chat session. Returns validation errors when LLM output fails validation rules. Always check hasValidationErrors field in response.",

  inputSchema: {
    type: "object",
    required: ["sessionId", "message"],
    properties: {
      sessionId: {
        type: "string",
        description: "Session ID from previous chat response"
      },
      message: {
        type: "string",
        description: "User message to send"
      }
    }
  },

  outputSchema: {
    type: "object",
    properties: {
      // Quick check
      hasValidationErrors: {
        type: "boolean",
        description: "True if validation errors detected. Check this first."
      },

      // Core data
      id: { type: "string", description: "Session ID" },
      content: { type: "string", description: "AI response" },
      tasks: { type: "array", description: "Task list with status" },

      // Error details (optional - only when errors present)
      validationSummary: {
        type: "object",
        description: "Error overview. Present only when hasValidationErrors is true.",
        properties: {
          summaryText: {
            type: "string",
            description: "One-line error summary. Show this to user."
          },
          hasErrors: { type: "boolean", description: "Critical errors found" },
          hasWarnings: { type: "boolean", description: "Warnings found" },
          errorCount: { type: "number", description: "Number of errors" },
          warningCount: { type: "number", description: "Number of warnings" }
        }
      },

      integrityIssues: {
        type: "array",
        description: "Individual error details. Show these to help user fix problems.",
        items: {
          type: "object",
          properties: {
            severity: { type: "string", description: "error or warning" },
            message: { type: "string", description: "What went wrong" },
            rule: { type: "string", description: "Error code" },
            taskId: { type: "string", description: "Related task (if any)" }
          }
        }
      },

      errorReport: {
        type: "object",
        description: "Full diagnostic report reference",
        properties: {
          filename: {
            type: "string",
            description: "Pass to getErrorReportTool for complete diagnostics"
          }
        }
      }
    }
  },

  examples: [
    {
      description: "Continue session with error checking",
      code: `
const response = await continueEvalabChatTool({
  sessionId: "session-123",
  message: "Continue with next step"
})

if (response.hasValidationErrors) {
  console.error(response.validationSummary.summaryText)
  response.integrityIssues.forEach(err => {
    console.log(\`- \${err.message}\`)
  })
} else {
  console.log("Success:", response.content)
}
`
    }
  ]
}

Summary of Changes

Non-Breaking (Do These First):

Add descriptions to all schema fields
Update tool descriptions to mention errors
Add hasValidationErrors boolean flag
Add summaryText to validationSummary
Add usage examples with error checking

Optional Enhancements:

Add validationStatus enum
Add typeLabel for categories
Link errors directly to tasks

Breaking Changes (Avoid Unless Major Version):

Restructure into session and validation groups
Remove or rename existing fields