@randalliser/yaml-glossary-server

v3.0.0

Published

11 days ago

Unified MCP Server for YAML Glossary Management

0High
0Medium
0Low

randalliser

mcp glossary yaml model-context-protocol

Unified MCP Glossary Server (Orchestrator Architecture)

Version: 3.0 Architecture: Orchestrator pattern with specialised components Status: Production-ready with comprehensive best practices

Overview

The unified MCP server acts as an orchestrator that delegates all functionality to specialised component modules. This separation of concerns makes the codebase more maintainable, testable, and extensible.

Best Practices Implemented (v3.0)

✅ Security: Robust path sanitisation prevents directory traversal attacks ✅ Configuration: Environment-based config with sensible defaults ✅ Error Handling: Process-level handlers, standardised responses ✅ Testability: Module guard, dependency injection, no side effects ✅ Scalability: Map-based handler registry, extracted tool definitions ✅ Logging: Audit trails to stderr, never pollutes MCP protocol ✅ Consistency: Explicit defaults, predictable behaviour

Architecture

mcp-glossary-server.js (Orchestrator)
    ↓ delegates to
    ├── entry-manager.js    - CRUD operations for glossary entries
    ├── search.js           - Search and query functionality
    ├── importer.js         - Import from TSV/CSV/JSON
    ├── file-manager.js     - File I/O, backups, schema management
    ├── linter.js           - Validation and quality checks
    ├── sorter.js           - Alphabetical sorting
    └── utils.js            - Shared utilities

Components

1. Entry Manager (`entry-manager.js`)

Responsibilities:

Add new glossary entries with validation
Update existing entries
Delete entries with reference checking
Alphabetical insertion

Key Methods:

addEntry(glossaryPath, entry) - Add new entry
updateEntry(glossaryPath, entryId, updates) - Update entry
deleteEntry(glossaryPath, entryId) - Delete with safety checks
insertAlphabetically(terms, newEntry) - Maintain sort order

2. Search Engine (`search.js`)

Responsibilities:

Flexible text search across terms, IDs, definitions, aliases
Category filtering
Result formatting

Key Methods:

searchEntries(glossaryPath, query, options) - Main search
findById(glossaryPath, id) - Exact ID lookup
getByCategory(glossaryPath, category) - Category filter
formatResults(results, verbose) - Display formatting

3. Importer (`importer.js`)

Responsibilities:

Import glossaries from external formats
Validate mandatory fields
Normalise and sort imported data

Key Methods:

importFromTSV(tsvPath, outputPath) - Import TSV files
importFromCSV(csvPath, outputPath) - Import CSV files
importFromJSON(jsonPath, outputPath) - Import JSON files

Mandatory Fields: id, term, definition

4. File Manager (`file-manager.js`)

Responsibilities:

Read/write files with backup support
List and filter files
Schema file management
YAML compliance fixes

Key Methods:

readFile(filename) - Read file content
writeFile(filename, content, options) - Write with backup
backupFile(filename) - Create timestamped backup
listFiles(pattern) - List with glob patterns
writeSchemaFile(filename, content) - Write with validation
fixYAMLCompliance(filename, schemaFile, fixes) - Apply fixes

5. Linter (`linter.js`)

Responsibilities:

YAML syntax validation
Schema compliance checking
Duplicate detection (IDs, terms, aliases)
Cross-reference validation
Alias conflict detection

Key Methods:

lintGlossary(glossaryPath, checks) - Run linting checks
formatResults(results) - Format output

Available Checks: all, duplicates, references, aliases, required-fields, sorting

6. Sorter (`sorter.js`)

Responsibilities:

Alphabetical sorting by ID
Sort validation
Backup before sorting

Key Methods:

sortGlossaryFile(glossaryPath, createBackup) - Sort with backup
sort(filePath, options) - Full control sorting
isSorted(terms) - Check if already sorted

7. Utils (`utils.js`)

Shared Utilities:

YAML parsing/formatting
Field order normalisation
Backup creation
Color output
File I/O helpers

MCP Tools

Entry Management

add_entry - Add new glossary entry
update_entry - Update existing entry
delete_entry - Delete entry (with reference checking)
search_entries - Search entries by query

File Operations

read_file - Read file content
write_file - Write file with backup
backup_file - Create backup
list_files - List files with pattern
write_schema_file - Write schema with validation
fix_yaml_compliance - Fix YAML compliance issues

Import

import_from_tsv - Import from TSV file
import_from_csv - Import from CSV file
import_from_json - Import from JSON file

Validation

lint_glossary - Run validation checks

Sorting

sort_glossary - Sort entries alphabetically

Usage Examples

Add Entry

{
  "tool": "add_entry",
  "arguments": {
    "glossary": "unsw-glossary/glossary-unsw-glossary-data.yaml",
    "entry": {
      "id": "new-term",
      "term": "New Term",
      "definition": "Description of the new term",
      "categories": ["category1"],
      "status": "current"
    }
  }
}

Search Entries

{
  "tool": "search_entries",
  "arguments": {
    "glossary": "unsw-glossary/glossary-unsw-glossary-data.yaml",
    "query": "campus",
    "category": "education"
  }
}

Import from TSV

{
  "tool": "import_from_tsv",
  "arguments": {
    "tsvPath": "/absolute/path/to/glossary.tsv",
    "outputPath": "imported-glossary.yaml"
  }
}

Lint Glossary

{
  "tool": "lint_glossary",
  "arguments": {
    "glossary": "unsw-glossary/glossary-unsw-glossary-data.yaml",
    "checks": ["duplicates", "references", "aliases"]
  }
}

Sort Glossary

{
  "tool": "sort_glossary",
  "arguments": {
    "glossary": "unsw-glossary/glossary-unsw-glossary-data.yaml",
    "backup": true
  }
}

Benefits of Orchestrator Pattern

Separation of Concerns - Each component has a single, clear responsibility
Testability - Components can be tested in isolation
Reusability - Components can be used in other contexts (CLI, other servers)
Maintainability - Changes to one component don't affect others
Extensibility - New functionality can be added without modifying the orchestrator
Clarity - Logic is organised by domain, not scattered across monolithic file

Migration from Old Servers

The orchestrator replaces these standalone MCP servers:

~~mcp-yaml-glossary-manager.js~~ → Entry management now in entry-manager.js
~~mcp-yaml-glossary-sorter.js~~ → Sorting now in sorter.js
~~mcp-yaml-glossary-file-manager.js~~ → File ops now in file-manager.js
~~lint-glossary.js~~ → Linting now in linter.js
~~lint-duplicate-aliases.js~~ → Alias checks now in linter.js
~~mcp-yaml-glossary-from-tsv.js~~ → Import now in importer.js

All functionality is preserved and enhanced.

Development

Running Tests

npm test                  # Run all tests
npm test entry-manager   # Test specific component

Adding New Functionality

Create new component in components/ directory
Import component in orchestrator
Add tool definition to ListToolsRequestSchema
Add handler case in CallToolRequestSchema
Add tests for component
Update this README

Component Interface Pattern

All components follow this pattern:

class ComponentName {
  constructor(glossaryDir) {
    this.glossaryDir = glossaryDir;
  }

  async mainMethod(args) {
    // Implementation
    return {
      success: true,
      message: "Operation completed",
      // ... additional data
    };
  }
}

module.exports = { ComponentName };

Security

Path Sanitisation

Robust directory traversal prevention using path.relative:

sanitisePath(filename) {
  const resolved = path.resolve(this.glossaryDir, filename);
  const relative = path.relative(this.glossaryDir, resolved);

  // Reject paths that escape the base directory
  if (relative.startsWith("..") || path.isAbsolute(relative)) {
    throw new Error(`Path traversal attempt detected: ${filename}`);
  }

  return resolved;
}

Prevents attacks like:

../../../etc/passwd
subdir/../../outside/file.yaml
/etc/passwd
Edge case: ../glossaries2/file.yaml when base is /var/glossaries

Source Path Resolution

Handles absolute and relative import paths:

resolveSourcePath(sourcePath) {
  if (path.isAbsolute(sourcePath)) {
    return sourcePath;  // Allow absolute paths for imports
  }
  return path.resolve(this.glossaryDir, sourcePath);
}

Error Handling

All components throw descriptive errors that the orchestrator catches and formats:

throw new Error("Descriptive error message");

The orchestrator formats errors consistently:

{
  "content": [{ "type": "text", "text": "Error: Descriptive error message" }],
  "isError": true
}

Global Error Handlers

process.on("unhandledRejection", (err) => {
  console.error("[MCP] Unhandled rejection:", err);
  process.exit(1);
});

process.on("uncaughtException", (err) => {
  console.error("[MCP] Uncaught exception:", err);
  process.exit(1);
});

Configuration

Environment Variables

GLOSSARY_WORKSPACE_ROOT=/custom/workspace  # Default: path.resolve(__dirname, "..")
GLOSSARY_DIR=/custom/glossaries            # Default: {workspace_root}/glossaries

Constructor Injection

const server = new UnifiedGlossaryMCP({
  workspaceRoot: "/test/workspace",
  glossaryDir: "/test/glossaries"
});

Paths in tool arguments are relative to the glossaries directory unless absolute.

Logging

All logs go to stderr (not stdout) to avoid corrupting MCP JSON-RPC protocol on stdout.

Troubleshooting

Tools Not Registering

Symptom: MCP tools don't appear in VS Code Copilot

Primary Cause: STDIO protocol corruption from logging to stdout

🚨 CRITICAL RULE: MCP servers communicate via STDIO JSON-RPC. Never write to stdout except for protocol messages.

// ❌ BAD - corrupts MCP protocol
console.log("Debug message");
console.log(result);

// ✅ GOOD - logs to stderr
console.error("Debug message");
console.error("Result:", JSON.stringify(result));

Note: The linter.js and sorter.js components have console.log() for CLI usage only. These are never called during MCP protocol communication.

Test with MCP Inspector

Interactive testing:

npx @modelcontextprotocol/inspector node mcp-glossary-server.js
# Opens browser at http://localhost:6274

CLI testing (automation):

# List available tools
npx @modelcontextprotocol/inspector --cli node mcp-glossary-server.js --method tools/list

# Test add_entry tool
npx @modelcontextprotocol/inspector --cli node mcp-glossary-server.js \
  --method tools/call \
  --tool-name add_entry \
  --tool-arg glossary="test/glossary.yaml" \
  --tool-arg 'entry={"id":"test","term":"Test","definition":"Test term"}'

# Test search_entries tool
npx @modelcontextprotocol/inspector --cli node mcp-glossary-server.js \
  --method tools/call \
  --tool-name search_entries \
  --tool-arg glossary="test/glossary.yaml" \
  --tool-arg query="campus"

Configuration Issues

Verify .vscode/mcp.json configuration:

{
  "mcpServers": {
    "yaml-glossary": {
      "command": "node",
      "args": [
        "C:\\Users\\...\\mcp\\yaml-glossary-server\\mcp-glossary-server.js"
      ],
      "env": {
        "GLOSSARY_WORKSPACE_ROOT": "C:\\Users\\...\\RAG Glossary",
        "GLOSSARY_DIR": "C:\\Users\\...\\RAG Glossary\\glossaries"
      }
    }
  }
}

Configuration requirements:

✅ Use absolute paths (not relative)
✅ Ensure mcp-glossary-server.js exists at specified path
✅ Ensure GLOSSARY_DIR points to existing directory
✅ No trailing slashes in paths (Windows can be inconsistent)

Path Sanitisation Errors

Symptom: "Path traversal attempt detected"

Cause: Security feature preventing access outside glossary directory

Solutions:

Use relative paths from glossary directory:

{"glossary": "unsw-glossary/glossary-unsw-glossary-data.yaml"}
// Not: "C:\\full\\path\\..." or "../../../etc/passwd"

Check environment variable GLOSSARY_DIR points to correct base:

// All file operations are relative to GLOSSARY_DIR
// So "unsw-glossary/file.yaml" -> "{GLOSSARY_DIR}/unsw-glossary/file.yaml"

Component-Specific Issues

Entry Manager

"Entry already exists": Use update_entry instead of add_entry

"Entry not found": Check id field matches exactly (case-sensitive)

"Cannot delete - referenced by other entries": Remove references first or use force flag

Linter

"YAML syntax error": Run lint_glossary with all checks to see details

"Duplicate aliases found": Each alias must be unique across all entries

"Broken reference": Referenced entry ID doesn't exist

Importer

"Mandatory field missing": TSV/CSV must have columns: id, term, definition

"Invalid JSON": Check JSON file syntax before importing

"Import failed": Check source file encoding (must be UTF-8)

Sorter

"File already sorted": No action needed

"Backup failed": Check write permissions in glossary directory

YAML File Issues

Symptom: "YAML parse error" or malformed output

Solutions:

Use YAML compliance tool:

{
  "tool": "fix_yaml_compliance",
  "arguments": {
    "filename": "glossary.yaml",
    "schemaFile": "glossary-schema.yaml",
    "fixes": ["format", "order"]
  }
}

Check for common issues:
- Mixed tabs and spaces (use spaces only)
- Inconsistent indentation (2 spaces per level)
- Unquoted special characters (:, #, |)
- Missing newline at end of file

Debugging Logs

Enable verbose logging:

// Temporarily add to mcp-glossary-server.js
console.error("[MCP-YAML] Tool called:", toolName);
console.error("[MCP-YAML] Arguments:", JSON.stringify(args, null, 2));

Run server manually to see logs:

node mcp-glossary-server.js 2> server-debug.log

# In another terminal
npx @modelcontextprotocol/inspector --cli node mcp-glossary-server.js --method tools/list

# Check logs
cat server-debug.log

Common Error Messages

| Error | Cause | Solution | |-------|-------|----------| | "Invalid JSON-RPC" | stdout pollution | Remove console.log() from orchestrator | | "Tool not found" | Typo in tool name | Check tool names: add_entry, not addEntry | | "Path traversal" | Absolute path used | Use relative path from GLOSSARY_DIR | | "ENOENT" | File not found | Check glossary file exists | | "Permission denied" | No write access | Check file/directory permissions | | "Invalid entry" | Missing required field | Ensure id, term, definition present |

Performance Optimisation

Large glossaries (1000+ entries):

Use category filters:

{"tool": "search_entries", "query": "*", "category": "education"}

Disable verbose output:

{"tool": "search_entries", "query": "term", "verbose": false}

Sort glossaries (improves search performance):

{"tool": "sort_glossary", "glossary": "large.yaml", "backup": true}

Validation Checklist

✅ Before reporting issues:

☑️ Server file exists at configured path
☑️ No stdout logs in orchestrator: grep "console.log" mcp-glossary-server.js empty
☑️ Inspector shows tools: npx @modelcontextprotocol/inspector --cli ...
☑️ Environment variables set: GLOSSARY_DIR, GLOSSARY_WORKSPACE_ROOT
☑️ Paths are absolute in config
☑️ Glossary files are valid YAML: Test with lint_glossary
☑️ File permissions correct: Can read/write glossary directory

Testing Individual Components

Test components outside MCP (for development):

# Test entry manager
node -e "const {EntryManager} = require('./components/entry-manager'); const em = new EntryManager('./glossaries'); em.addEntry('test.yaml', {id:'test',term:'Test',definition:'Test'}).then(console.log);"

# Test linter
node components/linter.js glossaries/test/glossary.yaml all

# Test sorter
node components/sorter.js glossaries/test/glossary.yaml --dry-run

Note: Component CLI modes use console.log() for output. This is fine - they're not called during MCP communication.

Additional Resources

Audit Logging

Every tool invocation is logged:

logToolInvocation(toolName, args) {
  console.error(`[MCP] Tool invoked: ${toolName}`, JSON.stringify(sanitisedArgs));
}

Testing

npm test                  # Run all tests
npm run test:watch        # Watch mode
npm run test:coverage     # With coverage report

Test Coverage Areas

Configuration loading (environment, defaults)
Path sanitisation (security)
Source path resolution
Response formatting (success, error, custom formats)
Tool registration and schema validation
Module exports and testability

Module Guard

if (require.main === module) {
  const server = new UnifiedGlossaryMCP();
  server.run().catch((error) => {
    console.error("[MCP] Fatal error:", error);
    process.exit(1);
  });
}

Allows importing without side effects for testing.

Version History

v3.0.0 - Best practice implementation (security, testability, configuration)
v3.0 - Refactored to orchestrator architecture with component separation
v2.0 - Unified MCP server with monolithic implementation
v1.0 - Multiple separate MCP servers

Migration from v2.0

Breaking Changes

Module requires no longer auto-start server
- Before: require() started server immediately
- After: Use require.main === module guard
Path handling changed
- Components now receive absolute paths
- Orchestrator handles all path resolution/sanitisation
Error responses no longer include ANSI color codes
- Plain text only for protocol compatibility

Non-Breaking Enhancements

Environment variable configuration
Constructor dependency injection
Exported TOOL_DEFINITIONS for external use
Enhanced security with better path validation

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Unified MCP Glossary Server (Orchestrator Architecture)

Overview

Best Practices Implemented (v3.0)

Architecture

Components

1. Entry Manager (entry-manager.js)

2. Search Engine (search.js)

3. Importer (importer.js)

4. File Manager (file-manager.js)

5. Linter (linter.js)

6. Sorter (sorter.js)

7. Utils (utils.js)

MCP Tools

Entry Management

File Operations

Import

Validation

Sorting

Usage Examples

Add Entry

Search Entries

Import from TSV

Lint Glossary

Sort Glossary

Benefits of Orchestrator Pattern

Migration from Old Servers

Development

Running Tests

Adding New Functionality

Component Interface Pattern

Security

Path Sanitisation

Source Path Resolution

Error Handling

Global Error Handlers

Configuration

Environment Variables

Constructor Injection

Logging

Troubleshooting

Tools Not Registering

Test with MCP Inspector

Configuration Issues

Path Sanitisation Errors

Component-Specific Issues

Entry Manager

Linter

Importer

Sorter

YAML File Issues

Debugging Logs

Common Error Messages

Performance Optimisation

Validation Checklist

Testing Individual Components

Additional Resources

Audit Logging

Testing

Test Coverage Areas

Module Guard

Version History

Migration from v2.0

Breaking Changes

Non-Breaking Enhancements

See Also

1. Entry Manager (`entry-manager.js`)

2. Search Engine (`search.js`)

3. Importer (`importer.js`)

4. File Manager (`file-manager.js`)

5. Linter (`linter.js`)

6. Sorter (`sorter.js`)

7. Utils (`utils.js`)