@porast1/mcp-cognitive

v2.1.1

Published

4 months ago

Universal MCP server for AI knowledge persistence with Weaviate hybrid search

0High
0Medium
0Low

porast1

mcp cognitive knowledge-base weaviate ai hybrid-search

@porast1/mcp-cognitive

Universal MCP server that gives AI agents persistent, searchable memory backed by Weaviate hybrid search (BM25 keyword + vector embeddings + cross-encoder reranking).

Store facts, documents, code artifacts, and test artifacts. Query them with semantic search, filter by module/type/tags, reason across knowledge, detect contradictions, and keep the database healthy — all from 5 unified MCP tools.

Why

LLM agents lose context between sessions.
mcp-cognitive makes knowledge survive: conventions, architecture decisions, debugging insights, domain rules — anything worth remembering.

Hybrid search — BM25 keyword matching + transformer vector embeddings + reranking. Not just text search.
Multi-tenancy — one Weaviate instance, many projects. Data fully isolated via WEAVIATE_TENANT.
4 collections — CognitiveFact, DocumentChunk, CodeArtifact, TestArtifact. Facts, docs, code, tests.
5 unified MCP tools — recall, store, sync, analyze, admin. Consolidated from 18 v1 tools for cleaner agent interaction.
Agent profiles — tune recall per agent role (architect vs. tester vs. debugger).
Pattern detection — auto-detect conventions from git diffs via post-commit hooks.
Zero vendor lock-in — port/adapter architecture. Weaviate is the sole adapter today.

Quick Start

1. Start Weaviate

git clone https://github.com/porast1/weaviate-dev-stack
cd weaviate-dev-stack
docker-compose up -d
# ~1.5 GB RAM (Weaviate + text2vec-transformers + reranker-transformers)

2. Install the package

npm install @porast1/mcp-cognitive
# or
pnpm add @porast1/mcp-cognitive

3. Configure VS Code / Copilot

Copy the template config to your project:

# Copy config templates (mcp.jsonc, profiles.json, .env.example)
cp -r ./node_modules/@porast1/mcp-cognitive/config/ ./.cognitive/

# Copy .env.example to project root
cp ./.cognitive/.env.example ./.env

Then add to .vscode/mcp.json (or copy from .cognitive/mcp.jsonc):

{
  "servers": {
    "cognitive": {
      "command": "node",
      "args": ["./node_modules/@porast1/mcp-cognitive/dist/server.js"],
      "env": {
        "WEAVIATE_URL": "localhost:8200",
        "WEAVIATE_TENANT": "myproject", // isolates your data
        "WORKSPACE_ROOT": "${workspaceFolder}",
        "COGNITIVE_PROFILES_FILE": "./.cognitive/profiles.json", // optional
      },
    },
  },
}

4. Done

The MCP server registers 5 unified tools automatically. Your AI agent can now store and recall knowledge.

Environment Variables

Full reference in config/.env.example. Key variables:

| Variable | Default | Description | | ------------------------- | ---------------- | ----------------------------------------------------- | | WEAVIATE_URL | localhost:8200 | Weaviate HTTP endpoint | | WEAVIATE_TENANT | (none) | Tenant name for data isolation (strongly recommended) | | WEAVIATE_GRPC_PORT | (auto) | gRPC port override | | WORKSPACE_ROOT | process.cwd() | Project root for resolving citation file paths | | COGNITIVE_PROJECT | default | Project identifier for multi-project tagging | | COGNITIVE_PROFILES_FILE | (none) | Path to agent profiles JSON | | COGNITIVE_PATTERNS_FILE | (none) | Path to pattern definitions for post-commit hook | | COGNITIVE_MAX_ARCHIVES | 20 | Max archive operations per session (safety guard) | | COGNITIVE_MAX_COMPACTS | 10 | Max compact operations per session (safety guard) | | COGNITIVE_ALLOW_RESET | false | Enable resetTenant() — test environments only |

Tools Reference (v2 — 5 unified tools)

v2.0.0 consolidated 18 v1 tools into 5 unified tools. Old v1 tool names are preserved in src/tools/v1/ for reference.

`cognitive_recall` — Unified search across all collections

Search facts, documents, code artifacts, and test artifacts with a single tool. Uses hybrid BM25 + vector search with cross-encoder reranking.

| Param | Description | |-------|-------------| | query | Semantic or keyword search query. Optional for scope: "facts" (filter-only browsing). | | scope | "all" (default) | "facts" | "docs" | "code" | "tests" | | module | Filter by bounded context / module | | modules | Multi-module parallel query (OR logic) | | types | Filter by fact types (facts scope only) | | tags | Filter by tags — all must match (facts scope only) | | agent | Agent name for profile-based filtering | | docType | "DDD", "ARCHITECTURE", "GUIDE", "README" (docs scope only) | | layer | "domain", "application", "infrastructure" (code scope only) | | artifactType | "entity", "usecase", "port", "adapter" (code scope only) | | testType | "unit", "integration", "e2e" (tests scope only) | | limit | Max results per scope (default: 10) |

Examples:

cognitive_recall({ query: "cross-BC isolation rules" })
cognitive_recall({ query: "Email value object", scope: "code", module: "identity" })
cognitive_recall({ scope: "facts", tags: ["architecture"], types: ["invariant"] })

`cognitive_store` — Create, update, or archive facts

Unified fact management replacing store, update, and forget.

| Mode | Params | Description | |------|--------|-------------| | Create | fact, type, module, citations?, tags?, confidence?, dryRun? | Store a new fact. Use dryRun: true to check for duplicates first. | | Update | id, + any patch fields (confidence?, addTags?, removeTags?, type?, module?) | Patch existing fact in-place. | | Archive | id, archive: true, reason | Soft-delete with audit trail. |

Examples:

cognitive_store({ fact: "Domain must be pure TS", type: "invariant", module: "architecture", citations: ["ARCH.md:L97"] })
cognitive_store({ id: "uuid", confidence: 0.95, addTags: ["verified"] })
cognitive_store({ id: "uuid", archive: true, reason: "outdated after refactor" })

`cognitive_sync` — Batch & single-file synchronization

Sync documents, code artifacts, and test artifacts into the knowledge base.

| Mode | Params | Description | |------|--------|-------------| | Batch | project, scope? ("all" | "docs" | "code" | "tests") | Sync all files from .cognitive/sync.json config. | | Single-file | project, filePath | Auto-detects collection from file extension. |

Supports dryRun: true for preview. Uses checksum-based change detection — unchanged files are skipped.

`cognitive_analyze` — Correlate artifacts & reason about knowledge

| Action | Params | Description | |--------|--------|-------------| | correlate | sourceCollection, sourceQuery or sourceId, targetCollections?, minScore? | Find semantically related artifacts across docs/code/tests. Scores: >0.65 = strong, 0.5–0.65 = partial. | | reason | question, modules?, depth? ("shallow" | "deep") | Synthesize structured answer with contradiction detection. |

Examples:

cognitive_analyze({ action: "correlate", sourceCollection: "DocumentChunk", sourceQuery: "user authentication" })
cognitive_analyze({ action: "reason", question: "What are the cross-BC isolation rules?" })

`cognitive_admin` — Maintenance & infrastructure

| Action | Params | Description | |--------|--------|-------------| | health | (none) | Weaviate infrastructure check: nodes, collections, tenants. | | stats | module? | Quick dashboard: counts by type/module/status, top tags, activity. | | audit | (none) | Deep quality scan: stale facts, broken citations, duplicates, health score. | | compact | mode ("preview" | "execute"), idA?, idB? | Find & merge duplicate facts. Preview first, then execute. | | verify | id? | Check citation integrity (files still exist on disk). Omit id to verify all. | | timeline | since?, until?, module?, types?, status?, order? | Chronological fact view with date filtering and daily grouping. |

Fact Types & Lifecycle

| Type | TTL | Use for | | ------------- | ------------- | --------------------------------------------- | | invariant | ∞ | Permanent architecture rules. Confidence: 1.0 | | policy | Until changed | Business decisions that may evolve | | convention | 6 months | Team conventions, re-audit periodically | | observation | 30 days | Debugging insights, re-verify when recalled | | ephemeral | 7 days | Temporary notes, auto-expire |

Agent Profiles

Profiles tune recall behavior per agent role. See config/profiles.json for a full example.

{
  "architect-agent": {
    "name": "architect-agent",
    "priorityTypes": ["invariant", "policy", "convention"],
    "boostTags": ["architecture", "design"],
    "suppressTags": ["test-only"],
    "maxRecall": 10,
    "alpha": 0.7
  }
}

alpha controls hybrid search balance: 0.0 = keyword-heavy (BM25), 1.0 = semantic-heavy (vector).

Use agent parameter in cognitive_recall to activate a profile:

cognitive_recall({ query: "cross-BC rules", agent: "architect-agent" })

Pattern Detection (Git Hooks)

Auto-detect conventions from git diffs. See examples/patterns/ for pattern templates.

# Set up
export COGNITIVE_PATTERNS_FILE=./.cognitive/patterns.ts

# After each commit
npx tsx ./node_modules/@porast1/mcp-cognitive/dist/hooks/post-commit.js

CLI Tools

npx mcp-cognitive-sync     # Sync markdown docs → knowledge base
npx mcp-cognitive-audit    # Audit stale facts and broken citations
npx mcp-cognitive-verify   # Verify citation file references
npx mcp-cognitive-stale    # List stale facts that need re-verification

Development

Code Quality Tools:

# ESLint — TypeScript linting (181 warnings tracked, 0 errors)
pnpm lint              # Check for issues
pnpm lint:fix          # Auto-fix where possible

# Prettier — Code formatting
pnpm format            # Format all files
pnpm format:check      # Check formatting status

# Tests
pnpm test              # Run all 162 tests
pnpm test:watch        # Watch mode for development

# Build
pnpm build             # Compile TypeScript → dist/

Linting Rules:

✅ Basic: unused vars, explicit any, non-null assertions
✅ Code style: prefer const, no var, template literals
✅ Safety: no debugger, eqeqeq
⚠️ Tests: relaxed rules (any allowed, console.log allowed)

Configuration:

eslint.config.mjs — flat config with TypeScript parser
.prettierrc — single quotes, 100 print width, trailing commas
.prettierignore — excludes dist/, node_modules/, backups

Testing

Test Tenant Isolation: All integration tests use dedicated test tenants to prevent production data loss.

# First-time setup: Create test tenants in Weaviate
cd weaviate-dev-stack
./scripts/create-tenant.sh mcp-cognitive-test
./scripts/create-tenant.sh test-isolation-a
./scripts/create-tenant.sh test-isolation-b

# Run tests (uses TEST_TENANT constants, never touches production data)
pnpm test                  # 162 tests covering all 5 unified tools + collections

Test Coverage:

✅ CRUD operations: store, recall, update, archive (forget), verify
✅ Search: hybrid (BM25 + vector), filters (type, module, tags, confidence)
✅ Indexing: document sync, code artifacts, test artifacts
✅ Analysis: stats, timeline, audit, health, correlate
✅ Safety: archive/compact guards, decay tracking, citation validation
✅ Phase 8: multi-module recall, optional query, dryRun mode
✅ Multi-tenant isolation: resetTenant, store UUID5, recall, archive (4 tests)

Tenant Isolation Guarantees:

resetTenant() operates only on this.tenantId — other tenants remain untouched
COGNITIVE_ALLOW_RESET global flag protects ALL tenants unless explicitly set
Verified: reset tenant A → tenant B data survives (see tenant-isolation.test.ts)
Verified: UUID5 deterministic IDs work correctly across isolated tenants
Verified: recall() queries never cross tenant boundaries
Verified: archive() in one tenant does not affect facts in other tenants

Production Safety: Test utilities in tests/weaviate-test-utils.ts enforce isolation via TEST_TENANT constant. Production tenant (e.g., WEAVIATE_TENANT=diamondpage) is never accessed during tests.

Architecture

┌──────────────────────────────────────────────┐
│  MCP Protocol (stdio)                        │
├──────────────────────────────────────────────┤
│  server.ts  — auto-registers 5 tools          │
├──────────────────────────────────────────────┤
│  tools/*.tool.ts  — Zod schema + execute fn  │
├──────────────────────────────────────────────┤
│  ports/cognitive-store.port.ts  — interface   │
├──────────────────────────────────────────────┤
│  adapters/weaviate-v3.adapter.ts  — impl      │
│  (BM25 + vector + gRPC + multi-tenancy)      │
└──────────────────────────────────────────────┘

Collections:

CognitiveFact — structured knowledge with confidence, citations, tags
DocumentChunk — markdown docs split by section
CodeArtifact — TypeScript files with AST metadata
TestArtifact — test files with describe/it block text

Weaviate Infrastructure

This package requires a running Weaviate instance with text2vec-transformers and reranker-transformers modules.

Recommended: Use weaviate-dev-stack — a pre-configured Docker Compose stack:

git clone https://github.com/porast1/weaviate-dev-stack
cd weaviate-dev-stack
docker-compose up -d

| Port | Service | | ------- | ----------------- | | 8200 | Weaviate HTTP API | | 50052 | Weaviate gRPC |

Multi-tenancy is handled automatically. Each project gets its own tenant via WEAVIATE_TENANT.

Project Setup Template

Recommended structure in your project:

your-project/
├── .cognitive/
│   ├── profiles.json        # Agent profiles (from config/profiles.json)
│   ├── patterns.ts          # Pattern definitions (optional, see examples/)
│   └── pending-facts.json   # Auto-detected facts by post-commit hook (git-ignored)
├── .vscode/
│   └── mcp.json             # MCP server config (from config/mcp.jsonc)
├── .env                     # Environment variables (from config/.env.example)
└── .gitignore               # Add: .cognitive/pending-facts.json

See examples/ for pattern & profile templates with detailed documentation.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@porast1/mcp-cognitive

Why

Quick Start

1. Start Weaviate

2. Install the package

3. Configure VS Code / Copilot

4. Done

Environment Variables

Tools Reference (v2 — 5 unified tools)

cognitive_recall — Unified search across all collections

cognitive_store — Create, update, or archive facts

cognitive_sync — Batch & single-file synchronization

cognitive_analyze — Correlate artifacts & reason about knowledge

cognitive_admin — Maintenance & infrastructure

Fact Types & Lifecycle

Agent Profiles

Pattern Detection (Git Hooks)

CLI Tools

Development

Testing

Architecture

Weaviate Infrastructure

Project Setup Template

License

`cognitive_recall` — Unified search across all collections

`cognitive_store` — Create, update, or archive facts

`cognitive_sync` — Batch & single-file synchronization

`cognitive_analyze` — Correlate artifacts & reason about knowledge

`cognitive_admin` — Maintenance & infrastructure