@harperfast/cortex-mcp-server
v1.0.2
Published
Remote MCP server that exposes Harper Cortex memory as tools to Claude, Cursor, Windsurf, and any MCP-compatible client
Readme
@harperfast/cortex-mcp-server
A remote MCP (Model Context Protocol) server that exposes Harper Cortex memory and development context as tools to Claude, Cursor, Windsurf, Copilot, and any MCP-compatible client.
This is the lowest-friction entry point into the Harper ecosystem. Users can add persistent, distributed memory to any AI tool by pointing it at a public URL — no local installation, no CLI, no code required.
Features
- Persistent Memory — Store and retrieve facts, decisions, and context using semantic search
- Multi-Client Support — Works with Claude (web, desktop, mobile), Claude Code, Cursor, Windsurf, Copilot, and any MCP-compatible client
- Server-Side Embeddings — No API keys needed on the client; Cortex handles all embedding with ONNX
- Multi-Agent Sharing — Share memory across agents and users with namespace isolation
- Self-Hosted — Deploy on your own infrastructure or Harper Cloud
- Production-Ready — Real database with ACID guarantees, no local files or SQLite limitations
Quick Start
1. Claude Desktop / Claude.ai
- Open Settings → Connectors → Add custom connector
- Enter the URL:
https://my-instance.harpercloud.com/mcp - (Optional) Authenticate with Bearer token if required
- You now have persistent memory tools available in Claude
2. Claude Code
# Add it as an MCP server (use port 9926 for Harper Fabric REST endpoints)
claude mcp add cortex -- npx @harperfast/cortex-mcp-server \
--url https://my-instance.harperfabric.com:9926 \
--token "[email protected]:password"Port note: Harper Fabric exposes custom resource endpoints (MemorySearch, MemoryStore, etc.) on port 9926. Port 9925 is the operations API and will return 404 for MCP requests.
3. Cursor / Windsurf
Add to your MCP configuration file (.cursor/mcp.json or similar):
{
"mcpServers": {
"cortex": {
"url": "https://my-instance.harpercloud.com/mcp",
"env": {
"CORTEX_TOKEN": "your-bearer-token"
}
}
}
}4. Local Development
# Clone the repo
git clone https://github.com/HarperFast/cortex-mcp-server.git
cd cortex-mcp-server
# Install dependencies
npm install
# Start in HTTP mode
npm run dev
# Or use npx directly
npx @harperfast/cortex-mcp-server --url https://my-cortex.harpercloud.com --port 3000Configuration
Environment Variables
CORTEX_URL(required) — URL of your Cortex instance (e.g.,https://my-instance.harpercloud.com)CORTEX_TOKEN(optional) — Bearer token for authenticationCORTEX_SCHEMA(optional) — Schema name in Cortex (default:data)PORT(optional) — Port to listen on for HTTP server (default:3000)HOST(optional) — Host to bind to (default:0.0.0.0)AUTH_REQUIRED(optional) — Require authentication (default:true)HTTP_SERVER(optional) — Use HTTP transport instead of stdio (default:false)MULTI_TENANT(optional) — Set to "true" to enable multi-tenant mode (JWT auth, namespace enforcement, rate limiting)JWKS_URL(optional) — JWKS endpoint for JWT validation (required in multi-tenant mode)ADMIN_TOKEN(optional) — Static token for admin API access
Command-Line Arguments
cortex-mcp-server \
--url https://my-cortex.harpercloud.com \
--token your-bearer-token \
--port 3000 \
--host localhost \
--no-auth \
--multi-tenant \
--jwks-url <url> \
--admin-token <token>WARNING: Running with --no-auth exposes all memory data without authentication. Only use in isolated development environments.
Available Tools
Standard Tools (always available)
| Tool | Description | Input | Output |
| ---------------- | -------------------------------------- | ------------------------------------------------------------ | ------------------------------ |
| memory_search | Search memories by semantic similarity | query, limit?, filters? | Results with similarity scores |
| memory_store | Store a new memory | text, source?, classification?, metadata? | Memory ID and timestamp |
| memory_recall | Retrieve a specific memory by ID | id | Full memory record |
| memory_forget | Delete a memory | id | Deletion confirmation |
| memory_count | Count stored memories | filters? | Total count |
| synapse_search | Search development context | query, projectId, limit?, filters? | Context entries with scores |
| synapse_ingest | Ingest context from a tool | source, content, projectId, parentId?, references? | Stored entries and count |
Admin Tools (multi-tenant mode only)
| Tool | Description |
| --------------------- | ------------------------------------------------------ |
| admin_create_tenant | Create a new tenant with namespace and security policy |
| admin_list_tenants | List all tenants, optionally filtered by status |
| admin_get_tenant | Get details for a specific tenant |
| admin_update_tenant | Update tenant name, tier, status, or quotas |
| admin_issue_token | Generate JWT claims for a tenant |
| admin_revoke_token | Revoke a specific JWT token |
Usage Examples
In Claude
You: "Remember that we use event-driven architecture for our order service"
Claude: [Uses memory_store] "Stored. I've saved that your order service uses event-driven architecture."
You: "What's the architecture for the order service?"
Claude: [Uses memory_search] "Based on our notes, your order service uses event-driven architecture."In Claude Code or Cursor
When ingesting context from your codebase:
[Claude/Cursor detects you're working on authentication]
[Uses synapse_ingest] Stores: "Intent: Implement JWT-based auth"
Later:
[Uses synapse_search] Retrieves: "Previous decision: JWT-based auth with 24h expiry"Deployment Options
Option 1: Standalone (npx)
npx @harperfast/cortex-mcp-server --url https://my-cortex.harpercloud.com --port 3000Runs on any Node.js host (VPS, laptop, container orchestration).
Option 2: Docker
# Build
docker build -t cortex-mcp-server:latest .
# Run
docker run \
-e CORTEX_URL=https://my-cortex.harpercloud.com \
-e CORTEX_TOKEN=your-token \
-p 3000:3000 \
cortex-mcp-server:latestOption 3: Harper Cloud (Custom Functions)
Deploy directly on Harper:
harper deploy cortex-mcp-serverThe MCP server runs in the same instance as your Cortex data, with zero additional infrastructure.
Option 4: Docker Compose
version: '3.8'
services:
cortex-mcp:
image: harperfast/cortex-mcp-server:latest
environment:
CORTEX_URL: https://my-cortex.harpercloud.com
CORTEX_TOKEN: ${CORTEX_TOKEN}
PORT: 3000
ports:
- "3000:3000"
restart: unless-stoppedAuthentication
The server supports two authentication modes:
Basic Auth (Harper Fabric): Pass credentials as user:password via --token or CORTEX_TOKEN. The server automatically Base64-encodes them for HTTP Basic Auth:
cortex-mcp-server --url https://my-cortex.harperfabric.com:9926 --token "[email protected]:password"Bearer Auth: Pass a pre-formatted Bearer token:
cortex-mcp-server --url https://my-cortex.harpercloud.com --token "Bearer eyJhbG..."In multi-tenant setups, include the user ID in the token:
Authorization: Bearer user-123:secret-tokenThe server extracts user-123 and scopes all memory operations to that user's namespace.
Auth Layering: Cortex relies on Harper/Fabric platform authentication. Ensure authentication.requireAuthentication is enabled in your Harper config to enforce security at the instance level.
Architecture
┌─────────────────────────────────────┐
│ Claude / Cursor / Windsurf / etc. │
│ (MCP-compatible client) │
└────────────┬────────────────────────┘
│
│ Streamable HTTP or Stdio
│ MCP Protocol
│
┌────────────▼────────────────────────┐
│ cortex-mcp-server │
│ │
│ ├─ memory_search │
│ ├─ memory_store │
│ ├─ memory_recall │
│ ├─ memory_forget │
│ ├─ memory_count │
│ ├─ synapse_search │
│ └─ synapse_ingest │
└────────────┬────────────────────────┘
│
│ HTTP + Bearer auth
│ @harperfast/cortex-client
│
┌────────────▼────────────────────────┐
│ Harper Cortex │
│ (Memory + Synapse database) │
│ │
│ ├─ Vector Search (ONNX) │
│ ├─ Metadata Filtering │
│ ├─ Multi-agent Namespaces │
│ └─ ACID Transactions │
└─────────────────────────────────────┘Development
Build from source
git clone https://github.com/HarperFast/cortex-mcp-server.git
cd cortex-mcp-server
npm install
npm run buildRun tests
npm testLocal development with live reload
npm run devThis starts the server in HTTP mode with hot reload. By default, it connects to http://localhost:8000 for Cortex.
Troubleshooting
"Connection refused" or 404 errors
- Check that
CORTEX_URLis correct and the Cortex instance is running - Harper Fabric users: Use port 9926 (REST endpoints), not 9925 (operations API)
- Verify network connectivity:
curl https://my-cortex.harperfabric.com:9926/MemoryCount -X POST -H "Content-Type: application/json" -d '{}'
"Authentication failed" or "Invalid character" error
- Ensure
CORTEX_TOKENis set correctly if your Cortex instance requires auth - For Harper Fabric, use
user:passwordformat — the server handles Base64 encoding automatically - If passing a pre-formatted header, prefix with
BasicorBearer(e.g.,Basic dXNlcjpwYXNz) - Check that the token hasn't expired
Memory not persisting
- Verify Cortex is using a persistent database (not in-memory)
- Check that the
CORTEX_SCHEMAmatches your Cortex configuration
Tool not appearing in client
- Restart your MCP client after deploying a new version
- Check that the server is running and reachable:
curl http://localhost:3000/health
MCP connection fails silently
If the MCP connection fails, Claude may silently fall back to local file-based memory. Verify the connection is active by:
curl http://localhost:3000/mcp/healthOr use Claude's built-in diagnostic:
/mcpThis shows all connected MCP servers and their status.
API Reference
For detailed API specifications, see cortex-client.
Data Handling & Compliance
Operators are responsible for ensuring Prohibited Data (PII, PHI, government IDs) is not stored unless covered by their Harper Order. This is specified in PaaS ToS Section 3.3.
All memory storage operations are protected by content sanitization that detects and blocks or sanitizes injection patterns, control characters, and oversized payloads. However, this protection assumes legitimate data. Do not store sensitive personal information without explicit legal coverage.
Security Model
Single-Tenant (Default)
The MCP server is designed for single-tenant deployment: one Cortex instance per team. Auth is handled by Harper's native HTTP auth layer (Basic auth or Bearer tokens configured at the instance level). The MCP server inherits this — no additional auth is needed beyond what Harper provides.
VectorSearch is intentionally excluded from MCP. The VectorSearch endpoint accepts pre-computed embedding vectors and is available for trusted server-to-server paths (e.g., LangChain running in your backend). It is not exposed through MCP because untrusted clients could craft adversarial vectors to poison the vector space or trick dedup into overwriting legitimate memories.
Multi-Tenant
For multi-tenant deployments where multiple users share a single Cortex instance, the server implements:
- JWT auth with RS256 JWKS validation — Tokens validated against JWKS endpoint for secure, stateless auth
- Server-side namespace enforcement — agentId bound from JWT ns claim, client values overwritten
- Per-tenant rate limiting with 3 tiers:
- Free: 60 reads/20 writes per minute
- Team: 300/100 per minute
- Enterprise: 1000/500 per minute
- Scope-based access control — memory:read, memory:write, synapse:read, synapse:write
- Token revocation with 60s cache TTL
- Content audit logging — All operations logged for compliance
Important: Ensure all tenants are provisioned as authorized Users under your Harper subscription (per PaaS ToS Section 3.2).
See docs/multi-tenant-design.md for the full architecture proposal.
Content Safety
All memory storage operations pass through content sanitization that:
- Detects and strips prompt injection patterns (system markers, instruction overrides, delimiter injection)
- Removes script tags, SQL-like injection, and control characters
- Enforces content length limits (16KB)
- Normalizes Unicode (NFKC)
- Blocks content with detected injection patterns (configurable: block vs. sanitize-and-store)
Retrieval also applies a lighter sanitization pass to prevent stored payloads from reaching LLM clients.
Production Deployment
Rate Limiting
Embedding generation and vector search are compute-intensive. In production:
- Harper deployment: Configure rate limits at the Harper instance level via
config.yamlor Fabric policies - Standalone deployment: Place an HTTP rate limiter (nginx, Cloudflare, express-rate-limit) in front of the MCP server
- Rate limits should be per-tenant for multi-tenant deployments, with separate budgets for reads vs. writes
Network Placement
- Internal/team use: Deploy alongside Cortex in your private network. No DMZ needed.
- Public-facing: Place the MCP server in a DMZ with strict ingress controls. Cortex must NOT be directly accessible from the internet. The MCP server acts as the auth boundary.
- OpenShell/NemoClaw: These environments block internal IPs by default. Cortex must be reachable at a routable HTTPS address. Use the standalone deployment mode with
CORTEX_URLpointing to your public Cortex endpoint.
Harper Deployment (Recommended)
When deployed as a Harper component, the MCP endpoint runs inside the Cortex process with direct table access. This eliminates network round-trips and inherits Harper's auth, TLS, and rate limiting automatically. See the harper/ directory.
License
MIT
Contributing
Contributions welcome! Please open issues and PRs on GitHub.
Related Projects
- Harper Cortex — The memory database powering this server
- cortex-client — TypeScript SDK for Cortex
- LangChain Harper Integration — Production RAG with Cortex
- OpenClaw — Multi-agent orchestration with shared memory
Support
- Docs: https://harperdb.io/docs
- Discord: https://discord.gg/harperdb
- Email: [email protected]
