@silkspace/llm-kg-mcp

v0.1.3

Published

8 months ago

MCP server for LLM Knowledge Graph - semantic memory for MCP clients with Graphistry visualization

Downloads

0High
0Medium
0Low

silkspace

mcp knowledge-graph semantic-search embeddings llm ai graphistry visualization

Knowledge Graph - Claude's Persistent Memory

A semantic knowledge graph that lets Claude remember everything you discover together.

Dependencies

Required:

Claude Code - AI coding assistant that orchestrates the system
Firecrawl MCP - For scraping web content
Python 3.10+ with uv package manager
Anthropic API key (for Claude analysis)

Python packages (auto-installed):

sentence-transformers - Local embeddings (all-MiniLM-L6-v2)
usearch - Fast vector similarity search
duckdb - Embedded analytics database
anthropic - Claude API client
pydantic - Type-safe models

What This Actually Does

You're vibing with Claude, researching some deep topic. Claude finds amazing content, analyzes it, and... it's gone next session.

Not anymore.

This knowledge graph is Claude's long-term memory. Every document Claude analyzes gets:

✨ Embedded into semantic space (384-dim vectors)
🧠 Clustered with similar concepts automatically
🔍 Searchable with natural language queries
💾 Persistent across all your sessions

It's like giving Claude a self-organizing wiki that grows smarter as you work.

The Flow

During a session:

You: "Claude, find me papers on category theory and dynamical systems"

Claude: [scrapes 10 papers with Firecrawl MCP]
        [saves to batch file]
        cat batch.json | ./batch_ingest.sh

        → Each paper analyzed by Claude API
        → Embedded into semantic space
        → Auto-clustered by similarity
        → Stored in knowledge graph

You: "What did we learn about functors?"

Claude: kg query "functors category theory"

        → Returns top 5 semantically similar documents
        → Each with summary, learnings, insights
        → Sorted by cosine similarity

You: "Show me that operad paper again"

Claude: kg query "operads algebraic topology"

        → Instant semantic search
        → Zero API calls, all local

Next session (days later):

You: "Remember those dynamical systems papers?"

Claude: kg query "dynamical systems chaos"

        → Knowledge graph still there
        → Semantic clusters intact
        → Everything remembered

Quick Start

Prerequisites: Claude Code with Firecrawl MCP configured (see Setup below)

# 1. Start the embedding service (once per boot)
source .venv/bin/activate
python embedding_service.py &

# 2. Query the knowledge graph (works from any directory)
kg query "quantum mechanics entanglement" 5
kg stats
kg show cluster_14.850

To add documents: Ask Claude Code in chat:

"Add https://arxiv.org/abs/2301.12345 to the knowledge graph"

Claude will scrape with Firecrawl MCP and automatically ingest.

For Claude: How To Use This

When the user asks you to research or remember something:

Option 1: MCP Tools (Recommended)

If the MCP server is configured, use these tools:

Add to knowledge graph:

# After scraping with Firecrawl
mcp__llm_kg__kg_add(
    content=scraped["markdown"],
    url=scraped["url"],
    title=scraped["metadata"]["title"]
)

Search the knowledge graph:

mcp__llm_kg__kg_query(
    query="quantum mechanics entanglement",
    top_k=5
)

Get statistics:

mcp__llm_kg__kg_stats()

Get specific document:

mcp__llm_kg__kg_get_document(doc_id="cluster_14.850")

List clusters:

mcp__llm_kg__kg_list_clusters()

Option 2: Direct Python API

For more control or when MCP is not available:

kg.add() - Maximally Flexible

kg.add() accepts ANYTHING:

Firecrawl result dict → ✅ Added
Any dict with content → ✅ Added
Raw markdown string → ✅ Added
Document object → ✅ Added
Arbitrary dict with no expected keys → ✅ Stringified and added

Primary workflow:

# 1. Scrape with Firecrawl MCP
scraped = mcp__firecrawl_mcp__firecrawl_scrape(
    url="https://example.com/paper.pdf",
    formats=["markdown"]
)

# 2. Add to knowledge graph - just pass the dict!
from kg_production import KnowledgeGraph

kg = KnowledgeGraph()

result = kg.add(scraped)  # That's it!
# → Auto-analyzes with Claude API
# → Auto-embeds semantically
# → Auto-clusters by similarity
# → Auto-stores in DuckDB

# Check result
if result['status'] == 'success':
    print(f"✅ {result['area_code']} - {result['cluster_name']}")

Alternative - add raw content:

# Just markdown text
kg.add("# My Research Notes\n\nLots of content here...")

# Any dict
kg.add({'experiment': 'data', 'notes': 'observations'})

# All work - kg.add() figures it out!

Searching the knowledge graph:

kg query "quantum mechanics entanglement" 5
# Returns top 5 semantically similar documents

Checking status:

kg stats  # Total docs, clusters, dimensions
kg show cluster_14.850  # Specific document details

Batch processing (for multiple URLs):

# After scraping many URLs with Firecrawl:
cat scraped_batch.json | ./batch_ingest.sh

Advanced: Direct Python API

For bespoke pipelines when you need more control:

from kg_production import KnowledgeGraph

kg = KnowledgeGraph()

# Add a document - kg.add() accepts dict, ScrapedDocument, or str
result = kg.add({
    "url": "https://example.com/paper.pdf",
    "markdown": content,
    "metadata": {"title": "Paper Title"}
})
# → Automatically analyzed, embedded, clustered, stored

# Query the graph
results = kg.query("quantum mechanics", top_k=5)
for doc in results:
    print(f"{doc['similarity']:.3f} - {doc['title']}")

# Access database directly
stats = kg.get_stats()
df = kg.db.get_all_documents_df()

What Makes This Rad

🚀 40x faster than cold starts

Keeps sentence-transformers model warm in memory
HTTP service on localhost:8765
Sub-second embeddings

🎯 Semantic clustering

Documents find their natural neighbors
No manual organization
Area codes like cluster_14.529 show topology

🧠 Claude analysis

Extracts learnings, not just keywords
"What does this teach?" not "What words appear?"
Questions raised by content

💎 Production-quality

XDG-compliant paths (~/.local/share/knowledge_graph/)
Type-safe Pydantic models
USearch for O(log n) similarity search
DuckDB for fast analytics

🏡 Fully local

No API costs for embeddings
Works offline
Your data stays on your machine

Under The Hood

┌──────────────────────────────────────┐
│  You + Claude (vibing)               │
└──────────────────────────────────────┘
              ↓
┌──────────────────────────────────────┐
│  Firecrawl MCP                       │
│  → Scrape web content                │
└──────────────────────────────────────┘
              ↓
┌──────────────────────────────────────┐
│  Claude Analysis                     │
│  → Extract learnings & insights      │
└──────────────────────────────────────┘
              ↓
┌──────────────────────────────────────┐
│  Semantic Embedding                  │
│  → 384-dim vectors (local, warm)     │
│  → HTTP: localhost:8765              │
└──────────────────────────────────────┘
              ↓
┌──────────────────────────────────────┐
│  Clustering                          │
│  → USearch finds similar docs        │
│  → Auto-assigns area codes           │
└──────────────────────────────────────┘
              ↓
┌──────────────────────────────────────┐
│  Storage                             │
│  → DuckDB: Fast analytics            │
│  → Vector Index: O(log n) search     │
│  → Location: ~/.local/share/...      │
└──────────────────────────────────────┘

Key Files:

kg - Main CLI (use this for everything!)
kg_production.py - Core KG system
kg_models.py - Type-safe interfaces
embedding_service.py - Warm embedding server
batch_ingest.sh - Batch processing pipeline

Example: Real Session

# Claude just researched Platonic Representation Hypothesis
$ kg query "platonic representations neural networks"

[1] Similarity: 0.793
    The Platonic Representation Hypothesis - Blog Post
    Area: cluster_14.850

    The Platonic Representation Hypothesis suggests that neural networks
    trained on different modalities converge to similar representations
    of reality, implying a universal structure to learned representations...

    Learnings:
    • Convergence across modalities suggests objective reality structure
    • Different architectures learn similar representations
    • Implications for AGI: universal cognitive primitives

    🔗 https://blog.research.google/platonic-representation.html

[2] Similarity: 0.688
    Representation Learning in Deep Networks
    Area: cluster_14.203
    ...

That's it. Claude remembered. Across sessions. Across topics. Forever.

Setup (First Time Only)

1. Install Claude Code

Get Claude Code from claude.ai/code and set up your Anthropic API key.

2. Install via npm (Recommended)

Easiest setup - no local paths needed:

Add to your Claude Code MCP settings (~/.claude/mcp_settings.json or via Claude Code settings):

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "@mendable/firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "your-firecrawl-api-key"
      }
    },
    "llm-kg": {
      "command": "npx",
      "args": ["-y", "@silkspace/llm-kg-mcp"]
    }
  }
}

Get a free Firecrawl API key at firecrawl.dev
Python dependencies auto-install via postinstall hook (using uv or pip)

Requirements:

Node.js 18+
Python 3.10+
uv (recommended) or pip

That's it! npx will download and run the MCP server automatically.

3. Or install from source (For Development)

Clone and install manually:

git clone https://github.com/silkspace/llm-knowledge-graph.git
cd llm-knowledge-graph

# Install dependencies with uv
uv sync

Then configure with local path:

{
  "mcpServers": {
    "llm-kg": {
      "command": "python",
      "args": ["/path/to/llm-knowledge-graph/kg_mcp_server.py"]
    }
  }
}

4. Configure Anthropic API key

Either set environment variable:

export ANTHROPIC_API_KEY="your-api-key"

Or create file:

echo "your-api-key" > ~/.anthropic_api_key

5. Start the embedding service

source .venv/bin/activate
python embedding_service.py &

# Verify it's running:
curl http://localhost:8765/health
# → {"status": "healthy", "model": "all-MiniLM-L6-v2", "dimensions": 384}

6. Add to PATH

echo '' >> ~/.zshrc
echo '# Knowledge graph CLI' >> ~/.zshrc
echo 'export PATH="$HOME/dev/knowledge-graph:$PATH"' >> ~/.zshrc

# Reload shell
source ~/.zshrc

7. Test it

# Check current stats (works from any directory!)
kg stats

# In Claude Code, ask:
"Add https://en.wikipedia.org/wiki/Knowledge_graph to the knowledge graph"

# Verify it was added:
kg stats  # Should show 1 more document
kg query "knowledge graph semantic web" 3

Done! Now Claude can remember everything across sessions.

Philosophy

Most knowledge tools make you organize. Tags, folders, hierarchies.

This knowledge graph doesn't. Documents find their own place in semantic space.

The topology emerges naturally from meaning.

Built for Claude Code. Built for deep research sessions. Built to remember.