@vinaes/succ

v1.5.42

Published

2 months ago

Semantic Understanding for Code Contexts — persistent memory for AI coding assistants (Claude Code, Cursor, Windsurf, Continue.dev)

Downloads

0High
0Medium
0Low

dg_cpz

claude claude-code memory rag embeddings ai mcp semantic-search knowledge-graph cursor windsurf

Persistent semantic memory for any MCP-compatible AI editor. Remember decisions, learn from mistakes, never lose context.

Works with

| Editor | Setup | |--------|-------| | Claude Code | succ init (auto-configured) | | Cursor | succ setup cursor | | Windsurf | succ setup windsurf | | Continue.dev | succ setup continue | | Codex | succ setup codex, then always launch via succ codex |

See Editor Guides for detailed setup.

Quick Start

npm install -g @vinaes/succ

cd your-project
succ init
succ index
succ index-code
succ analyze

That's it. Claude Code now has persistent memory for your project.

Features

| Feature | Description | |---------|-------------| | Hybrid Search | Semantic embeddings + BM25 keyword matching with cross-encoder reranking and AST symbol boost | | AST Code Indexing | Tree-sitter parsing for 21 languages — 13 with full symbol extraction, 8 grammar-only | | Code Scanning | Recursive code discovery and indexing via succ_index action="scan" with .succignore support | | Brain Vault | Obsidian-compatible markdown knowledge base with hierarchical summaries | | Persistent Memory | Decisions, learnings, patterns across sessions with auto-extraction | | Cross-Project | Global memories shared between all projects; cross-repo search | | Knowledge Graph | Directed graph with PPR, SCC, articulation points, bridge edges (code↔memory), LLM-enriched relations | | Graph Algorithms | Personalized PageRank, Louvain communities, Dijkstra shortest path, betweenness centrality, co-change analysis | | MCP Native | 15 consolidated tools — Claude uses succ tools directly | | Reranker | ONNX cross-encoder (ms-marco-MiniLM-L6-v2) for search result post-processing | | HyDE | Hypothetical Document Embeddings — LLM generates code snippets for NL queries to bridge embedding gap | | Late Chunking | Long-context embedding with per-AST-chunk pooling for context-aware chunks | | Web Search | Real-time web search via Perplexity Sonar (quick, quality, deep research) | | Web Fetch | Fetch any URL as clean Markdown via md.succ.ai (Readability + Playwright) | | LSP Integration | Language Server Protocol client for definition, references, hover queries | | Working Memory | Priority scoring, validity filtering, diversity, pinned memories | | Dynamic Hook Rules | Save memories that auto-fire as pre-tool rules — inject context, block, ask confirmation, or auto-approve permissions | | File-Linked Memories | Link memories to files; auto-recalled when editing those files | | Dead-End Tracking | Record failed approaches to prevent retrying | | Debug Sessions | Structured debugging with hypothesis testing, 13-language instrumentation | | Session Surgeon | Auto compact stats, trim tool content/thinking/images, manual compact with chain integrity | | Observability | Search latency, embedding times, LLM call metrics; retrieval feedback loop | | PRD Pipeline | Generate PRDs, parse into tasks, execute with quality gates | | Team Mode | Parallel task execution with git worktrees | | Multi-Backend Storage | SQLite, PostgreSQL, Qdrant — scale from laptop to cloud |

AST Code Indexing — Tree-sitter parsing for 21 languages (13 with full symbol extraction + 8 grammar-only); symbol-aware BM25 tokenization boosts function/class names in search results
Web Search — Real-time search via Perplexity Sonar through OpenRouter (quick $1/MTok, quality $3-15/MTok, deep research); search history tracking with cost auditing
PRD Pipeline — Generate PRDs from feature descriptions, parse into executable tasks, run with Claude Code agent, export workflow to Obsidian (Mermaid Gantt + dependency DAG)
Team Mode — Parallel task execution using git worktrees; each worker gets an isolated checkout, results merge via cherry-pick
Quality Gates — Auto-detected (TypeScript, Go, Python, Rust) or custom; run after each task to verify code quality
Graph Algorithms — Personalized PageRank (PPR) retrieval, Tarjan's SCC, articulation points, Dijkstra shortest path, betweenness centrality, Louvain communities with LLM summaries, bridge edges (code↔memory graph)
Cross-encoder Reranker — ONNX ms-marco-MiniLM-L6-v2 rescores (query, document) pairs; configurable weight, topK clamping, graceful degradation
HyDE — Hypothetical Document Embeddings via LLM for natural language → code search; tree-sitter AST code detection
Late Chunking — Long-context embedding (jina 8192 tokens) with per-AST-chunk pooling for context-aware embeddings
Hierarchical Summaries (RAPTOR-style) — bottom-up LLM summarization at file → directory → module → repo zoom levels with query routing
Code Scanning — succ_index action="scan" recursively discovers and indexes code files via git ls-files / directory walk with .succignore, size filtering, symlink rejection
Observability — search latency, embedding times, LLM call metrics; retrieval feedback loop for ranking adjustment
Auto-memory Extraction — session-end fact extraction via LLM with quality gate + periodic dimension-bucketed consolidation
Cross-repo Search — search across multiple succ-indexed repositories
Diff-brain Analysis — LLM-powered diff analysis for brain vault document changes
LSP Integration — language server protocol client, installer, and server registry (Kotlin + Swift added)
MCP Review Tool — succ_review for code review with blast-radius estimation
Co-change Analysis — git log mining to detect files frequently changed together
Brain Vault Export — structured export with metadata
API Versioning — /v1/ route prefix aliases for all daemon endpoints
Graph Enrichment — LLM-classified relations (implements, leads_to, contradicts...), contextual proximity, Label Propagation communities, degree centrality with recall boost
Dead-End Tracking — Record failed approaches; auto-boosted in recall to prevent retrying
AGENTS.md Auto-Export — Auto-generate editor instructions from decisions, patterns, dead-ends
Learning Delta — Track knowledge growth per session (memories added, types, quality)
Confidence Retention — Time-decay scoring with auto-cleanup of low-value memories
Safe Consolidation — Soft-delete with undo support; no data loss on merge
Skill Discovery — Auto-suggest relevant skills based on user prompt (opt-in, disabled by default)
Skyll Integration — Access community skills from Skyll registry (requires skills.enabled = true)
Soul Document — Define AI personality and values
Dynamic Hook Rules — Memories tagged hook-rule auto-fire before matching tool calls; filter by tool:{Name} and match:{regex} tags; error type blocks, pattern asks confirmation, allow type auto-approves permission dialogs (v2.1.63+), others inject as context
PermissionRequest — Auto-approve or deny Claude Code permission dialogs based on memory rules (requires Claude Code v2.1.63+)
HTTP Hooks — Direct HTTP hooks to daemon (no process spawn) for faster, more reliable hook execution (requires Claude Code v2.1.63+, auto-detected at succ init)
File-Linked Memories — Attach memories to files via files parameter; pre-tool hook auto-recalls related memories when editing those files
Auto-Hooks — Context injection at session start/end
Idle Reflections — AI generates insights during idle time
Session Context — Auto-generated briefings for next session
Security Hardening — 3-tier prompt injection detection (structural + multilingual regex + embedding semantic), content sanitization for 13 entry points, Bell-LaPadula IFC with compartments, file operation guards, exfiltration detection, post-tool secret scanning
LLM Guardrails — Optional Tier 3 LLM classification (Llama Guard, safeguard-20b) for sensitivity, code policy (OWASP SC2-SC7), and injection detection with LRU caching
Sensitive Filter — Detect and redact PII, API keys, secrets
Quality Scoring — Local ONNX classification to filter noise
Token Savings — Track RAG efficiency vs full files
Temporal Awareness — Time decay, validity periods, point-in-time queries
Unified Daemon — Single background process for watch, analyze, idle tracking
Watch Mode — Auto-reindex on file changes via @parcel/watcher
Fast Analyze — --fast mode with fewer agents and smaller context for quick onboarding
Incremental Analyze — Git-based change detection, skip unchanged agents
Local LLM — Ollama, LM Studio, llama.cpp support
Sleep Agent — Offload heavy operations to local LLM
Checkpoints — Backup and restore full succ state
AI-Readiness Score — Measure project readiness for AI collaboration
Multiple LLM Backends — Local (Ollama), OpenRouter, or Claude CLI
Storage Backends — SQLite (default), PostgreSQL + pgvector, Qdrant
Data Migration — Export/import JSON, migrate between backends

Claude Code Agents

succ ships with 20 specialized agents in .claude/agents/ that run as subagents inside Claude Code:

| Agent | What it does | |-------|-------------| | succ-explore | Codebase exploration powered by semantic search | | succ-plan | TDD-enforced implementation planning with red-green-refactor cycles | | succ-code-reviewer | Full code review with OWASP Top 10 checklist — works with any language | | succ-diff-reviewer | Fast pre-commit diff review for security, bugs, and regressions | | succ-deep-search | Cross-search memories, brain vault, and code | | succ-memory-curator | Consolidate, deduplicate, and clean up memories | | succ-memory-health-monitor | Detect decayed, stale, or low-quality memories | | succ-pattern-detective | Surface recurring patterns and anti-patterns from sessions | | succ-session-handoff-orchestrator | Extract summary and briefing at session end | | succ-session-reviewer | Review past sessions, extract missed learnings | | succ-decision-auditor | Find contradictions and reversals in architectural decisions | | succ-knowledge-indexer | Index documentation and code into the knowledge base | | succ-knowledge-mapper | Maintain knowledge graph, find orphaned memories | | succ-checkpoint-manager | Create and manage state backups | | succ-context-optimizer | Optimize what gets preloaded at session start | | succ-quality-improvement-coach | Analyze memory quality, suggest improvements | | succ-readiness-improver | Actionable steps to improve AI-readiness score | | succ-general | General-purpose agent with semantic search, web search, and all tools | | succ-debug | Structured debugging — hypothesize, instrument, reproduce, fix with dead-end tracking | | succ-style-tracker | Track communication style changes, update soul.md and brain vault |

Agents are auto-discovered by Claude Code from .claude/agents/ and can be launched via the Task tool with subagent_type.

Commands

| Command | Description | |---------|-------------| | succ init | Interactive setup wizard | | succ setup <editor> | Configure MCP for any editor | | succ codex-chat | Launch Codex chat with succ briefing/hooks | | succ analyze | Generate brain vault with Claude agents | | succ index [path] | Index files for semantic search | | succ scan-code [path] | Recursive code discovery and indexing | | succ search <query> | Semantic search in brain vault | | succ remember <content> | Save to memory | | succ memories | List and search memories | | succ watch | Watch for changes and auto-reindex | | succ daemon <action> | Manage unified daemon | | succ prd generate | Generate PRD from feature description | | succ prd run | Execute PRD tasks with quality gates | | succ session analyze | Token breakdown by type and tool name | | succ session trim | Trim tool content from session transcript | | succ session compact | Manual compact with dialogue summary | | succ status | Show index statistics |

| Command | Description | |---------|-------------| | succ index-code [path] | Index source code (AST chunking via tree-sitter) | | succ index --memories | Re-embed all memories with current embedding model | | succ reindex | Detect and fix stale/deleted index entries | | succ chat <query> | RAG chat with context | | succ train-bpe | Train BPE vocabulary from indexed code | | succ forget | Delete memories | | succ graph <action> | Knowledge graph: stats, auto-link, enrich, proximity, communities, centrality | | succ consolidate | Merge duplicate memories (soft-delete with undo) | | succ agents-md | Generate .claude/AGENTS.md from memories | | succ progress | Show learning delta history | | succ retention | Memory retention analysis and cleanup | | succ soul | Generate personalized soul.md | | succ config | Interactive configuration | | succ stats | Show token savings statistics | | succ checkpoint <action> | Create, restore, or list checkpoints | | succ score | Show AI-readiness score | | succ prd parse <file> | Parse PRD markdown into tasks | | succ prd list | List all PRDs | | succ prd status [id] | Show PRD status and tasks | | succ prd archive [id] | Archive a PRD | | succ prd export [id] | Export PRD workflow to Obsidian (Mermaid diagrams) | | succ session trim | Trim tool content from session (--tools, --only-inputs, --only-results) | | succ session trim-thinking | Trim thinking blocks only | | succ session trim-all | Trim all strippable content (tools, thinking, images) | | succ session compact | Manual compact with dialogue summary and chain integrity | | succ clear | Clear index and/or memories | | succ benchmark | Run performance benchmarks | | succ migrate | Migrate data between storage backends |

succ init

succ init                # Interactive mode
succ init --yes          # Non-interactive (defaults)
succ init --force        # Reinitialize existing project

Creates .succ/ structure, configures MCP server, sets up hooks.

succ analyze

succ analyze             # Run via Claude CLI (recommended)
succ analyze --fast      # Fast mode (fewer agents, smaller context)
succ analyze --force     # Force full re-analysis (skip incremental)
succ analyze --local     # Use local LLM (Ollama, LM Studio)
succ analyze --openrouter # Use OpenRouter API
succ analyze --background # Run in background

Generates brain vault structure:

.succ/brain/
├── CLAUDE.md              # Navigation hub
├── project/               # Project knowledge
│   ├── technical/         # Architecture, API, Conventions
│   ├── systems/           # Core systems/modules
│   ├── strategy/          # Project goals
│   └── features/          # Implemented features
├── knowledge/             # Research notes
└── archive/               # Old/superseded

succ watch

succ watch               # Start watch service (via daemon)
succ watch --ignore-code # Watch only docs
succ watch --status      # Check watch service status
succ watch --stop        # Stop watch service

succ daemon

succ daemon status       # Show daemon status
succ daemon sessions     # List active Claude Code sessions
succ daemon start        # Start daemon manually
succ daemon stop         # Stop daemon
succ daemon logs         # Show recent logs

succ prd

succ prd generate "Add JWT authentication"   # Generate PRD + parse tasks
succ prd run                                  # Execute sequentially (default)
succ prd run --mode team                      # Execute in parallel (git worktrees)
succ prd run --mode team --concurrency 5      # Parallel with 5 workers
succ prd run --resume                         # Resume interrupted run
succ prd run --dry-run                        # Preview execution plan
succ prd status                               # Show latest PRD status
succ prd list                                 # List all PRDs
succ prd export                               # Export latest PRD to Obsidian
succ prd export --all                         # Export all PRDs
succ prd export prd_abc123                    # Export specific PRD

Team mode runs independent tasks in parallel using git worktrees for isolation. Each worker gets its own checkout; results merge via cherry-pick. Quality gates (typecheck, test, lint, build) run automatically after each task.

Export generates Obsidian-compatible markdown with Mermaid diagrams (Gantt timeline, dependency DAG), per-task detail pages with gate results, and wiki-links between pages. Output goes to .succ/brain/prd/.

Configuration

No API key required. Uses local embeddings by default.

{
  "llm": {
    "embeddings": {
      "mode": "local",
      "model": "Xenova/all-MiniLM-L6-v2"
    }
  },
  "chunk_size": 500,
  "chunk_overlap": 50
}

Local (default):

{
  "llm": { "embeddings": { "mode": "local" } }
}

Ollama (unified namespace):

{
  "llm": {
    "embeddings": {
      "mode": "api",
      "model": "nomic-embed-text",
      "api_url": "http://localhost:11434/v1/embeddings"
    }
  }
}

OpenRouter:

{
  "embedding_mode": "openrouter",
  "openrouter_api_key": "sk-or-..."
}

MRL dimension override (Matryoshka models):

{
  "llm": {
    "embeddings": {
      "mode": "api",
      "model": "nomic-embed-text-v1.5",
      "api_url": "http://localhost:11434/v1/embeddings",
      "dimensions": 256
    }
  }
}

succ uses native ONNX Runtime for embedding inference with automatic GPU detection:

| Platform | Backend | GPUs | |----------|---------|------| | Windows | DirectML | AMD, Intel, NVIDIA | | Linux | CUDA | NVIDIA | | macOS | CoreML | Apple Silicon | | Fallback | CPU | Any |

GPU is enabled by default. No manual configuration needed — the best available backend is auto-detected.

{
  "gpu_enabled": true,
  "gpu_device": "directml"
}

Set gpu_device to override auto-detection: cuda, directml, coreml, or cpu.

{
  "idle_watcher": {
    "enabled": true,
    "idle_minutes": 2,
    "check_interval": 30,
    "min_conversation_length": 5
  }
}

Automatically run the succ-diff-reviewer agent before every git commit to catch security issues, bugs, and regressions:

{
  "preCommitReview": true
}

When enabled, Claude will run a diff review before each commit. Critical findings block the commit; high findings trigger a warning.

Disabled by default. Set via succ_config(action="set", key="preCommitReview", value="true").

When running with --dangerously-skip-permissions, succ's security guards can block autonomous operations. Enable trustAgentPermissions to downgrade deny/ask to context warnings:

{
  "security": {
    "trustAgentPermissions": true
  }
}

Injection detection stays ON (protects the agent). See Security Hardening for details.

Offload heavy operations to local LLM:

{
  "idle_reflection": {
    "sleep_agent": {
      "enabled": true,
      "mode": "local",
      "model": "qwen2.5-coder:14b",
      "api_url": "http://localhost:11434/v1"
    }
  }
}

succ supports multiple storage backends for different deployment scenarios:

| Setup | Use Case | Requirements | |-------|----------|--------------| | SQLite + sqlite-vec | Local development (default) | None | | PostgreSQL + pgvector | Production/cloud | PostgreSQL 15+ with pgvector | | SQLite + Qdrant | Local + powerful vector search | Qdrant server | | PostgreSQL + Qdrant | Full production scale | PostgreSQL + Qdrant |

Example: PostgreSQL + pgvector

{
  "storage": {
    "backend": "postgresql",
    "postgresql": {
      "connection_string": "postgresql://user:pass@localhost:5432/succ"
    }
  }
}

Example: PostgreSQL + Qdrant

{
  "storage": {
    "backend": "postgresql",
    "vector": "qdrant",
    "postgresql": { "connection_string": "postgresql://..." },
    "qdrant": { "url": "http://localhost:6333" }
  }
}

See Storage Configuration for all options.

succ supports multiple LLM backends for operations like analyze, idle reflection, and skill suggestions:

{
  "llm": {
    "type": "local",
    "model": "qwen2.5:7b",
    "local": {
      "endpoint": "http://localhost:11434/v1/chat/completions"
    },
    "openrouter": {
      "model": "anthropic/claude-3-haiku"
    }
  }
}

| Key | Values | Default | Description | |-----|--------|---------|-------------| | llm.type | local / openrouter / claude | local | LLM provider | | llm.model | string | per-type | Model name for the active type | | llm.transport | process / ws / http | auto | How to talk to the backend |

Transport auto-selects based on type: claude uses process (or ws for persistent WebSocket), local/openrouter use http.

WebSocket transport (transport: "ws") keeps a persistent connection to Claude CLI, avoiding process spawn overhead on repeated calls:

{
  "llm": {
    "type": "claude",
    "model": "sonnet",
    "transport": "ws"
  }
}

Per-backend model overrides for the fallback chain:

{
  "llm": {
    "type": "claude",
    "model": "sonnet",
    "transport": "ws",
    "local": { "endpoint": "http://localhost:11434/v1/chat/completions", "model": "qwen2.5:7b" },
    "openrouter": { "model": "anthropic/claude-3-haiku" }
  }
}

Claude backend usage

The claude backend integrates with an existing, locally running Claude Code session and is intended only for in-session developer assistance by the same user, including tasks such as file analysis, documentation, indexing, and session summarization.

It is not supported for unattended background processing, cloud deployments, or multi-user scenarios. For automated, long-running, or cloud workloads, use the local or openrouter backends instead.

{
  "retention": {
    "enabled": true,
    "decay_rate": 0.01,
    "access_weight": 0.1,
    "keep_threshold": 0.3,
    "delete_threshold": 0.15
  }
}

Hybrid Search

Combines semantic embeddings with BM25 keyword search. Code search includes AST symbol boost, regex post-filtering, and symbol type filtering (function, method, class, interface, type_alias). Three output modes: full (code blocks), lean (file+lines), signatures (symbol names only).

| Aspect | Documents | Code | |--------|-----------|------| | Tokenizer | Markdown-aware + stemming | Naming convention splitter + AST symbol boost | | Stemming | Yes | No | | Stop words | Filtered | Kept | | Segmentation | Standard | Ronin + BPE | | Symbol metadata | N/A | function, class, interface names via tree-sitter |

Code tokenizer handles all naming conventions:

| Convention | Example | Tokens | |------------|---------|--------| | camelCase | getUserName | get, user, name | | PascalCase | UserService | user, service | | snake_case | get_user_name | get, user, name | | SCREAMING_SNAKE | MAX_RETRY_COUNT | max, retry, count |

Memory System

Local memory — stored in .succ/succ.db, project-specific.

Global memory — stored in ~/.succ/global.db, shared across projects.

succ remember "User prefers TypeScript" --global
succ memories --global

Architecture

your-project/
├── .claude/
│   └── settings.json      # Claude Code hooks config
└── .succ/
    ├── brain/             # Obsidian-compatible vault
    ├── hooks/             # Hook scripts
    ├── config.json        # Project configuration
    ├── soul.md            # AI personality
    └── succ.db            # Vector database

~/.succ/
├── global.db              # Global memories
└── config.json            # Global configuration

Documentation

Configuration Reference — All config options with examples
PRD Pipeline — Generate, execute, and verify tasks with quality gates
Storage Backends — SQLite, PostgreSQL, Qdrant setup and benchmarks
Benchmarks — Performance and accuracy metrics
Temporal Awareness — Time decay, validity periods
Ollama Setup — Recommended local LLM setup
llama.cpp GPU — GPU-accelerated embeddings
MCP Integration — Claude Code tools and resources
Security Hardening — Injection detection, IFC, guardrails, content sanitization
Troubleshooting — Common issues and fixes
Development — Contributing and testing

License

FSL-1.1-Apache-2.0 — Free to use, modify, self-host. Commercial cloud hosting restricted until Apache 2.0 date.