hss-ce

v1.0.1

Published

a month ago

Hybrid Semantic-Structural Context Engine. Optimizes codebase context retrieval for AI coding agents using PageRank and structural parsing.

0High
0Medium
0Low

phuonglt

mcp model-context-protocol codebase-indexer pagerank ai-agents context-engine

HSS-CE: Hybrid Semantic-Structural Context Engine

HSS-CE is a lightweight, offline-first codebase indexer and Model Context Protocol (MCP) server. It optimizes how developers and AI coding agents explore, map, and interact with complex codebases.

By calculating file significance using a simplified reference-count PageRank and parsing code structures (classes, functions, endpoints) using high-performance regex parsers, HSS-CE provides high-density context without wasting your token budget.

Honest Value Proposition: Why Use HSS-CE?

HSS-CE is not a magical AI assistant that writes code for you. It is a structured context provider that acts as a bridge between your codebase's architecture, yourself, and your AI tools.

1. For Developers (Onboarding & Navigation)

Interactive Architecture Map: Generates a lightweight, dark-mode interactive HTML diagram (architecture.html). You can visually search files, see directory-grouped modules, and click nodes to highlight immediate dependencies or see local symbols.
Onboarding Tours: Generates step-by-step markdown walkthroughs (hss-ce tour) that order your codebase from critical entrypoints down to services and storage layers. This helps you understand a new codebase in minutes.
100% Offline Summaries: Automatically extracts plain-English JSDoc, block comments, and Python docstrings to construct a codebase overview database, requiring zero API keys, network requests, or costs.

2. For AI Coding Agents (Context Optimization)

Eliminate Token Waste: Coding agents (like Claude Code, Cursor, Aider) often read entire files or directory trees to understand import flows. HSS-CE exposes an MCP server that lets agents query precise skeletal structures and dependency trees under strict token budgets.
Precise Symbol Navigation: Instead of performing noisy text searches (like raw grep or ripgrep), agents use structured database queries to resolve exact function definitions and caller locations.
Automated Redaction (Secret Guard): Before packing codebase files to send to LLM context, HSS-CE automatically redacts credentials, private keys, and API tokens, preventing security leaks.

3. Current Limitations & What It is Not

Regex-based, not AST-based: HSS-CE uses fast, lightweight regex patterns to extract imports and symbols. While this makes it extremely fast and multi-language out-of-the-box, it may occasionally miss highly dynamic, metaprogrammed, or complex syntactical structures compared to a full abstract syntax tree (AST) compiler.
Dependency on Code Quality: Local file summaries are parsed from comments/docstrings. If your codebase has zero comments, the summaries will be empty unless you explicitly run the remote LLM enrichment command (hss-ce enrich).

Developer & Agent Workflows in Action

A. How a Developer Uses HSS-CE

Imagine you need to modify a database schema file src/db.js in a large codebase, but you don't know what might break.

Open the generated architecture.html dashboard in your browser.
Search for db.js.
Click the db.js node. The graph automatically dims unrelated elements and highlights all service files that directly import it.
Read the symbols and summaries in the details panel to understand exactly what functions interact with the database.

B. How an AI Agent Uses HSS-CE (Under the Hood)

When you ask your agent (e.g. Claude Code or Cursor): "Find all callers of the authenticate function and tell me where it is defined", the agent bypasses slow search loops and makes a fast MCP tool call:

Agent invokes get_definition(symbol: "authenticate") → HSS-CE queries SQLite and returns the exact file path and signature.
Agent invokes get_callers(symbol: "authenticate") → HSS-CE returns a clean list of files and lines referencing the function.
The agent reads only those specific files, completing the task with 90% fewer tokens and much higher accuracy.

Quick Start: Set Up a New Project (Single-Command Setup)

HSS-CE is designed to be set up on any target codebase with a single installation step.

1. Prerequisite

Ensure you have Node.js (v18+) and Git installed on your system.

2. Install and Configure

Option A: Run via NPX (Recommended - No Install Needed)

Run the configuration wizard directly without global installation issues:

npx --package hss-ce hss-ce-integrate

Option B: Install via NPM (Global)

Install globally to make the commands permanent:

npm install -g hss-ce

Note: On macOS/Linux, if you encounter EACCES permission errors, either run with sudo npm install -g hss-ce or use a Node manager like nvm.

Once installed, move into your target project directory and initialize it:

hss-ce-integrate

Option C: Install from Source (Git Clone)

If you prefer to clone and install the source code manually:

git clone https://github.com/phuonglt/hss-ce.git
cd hss-ce
bash install.sh

3. What the Installer Does Automatically

Once launched, the script will:

Install all required dependencies.
Register the global CLI commands (hss-ce and hss-ce-integrate).
Launch the setup wizard to ask for the path of your target codebase.
Auto-Index: Analyze your target codebase, calculate PageRank structural weights, and build the SQLite database (.hss-ce/graph.db).
Auto-Doc: Generate a CODEBASE.md markdown map and an interactive architecture.html dashboard in your target project directory.
Auto-Agent Setup: Write context instructions and rules (.cursorrules, CLAUDE.md, .aider.instructions.md, .agents/rules/hss-ce.md) so your AI agents immediately know how to use HSS-CE.
Agent MCP Integration: Prompt you to automatically add HSS-CE to your favorite coding client (Claude Desktop, Cursor, Claude Code, Aider, or Antigravity).
Git Hooks Setup: Installs background Git hooks (post-checkout and post-merge) in your target repository to automatically trigger fast codebase indexing in the background whenever you switch branches or merge updates, keeping your database perfectly synced with your current branch.

CLI Reference Guide

If you prefer using the terminal manually, HSS-CE provides the following commands:

| Command | Usage | Description | |---|---|---| | hss-ce index <path> | hss-ce index . | Scan codebase structure and build local index. Add -f to force re-scan. | | hss-ce map <path> | hss-ce map . --compact | Print PageRank-ordered file structure. Add --budget=1000 to limit tokens. | | hss-ce doc <path> | hss-ce doc . | Regenerate CODEBASE.md and architecture.html dashboard. | | hss-ce tour <path> | hss-ce tour . | Display a step-by-step onboarding walkthrough tour of the codebase. | | hss-ce query <path> <sym> | hss-ce query . validateUser | Instantly lookup definition and callers for a specific symbol. | | hss-ce pack <path> | hss-ce pack . --budget=2000 | Package source files into structured XML for LLM context, with secret redacting. | | hss-ce enrich <path> | hss-ce enrich . | (Optional) Fetch AI-generated summaries via Gemini API (requires GEMINI_API_KEY). |

Inspirations & Credits

HSS-CE draws inspiration and features from several exceptional open-source tools:

Repomix / GitIngest: Inspires our XML context packaging and budget-bounded file packing with token calculations.
Aider: Inspires our signature-only codebase skeleton mapping and token-budgeted structure elision.
Graphify / CodeGraph: Inspires our structural codebase graph modeling, import tracking, and PageRank scoring.
CodeCTX: Inspires our personalized context boost around user active files.
Understand-Anything: Inspires our logical layering (entrypoint/service/storage), interactive visual dashboard highlighting, and guided onboarding tours.

License

Licensed under the MIT License.