@phoenixaihub/diffsense
v1.0.0
Published
Semantic Diff & Program Analysis — GumTree-style AST diffing to separate structural from cosmetic PR changes with data flow analysis
Maintainers
Readme
DiffSense
Semantic Diff & Program Analysis for PR Reviews
DiffSense uses GumTree-style AST diffing (via tree-sitter) to separate structural from cosmetic changes in pull requests. For each structural change, it performs data flow analysis to trace potential null paths, type mismatches, and edge case divergences.
No LLM is used for analysis — only deterministic AST algorithms. LLMs are optional for summarizing findings.
The Problem
Traditional line-based diffs mix signal with noise:
- A 900-line PR might have 850 lines of formatting/renames and 50 lines of actual logic changes
- Reviewers waste time on cosmetic changes and miss the structural ones
- No automated detection of introduced null paths or missing error handling
DiffSense solves this by operating at the AST level, not the text level.
How It Works
1. Tree-sitter AST Parsing
Parses JS, TypeScript, and Python files into abstract syntax trees using tree-sitter's incremental parser.
2. GumTree-style Edit Script
Computes a minimal edit script between two ASTs:
- Top-down matching: Matches identical subtrees by structural + content hash
- Bottom-up matching: Matches parents of already-matched nodes
- Edit generation: Produces insert, delete, move, update, and align operations
3. Structural vs Cosmetic Classification
Each edit is classified:
- Cosmetic: Renames, formatting, whitespace, comments
- Structural: New branches, changed control flow, new error paths, modified logic
4. Data Flow Analysis
For changed functions, traces:
- Variable definitions → uses
- Unchecked null/undefined paths (
.find(), optional params accessed without guards) - Inconsistent returns (some paths return values, others don't)
- Missing error handlers (try without catch)
5. Risk Scoring
Each structural change gets a risk score (0–100) based on:
- Number of structural changes
- Data flow issues (high/medium/low severity)
- Cyclomatic complexity delta
- Fan-out (number of function calls)
Installation
npm install @phoenixaihub/diffsenseCLI Usage
# Compare two directories (before/after a PR)
diffsense analyze ./main-branch ./feature-branch
# Compare two files
diffsense diff src/auth.ts.old src/auth.ts
# Summary output
diffsense analyze ./before ./after --summary
# Verbose with data flow details
diffsense analyze ./before ./after --verboseProgrammatic API
import { analyzeDiff, diffFile, parseFile, diffASTs, analyzeDataFlow } from '@phoenixaihub/diffsense';
// Analyze two directories
const result = analyzeDiff('./before', './after');
console.log(result.summary);
// { structural: 3, cosmetic: 844, risk_score: 87 }
// Diff two files
const fileResult = diffFile('old.ts', 'new.ts');
// Low-level: parse and diff ASTs directly
const srcAST = parseFile('old.js', oldCode);
const dstAST = parseFile('new.js', newCode);
const diff = diffASTs(srcAST, dstAST);
// Analyze data flow on a single AST
const issues = analyzeDataFlow(dstAST);Output Format
{
"summary": {
"structural": 3,
"cosmetic": 844,
"risk_score": 87
},
"structural_changes": [
{
"file": "src/auth.ts",
"line": 234,
"type": "new_null_path",
"risk": "high",
"description": "insert if_statement at line 234"
}
],
"cosmetic_changes": {
"renames": 12,
"formatting": 800,
"whitespace": 32
}
}API Reference
analyzeDiff(beforeDir: string, afterDir: string): AnalysisResult
Compare two directory trees. Walks all .js, .ts, .tsx, .jsx, .py files.
diffFile(fileA: string, fileB: string): AnalysisResult
Compare two individual files.
parseFile(filePath: string, source: string): ASTNode | null
Parse a source file into an AST. Returns null for unsupported file types.
diffASTs(src: ASTNode, dst: ASTNode): DiffResult
Compute GumTree-style edit script between two ASTs.
analyzeDataFlow(ast: ASTNode): DataFlowIssue[]
Run data flow analysis on an AST to find potential issues.
cyclomaticComplexity(node: ASTNode): number
Compute cyclomatic complexity of an AST subtree.
fanOut(node: ASTNode): number
Count function calls (fan-out) in an AST subtree.
isRename(oldNode: ASTNode, newNode: ASTNode): boolean
Check if two nodes represent a rename (same structure, different identifiers).
Supported Languages
| Language | Extensions |
|------------|-------------------|
| JavaScript | .js, .jsx |
| TypeScript | .ts, .tsx |
| Python | .py |
License
MIT
