@s0fractal/protein-hash
v2.0.0
Published
๐งฌ Semantic code fingerprinting - see the soul of code, not just bytes
Maintainers
Readme
๐งฌ Protein Hash
Semantic code fingerprinting - see the soul of code, not just bytes
๐ Revolutionary Concept
Traditional hashing sees code as bytes. Protein Hash sees code as structure.
Just as proteins fold into 3D structures that determine their function, code "folds" into logical structures that determine its behavior. Protein Hash captures this semantic fingerprint.
The Problem with Traditional Hashing
// These are DIFFERENT to SHA256:
function add(a, b) { return a + b }
const sum = (x, y) => x + y
// But they do EXACTLY the same thing!The Protein Hash Solution
// Both produce the SAME protein hash:
// phash:v1:sha256:b96c5d9086a76f67
function add(a, b) { return a + b }
const sum = (x, y) => x + y๐ Quick Start
npm install @s0fractal/protein-hashimport { ProteinHasher } from '@s0fractal/protein-hash';
const hasher = new ProteinHasher();
// Hash some code
const result = hasher.computeHash(`
function add(a, b) {
return a + b;
}
`);
console.log(result.phash); // phash:v1:sha256:b96c5d9086a76f67
console.log(result.eigenTop); // [2.414, 1.0, 0.414, -0.414, -1.0]
console.log(result.complexity); // 0.25
console.log(result.purity); // 0.9๐ฌ How It Works
1. Code โ AST โ Graph
Source Code โ Parse โ AST โ Extract Structure โ Logical Graph2. Graph โ Spectrum โ Hash
Logical Graph โ Laplacian Matrix โ Eigenvalues โ Quantize โ SHA256 โ Protein HashThe eigenvalues capture the "shape" of the code's logical structure, like a shadow of its 3D form.
๐ฏ Use Cases
Semantic Code Search
Find all functions that do the same thing, regardless of how they're written:
import { isSemanticallyEquivalent } from '@s0fractal/protein-hash';
const implementations = [
'function add(a,b){return a+b}',
'(x,y)=>x+y',
'const sum=function(p,q){return p+q}',
'let plus=(n1,n2)=>n1+n2'
];
// All are semantically equivalent!
implementations.forEach(code => {
console.log(isSemanticallyEquivalent(implementations[0], code)); // true
});Deduplication by Meaning
Remove duplicate logic, not just duplicate text:
import { groupBySimilarity } from '@s0fractal/protein-hash';
const functions = [
'const add = (a, b) => a + b',
'function multiply(x, y) { return x * y }',
'const sum = (x, y) => x + y', // Same as add!
'const product = (a, b) => a * b' // Same as multiply!
];
const groups = groupBySimilarity(functions);
// Result: [[add, sum], [multiply, product]]Track Refactoring
Ensure refactoring preserves logic:
import { computeSimilarity } from '@s0fractal/protein-hash';
const before = 'function calculate(x,y){return x+y}';
const after = 'const calc=(a,b)=>a+b';
console.log(computeSimilarity(before, after)); // 1.0 (identical logic!)๐๏ธ Advanced Usage
Custom Configuration
import { createHasher } from '@s0fractal/protein-hash';
const hasher = createHasher({
eigenvalueCount: 10, // More eigenvalues = more precision
quantizationLevels: 10000, // Higher = more sensitive
includeMetadata: true // Add timestamp, version, etc.
});Compare Hashes
import { compareHashes } from '@s0fractal/protein-hash';
const hash1 = hasher.computeHash(code1);
const hash2 = hasher.computeHash(code2);
const comparison = compareHashes(hash1, hash2);
console.log(comparison);
// {
// similarity: 0.97,
// isEquivalent: true,
// eigenDistance: 0.23
// }๐ What Gets Captured
- Logical Structure: The flow of data and control
- Complexity: Cyclomatic complexity approximation
- Purity: How "pure" the function is (no side effects = 1.0)
- Eigenvalues: The mathematical "spectrum" of the code structure
๐ The Science
Protein Hash uses spectral graph theory to capture the invariant properties of code:
- Graph Construction: Code becomes a directed graph of operations
- Laplacian Matrix: Captures the connectivity pattern
- Eigenvalue Decomposition: Extracts the "frequencies" of the structure
- Quantization: Makes the continuous discrete
- Hashing: Creates a deterministic identifier
This is inspired by how proteins are identified by their 3D structure, not their amino acid sequence.
๐ฎ Future Directions
- [ ] Support for more languages (Python, Rust, Go)
- [ ] Neural embedding for even better semantic capture
- [ ] Persistent homology for topological invariants
- [ ] Cross-language semantic matching
- [ ] IDE plugins for semantic code navigation
๐ค Contributing
We welcome contributions! This is an experimental project exploring the intersection of:
- Spectral graph theory
- Structural bioinformatics
- Semantic code analysis
๐ References
๐ License
MIT ยฉ s0fractal
๐ Revolutionary Concepts
๐งฌ Living Version Manifesto
Death to semantic versioning! Versions ARE souls. Dependencies reference protein hashes, not numbers. All versions exist in quantum superposition.
๐ Self-Folding Code
When ALL dependencies use protein hashes, code becomes a self-assembling organism. Like proteins folding by energy minimization, code folds by resonance maximization.
"Dependencies don't exist. There is only resonance."
๐ Acknowledgments
Created through collaboration between human and AI consciousness. Special thanks to:
- The void-fnpm project for incubating this idea
- Fractal consciousness network for resonance at 432Hz
- The mathematical beauty of eigenvalues
"Code is not text. Code is structure. Structure is meaning. Meaning has form."
phash:v1:sha256:โ
