@someshtalligeridev/pika-cli
v1.0.6
Published
PIKA - Pattern Inspection & Knowledge Analyzer. CLI for code-similarity detection, plagiarism checking, and duplicate analysis using Rabin-Karp rolling hash with a native C engine.
Maintainers
Readme
🚀 Installation
npm install -g pika-cliRequirements:
- Node.js ≥ 18
- C compiler (
cc/gcc/clang) in PATH — auto-compiles on first run
✨ Features
| Feature | Description | |---------|-------------| | 🔍 Code Similarity Scan | Detect duplicate code blocks across entire projects | | 🕵️ Plagiarism Detection | Check a file against a corpus with visual verdict | | 🐙 GitHub Repo Scanner | Clone and analyze any public repository | | 📁 Duplicate File Finder | Find same-named files/images across directories | | 👁️ Live Watch Mode | Re-scan automatically on file changes | | 📊 Side-by-Side Diff | Visual comparison of two files | | 🐚 Interactive Shell | Persistent REPL with tab completion and history | | ⚡ C Engine Backend | 78-line Rabin-Karp in C, auto-compiled on first run | | 📈 Visual Analytics | Heatmaps, risk meters, complexity panels |
📦 Quick Start
# Launch interactive shell
pika
# Scan a project
pika scan ./src
# Compare two files
pika compare file1.js file2.js
# Check for plagiarism
pika scan ./homework.js
# then inside shell:
plagiarism ./homework.js ./originals/
# Scan a GitHub repo
# inside shell:
github user/repo🛠️ Commands
| Command | Description |
|---------|-------------|
| scan <path> | Recursive duplicate code detection |
| compare <a> <b> | Side-by-side diff + similarity score |
| plagiarism <file> <corpus> | Plagiarism check with visual verdict |
| github <url> | Clone & scan a GitHub repository |
| duplicates <path> | Find duplicate file/image names |
| watch <path> | Live re-scan on file changes |
| paste | Paste code → press Enter on empty line → analyze |
| report [--format json\|txt\|html] | Export results to disk |
| status | Current session stats |
| help | Show all commands |
🧠 Algorithm
PIKA uses the Rabin-Karp rolling hash algorithm implemented in C (78 lines):
hash(chunk) = Σ char[i] · 257^(n-i-1) mod 10^9+7Pipeline
① File Collection O(d) Walk directory tree
② Normalization O(n) Strip comments (language-aware)
③ Chunking O(n) 5-line sliding windows
④ Hashing (C) O(n) Polynomial rolling hash
⑤ Collision Detect O(k²) Hash table lookup + strcmp verify
⑥ Pair Scoring O(p) similarity = shared/min(chunks) × 100Complexity
| | Average | Worst |
|---|---|---|
| Time | O(n × m) | O(n × m) |
| Space | O(n) | O(n) |
n = total lines across all files, m = number of files
📊 Visual Output
PIKA provides rich terminal visuals:
- ⚠️ Risk Meter — LOW / MEDIUM / HIGH / CRITICAL gauge
- ▓ Heatmap — Color-coded duplicate density per file
- ① Pipeline — Step-by-step algorithm complexity breakdown
- █ Similarity Bars — Visual percentage indicators
- ✓/✗ Verdict — ORIGINAL / SUSPICIOUS / PLAGIARIZED
🔌 API
Use PIKA programmatically:
import { analyze, compareTwo } from 'pika-cli/similarity';
import { collectFiles, readFilesContent } from 'pika-cli/scanner';
// Scan a directory
const paths = await collectFiles('./src');
const files = await readFilesContent(paths);
const result = analyze(files, { chunkSize: 5, minSimilarity: 10 });
console.log(result.summary);
// { totalFiles: 25, totalChunks: 2884, duplicateChunks: 36, pairCount: 20, ... }
// Compare two files
const pair = compareTwo('a.js', sourceA, 'b.js', sourceB);
console.log(pair.similarity); // 42.5⚙️ Configuration
Create .pikaignore in your project root (same syntax as .gitignore):
dist/
*.min.js
vendor/
node_modules/CLI Flags
pika scan ./src --threshold 20 # Only show pairs ≥ 20% similar
pika scan ./src --chunk 3 # Use 3-line windows (more granular)
pika scan ./src --top 5 # Show top 5 pairs only🔒 Security
| Protection | Implementation |
|---|---|
| Command Injection | spawnSync with array args (no shell) |
| Path Traversal | Root boundary check + symlink rejection |
| URL Validation | Regex-validated before git clone |
| Resource Limits | 120s timeout, 64MB buffer cap |
| Dependency Safety | No eval, no prototype pollution vectors |
🏗️ Architecture
pika-cli/
├── src/
│ ├── algorithms/
│ │ ├── pika_rk.c # C Rabin-Karp engine (78 lines)
│ │ └── rabinKarp.js # JS wrapper (compile + spawn)
│ ├── commands/
│ │ ├── scan.js # Directory scanning
│ │ ├── compare.js # File comparison
│ │ ├── plagiarism.js # Plagiarism detection
│ │ ├── github.js # GitHub repo scanner
│ │ ├── duplicates.js # Duplicate filename finder
│ │ ├── watch.js # Live file watcher
│ │ ├── paste.js # Inline paste analysis
│ │ └── report.js # Export results
│ ├── core/
│ │ ├── similarity.js # Scoring engine
│ │ ├── scanner.js # File system walker
│ │ └── session.js # Session state
│ ├── ui/
│ │ ├── header.js # ASCII art + image
│ │ ├── renderer.js # Visual components
│ │ ├── statusBar.js # Live status
│ │ ├── banner.js # Welcome screen
│ │ └── diffView.js # Side-by-side diff
│ ├── utils/
│ │ ├── langDetect.js # Language-aware normalization
│ │ ├── fileFilter.js # Ignore rules
│ │ ├── logger.js # Colored logging
│ │ └── timer.js # Performance timing
│ ├── shell/
│ │ └── interactiveShell.js # REPL with history
│ └── index.js # Entry point
├── assets/
│ └── pika.png # Mascot image
├── tests/
│ ├── run-tests.mjs # 10 test cases
│ └── fixtures/ # Test data
├── .github/workflows/ci.yml # CI/CD pipeline
└── package.json🧪 Testing
# Run all 10 test cases
npm test
# Test cases cover:
# TC-01: Normal — shared code blocks
# TC-02: Normal — unrelated files
# TC-03: Edge — empty corpus
# TC-04: Edge — single file
# TC-05: Edge — identical files
# TC-06: Edge — internal duplicates
# TC-07: Extreme — 1000 files × 50 lines
# TC-08: Extreme — 100 identical files
# TC-09: Extreme — 100 unique files
# TC-10: Extreme — Unicode + comment stripping🤝 Contributing
git clone https://github.com/SomeshTalligeriDEV/pika-cli--daa.git
cd pika-cli--daa
npm install
npm test
node src/index.jsPRs welcome! Please ensure all 10 tests pass before submitting.
📄 License
MIT © Somesh S Talligeri
