@someshtalligeridev/pika-cli

v1.0.6

Published

2 months ago

PIKA - Pattern Inspection & Knowledge Analyzer. CLI for code-similarity detection, plagiarism checking, and duplicate analysis using Rabin-Karp rolling hash with a native C engine.

0High
0Medium
0Low

someshtalligeridev

cli pika code-similarity duplicate-detection plagiarism rabin-karp rolling-hash static-analysis code-quality pattern-detection interactive-shell github-scanner developer-tools

🚀 Installation

npm install -g pika-cli

Requirements:

Node.js ≥ 18
C compiler (cc/gcc/clang) in PATH — auto-compiles on first run

✨ Features

| Feature | Description | |---------|-------------| | 🔍 Code Similarity Scan | Detect duplicate code blocks across entire projects | | 🕵️ Plagiarism Detection | Check a file against a corpus with visual verdict | | 🐙 GitHub Repo Scanner | Clone and analyze any public repository | | 📁 Duplicate File Finder | Find same-named files/images across directories | | 👁️ Live Watch Mode | Re-scan automatically on file changes | | 📊 Side-by-Side Diff | Visual comparison of two files | | 🐚 Interactive Shell | Persistent REPL with tab completion and history | | ⚡ C Engine Backend | 78-line Rabin-Karp in C, auto-compiled on first run | | 📈 Visual Analytics | Heatmaps, risk meters, complexity panels |

📦 Quick Start

# Launch interactive shell
pika

# Scan a project
pika scan ./src

# Compare two files
pika compare file1.js file2.js

# Check for plagiarism
pika scan ./homework.js
# then inside shell:
plagiarism ./homework.js ./originals/

# Scan a GitHub repo
# inside shell:
github user/repo

🛠️ Commands

| Command | Description | |---------|-------------| | scan <path> | Recursive duplicate code detection | | compare <a> <b> | Side-by-side diff + similarity score | | plagiarism <file> <corpus> | Plagiarism check with visual verdict | | github <url> | Clone & scan a GitHub repository | | duplicates <path> | Find duplicate file/image names | | watch <path> | Live re-scan on file changes | | paste | Paste code → press Enter on empty line → analyze | | report [--format json\|txt\|html] | Export results to disk | | status | Current session stats | | help | Show all commands |

🧠 Algorithm

PIKA uses the Rabin-Karp rolling hash algorithm implemented in C (78 lines):

hash(chunk) = Σ char[i] · 257^(n-i-1) mod 10^9+7

Pipeline

① File Collection    O(d)    Walk directory tree
② Normalization      O(n)    Strip comments (language-aware)
③ Chunking           O(n)    5-line sliding windows
④ Hashing (C)        O(n)    Polynomial rolling hash
⑤ Collision Detect   O(k²)   Hash table lookup + strcmp verify
⑥ Pair Scoring       O(p)    similarity = shared/min(chunks) × 100

Complexity

| | Average | Worst | |---|---|---| | Time | O(n × m) | O(n × m) | | Space | O(n) | O(n) |

n = total lines across all files, m = number of files

📊 Visual Output

PIKA provides rich terminal visuals:

⚠️ Risk Meter — LOW / MEDIUM / HIGH / CRITICAL gauge
▓ Heatmap — Color-coded duplicate density per file
① Pipeline — Step-by-step algorithm complexity breakdown
█ Similarity Bars — Visual percentage indicators
✓/✗ Verdict — ORIGINAL / SUSPICIOUS / PLAGIARIZED

🔌 API

Use PIKA programmatically:

import { analyze, compareTwo } from 'pika-cli/similarity';
import { collectFiles, readFilesContent } from 'pika-cli/scanner';

// Scan a directory
const paths = await collectFiles('./src');
const files = await readFilesContent(paths);
const result = analyze(files, { chunkSize: 5, minSimilarity: 10 });

console.log(result.summary);
// { totalFiles: 25, totalChunks: 2884, duplicateChunks: 36, pairCount: 20, ... }

// Compare two files
const pair = compareTwo('a.js', sourceA, 'b.js', sourceB);
console.log(pair.similarity); // 42.5

⚙️ Configuration

Create .pikaignore in your project root (same syntax as .gitignore):

dist/
*.min.js
vendor/
node_modules/

CLI Flags

pika scan ./src --threshold 20    # Only show pairs ≥ 20% similar
pika scan ./src --chunk 3         # Use 3-line windows (more granular)
pika scan ./src --top 5           # Show top 5 pairs only

🔒 Security

| Protection | Implementation | |---|---| | Command Injection | spawnSync with array args (no shell) | | Path Traversal | Root boundary check + symlink rejection | | URL Validation | Regex-validated before git clone | | Resource Limits | 120s timeout, 64MB buffer cap | | Dependency Safety | No eval, no prototype pollution vectors |

🏗️ Architecture

pika-cli/
├── src/
│   ├── algorithms/
│   │   ├── pika_rk.c          # C Rabin-Karp engine (78 lines)
│   │   └── rabinKarp.js       # JS wrapper (compile + spawn)
│   ├── commands/
│   │   ├── scan.js            # Directory scanning
│   │   ├── compare.js         # File comparison
│   │   ├── plagiarism.js      # Plagiarism detection
│   │   ├── github.js          # GitHub repo scanner
│   │   ├── duplicates.js      # Duplicate filename finder
│   │   ├── watch.js           # Live file watcher
│   │   ├── paste.js           # Inline paste analysis
│   │   └── report.js          # Export results
│   ├── core/
│   │   ├── similarity.js      # Scoring engine
│   │   ├── scanner.js         # File system walker
│   │   └── session.js         # Session state
│   ├── ui/
│   │   ├── header.js          # ASCII art + image
│   │   ├── renderer.js        # Visual components
│   │   ├── statusBar.js       # Live status
│   │   ├── banner.js          # Welcome screen
│   │   └── diffView.js        # Side-by-side diff
│   ├── utils/
│   │   ├── langDetect.js      # Language-aware normalization
│   │   ├── fileFilter.js      # Ignore rules
│   │   ├── logger.js          # Colored logging
│   │   └── timer.js           # Performance timing
│   ├── shell/
│   │   └── interactiveShell.js # REPL with history
│   └── index.js               # Entry point
├── assets/
│   └── pika.png               # Mascot image
├── tests/
│   ├── run-tests.mjs          # 10 test cases
│   └── fixtures/              # Test data
├── .github/workflows/ci.yml   # CI/CD pipeline
└── package.json

🧪 Testing

# Run all 10 test cases
npm test

# Test cases cover:
# TC-01: Normal — shared code blocks
# TC-02: Normal — unrelated files
# TC-03: Edge — empty corpus
# TC-04: Edge — single file
# TC-05: Edge — identical files
# TC-06: Edge — internal duplicates
# TC-07: Extreme — 1000 files × 50 lines
# TC-08: Extreme — 100 identical files
# TC-09: Extreme — 100 unique files
# TC-10: Extreme — Unicode + comment stripping

🤝 Contributing

git clone https://github.com/SomeshTalligeriDEV/pika-cli--daa.git
cd pika-cli--daa
npm install
npm test
node src/index.js

PRs welcome! Please ensure all 10 tests pass before submitting.