polydup
v0.5.5
Published
Cross-language duplicate code detector - Node.js bindings
Maintainers
Readme
@polydup/core
Node.js bindings for PolyDup - a cross-language duplicate code detector powered by Rust.
Features
- 🚀 Fast: Built in Rust with Tree-sitter for efficient parsing
- 🔄 Multi-language: Supports Rust, Python, JavaScript/TypeScript
- ⚡ Non-blocking: Async API runs on background threads
- Type-2 clones: Detects structurally similar code with different variable names
- Detailed reports: Statistics and similarity scores
Installation
npm install @polydup/coreQuick Start
const { findDuplicates } = require('@polydup/core');
findDuplicates(['./src', './lib'], 50, 0.85)
.then(report => {
console.log(`Found ${report.duplicates.length} duplicates`);
console.log(`Scanned ${report.filesScanned} files in ${report.stats.durationMs}ms`);
report.duplicates.forEach(dup => {
console.log(`${dup.file1} ↔️ ${dup.file2} (${(dup.similarity * 100).toFixed(1)}%)`);
});
})
.catch(err => console.error('Scan failed:', err));API
findDuplicates(paths, minBlockSize?, threshold?)
Asynchronously scans for duplicate code (recommended).
Parameters:
paths: string[]- File or directory paths to scanminBlockSize?: number- Minimum code block size in tokens (default: 50)threshold?: number- Similarity threshold 0.0-1.0 (default: 0.85)
Returns: Promise<Report>
Example:
const report = await findDuplicates(['./src'], 30, 0.9);findDuplicatesSync(paths, minBlockSize?, threshold?)
Synchronously scans for duplicate code (blocks event loop - use sparingly).
Parameters: Same as findDuplicates
Returns: Report
Example:
const report = findDuplicatesSync(['./src'], 50, 0.85);version()
Returns the library version string.
Returns: string
TypeScript
Type definitions are automatically generated:
import { findDuplicates, Report, DuplicateMatch } from '@polydup/core';
const report: Report = await findDuplicates(['./src']);
report.duplicates.forEach((dup: DuplicateMatch) => {
console.log(`${dup.file1} ↔️ ${dup.file2}`);
});Report Structure
interface Report {
filesScanned: number;
functionsAnalyzed: number;
duplicates: DuplicateMatch[];
stats: ScanStats;
}
interface DuplicateMatch {
file1: string;
file2: string;
startLine1: number;
startLine2: number;
length: number; // Block size in tokens
similarity: number; // 0.0 - 1.0
hash: string; // Hash signature
}
interface ScanStats {
totalLines: number;
totalTokens: number;
uniqueHashes: number;
durationMs: number;
}Performance Tips
- Use async API: Always prefer
findDuplicates()overfindDuplicatesSync()to avoid blocking - Adjust window size: Smaller
minBlockSizefinds more matches but may include false positives - Filter results: Apply post-processing to filter duplicates by file patterns or directories
- Parallel scans: Use Promise.all for multiple independent scans
Example: Custom Analysis
const { findDuplicates } = require('@polydup/core');
async function analyzeCrossProject() {
const [frontend, backend] = await Promise.all([
findDuplicates(['./frontend/src'], 40, 0.9),
findDuplicates(['./backend/src'], 40, 0.9),
]);
console.log('Frontend duplicates:', frontend.duplicates.length);
console.log('Backend duplicates:', backend.duplicates.length);
// Find cross-project duplicates
const allPaths = ['./frontend', './backend'];
const crossProject = await findDuplicates(allPaths, 50, 0.95);
const crossDuplicates = crossProject.duplicates.filter(d =>
d.file1.includes('frontend') && d.file2.includes('backend')
);
console.log('Cross-project duplicates:', crossDuplicates.length);
}
analyzeCrossProject();Building from Source
cd crates/polydup-node
npm install
npm run build
npm testGenerating Type Definitions
Type definitions are auto-generated during build:
npm run typegenThis creates index.d.ts with TypeScript definitions for all exported functions.
Supported Platforms
- macOS (Intel & Apple Silicon)
- Linux (x64 & ARM64)
- Windows (x64)
License
MIT
Repository
https://github.com/wiesnerbernard/polydup
