jscpd
v4.2.2
Published
detector of copy/paste in files
Readme
jscpd
Copy/paste detector for programming source code, supports 223 formats. AI-ready with AI skills, MCP server and token-efficient reporter.
Copy/paste is a common technical debt on a lot of projects. The jscpd gives the ability to find duplicated blocks implemented on more than 223 programming languages and digital formats of documents. The jscpd tool implements Rabin-Karp algorithm for searching duplications.
Table of content
- What's New
- Features
- Getting started
- JSCPD Server
- Shebang Detection
- Options
- Config File
- Ignored Blocks
- Reporters
- API
- Changelog
- Who uses jscpd
- Contributors
- Backers
- Sponsors
- License
Features
- Detect duplications in programming source code, use semantic of programing languages, can skip comments, empty lines etc.
- Detect duplications in embedded blocks of code, like
<script>or<style>sections in html - Detect duplications in executable script files without extensions via shebang detection
- Detect duplications in Svelte (
.svelte), Astro (.astro), Vue SFC (.vue), and Markdown — tokenized per-block/per-section with cross-format duplicate detection across file types - Support for Apex, CFML/ColdFusion, and GDScript (Godot)
- Blame authors of duplications
- Generate XML report in pmd-cpd format, JSON report, HTML report
- Token-efficient
aireporter (~79% fewer tokens) for piping to LLM tools - Integrate with CI systems, use thresholds for level of duplications
What's New
v4.2.x
- Custom tokenizer backend — replaced
prismjswith an own backend built on the reprism grammar engine. ~11.5% faster tokenization on real projects (avg 1126ms → 997ms on a 548-file, 223-format scan). - Cross-format detection — Vue SFC (
.vue), Svelte (.svelte), Astro (.astro), and Markdown files are tokenized per-block/per-section, enabling duplicate detection across file types (e.g. a<script>block in.vuematched against.tsfiles). - New formats: Apex, CFML/ColdFusion, GDScript, and 70+ additional formats (223 total, up from 152)
--skipComments: shorthand flag for--mode weak(strip comments before detection)- Shebang detection: auto-detect language for extensionless executable scripts
--store-path: configure LevelDB cache directory for parallel runs--formats-names: map specific filenames (e.g.Makefile,Dockerfile) to a format--noTips: suppress tip output in CI environments
Bug Fixes
- Entire-file duplicates silently dropped — RabinKarp flushed the pending clone on a store hit at end-of-file instead of on a miss, causing files that are complete copies of each other to go undetected. Fixed in
@jscpd/core(#728). - ReDoS hang on Lisp/Elisp files — the Lisp string regex could catastrophically backtrack (O(2ⁿ)) on unterminated strings. Replaced with a linear alternative in
@jscpd/tokenizer(#737). - Process crash on malformed
package.json— invalid JSON inpackage.jsonthrew an unhandledSyntaxErrorthat killed the process. Now emits a warning and continues (#739). - Vue SFC cross-file detection broken — the detector used the file-level format (
vue) as the store namespace for all SFC blocks, preventing cross-file matches. Namespace now reflects each block's resolved sub-format. - Vue SFC incorrect column numbers — tokens on the first line of a block carried block-relative column 1 instead of the file-absolute column.
- 50 dependency security vulnerabilities remediated across the monorepo.
Getting started
Installation
$ npm install -g jscpdUsage
$ npx jscpd /path/to/sourceor
$ jscpd /path/to/codeor
$ jscpd --pattern "src/**/*.js"JSCPD Server
If you need a standalone application that provides an API for detecting code duplication, you can use jscpd-server. It allows you to integrate duplication detection into your services or tools via HTTP API.
Shebang Detection
jscpd can detect duplications in script files that have no file extension, such as shell scripts, Python scripts, or other executables deployed without an extension (e.g. deploy, build, entrypoint).
How it works
When jscpd encounters a file with no recognized extension, it checks two conditions:
- The file has the executable bit set (
chmod +x) - The first line is a shebang (
#!...)
If both conditions are met, jscpd reads the interpreter from the shebang line and maps it to a supported format.
Supported interpreters
| Interpreter | Detected format |
|-------------|-----------------|
| bash, sh, zsh, fish, dash, ksh | shell |
| python, python3, python2 | python |
| node, nodejs | javascript |
| ruby | ruby |
| perl | perl |
| php | php |
| lua | lua |
| tclsh, wish | tcl |
| Rscript | r |
| groovy | groovy |
| swift | swift |
| kotlin | kotlin |
Both direct (#!/usr/bin/bash) and env-mediated (#!/usr/bin/env python3) shebangs are supported. Version suffixes are stripped automatically (python3.11 → python).
Limitations
- Files without the executable bit are not inspected for shebangs and are skipped if they have no recognized extension — same behaviour as before.
- Symlinks are always excluded from shebang detection.
- If your interpreter is not in the table above, use
--formats-namesto map specific filenames to a format.
Options
Pattern
Glob pattern for find files to detect
- Cli options:
--pattern,-p - Type: string
- Default: "**/*"
Example:
$ jscpd --pattern "**/*.js"Min Tokens
Minimal block size of code in tokens. The block of code less than min-tokens will be skipped.
- Cli options:
--min-tokens,-k - Type: number
- Default: 50
This option is called minTokens in the config file.
Min Lines
Minimal block size of code in lines. The block of code less than min-lines will be skipped.
- Cli options:
--min-lines,-l - Type: number
- Default: 5
Max Lines
Maximum file size in lines. The file bigger than max-lines will be skipped.
- Cli options:
--max-lines,-x - Type: number
- Default: 1000
Max Size
Maximum file size in bytes. The file bigger than max-size will be skipped.
- Cli options:
--max-size,-z - Type: string
- Default: 100kb
Threshold
The threshold for duplication level, check if current level of duplications bigger than threshold jscpd exit with error.
- Cli options:
--threshold,-t - Type: number
- Default: null
Config
The path to configuration file. The config should be in json format. Supported options in config file can be the same with cli options.
- Cli options:
--config,-c - Type: path
- Default: null
Ignore
The option with glob patterns to ignore from analyze. For multiple globs you can use comma as separator. Example:
$ jscpd --ignore "**/*.min.js,**/*.map" /path/to/files- Cli options:
--ignore,-i - Type: string
- Default: null
Reporters
The list of reporters. Reporters use for output information of clones and duplication process.
Available reporters:
- console - report about clones to console;
- ai - compact, token-efficient clone list suited for piping to AI tools;
- consoleFull - report about clones to console with blocks of code;
- json - output
jscpd-report.jsonfile with clones report in json format; - xml - output
jscpd-report.xmlfile with clones report in xml format; - csv - output
jscpd-report.csvfile with clones report in csv format; - markdown - output
jscpd-report.mdfile with clones report in markdown format; - html - generate html report to
html/folder; - sarif - generate a report in SARIF format (https://github.com/oasis-tcs/sarif-spec), save it to
jscpd-sarif.jsonfile; - verbose - output a lot of debug information to console;
Note: A reporter can be developed manually, see @jscpd/finder package.
- Cli options:
--reporters,-r - Type: string
- Default: console
Output
The path to directory for reports. JSON and XML reports will be saved there.
- Cli options:
--output,-o - Type: path
- Default: ./report/
Mode
The mode of detection quality.
strict- use all types of symbols as token, skip only blocks marked as ignored.mild- skip blocks marked as ignored and new lines and empty symbols.weak- skip blocks marked as ignored and new lines and empty symbols and comments.
Note: A mode can be developed manually, see API section.
- Cli options:
--mode,-m - Type: string
- Default: mild
Skip Comments
Ignore comments during detection. Shorthand for --mode weak; comments are stripped before the duplicate-detection pass so comment-only blocks are never reported as clones.
If --mode is also provided, --mode takes precedence.
Example:
$ jscpd --skipComments /path/to/source- Cli options:
--skipComments - Type: boolean
- Default: false
Format
The list of formats to detect for duplications. Available 223 formats.
Example:
$ jscpd --format "php,javascript,markup,css" /path/to/files- Cli options:
--format,-f - Type: string
- Default: {all formats}
Blame
Get information about authors and dates of duplicated blocks from git.
- Cli options:
--blame,-b - Type: boolean
- Default: false
Silent
Don't write a lot of information to a console.
Example:
$ jscpd /path/to/source --silent
Duplications detection: Found 60 exact clones with 3414(46.81%) duplicated lines in 100 (31 formats) files.
Execution Time: 1381.759ms- Cli options:
--silent,-s - Type: boolean
- Default: false
Absolute
Use the absolute path in reports.
- Cli options:
--absolute,-a - Type: boolean
- Default: false
Ignore Case
Ignore case of symbols in code (experimental).
- Cli options:
--ignoreCase - Type: boolean
- Default: false
No Symlinks
Do not follow symlinks.
- Cli options:
--noSymlinks,-n - Type: boolean
- Default: false
Skip Local
Use for detect duplications in different folders only. For correct usage of --skipLocal option you should provide list of path's with more than one item.
Example:
jscpd --skipLocal /path/to/folder1/ /path/to/folder2/will detect clones in separate folders only, clones from same folder will be skipped.
- Cli options:
--skipLocal - Type: boolean
- Default: false
Formats Extensions
Define the list of formats with file extensions. Available 223 formats.
In following example jscpd will analyze files *.es and *.es6 as javascript and *.dt files as dart:
$ jscpd --formats-exts javascript:es,es6;dart:dt /path/to/codeNote: formats defined in the option redefine default configuration, you should define all need formats manually or create two configuration for run
jscpd
- Cli options:
--formats-exts - Type: string
- Default: null
Formats Names
Define the list of formats for files matched by exact filename (no extension required). This is independent of --formats-exts and does not affect extension-based detection.
Use this when you have extensionless files that are not covered by shebang detection — for example Makefile, Dockerfile, Jenkinsfile, or any script not starting with #!/.
$ jscpd --formats-names makefile:Makefile,GNUmakefile /path/to/code
$ jscpd --formats-names docker:Dockerfile;makefile:Makefile /path/to/codeThe syntax mirrors --formats-exts: format:name1,name2;format2:name3.
- Cli options:
--formats-names - Type: string
- Default: null
Store
Stores used for collect information about code, by default all information collect in memory.
Available stores:
- leveldb - leveldb store all data to files. The store recommended as store for big repositories. Should install @jscpd/leveldb-store before;
Note: A store can be developed manually, see @jscpd/finder package and @jscpd/leveldb-store as example.
- Cli options:
--store - Type: string
- Default: null
Store Path
The directory used by the store for its cache files. By default, --store leveldb creates a .jscpd/ directory in the current working directory. Use --store-path to override this location.
This is especially useful when running multiple jscpd processes in parallel — give each process a unique path to avoid LevelDB file conflicts:
# Two parallel runs, each with its own isolated cache
jscpd /data/files/1 /data/repo/ --store leveldb --store-path /tmp/jscpd-run1 --reporters json
jscpd /data/files/2 /data/repo/ --store leveldb --store-path /tmp/jscpd-run2 --reporters jsonCan also be set in the config file:
{
"store": "leveldb",
"storePath": "/tmp/my-jscpd-cache"
}- Cli options:
--store-path - Type: string
- Default:
.jscpd(relative to current working directory)
Ignore Pattern
Ignore code blocks matching the regexp patterns.
- Cli options:
--ignore-pattern - Type: string
- Default: null
Example:
$ jscpd /path/to/source --ignore-pattern "import.*from\s*'.*'"Excludes import statements from the calculation.
No Tips
By default, jscpd prints a few tip lines after the timer output:
time: 1.234s
💡 Auto-refactor with AI: npx skills add kucherenko/jscpd --skill dry-refactoring
🎩 New: Gangsta Agents — discipline your AI coding → gangsta.page
💖 Sponsor jscpd → https://opencollective.com/jscpdUse --noTips to suppress these lines (useful in CI environments or when piping output).
$ jscpd --noTips /path/to/sourceTips are also automatically suppressed when --silent is active.
- Cli options:
--noTips - Type: boolean
- Default: false
Config File
Put .jscpd.json file in the root of the projects:
{
"threshold": 0,
"reporters": ["html", "console", "badge"],
"ignore": ["**/__snapshots__/**"],
"absolute": true
}Also you can use section in package.json:
{
...
"jscpd": {
"threshold": 0.1,
"reporters": ["html", "console", "badge"],
"ignore": ["**/__snapshots__/**"],
"absolute": true,
"gitignore": true
}
...
}
Exit code
By default, the tool exits with code 0 even when code duplications were detected. This behaviour can be changed by specifying a custom exit code for error states.
Example:
jscpd --exitCode 1 .- Cli options:
--exitCode - Type: number
- Default: 0
Ignored Blocks
Mark blocks in code as ignored:
/* jscpd:ignore-start */
import lodash from 'lodash';
import React from 'react';
import {User} from './models';
import {UserService} from './services';
/* jscpd:ignore-end */<!--
// jscpd:ignore-start
-->
<meta data-react-helmet="true" name="theme-color" content="#cb3837"/>
<link data-react-helmet="true" rel="stylesheet" href="https://static.npmjs.com/103af5b8a2b3c971cba419755f3a67bc.css"/>
<link data-react-helmet="true" rel="stylesheet" href="https://static.npmjs.com/cms/flatpages.css"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="120x120" href="https://static.npmjs.com/58a19602036db1daee0d7863c94673a4.png"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="144x144" href="https://static.npmjs.com/7a7ffabbd910fc60161bc04f2cee4160.png"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="152x152" href="https://static.npmjs.com/34110fd7686e2c90a487ca98e7336e99.png"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="180x180" href="https://static.npmjs.com/3dc95981de4241b35cd55fe126ab6b2c.png"/>
<link data-react-helmet="true" rel="icon" type="image/png" href="https://static.npmjs.com/b0f1a8318363185cc2ea6a40ac23eeb2.png" sizes="32x32"/>
<!--
// jscpd:ignore-end
-->Reporters
HTML
Badge
More info jscpd-badge-reporter
AI
Compact, token-efficient reporter designed for piping jscpd output into AI tools. Outputs one clone pair per line using common-path-prefix compression, followed by a summary. No code fragments, no colors — clean for piping.
Token savings: ~79% fewer tokens compared to the default console reporter.
Benchmarked on the fixtures/ directory (91 clones across 132 files):
| Reporter | Output size | Estimated tokens |
|----------|-------------|------------------|
| default (console) | ~21,800 chars | ~5,400 |
| ai | ~4,500 chars | ~1,100 |
Example output:
src/utils/ auth.ts:10-25 ~ helpers.ts:40-55
src/utils/auth.ts 30-45 ~ 80-95
src/ utils/auth.ts:10-25 ~ api/routes.ts:5-20
---
23 clones · 4.2% duplicationActivate with: jscpd --reporters ai
To use jscpd with an AI coding assistant, install the agent skills:
jscpd — tool reference skill (all CLI options, AI reporter format, config file syntax):
npx skills add kucherenko/jscpd --skill jscpddry-refactoring — guided refactoring workflow (read clones, choose strategy, apply refactor, verify):
npx skills add kucherenko/jscpd --skill dry-refactoringPMD CPD XML
<?xml version="1.0" encoding="utf-8"?>
<pmd-cpd>
<duplication lines="10">
<file path="/path/to/file" line="1">
<codefragment><![CDATA[ ...first code fragment... ]]></codefragment>
</file>
<file path="/path/to/file" line="5">
<codefragment><![CDATA[ ...second code fragment...}]]></codefragment>
</file>
<codefragment><![CDATA[ ...duplicated fragment... ]]></codefragment>
</duplication>
</pmd-cpd>JSON reporters
{
"duplicates": [{
"format": "javascript",
"lines": 27,
"fragment": "...code fragment... ",
"tokens": 0,
"firstFile": {
"name": "tests/fixtures/javascript/file2.js",
"start": 1,
"end": 27,
"startLoc": {
"line": 1,
"column": 1
},
"endLoc": {
"line": 27,
"column": 2
}
},
"secondFile": {
"name": "tests/fixtures/javascript/file1.js",
"start": 1,
"end": 24,
"startLoc": {
"line": 1,
"column": 1
},
"endLoc": {
"line": 24,
"column": 2
}
}
}],
"statistic": {
"detectionDate": "2018-11-09T15:32:02.397Z",
"formats": {
"javascript": {
"sources": {
"/path/to/file": {
"lines": 24,
"sources": 1,
"clones": 1,
"duplicatedLines": 26,
"percentage": 45.33,
"newDuplicatedLines": 0,
"newClones": 0
}
},
"total": {
"lines": 297,
"sources": 1,
"clones": 1,
"duplicatedLines": 26,
"percentage": 45.33,
"newDuplicatedLines": 0,
"newClones": 0
}
}
},
"total": {
"lines": 297,
"sources": 6,
"clones": 5,
"duplicatedLines": 26,
"percentage": 45.33,
"newDuplicatedLines": 0,
"newClones": 0
}
}
}API
For integration copy/paste detection to your application you can use programming API:
jscpd Promise API
import {IClone} from '@jscpd/core';
import {jscpd} from 'jscpd';
const clones: Promise<IClone[]> = jscpd(process.argv);jscpd async/await API
import {IClone} from '@jscpd/core';
import {jscpd} from 'jscpd';
(async () => {
const clones: IClone[] = await jscpd(['', '', __dirname + '/../fixtures', '-m', 'weak', '--silent']);
console.log(clones);
})();
detectClones API
import {detectClones} from "jscpd";
(async () => {
const clones = await detectClones({
path: [
__dirname + '/../fixtures'
],
silent: true
});
console.log(clones);
})()detectClones with persist store
import {detectClones} from "jscpd";
import {IMapFrame, MemoryStore} from "@jscpd/core";
(async () => {
const store = new MemoryStore<IMapFrame>();
await detectClones({
path: [
__dirname + '/../fixtures'
],
}, store);
await detectClones({
path: [
__dirname + '/../fixtures'
],
silent: true
}, store);
})()In case of deep customisation of detection process you can build your own tool: If you are going to detect clones in file system you can use @jscpd/finder for make a powerful detector. In case of detect clones in browser or not node.js environment you can build your own solution base on @jscpd/code
Changelog
Who uses jscpd
- Code-Inspector is a code analysis and technical debt management service.
- Mega-Linter is a 100% open-source linters aggregator for CI (GitHub Action & other CI tools) or to run locally
- vscode-jscpd VSCode Copy/Paste detector plugin.
Contributors
This project exists thanks to all the people who contribute.
Backers
Thank you to all our backers! 🙏 [Become a backer]
Sponsors
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]
License
MIT © Andrey Kucherenko

