jscpd-server
v4.0.8
Published
jscpd server application
Readme
JSCPD Server API Documentation
Overview
The JSCPD Server provides a RESTful API for detecting code duplications. It scans a codebase on startup and allows clients to check code snippets for duplications against the scanned codebase.
Starting the Server
Command Line
# Start server in current directory
jscpd-server
# Start server in specific directory
jscpd-server /path/to/project
# Start server on specific port
jscpd-server . --port 8080
# Start server with custom host
jscpd-server . --host localhost --port 3000
# Start server with persistent storage (recommended for production)
jscpd-server . --store leveldbServer-Specific Options
-p, --port [number]- Port to run the server on (Default: 3000)-H, --host [string]- Host to bind the server to (Default: 0.0.0.0)
Common Options (Available for Server)
--store [string]- Persistent store for codebase data (e.g.,leveldbfor disk persistence). Without this option, the server uses in-memory storage which is lost on restart-c, --config [string]- Path to config file (Default is .jscpd.json in )-f, --format [string]- Format or formats separated by comma-i, --ignore [string]- Glob pattern for files to exclude--ignore-pattern [string]- Ignore code blocks matching regexp patterns-l, --min-lines [number]- Min size of duplication in code lines-k, --min-tokens [number]- Min size of duplication in code tokens-x, --max-lines [number]- Max size of source in lines-z, --max-size [string]- Max size of source in bytes-m, --mode [string]- Mode of quality of search (strict, mild, weak)-a, --absolute- Use absolute path in reports-n, --noSymlinks- Don't use symlinks for detection--ignoreCase- Ignore case of symbols in code-g, --gitignore- Ignore all files from .gitignore file--skipLocal- Skip duplicates in local folders
Base URL
When running locally: http://localhost:3000/api
Content Type
All API responses return Content-Type: application/json. The server accepts both application/json and application/x-www-form-urlencoded request bodies.
Request Body Size Limit
The server accepts request bodies up to 10MB in size.
Authentication
Currently, no authentication is required. Future versions may add authentication support.
Endpoints
1. Check Code Snippet
Check a code snippet for duplications against the scanned codebase.
Endpoint: POST /api/check
Request Headers:
Content-Type: application/jsonRequest Body:
{
"code": "string (required)",
"format": "string (required)"
}Parameters:
code(required, string): The code snippet to check for duplications. This field cannot be empty or whitespace-only.format(required, string): Programming language/format of the snippet (e.g., "javascript", "python", "java")
Example Request:
curl -X POST http://localhost:3000/api/check \
-H "Content-Type: application/json" \
-d '{
"code": "function hello() {\n console.log(\"Hello, World!\");\n}",
"format": "javascript"
}'Success Response (200 OK):
{
"duplications": [
{
"snippetLocation": {
"startLine": 1,
"endLine": 5,
"startColumn": 0,
"endColumn": 20
},
"codebaseLocation": {
"file": "src/utils/helper.js",
"startLine": 10,
"endLine": 14,
"startColumn": 0,
"endColumn": 20,
"fragment": "function hello() {\n console.log(\"Hello, World!\");\n}"
},
"linesCount": 4
}
],
"statistics": {
"totalDuplications": 1,
"duplicatedLines": 4,
"totalLines": 5,
"percentageDuplicated": 80.0
}
}"message": "Missing required field: code", "statusCode": 400 }
**400 Bad Request** - Validation error (wrong type):
```json
{
"error": "ValidationError",
"message": "Field \"code\" must be a string",
"statusCode": 400
}400 Bad Request - Validation error (empty code):
{
"error": "ValidationError",
"message": "Field \"code\" cannot be empty",
"statusCode": 400
}400 Bad Request - Server not initialized:
{
"error": "Error",
"message": "Server not initialized. Please wait for initial scan to complete.",
"statusCode": 400
}400 Bad Request - Processing error:
{
"error": "CheckError",
"message": "Error details...",
"statusCode": 400
}Snippet Isolation and Memory Management
The /api/check endpoint implements request isolation to ensure:
No Cross-Request Contamination: Snippet tokens are isolated per request and never stored in the shared project store. Each request compares only against the scanned codebase, not against tokens from previous snippet checks.
Automatic Cleanup: Snippet token data is automatically discarded when the request completes (via finally block), preventing unbounded memory growth.
Concurrent Request Safety: Multiple concurrent snippet checks are isolated from each other. The same snippet checked simultaneously will produce identical results without interference.
Consistent Results: Checking the same snippet multiple times will always produce identical duplication reports, as snippet tokens don't persist between requests.
Implementation Details:
The server uses an ephemeral hybrid store for each snippet check:
- Reads are delegated to the shared project store (to detect duplications against the codebase)
- Writes (snippet tokens) go to a temporary in-memory store
- The temporary store is discarded after the request completes
This architecture ensures snippet detection remains stateless and memory-safe while still detecting duplications against the full project.
2. Get Project Statistics
Get overall duplication statistics for the scanned codebase.
Endpoint: GET /api/stats
Request Headers: None required
Example Request:
curl http://localhost:3000/api/statsSuccess Response (200 OK):
The response includes the full statistics object from the initial scan. The structure follows the standard jscpd statistics format.
{
"statistics": {
"detectionDate": "2025-11-17T10:30:00.000Z",
"total": {
"lines": 10000,
"tokens": 50000,
"sources": 50,
"duplicatedLines": 500,
"duplicatedTokens": 2500,
"clones": 10,
"percentage": 5.0,
"percentageTokens": 5.0,
"newDuplicatedLines": 0,
"newClones": 0
},
"formats": {
"javascript": {
"total": {
"lines": 5000,
"tokens": 25000,
"sources": 30,
"duplicatedLines": 300,
"duplicatedTokens": 1500,
"clones": 6,
"percentage": 6.0,
"percentageTokens": 6.0,
"newDuplicatedLines": 0,
"newClones": 0
},
"sources": {
"src/file1.js": {
"lines": 100,
"tokens": 500,
"sources": 1,
"duplicatedLines": 10,
"duplicatedTokens": 50,
"clones": 1,
"percentage": 10.0,
"percentageTokens": 10.0,
"newDuplicatedLines": 0,
"newClones": 0
}
}
}
}
},
"timestamp": "2025-11-17T10:30:00.000Z"
}Error Responses:
503 Service Unavailable - Statistics not ready:
{
"error": "NotReady",
"message": "Statistics not available yet. Server is still initializing.",
"statusCode": 503
}4. Health Check
Check server health and initialization status.
Endpoint: GET /api/health
Request Headers: None required
Example Request:
curl http://localhost:3000/api/healthSuccess Response (200 OK):
{
"status": "ready",
"workingDirectory": "/path/to/project",
"lastScanTime": "2025-11-17T10:30:00.000Z"
}Status Values:
initializing- Server is scanning the codebaseready- Server is ready to accept requests
5. API Information
Get information about the API and available endpoints.
Endpoint: GET /
Request Headers: None required
Example Request:
curl http://localhost:3000/Success Response (200 OK):
{
"name": "jscpd-server",
"version": "1.0.0",
"endpoints": {
"POST /api/check": "Check code snippet for duplications",
"GET /api/stats": "Get overall project statistics",
"GET /api/health": "Server health check"
},
"documentation": "https://github.com/kucherenko/jscpd"
}MCP Server
The server also supports the Model Context Protocol (MCP), allowing it to fit into LLM-based workflows.
MCP Endpoint
Endpoint: POST /mcp
The server handles MCP requests via the /mcp endpoint using the protocol's streamable HTTP transport.
Configuration
To use with Claude Desktop, add the following to your claude_desktop_config.json:
{
"mcpServers": {
"jscpd": {
"type": "streamable-http",
"url": "http://localhost:3000/mcp"
}
}
}Ensure jscpd-server is running (e.g., npx jscpd-server . --port 3000) before starting Claude Desktop.
MCP Tools
check_duplication
Check code snippet for duplications against the codebase.
- Input:
code(string): Source code snippet.format(string): Format/language of the code.
get_statistics
Get overall project duplication statistics.
- Input: None
check_current_directory
Trigger a re-scan of the current working directory for duplications.
- Input: None
Response Schemas
CheckSnippetResponse
{
duplications: Array<{
snippetLocation: {
startLine: number;
endLine: number;
startColumn: number;
endColumn: number;
};
codebaseLocation: {
file: string;
startLine: number;
endLine: number;
startColumn: number;
endColumn: number;
fragment?: string;
};
linesCount: number;
}>;
statistics: {
totalDuplications: number;
duplicatedLines: number;
totalLines: number;
percentageDuplicated: number;
};
}ErrorResponse
{
error: string;
message: string;
statusCode: number;
}Supported Languages
The server supports all languages that jscpd supports. When checking a snippet, specify the language using the format parameter with the language name or file extension.
Common format identifiers:
- JavaScript:
javascript,js - TypeScript:
typescript,ts - Python:
python,py - Java:
java - C/C++:
c,cpp - C#:
csharp,cs - PHP:
php - Ruby:
ruby,rb - Go:
go - Rust:
rust,rs
For a complete list, see the supported formats documentation.
Error Handling
All errors follow a consistent format with an HTTP status code and JSON body:
{
"error": "ErrorType",
"message": "Human-readable error message",
"statusCode": 400
}Common Error Types
ValidationError(400) - Invalid request parameters (missing fields, wrong types, empty values)CheckError(400) - Error processing the check requestError(400) - Server not initialized or general errorsNotReady(503) - Statistics not available yet (server still initializing)NotFound(404) - Endpoint not foundStatsError(500) - Error retrieving statisticsInternalServerError(500) - Unexpected server error
Rate Limiting
Currently, no rate limiting is implemented. Consider implementing rate limiting in production environments.
Persistent Storage with LevelDB
Why Use Persistent Storage?
By default, the server uses in-memory storage (MemoryStore) for the codebase scan results. This means:
- ✅ Fast - All data is in memory
- ❌ Volatile - Data is lost when the server restarts
- ❌ Memory-intensive - Large codebases consume significant RAM
With --store leveldb, the server uses disk-based storage (LevelDB):
- ✅ Persistent - Data survives server restarts
- ✅ Memory-efficient - Data is stored on disk
- ✅ No re-scanning - Server starts immediately with cached data
- ⚠️ Slightly slower - Disk I/O overhead (minimal impact)
Installation
# Install the LevelDB store package
npm install @jscpd/leveldb-storeUsage
# Start server with LevelDB persistence
jscpd server /path/to/project --store leveldb --port 3000
# On first start, the server will:
# 1. Scan the codebase
# 2. Store results in .jscpd/ directory
# 3. Accept requests
# On subsequent restarts, the server will:
# 1. Load cached data from .jscpd/ directory
# 2. Accept requests immediately (no re-scan needed)Storage Location
LevelDB stores data in the .jscpd/ directory relative to where the server is started. This directory contains:
- Token databases for each file format
- Duplication detection data
Important: Add .jscpd/ to your .gitignore file:
echo ".jscpd/" >> .gitignoreCleanup
To clear cached data and force a fresh scan:
# Stop the server, then:
rm -rf .jscpd/Examples
Example 1: Check JavaScript Code
const axios = require('axios');
const checkCode = async () => {
try {
const response = await axios.post('http://localhost:3000/api/check', {
code: `
function calculateSum(a, b) {
return a + b;
}
`,
format: 'javascript'
});
console.log('Duplications found:', response.data.duplications.length);
console.log('Percentage duplicated:', response.data.statistics.percentageDuplicated + '%');
} catch (error) {
console.error('Error:', error.response.data);
}
};
checkCode();Example 2: Check Python Code
import requests
def check_code():
url = 'http://localhost:3000/api/check'
payload = {
'code': '''
def hello_world():
print("Hello, World!")
''',
'format': 'python'
}
response = requests.post(url, json=payload)
if response.status_code == 200:
data = response.json()
print(f"Duplications found: {len(data['duplications'])}")
print(f"Percentage duplicated: {data['statistics']['percentageDuplicated']}%")
else:
print(f"Error: {response.json()}")
check_code()Example 3: Get Project Statistics
#!/bin/bash
# Get statistics
curl -s http://localhost:3000/api/stats | jq '.statistics.total'
# Output:
# {
# "lines": 10000,
# "tokens": 50000,
# "sources": 50,
# "duplicatedLines": 500,
# "duplicatedTokens": 2500,
# "clones": 10,
# "percentage": 5.0,
# ...
# }Integration with CI/CD
GitHub Actions Example
name: Check Code Duplication
on: [push, pull_request]
jobs:
check-duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Node.js
uses: actions/setup-node@v2
with:
node-version: '18'
- name: Install jscpd
run: npm install -g jscpd
- name: Start jscpd server
run: |
jscpd server . --port 3000 &
sleep 10 # Wait for server to initialize
- name: Check new code
run: |
# Check files changed in this PR
for file in $(git diff --name-only HEAD~1); do
if [[ -f "$file" ]]; then
# Extract file extension to determine format
ext="${file##*.}"
curl -X POST http://localhost:3000/api/check \
-H "Content-Type: application/json" \
-d "{\"code\": \"$(cat $file | jq -Rs .)\", \"format\": \"$ext\"}" \
| jq .
fi
doneBest Practices
Initialize Once: The server scans the codebase on startup. For large codebases, this may take time. Check the
/api/healthendpoint to verify the server is ready before sending requests.Format Parameter: Always provide the
formatparameter with a valid language identifier or file extension.Snippet Size: Large snippets take longer to process. Maximum body size is 10MB.
Error Handling: Always handle errors appropriately, especially 400 errors when the server is not initialized or validation fails.
Production Use: For production use, consider:
- Using persistent storage with
--store leveldbto avoid rescanning on restarts - Adding authentication
- Implementing rate limiting
- Using a reverse proxy (nginx, Apache)
- Monitoring and logging
- Running behind HTTPS
- Using persistent storage with
Troubleshooting
Server won't start
- Port already in use: Try a different port with
--port - Permission denied: Use a port above 1024 or run with appropriate permissions
No duplications found
- Ensure
formatis correctly specified with a valid language identifier - Check that the codebase was successfully scanned (check server logs)
- Verify minimum thresholds (
--min-lines,--min-tokens)
400 Bad Request - Server not initialized
- Wait for the initial scan to complete
- Check
/api/healthendpoint - status should bereadynotinitializing
Validation errors
- Ensure
codefield is a non-empty string - Ensure
formatfield is provided and is a string - Check that the request body is valid JSON
Performance Considerations
- Initial Scan Time: Depends on codebase size. Large codebases may take several minutes.
- Memory Usage: The server keeps the scanned codebase in memory by default. Monitor memory usage for large projects. Snippet checks use ephemeral stores that are automatically garbage-collected, preventing unbounded memory growth from snippet tokens.
- Check Response Time: Typically < 1 second for small snippets, longer for larger snippets.
- Concurrent Requests: Snippet checks are isolated and thread-safe. Multiple concurrent requests do not interfere with each other.
- Persistent Storage: Use
--store leveldbfor disk-based persistence. This allows the server to survive restarts without rescanning the codebase. LevelDB is recommended for large repositories. To use LevelDB, ensure@jscpd/leveldb-storeis installed:npm install @jscpd/leveldb-store
Support
For issues, questions, or feature requests, please visit:
- GitHub Issues: https://github.com/kucherenko/jscpd/issues
- Documentation: https://github.com/kucherenko/jscpd
