prsense
v1.0.1
Published
AI-powered PR duplicate detection system using embeddings, bloom filters, and multi-signal scoring
Maintainers
Readme
PRSense
Production-Ready AI-Powered Pull Request Duplicate Detection
Mission: Build trustworthy tools that give repositories memory — reducing duplicate work while preserving contributor credit.
Automatically detect duplicate PRs and save hours of review time. Deploy in 5 minutes.
[!TIP] New Here? Start with START_HERE.md for a beginner-friendly overview, then check the guides below.
Quick Links:
- Installation: docs/install.md
- User Guide: docs/user-guide.md
- CLI Usage: docs/cli-usage.md
- Deployment: docs/deployment.md
Key Features
Core Capabilities
- 95% Accuracy — Real OpenAI embeddings (or free local alternatives)
- 2ms Latency — Bloom filter + ANN for sub-linear scaling
- Production Storage — SQLite (dev) or PostgreSQL + pgvector (prod)
- GitHub Bot — Auto-comments on duplicate PRs
- Deploy in 5min — Vercel, Docker, or manual server
- Cost-Effective — $0-50/month depending on scale
- Attribution Tracking — Preserves original authorship credit
Advanced Features
- Persistent Storage — Exports for SQLite/Postgres
- Score Breakdown — Explainable "Why" for every match
- Batch API — Check 100s of PRs in CI/CD pipelines
- Embedding Cache — Reduces API costs by 90%
- Configurable Weights — Tune scoring (Text vs Diff vs Files)
- Dry-Run Mode — specific testing without indexing
- ONNX Local Embeddings — 100% offline, privacy-first
- Cross-Repo Detection — Find duplicates across your entire org
📖 See Features Documentation for details.
Architecture
┌─────────────┐
│ New PR │
└──────┬──────┘
│
▼
┌─────────────────┐
│ Bloom Filter │ ──── Early rejection
│ (Fast Path) │
└──────┬──────────┘
│ might contain
▼
┌─────────────────┐
│ Embedding │ ──── Text + Diff embeddings
│ Pipeline │
└──────┬──────────┘
│
▼
┌─────────────────┐
│ ANN Search │ ──── Candidate retrieval (k=20)
│ (Retriever) │
└──────┬──────────┘
│
▼
┌─────────────────┐
│ Multi-Signal │ ──── Text + Diff + File Overlap
│ Ranker │ (Configurable weights)
└──────┬──────────┘
│
▼
┌─────────────────┐
│ Decision │ ──── HIGH (≥0.9), MEDIUM (≥0.82), LOW
│ Engine │
└──────┬──────────┘Project Structure
prsenses-labs/
├── package.json # Monorepo Root
├── PRSenses/ # Core Library & API
│ ├── src/ # Detection Logic
│ └── action/ # GitHub Action
├── prsense-vscode/ # VS Code Extension
└── prsense-analytics/ # Analytics DashboardQuick Start
Installation
git clone https://github.com/prsense-labs/prsense
cd prsense
npm install # Installs dependencies
npm run build # Builds all packages[!TIP] Zero setup mode: After building, the CLI works immediately using local ONNX embeddings. No API key required.
6 Ways to Use
1. GitHub Action (CI/CD)
- uses: prsense-labs/prsense@v1
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
embedding-provider: 'onnx'2. VS Code Extension (Local Dev)
- Detects duplicates while you type.
- Extension Guide
3. CLI Tool (Manual Check)
npm run cli check my-pr.json4. GitHub Bot (Webhook)
- Deploy to Vercel in 2 minutes.
- Deployment Guide
5. Library (NodeJS Integration)
import { PRSenseDetector } from 'prsense'
const result = await detector.check(prData)6. Microservice (Docker/API)
- Run as a standalone REST API.
- Production Setup
Usage Example
Simple API (Recommended)
import { PRSenseDetector, createPostgresStorage } from './prsense.js'
import { createOpenAIEmbedder } from './embedders/openai.js'
// 1. Initialize
const storage = await createPostgresStorage().init()
const detector = new PRSenseDetector({
embedder: createOpenAIEmbedder(),
storage,
enableCache: true
})
// 2. Check for duplicates
const result = await detector.checkDetailed({
prId: 123,
title: 'Fix login bug',
description: 'Handle empty passwords',
files: ['auth/login.ts']
})
// 3. Handle Result
if (result.type === 'DUPLICATE') {
console.log(`Duplicate of PR #${result.originalPr}`)
console.log(`Confidence: ${result.confidence}`)
console.log(`Breakdown:`, result.breakdown)
// Breakdown: { textSimilarity: 0.95, diffSimilarity: 0.88, ... }
}Documentation Index
- problem.md — Problem statement
- algorithms.md — How it works (Deep Dive)
- data_structures.md — Bloom filters & Vector Index
- scoring.md — The math behind the match
- complexity.md — Performance analysis
- features.md — Detailed feature guide
- ROADMAP.md — Future plans
Design Principles
- Scalability — O(1) lookups with Bloom filters
- Accuracy — Multi-signal scoring > Single vector search
- Privacy — Local embeddings (ONNX) support
- Modularity — Swap storage, embedders, or rankers easily
What We Don't Do
PRSense tools are designed to assist humans, not replace them.
We do not:
- Block or auto-close pull requests
- Judge code quality
- Replace maintainer decisions or code review
Contributing
Contributions welcome! Please read CONTRIBUTING.md before submitting a PR.
- 💬 Questions? Start a Discussion
- 🐛 Found a bug? Open an Issue
- 📋 What's new? See the CHANGELOG
License
MIT License
PRSense — Because duplicate PRs waste everyone's time. Let's fix that.
