@phoenixaihub/contrib-guard
v1.0.0
Published
AI contribution triage for open source maintainers — detect duplicates, AI-generated content, and quality issues without LLM calls
Maintainers
Readme
ContribGuard
AI contribution triage for open source maintainers. Detect duplicate issues, AI-generated content, low-quality submissions, and submission velocity anomalies — without any LLM API calls.
Quick Start
npx @phoenixaihub/contrib-guard scan --repo facebook/reactFeatures
- 🔁 Duplicate Detection — TF-IDF + cosine similarity clustering
- 🤖 AI Content Fingerprinting — Stylometric heuristics (sentence uniformity, vocabulary diversity, pattern matching)
- 📊 Quality Scoring — Checks for repro steps, tests, code references, environment info
- 🚨 Velocity Anomaly Detection — Flags burst submission patterns
- 🔗 Cross-Reference Engine — Matches open issues against closed/fixed ones
Install
npm install -g @phoenixaihub/contrib-guardCLI Usage
# Scan a repo
contrib-guard scan --repo owner/repo
# With GitHub token for higher rate limits
contrib-guard scan --repo owner/repo --token ghp_xxx
# JSON output
contrib-guard scan --repo owner/repo --json
# Limit items fetched
contrib-guard scan --repo owner/repo --limit 50GitHub Action
name: ContribGuard Triage
on:
issues:
types: [opened]
pull_request:
types: [opened]
jobs:
triage:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: phoenix-assistant/contrib-guard@main
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
ai-threshold: '40'
quality-threshold: '40'Action Inputs
| Input | Default | Description |
|-------|---------|-------------|
| github-token | ${{ github.token }} | GitHub API token |
| ai-threshold | 40 | AI detection score threshold (0-100) |
| quality-threshold | 40 | Quality score threshold (0-100) |
| label-duplicates | possible-duplicate | Label for duplicates |
| label-ai | ai-generated | Label for AI-flagged items |
| label-low-quality | needs-details | Label for low quality items |
How It Works
Duplicate Detection
Uses TF-IDF vectorization of issue/PR text, then computes pairwise cosine similarity. Items with similarity ≥ 0.6 are clustered as duplicates.
AI Fingerprinting
Pure heuristic analysis — no LLM calls:
- Sentence uniformity: AI text has unusually consistent sentence lengths (low coefficient of variation)
- Vocabulary diversity: AI text tends toward lower type-token ratios
- Pattern matching: Detects hedging phrases ("it is worth noting"), excessive numbered lists, overly formal language
Quality Scoring
Checks issues for: reproduction steps, expected/actual behavior, environment info, code snippets, descriptive titles. Checks PRs for: description, issue references, test mentions, reasonable size.
Velocity Detection
Flags users who submit ≥ 5 items within a 60-minute window.
Cross-Referencing
Matches open issues against closed ones using the same TF-IDF similarity engine to surface "already fixed" reports.
Programmatic API
import { triage, fetchItems } from '@phoenixaihub/contrib-guard';
const items = await fetchItems('owner/repo', 'ghp_token');
const report = triage('owner/repo', items);
console.log(report.summary);License
MIT
