docshield
v1.0.3
Published
AI Document Tampering & AI-Generation Detection SDK (Node + Browser) - Forensic signal aggregation engine for detecting document tampering and authenticity anomalies
Maintainers
Readme
📄 docshield
AI Document Tampering & AI-Generation Detection SDK (Node.js + Browser)
A probabilistic forensic signal aggregation engine for detecting document tampering, AI-generated artifacts, and authenticity anomalies in PDFs and images.
1. 🎯 Purpose
docshield is a forensic SDK designed to analyze documents (PDFs, images) and produce a multi-signal, evidence-backed authenticity assessment.
It does NOT claim deterministic detection. Instead, it provides:
- Probabilistic tampering likelihood
- AI-generation likelihood
- Metadata anomaly reports
- OCR consistency analysis
- Model-based deepfake probability
- Structured forensic breakdown
Accuracy and interpretability are prioritized over sensational detection claims.
2. ⚖️ Design Philosophy
This SDK is built on the following principles:
- Multi-signal aggregation > single detector
- Explainability > black-box scoring
- Probabilistic scoring > binary classification
- Metadata + structural + content + ML analysis
- Forensic defensibility
Every output includes traceable evidence components.
3. 📦 Installation
Option 1: NPM (Recommended)
npm install docshieldOption 2: Yarn
yarn add docshieldPeer Dependencies
docshield requires peer dependencies. Install them with:
npm install tesseract.js sharp pdf-lib exifreaderOptional ML models:
npm install onnxruntime-web @tensorflow/tfjsNote: These peer dependencies are optional. The SDK works in degraded mode without them.
4. 🚀 Quick Start
Node.js / TypeScript
import fs from 'fs';
import { verifyDocument, quickVerify } from 'docshield';
// Load a PDF file
const pdfBuffer = fs.readFileSync('document.pdf');
// Option 1: Full verification with detailed forensic report
const result = await verifyDocument({
buffer: pdfBuffer,
type: 'pdf',
filename: 'document.pdf'
});
console.log(result);
// Output:
// {
// confidenceScore: 85,
// tamperingProbability: 0.05,
// aiGeneratedProbability: 0.02,
// forensicSummary: {
// riskLevel: 'Low',
// primaryFlags: [],
// explanation: 'Document appears authentic...'
// },
// ...
// }
// Option 2: Quick verification (essential scores only)
const quick = await quickVerify({
buffer: pdfBuffer,
type: 'pdf'
});
console.log(quick);
// Output: { confidenceScore: 85, riskLevel: 'Low', ... }Browser / React
import { verifyImage } from 'docshield';
async function handleImageUpload(event: React.ChangeEvent<HTMLInputElement>) {
const file = event.target.files?.[0];
if (!file) return;
const buffer = await file.arrayBuffer();
const result = await verifyImage(new Uint8Array(buffer));
console.log(`Authenticity: ${result.confidenceScore}%`);
console.log(`Risk Level: ${result.forensicSummary.riskLevel}`);
}Configuration
import { verifyDocument } from 'docshield';
const result = await verifyDocument(
{ buffer: pdfBuffer, type: 'pdf' },
{
verbose: true,
enableDeepfakeDetection: true,
enableWatermarkAnalysis: true,
weights: {
metadataWeight: 0.25,
ocrWeight: 0.20,
watermarkWeight: 0.15,
modelWeight: 0.40
}
}
);5. 📚 API Reference
Main Functions
verifyDocument(fileInput, config?)
Full forensic verification with detailed analysis.
Parameters:
fileInput- File buffer and metadatabuffer: Buffer- File contentstype: 'pdf' | 'image'- Document typefilename?: string- Optional filename
config?: DocshieldConfig- Optional configuration
Returns: Promise<VerificationResult>
quickVerify(fileInput, config?)
Fast verification returning only essential scores.
Returns: Promise with { confidenceScore, riskLevel, tamperingProbability, aiGeneratedProbability }
verifyPDF(buffer, config?)
Shorthand for PDF verification.
verifyImage(buffer, config?)
Shorthand for image verification.
6. 📊 Output Schema
interface VerificationResult {
confidenceScore: number // 0–100 authenticity confidence
tamperingProbability: number // 0–1
aiGeneratedProbability: number // 0–1
forensicSummary: {
riskLevel: "Low" | "Moderate" | "High" | "Critical"
primaryFlags: string[]
explanation: string
}
detectedIssues: string[]
metadataAnalysis: MetadataReport
ocrAnalysis: OCRReport
watermarkAnalysis: WatermarkReport
deepfakeAnalysis: DeepfakeReport
technicalBreakdown: {
metadataScore: number
ocrConsistencyScore: number
watermarkScore: number
modelScore: number
}
evidenceHash: string // SHA-256 fingerprint
analysisTimestamp: string
}7. 🔍 Detection Modules
7.1 Metadata Analyzer
Detects structural and metadata inconsistencies in PDFs and images.
Checks:
- Missing creation timestamps
- Creation date > modification date anomaly
- Suspicious producer software
- Known AI generation tags
- Editing software traces
- Inconsistent embedded fonts
- Unusual compression artifacts
- Resolution mismatch
Output:
interface MetadataReport {
suspiciousFields: string[]
softwareDetected: string[]
timestampAnomalies: boolean
structuralIntegrityScore: number
}7.2 EXIF & Image Integrity Analysis
Flags:
- Missing camera model in real-world claim documents
- AI tool signatures
- Synthetic resolution patterns
- Metadata wiped after editing
- Layering inconsistencies
7.3 OCR Consistency Engine
Detects overlay tampering or digital text injection.
Method:
- Extract embedded PDF text
- Extract OCR text via Tesseract
- Compare similarity
If similarity < threshold → potential overlay tampering
Output:
interface OCRReport {
extractedTextLength: number
similarityScore: number
overlaySuspicion: boolean
}7.4 AI Text Watermark Heuristic
Uses statistical analysis to detect AI-generated text:
- Shannon entropy
- Token distribution uniformity
- Burstiness measurement
- Repetition index
- Stylometric signature
Output:
interface WatermarkReport {
entropyScore: number
repetitionIndex: number
aiLikelihoodScore: number
heuristicConfidence: number
}7.5 Deepfake / AI Image Model
Uses ONNX or TensorFlow.js models.
Pipeline:
- Convert image to tensor
- Normalize channels
- Run classifier
- Output probability
Output:
interface DeepfakeReport {
aiImageProbability: number
ganFingerprintDetected: boolean
modelConfidence: number
}8. 🧠 Confidence Scoring Engine
Final authenticity score is weighted aggregation:
Final Confidence =
0.25 * metadataScore +
0.20 * ocrConsistencyScore +
0.15 * watermarkScore +
0.40 * modelScoreWeights are configurable.
9. 🔐 Evidence Integrity
Every analyzed document produces:
evidenceHash = SHA256(fileBuffer)This ensures:
- Forensic reproducibility
- Chain-of-custody support
- Audit trace reliability
10. 📈 Risk Classification Logic
| Confidence Score | Risk Level | | --- | --- | | 85–100 | Low | | 65–84 | Moderate | | 40–64 | High | | 0–39 | Critical |
11. 🧩 Multi-Use Cases
Legal Document Verification
- Court filings
- Affidavits
- Evidence submissions
- Contract authenticity
HR / Background Verification
- Resume tampering
- Degree certificate validation
Insurance & Claims
- Image manipulation detection
- Accident report verification
Financial Institutions
- Loan document fraud screening
Digital Forensics Teams
- Chain-of-custody validation
- Evidence screening
SaaS Integration
- API verification endpoint
- Browser upload validation
- KYC pipeline integration
12. 🌐 Node + Browser Compatibility
Dual export structure:
{
"main": "dist/node/index.js",
"browser": "dist/browser/index.js",
"exports": {
".": {
"import": "./dist/node/index.js",
"browser": "./dist/browser/index.js"
}
}
}13. ⚠️ Accuracy Guarantees & Limitations
This SDK:
- ✔ Detects statistical anomalies
- ✔ Flags structural inconsistencies
- ✔ Provides probabilistic scoring
- ✔ Aggregates independent forensic signals
This SDK DOES NOT:
- ✖ Guarantee 100% AI detection
- ✖ Provide legal certification
- ✖ Replace forensic lab analysis
All outputs are probabilistic.
14. 🔄 Future Enhancements
- Blockchain notarization module
- Digital signature verification
- Tamper-evident watermark detection
- Stylometric author fingerprinting
- Large-scale ML ensemble voting
- REST API wrapper
- Enterprise audit logs
15. 🔬 Accuracy Optimization Guidelines
To maximize detection reliability:
- Use ensemble ML models
- Train on diverse datasets
- Regularly update AI detection models
- Calibrate thresholds per industry
- Store anonymized telemetry for model refinement
- Maintain strict test benchmarks
16. 🏁 Project Setup & Getting Started
📋 Project Structure
docshield/
├── src/
│ ├── types/ # TypeScript interfaces
│ ├── analyzers/ # Detection modules
│ │ ├── metadataAnalyzer.ts
│ │ ├── ocrAnalyzer.ts
│ │ ├── watermarkAnalyzer.ts
│ │ └── deepfakeAnalyzer.ts
│ ├── core/ # Core engine
│ │ ├── scoringEngine.ts
│ │ └── forensicAggregator.ts
│ ├── utils/ # Utility functions
│ │ ├── hashing.ts
│ │ └── validators.ts
│ ├── __tests__/ # Test suite
│ └── index.ts # Main export
├── dist/ # Compiled output
├── package.json # Dependencies & scripts
├── tsconfig.json # TypeScript config
├── jest.config.js # Test config
├── .eslintrc.json # ESLint config
├── .prettierrc.json # Code formatting
└── README.md # Documentation🚀 Development Setup
1. Clone & Install
git clone https://github.com/yourusername/docshield.git
cd docshield
npm install2. Build Project
npm run buildOutput: dist/node/ and dist/browser/
3. Run Tests
npm testRun in watch mode:
npm run test:watch4. Development Mode
npm run devThis starts TypeScript compilation in watch mode.
5. Code Quality
Format code:
npm run formatLint code:
npm run lint💡 Usage Examples
Example 1: Verify a Legal Document
import fs from 'fs';
import { verifyPDF } from 'docshield';
const contractBuffer = fs.readFileSync('contract.pdf');
const result = await verifyPDF(contractBuffer);
if (result.forensicSummary.riskLevel === 'Low') {
console.log('✅ Contract appears authentic');
} else {
console.log(`⚠️ Risk Level: ${result.forensicSummary.riskLevel}`);
console.log(`Detected Issues: ${result.detectedIssues.join(', ')}`);
}Example 2: Verify an Insurance Claim Image
import fs from 'fs';
import { verifyImage } from 'docshield';
const claimImageBuffer = fs.readFileSync('accident-photo.jpg');
const result = await verifyImage(claimImageBuffer);
console.log(`Authenticity Score: ${result.confidenceScore}%`);
console.log(`AI Generation Risk: ${(result.aiGeneratedProbability * 100).toFixed(1)}%`);Example 3: Express.js Backend Integration
import express from 'express';
import fileUpload from 'express-fileupload';
import { verifyDocument } from 'docshield';
const app = express();
app.use(fileUpload());
app.post('/api/verify', async (req, res) => {
try {
if (!req.files?.document) {
return res.status(400).json({ error: 'No file uploaded' });
}
const file = req.files.document as any;
const result = await verifyDocument({
buffer: file.data,
type: req.body.type || 'pdf'
});
res.json({
authenticity: result.confidenceScore,
riskLevel: result.forensicSummary.riskLevel,
issues: result.detectedIssues,
hash: result.evidenceHash
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => console.log('Server running on :3000'));Example 4: Advanced Configuration
import { verifyDocument } from 'docshield';
const result = await verifyDocument(
{ buffer: pdfBuffer, type: 'pdf' },
{
verbose: true,
enableDeepfakeDetection: true,
enableWatermarkAnalysis: true,
ocrThreshold: 0.75,
weights: {
metadataWeight: 0.3, // Increase metadata importance
ocrWeight: 0.25,
watermarkWeight: 0.15,
modelWeight: 0.3 // Reduce ML dependence
}
}
);🔗 Resources
- TypeScript Documentation
- Node.js Best Practices
- Digital Forensics
- Document Authentication
- ML-based Analysis
📝 License
MIT License - See LICENSE file for details
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
📞 Support
For issues & feature requests: GitHub Issues
📊 Summary
docshield provides:
- ✨ Forensic-grade probabilistic detection
- 🧠 Multi-signal analysis (metadata, OCR, watermark, ML)
- 📋 Detailed explainable results
- 🔐 Evidence integrity (SHA-256 hashing)
- 🌍 Node.js + Browser support
- 🛡️ Production-ready TypeScript
Happy Verifying! 🛡️
