webform-privacy-consent-scanner
v0.1.10
Published
Advanced web form scanner detecting Google Forms, HubSpot Forms, Microsoft Forms, Formstack Forms with comprehensive CMP detection including Cookiebot, OneTrust, Efilli, and GDPR compliance auditing. Also detects 3rd party application links and embedded f
Downloads
20
Maintainers
Readme
Webform Privacy Consent Scanner
Advanced web form scanner detecting Google Forms, HubSpot Forms, Microsoft Forms, Formstack Forms with comprehensive CMP detection including Cookiebot, OneTrust, Efilli, and GDPR compliance auditing. Also detects 3rd party application links and embedded forms on websites.

🚀 Quick Start
# Install
npm install -g webform-privacy-consent-scanner
# Basic scan with CMP detection
webform-scanner --input urls.txt --out results.csv --cmp
# Full scan with dynamic rendering (default wait: 6000ms)
webform-scanner --input urls.txt --dynamic --cmp📋 Table of Contents
✨ Features
- 🔍 Multi-Platform Form Detection: Google Forms, HubSpot Forms, Microsoft Forms, Formstack Forms, Target Website Forms
- 🔗 3rd Party Application Link Detection: Detects links to external applications and forms on websites
- 🍪 Comprehensive CMP Detection: Cookiebot, OneTrust, Efilli, GTM, Generic GDPR
- 🌐 Advanced Scanning: Static HTML + optional Playwright dynamic rendering
- 🔄 Smart Fallback: Automatic curl fallback for blocked requests
- 📊 Multiple Outputs: CSV, JSON, filtered text reports
- ⚡ High Performance: Concurrent scanning with configurable limits
- 🔒 Privacy Focused: Respects robots.txt, ethical scanning practices
- 🎯 CLI First: Powerful command-line interface with extensive options
📦 Installation
Requirements
- Node.js >= 18
- npm
Optional Dependencies
- Playwright (for dynamic scanning):
npm install -D playwright && npx playwright install
Global Installation
npm install -g webform-privacy-consent-scannerLocal Development
git clone https://github.com/c3nk/webform-privacy-consent-scanner.git
cd webform-privacy-consent-scanner
npm install
# Optional: Install Playwright for dynamic scanning
npm install -D playwright
npx playwright install🎯 Usage
Basic Scanning
# Scan URLs with CMP detection
webform-scanner --input urls.txt --cmp
# Output: results_2025-01-15T10-30-00.csv
# results_2025-01-15T10-30-00.jsonAdvanced Options
# Dynamic scanning with custom wait
webform-scanner --input urls.txt --dynamic --wait 10000 --cmp
# High concurrency for large lists
webform-scanner --input large-list.txt --concurrency 16 --timeout 20000
# Custom output location
webform-scanner --input urls.txt --out my-scan-results.csv
# Collector pattern detection (DOM analysis)
webform-scanner --input urls.txt --collectors "forms.example.com/*,yourbrand.formstack.com/*" --cmpFiltering Results
# Filter Google Forms
node filter.mjs --attr is_google_form --value true
# Filter by CMP vendor
node filter.mjs --attr cmp_vendor --value Cookiebot
# Case-insensitive search
node filter.mjs --attr url --value example.com --ci --contains🎪 Supported Platforms
Form Types
- Google Forms: Direct URLs, embedded iframes, form actions
- HubSpot Forms: Script detection, API endpoints, inline JavaScript
- Microsoft Forms: Response pages, short URLs, Office UI framework
- Formstack Forms: Hosted forms, embed scripts, iframe integration
CMP Platforms
- Cookiebot: Popular EU CMP solution
- OneTrust: Enterprise-grade consent management
- Efilli: Turkish CMP platform
- Google Tag Manager: GTM-loaded CMP detection
- Generic: Standard GDPR/cookie consent banners
📊 Output Formats
CSV Format
url,method,status,is_target_website_form,is_yourcompany_form,is_example_form,is_google_form,is_hubspot_form,is_microsoft_form,is_formstack_form,detected_types,evidence,has_cmp,cmp_vendor,cmp_evidence,collectors_detected,collector_link_count,collector_embed_count,linked_forms_detected,linked_forms_count,note
https://www.c3nk.com/examples/forms/hubspot.html?cmp=cookiebot&mode=mock,dynamic,200,false,true,false,["hubspot"],"hbspt.forms.create(",true,"Cookiebot","Cookiebot",true,2,1,
https://www.c3nk.com/examples/forms/google.html?cmp=onetrust&mode=mock,dynamic,200,true,false,false,["google"],"docs.google.com/forms",true,"Cookiebot","cookiebot",false,0,0,
https://www.c3nk.com/examples/forms/microsoft.html?cmp=efilli&mode=mock,dynamic,200,false,false,true,["microsoft"],"forms.office.com",true,"Cookiebot","cookiebot",true,1,1,JSON Format
[
{
"url": "https://www.c3nk.com/examples/forms/hubspot.html?cmp=cookiebot&mode=mock",
"method": "dynamic",
"status": 200,
"is_google_form": false,
"is_hubspot_form": true,
"is_microsoft_form": false,
"detected_types": ["hubspot"],
"evidence": "hbspt.forms.create(",
"has_cmp": true,
"cmp_vendor": "Cookiebot",
"cmp_evidence": "Cookiebot",
"collectors_detected": true,
"collector_link_count": 2,
"collector_embed_count": 1,
"collectors": [
{
"target_pattern": "forms.example.com/*",
"matched_url": "https://forms.example.com/contact",
"relation": "link",
"match_type": "wildcard",
"text_or_context": "Contact us for more information"
}
],
"note": ""
},
{
"url": "https://www.c3nk.com/examples/forms/google.html?cmp=onetrust&mode=mock",
"method": "dynamic",
"status": 200,
"is_google_form": true,
"is_hubspot_form": false,
"is_microsoft_form": false,
"detected_types": ["google"],
"evidence": "docs.google.com/forms",
"has_cmp": true,
"cmp_vendor": "Cookiebot",
"cmp_evidence": "cookiebot",
"collectors_detected": false,
"collector_link_count": 0,
"collector_embed_count": 0,
"collectors": [],
"note": ""
}
]Filtered Text Report
FILTER REPORT
=============
Input file: test-results.json
Filter: is_google_form = true
Total results: 11
Filtered results: 1
RESULTS:
--------
https://www.c3nk.com/examples/forms/google.html?cmp=onetrust&mode=mock🔧 CLI Options
| Option | Description | Default |
|--------|-------------|---------|
| --input <file> | Input file with URLs | Required |
| --out <file> | Output file path | results_TIMESTAMP.csv |
| --concurrency <n> | Number of concurrent requests | 8 |
| --timeout <ms> | Request timeout | 15000 |
| --dynamic | Enable dynamic scanning | false |
| --wait <ms> | Wait time for dynamic content (default: 6000ms) | 6000 |
| --cmp | Enable CMP detection | false |
| --collectors <patterns> | Comma-separated glob patterns for link/iframe/script detection | None |
📈 Examples
Live test pages (hosted on c3nk.com)
- HubSpot + Cookiebot (mock signatures): https://www.c3nk.com/examples/forms/hubspot.html?cmp=cookiebot&mode=mock
- Google + OneTrust (mock signatures): https://www.c3nk.com/examples/forms/google.html?cmp=onetrust&mode=mock
- Microsoft + Efilli (mock signatures): https://www.c3nk.com/examples/forms/microsoft.html?cmp=efilli&mode=mock
mock mode embeds only detection signatures (no third‑party requests). Use
&mode=liveto actually load vendor scripts for visual checks.
Local examples (in this repo)
We ship three mock pages under examples/forms/ with the same signature logic.
1. Basic Website Audit
echo "https://c3nk.com/" > urls.txt
webform-scanner --input urls.txt --cmp2. Large Scale Scanning
webform-scanner --input company-websites.txt --concurrency 20 --cmp --dynamic3. GDPR Compliance Audit
webform-scanner --input eu-websites.txt --cmp --out gdpr-audit.csv4. Collector Pattern Detection
# Detect links to specific forms
webform-scanner --input urls.txt --collectors "forms.example.com/*,yourbrand.formstack.com/*"
# Combine with form detection and CMP
webform-scanner --input urls.txt --collectors "*.formstack.com/forms/*" --cmp --dynamic
# Multiple patterns for comprehensive analysis
webform-scanner --input company-sites.txt --collectors "contact.example.com/*,forms.example.com/*,apply.example.com/*"5. Development Testing
npm run scan:static # Uses examples/urls.sample.txt
npm run scan:full # Dynamic scanning with examples🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development
npm run start # Show help
npm run scan:static # Test static scanning
npm run scan:full # Test dynamic scanning📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔒 Security
Please see our Security Policy for responsible disclosure practices.
Responsible Use
- Only scan websites you own or have permission to test
- Respect robots.txt and website terms of service
- Use for legitimate privacy compliance auditing only
🗺️ Roadmap
- [ ] Additional CMP platform support
- [ ] Advanced filtering options
- [ ] Web interface
- [ ] API endpoints
- [ ] Docker containerization
- [ ] Integration with popular CI/CD platforms
📞 Support
- 📧 Email: [email protected]
- 🐛 Issues: GitHub Issues
- 📖 Documentation: User Guide | User Guide (TR)
Built with ❤️ by c3nk.com
Webform Privacy Consent Scanner (Türkçe)
Google Forms, HubSpot Forms, Microsoft Forms tespit eden gelişmiş web form tarayıcı. Cookiebot, OneTrust, Efilli dahil kapsamlı CMP tespiti ve GDPR uyumluluk denetimi.
🚀 Hızlı Başlangıç
# Kurulum
npm install -g webform-privacy-consent-scanner
# Temel tarama
webform-scanner --input urls.txt --cmp
# Tam tarama (dinamik, varsayılan bekleme: 6000ms)
webform-scanner --input urls.txt --dynamic --cmp✨ Özellikler
- 🔍 Çoklu Platform Form Tespiti: Google Forms, HubSpot Forms, Microsoft Forms
- 🍪 Kapsamlı CMP Tespiti: Cookiebot, OneTrust, Efilli, GTM, Genel GDPR
- 🌐 Gelişmiş Tarama: Statik HTML + isteğe bağlı Playwright dinamik render
- 🔄 Akıllı Fallback: Engellenen istekler için otomatik curl fallback
- 📊 Çoklu Çıktı: CSV, JSON, filtrelenmiş metin raporları
- ⚡ Yüksek Performans: Yapılandırılabilir limitlerle eşzamanlı tarama
- 🔒 Gizlilik Odaklı: robots.txt'e saygı, etik tarama uygulamaları
📦 Kurulum
# Global kurulum
npm install -g webform-privacy-consent-scanner
# Yerel geliştirme
git clone https://github.com/c3nk/webform-privacy-consent-scanner.git
cd webform-privacy-consent-scanner
npm install🎯 Kullanım
Temel Tarama
# URL'leri CMP tespiti ile tara
webform-scanner --input urls.txt --cmpGelişmiş Seçenekler
# Dinamik tarama
webform-scanner --input urls.txt --dynamic --wait 10000 --cmp
# Yüksek eşzamanlılık
webform-scanner --input buyuk-liste.txt --concurrency 16 --timeout 20000Sonuçları Filtreleme
# Google formlarını filtrele
node filter.mjs --attr is_google_form --value true
# CMP sağlayıcısına göre filtrele
node filter.mjs --attr cmp_vendor --value Cookiebot🎪 Desteklenen Platformlar
Form Türleri
- Google Forms: Doğrudan URL'ler, gömülü iframe'ler, form eylemleri
- HubSpot Forms: Script tespiti, API uç noktaları, satır içi JavaScript
- Microsoft Forms: Yanıt sayfaları, kısa URL'ler, Office UI framework
CMP Platformları
- Cookiebot: Popüler AB CMP çözümü
- OneTrust: Kurumsal düzeyde onay yönetimi
- Efilli: Türk CMP platformu
- Google Tag Manager: GTM üzerinden yüklenen CMP tespiti
- Genel: Standart GDPR/çerez onay bannerları
🤝 Katkıda Bulunma
Katkılarınızı bekliyoruz! Detaylar için Katkıda Bulunma Kılavuzu'na bakın.
📄 Lisans
Bu proje MIT Lisansı altında lisanslanmıştır - detaylar için LICENSE dosyasına bakın.
🔒 Güvenlik
Sorumlu açıklama uygulamaları için Güvenlik Politikası'mıza bakın.
Sorumlu Kullanım:
- Sadece sahip olduğunuz veya test izniniz olan web sitelerini tarayın
- robots.txt'e ve web sitesi kullanım koşullarına saygı gösterin
- Sadece meşru gizlilik uyumluluk denetimi için kullanın
❤️ ile yapıldı: c3nk.com
