ai-warden

v1.1.1

Published

3 months ago

AI security scanner - Detect prompt injection attacks and PII with user settings

Downloads

226

🛡️ AI-Warden

Production-ready AI security scanner for Node.js and Python. Detect prompt injection attacks and PII leaks with dual-mode operation.

🎯 Two Modes, One Package

AI-Warden works in two modes to fit your needs:

🆓 Offline Mode (Free Forever)

Fast local pattern matching. No API key required. Perfect for:

CI/CD pipelines and pre-commit hooks
Privacy-sensitive applications (no data leaves your server)
Quick local validation
Testing and development

🚀 API Mode (Subscription)

Full Aegis 3-layer cascade protection via our API. Includes:

Self-learning Vector DB (958+ attack patterns, growing daily)
ML-powered semantic detection
LLM validation for zero-day threats
User-configurable settings
Real-time pattern updates
PII masking preferences

Get your API key: ai-warden.io/signup
Free tier: 5,000 validations/month (no credit card required)

📦 Installation

npm install ai-warden

🚀 Quick Start

Offline Mode (Free)

No signup required. Works completely offline with local pattern matching.

const AIWarden = require('ai-warden');

// No API key = Offline mode
const scanner = new AIWarden();

// Fast local validation (<1ms)
const result = scanner.scan('Ignore all previous instructions');

console.log(result.safe);       // false
console.log(result.riskScore);  // 85
console.log(result.patterns);   // ['instruction_override']

What you get in offline mode:

✅ 100+ prompt injection patterns
✅ 34+ PII detection patterns (email, SSN, credit cards, IBAN, IP)
✅ Risk scoring (0-1000)
✅ Pattern categorization
✅ Works completely offline
✅ <1ms response time
✅ Zero cost

API Mode (Subscription)

Get full Aegis cascade protection with Vector DB, ML, and LLM validation.

const AIWarden = require('ai-warden');

// With API key = API mode
const warden = new AIWarden(process.env.AI_WARDEN_API_KEY);

// Full Aegis cascade validation
const result = await warden.validate('Ignore all previous instructions');

console.log(result.blocked);    // true
console.log(result.layer);      // 'vector_db'
console.log(result.confidence); // 0.95
console.log(result.layer_name); // 'PERIMETER DEFENSE'

What you get in API mode:

✅ All offline features PLUS:
✅ Self-learning Vector DB (semantic similarity)
✅ ML-powered detection (ProtectAI deberta model)
✅ LLM validation (Azure OpenAI gpt-4o-mini)
✅ User settings (custom whitelist, masking preferences)
✅ Real-time pattern updates
✅ Auto-capture of new attack variants
✅ 95% of requests complete in <1ms (Vector DB)

Pricing:

FREE: 5,000 validations/month
STARTER: €19/month (50K validations)
GROWTH: €89/month (500K validations)
ENTERPRISE: €599/month (unlimited)

View full pricing

📚 Usage Examples

Offline Mode (scan)

const AIWarden = require('ai-warden');
const scanner = new AIWarden();

// Basic scan
const result = scanner.scan('User input text');

if (!result.safe) {
  console.log('⚠️ Threat detected');
  console.log('Risk score:', result.riskScore);
  console.log('Patterns:', result.patterns);
  console.log('Severity:', result.severity); // 'LOW', 'MEDIUM', 'HIGH', 'CRITICAL'
}

// With options
const strictResult = scanner.scan('Text to check', {
  mode: 'strict',     // 'strict' | 'balanced' | 'permissive'
  threshold: 75,      // Custom risk threshold
  verbose: true       // Detailed output
});

API Mode (validate)

const AIWarden = require('ai-warden');
const warden = new AIWarden(process.env.AI_WARDEN_API_KEY);

try {
  // Full Aegis cascade validation
  const result = await warden.validate('User input text');
  
  if (result.blocked) {
    return res.status(400).json({ 
      error: 'Input rejected by security scanner',
      reason: result.reason
    });
  }
  
  // Process safe input (use cleanText if PII masking enabled)
  processUserInput(result.cleanText || result.text);
  
} catch (error) {
  if (error.message.includes('API key required')) {
    console.error('Please sign up at https://ai-warden.io/signup');
  } else if (error.message.includes('API unavailable')) {
    // Fallback to offline mode
    const result = scanner.scan('User input text');
  }
}

Hybrid Approach (Best Practice)

Combine both modes for optimal performance and cost:

const AIWarden = require('ai-warden');
const scanner = new AIWarden();
const warden = new AIWarden(process.env.AI_WARDEN_API_KEY);

async function validateInput(text) {
  // Step 1: Fast local pre-filter (offline, free)
  const quickCheck = scanner.scan(text);
  
  if (quickCheck.riskScore > 200) {
    // Obviously malicious, reject immediately
    return { blocked: true, reason: 'High-risk patterns detected' };
  }
  
  if (quickCheck.riskScore < 50) {
    // Obviously safe, accept immediately
    return { blocked: false, text };
  }
  
  // Step 2: Borderline case - send to API for deep analysis
  const deepCheck = await warden.validate(text);
  
  return deepCheck;
}

// This approach saves API calls while maintaining security
const result = await validateInput(userInput);

PII Detection & Handling (3 Modes)

AI-Warden v1.0.3+ includes powerful PII detection with 3 handling modes:

ignore - Detect PII but don't modify text (just report findings)
mask - Replace PII with labeled placeholders ([EMAIL], [SSN], etc.)
remove - Remove PII completely from text

const { PIIDetector, PII_MODES } = require('ai-warden/src/pii');

const text = 'Contact: [email protected], SSN: 123-45-6789, Card: 5425-2334-3010-9903';

// Mode 1: IGNORE (detect only, don't modify)
const detector1 = new PIIDetector({ mode: PII_MODES.IGNORE });
const result1 = detector1.detect(text);
console.log(result1.hasPII);       // true
console.log(result1.count);        // 3
console.log(result1.findings);     // Array of detected PII
console.log(result1.modified);     // Original text (unchanged)

// Mode 2: MASK (replace with labels)
const detector2 = new PIIDetector({ mode: PII_MODES.MASK });
const result2 = detector2.detect(text);
console.log(result2.modified);
// "Contact: [EMAIL], SSN: [SSN], Card: [CREDIT_CARD]"

// Mode 3: REMOVE (delete PII completely)
const detector3 = new PIIDetector({ mode: PII_MODES.REMOVE });
const result3 = detector3.detect(text);
console.log(result3.modified);
// "Contact: , SSN: , Card: "

// Convenience methods (use default mode from config)
const detector = new PIIDetector();
console.log(detector.hasPII(text));     // true
console.log(detector.maskPII(text));    // Quick mask
console.log(detector.removePII(text));  // Quick remove

Supported PII types:

Credit Cards (Visa, Mastercard, Amex, Discover) - with Luhn validation
US SSN (Social Security Numbers)
Emails (RFC 5322 compliant)
Phone numbers (US + international formats)
IPv4/IPv6 addresses
Swedish Personnummer, Norwegian Fødselsnummer, Danish CPR, Finnish Henkilötunnus
IBAN (European bank accounts)
US Passports & Driver Licenses

🎮 CLI Usage

AI-Warden includes a command-line tool for file, directory, and skill repo scanning.

# Install globally
npm install -g ai-warden

🆓 Offline Commands (Free — no API key needed)

These run entirely on your machine. No data leaves your system.

# Scan local files and directories
aiwarden scan file.txt
aiwarden scan ./src
aiwarden scan ./src --mode strict --verbose
aiwarden scan ./src --interactive

# Scan remote skill repos (local pattern matching)
aiwarden scan-skill https://github.com/user/skill --offline
aiwarden scan-skill https://github.com/user/skill --offline --json
aiwarden scan-skill https://github.com/user/skill --offline --strict

🔑 API Key Setup (required for API-powered features)

# Login (opens browser, saves key locally)
aiwarden login

# Check your current key and tier
aiwarden whoami

# Or set key manually
export AI_WARDEN_API_KEY=sk_live_xxx

Get your API key: ai-warden.io/signup (free tier: 5,000 validations/month)

🚀 API-Powered Commands (requires API key)

Full Judge Mars ML analysis — near-zero false positives, deeper detection.

# Validate text input
aiwarden validate "User input text"
aiwarden validate "Text to check" --json

# Scan skill repos with ML engine
aiwarden scan-skill https://github.com/user/skill
aiwarden scan-skill https://github.com/user/skill --json --strict

CLI Options

Scan options:

--mode <strict|balanced|permissive> - Detection sensitivity
--verbose - Detailed output
--interactive - Interactive whitelist mode
--ignore-file <path> - Custom .aiwardenignore file

Scan-skill options:

--offline - Use local scanner only (free, no API key)
--json - Machine-readable JSON output
--strict - Exit code 1 unless verdict is SAFE

🔍 Skill Scanner — Example Output

🔍 AI-Warden Skill Scan
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Skill:    smart-web-search
  Source:   github.com/davidme6/smart-web-search
  Files:    4 scanned
  Mode:     offline

  LICENSE                  ✅ Safe       (0.00)
  README.md                ❌ CRITICAL   (1.00)
    ├─ P102: Data Forwarding Instructions [CRITICAL] — "Email**: [email protected]"
    └─ H003: Excessive External URLs [LOW] — "Found 11 external URLs"
  SKILL.md                 ✅ Safe       (0.19)
    └─ H003: Excessive External URLs [LOW] — "Found 20 external URLs"
  _meta.json               ✅ Safe       (0.00)

  Verdict:     ❌ DANGEROUS
  Trust Score: 0/100
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Verdicts

| Verdict | Trust Score | Meaning | |---------|-------------|---------| | ✅ SAFE | 70-100 | No threats detected | | ⚠️ WARNING | 25-69 | Suspicious patterns found, review recommended | | ❌ DANGEROUS | 0-24 | Active threats detected, do not install |

Offline vs API Mode

| | Offline (free) | API (metered) | |---|---|---| | Detection | Regex patterns | Judge Mars ML + patterns | | Speed | Instant | ~150ms/file | | False positives | Higher | Lower | | Zero-day threats | ❌ | ✅ | | Requires API key | No | Yes |

🔧 Configuration

Constructor Options

const warden = new AIWarden('sk_live_xxx', {
  apiUrl: 'https://api.ai-warden.io', // API endpoint
  mode: 'balanced',                        // Scanner mode
  threshold: 150,                          // Custom risk threshold
  verbose: false,                          // Verbose logging
  context: 'user'                          // Content context
});

Scanner Modes

| Mode | Threshold | Use Case | |------|-----------|----------| | strict | 75 | High-security apps (financial, healthcare) | | balanced | 150 | General production use (default) | | permissive | 250 | Creative AI apps, lower false positives |

API Methods

`scan(text, options)` - Offline Mode

Local pattern matching. No API key required.

scanner.scan(text, {
  mode: 'balanced',
  threshold: 150,
  verbose: false
});

Returns:

{
  safe: boolean,
  riskScore: number,          // 0-1000
  patterns: string[],         // Matched pattern names
  severity: string,           // 'SAFE', 'LOW', 'MEDIUM', 'HIGH', 'CRITICAL'
  findings: object[],         // Detailed findings
  piiFindings: object[]       // Detected PII
}

`validate(text, options)` - API Mode

Full Aegis cascade via API. Requires API key.

await warden.validate(text, {
  threatModel: 'prompt_injection',
  context: 'user'
});

Returns:

{
  safe: boolean,
  blocked: boolean,
  layer: string,              // 'vector_db' | 'pattern' | 'ml' | 'llm'
  layer_name: string,         // Human-readable layer name
  confidence: number,         // 0.0-1.0
  reason: string,             // Block reason
  cleanText: string,          // PII-masked text (if enabled)
  appliedSettings: object     // User settings applied
}

Throws: Error if no API key provided

`detectPII(text, options)` - PII Detection

Detect personally identifiable information.

scanner.detectPII(text, {
  types: ['email', 'ssn', 'credit_card']  // Optional filter
});

Returns:

{
  types: string[],            // PII types found
  findings: object[]          // Detailed findings with positions
}

`maskPII(text, findings, options)` - PII Masking

Mask detected PII in text.

scanner.maskPII(text, findings, {
  maskChar: '*',
  preserveLength: true
});

🎯 Use Cases

1. Production API Input Validation

app.post('/api/chat', async (req, res) => {
  const { message } = req.body;
  
  // Validate with API
  const result = await warden.validate(message);
  
  if (result.blocked) {
    return res.status(400).json({ 
      error: 'Message rejected',
      reason: result.reason
    });
  }
  
  // Safe to send to LLM
  const response = await openai.chat.completions.create({
    messages: [{ role: 'user', content: result.cleanText }]
  });
  
  res.json({ response: response.choices[0].message.content });
});

2. CI/CD Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit

npx aiwarden scan ./prompts --mode strict

if [ $? -ne 0 ]; then
  echo "❌ Prompt injection detected in prompts/"
  exit 1
fi

3. Privacy-First PII Scrubbing

const scanner = new AIWarden();

function sanitizeUserData(data) {
  const pii = scanner.detectPII(data);
  
  if (pii.findings.length > 0) {
    return scanner.maskPII(data, pii.findings);
  }
  
  return data;
}

// Logs safe to store
const cleanLog = sanitizeUserData(userMessage);
db.logs.insert({ message: cleanLog });

4. Real-time Chat Moderation

// Fast pre-filter with offline mode
const quickCheck = scanner.scan(message);

if (quickCheck.riskScore > 200) {
  socket.emit('message_blocked', { reason: 'Security policy' });
  return;
}

// Deep check with API (async, doesn't block user)
warden.validate(message).then(result => {
  if (result.blocked) {
    moderationQueue.add({ message, user, result });
  }
});

🔐 Supported PII Types

| Type | Examples | Validation | |------|----------|------------| | Email | [email protected] | RFC 5322 | | Phone | +1-555-123-4567 | International formats | | SSN (US) | 123-45-6789 | Checksum | | SSN (SE) | 19900101-1234 | Luhn algorithm | | Credit Card | 4532-1111-2222-3333 | Luhn algorithm | | IBAN | DE89370400440532013000 | Mod-97 checksum | | IP Address | 192.168.1.1 | IPv4 & IPv6 | | API Keys | sk_live_xxx | Common patterns |

📊 Performance

| Mode | Avg Response Time | API Calls | Cost | |------|-------------------|-----------|------| | Offline (scan) | <1ms | 0 | FREE | | API (validate) - Vector DB | 50-80ms | 1 | ~€0.001 | | API (validate) - Pattern | <1ms | 1 | ~€0.001 | | API (validate) - ML | ~400ms | 1 | ~€0.002 | | API (validate) - LLM | ~1200ms | 1 | ~€0.005 |

Aegis Cascade Intelligence:

60% of attacks caught by Vector DB (50-80ms)
35% caught by Pattern layer (<1ms)
4% require ML validation (~400ms)
1% require LLM validation (~1200ms)

Result: 95% of requests complete in <1ms!

🛡️ Security Best Practices

Never trust user input - Always validate before sending to LLMs
Use hybrid approach - Local pre-filter + API for borderline cases
Mask PII - Enable PII masking in your dashboard settings
Monitor false positives - Use interactive whitelist mode in dev
Keep patterns updated - Run npm update ai-warden regularly
Rate limit - Protect your API quota with rate limiting
Log blocked attempts - Track attack patterns in your logs

🔗 Links

Website: ai-warden.io
Dashboard: ai-warden.io/dashboard
Pricing: ai-warden.io/pricing
NPM Package: npmjs.com/package/ai-warden
GitHub: github.com/ai-warden/scanner
Support: [email protected]

📝 License

MIT License - see LICENSE file for details

🙏 Credits

Built with ❤️ by the AI-Warden team

Powered by:

ProtectAI - ML detection model
Azure OpenAI - LLM validation
FAISS - Vector similarity search

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🛡️ AI-Warden

🎯 Two Modes, One Package

🆓 Offline Mode (Free Forever)

🚀 API Mode (Subscription)

📦 Installation

🚀 Quick Start

Offline Mode (Free)

API Mode (Subscription)

📚 Usage Examples

Offline Mode (scan)

API Mode (validate)

Hybrid Approach (Best Practice)

PII Detection & Handling (3 Modes)

🎮 CLI Usage

🆓 Offline Commands (Free — no API key needed)

🔑 API Key Setup (required for API-powered features)

🚀 API-Powered Commands (requires API key)

CLI Options

🔍 Skill Scanner — Example Output

Verdicts

Offline vs API Mode

🔧 Configuration

Constructor Options

Scanner Modes

API Methods

scan(text, options) - Offline Mode

validate(text, options) - API Mode

detectPII(text, options) - PII Detection

maskPII(text, findings, options) - PII Masking

🎯 Use Cases

1. Production API Input Validation

2. CI/CD Pre-commit Hook

3. Privacy-First PII Scrubbing

4. Real-time Chat Moderation

🔐 Supported PII Types

📊 Performance

🛡️ Security Best Practices

🔗 Links

📝 License

🙏 Credits

`scan(text, options)` - Offline Mode

`validate(text, options)` - API Mode

`detectPII(text, options)` - PII Detection

`maskPII(text, findings, options)` - PII Masking