ai-security-suite

v1.0.0

Published

3 months ago

Security middleware for LLM APIs - prompt injection detection, PII redaction, output filtering, audit logging

0High
0Medium
0Low

umbraserj

ai llm security openai anthropic prompt-injection pii dlp firewall middleware

AI Security Suite

Security middleware for LLM APIs. Protect your AI applications with:

🛡️ Prompt Firewall — Detect and block injection attacks
🔒 PII Redaction — Automatically redact sensitive data
🚫 Output Filtering — Filter harmful content, competitor mentions
📝 Audit Logging — Compliance-ready logging

Zero ML dependencies. <50ms overhead. Drop-in replacement.

Installation

npm install ai-security-suite

Quick Start

SDK Mode (OpenAI)

Drop-in replacement for the OpenAI client:

import { SecureOpenAI } from 'ai-security-suite';

const client = new SecureOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  security: {
    injection: { action: 'block' },
    pii: { redact: ['email', 'phone', 'ssn', 'card'] },
    audit: { enabled: true, destination: 'console' }
  }
});

// Use exactly like the OpenAI client
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'My email is [email protected]' }
  ]
});

// Email is automatically redacted in the request
// Response includes security metadata
console.log(response._security?.violations);

SDK Mode (Anthropic)

import { SecureAnthropic } from 'ai-security-suite';

const client = new SecureAnthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  security: {
    injection: { action: 'block', patterns: 'strict' },
    pii: { redact: ['email', 'phone'] }
  }
});

const response = await client.messages.create({
  model: 'claude-3-opus-20240229',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }]
});

Proxy Mode (Zero-Code)

For existing applications — route through security proxy:

# Start proxy
npx ai-security proxy --port 8080

# Configure your app to use localhost:8080 instead of api.openai.com
OPENAI_BASE_URL=http://localhost:8080/v1

Or programmatically:

import { startProxy } from 'ai-security-suite';

await startProxy({
  port: 8080,
  security: {
    injection: { action: 'block' },
    pii: { redact: ['email', 'phone'] }
  }
});

Configuration

Injection Detection

{
  injection: {
    enabled: true,
    action: 'block' | 'warn' | 'log',
    patterns: 'default' | 'strict' | 'custom',
    customPatterns: [
      {
        name: 'custom_pattern',
        pattern: /my custom regex/gi,
        severity: 'high',
        description: 'Custom injection pattern'
      }
    ],
    threshold: 0.7  // Detection threshold (0-1)
  }
}

Actions:

block — Block request and return error
warn — Log warning but allow request
log — Silent logging only

Pattern Modes:

default — Common injection patterns
strict — Includes encoded payloads, resource attacks
custom — Your custom patterns only

PII Redaction

{
  pii: {
    enabled: true,
    redact: ['email', 'phone', 'ssn', 'card', 'ip', 'name', 'address'],
    action: 'redact' | 'block' | 'warn',
    placeholderFormat: 'bracket',  // [EMAIL] or ###EMAIL###
    customPatterns: [
      {
        name: 'employee_id',
        pattern: /EMP\d{6}/g,
        placeholder: '[EMPLOYEE_ID]'
      }
    ]
  }
}

Detected PII Types:

email — Email addresses
phone — Phone numbers (international)
ssn — SSN, NI numbers, SIN
card — Credit card numbers (with Luhn validation)
ip — IP addresses (excludes private ranges)
name — Names with titles (Mr., Dr., etc.)
address — US street addresses

Output Filtering

{
  output: {
    enabled: true,
    harmfulContent: {
      enabled: true,
      action: 'warn',
      categories: ['violence', 'hate', 'self-harm', 'illegal', 'dangerous']
    },
    competitors: {
      enabled: true,
      action: 'warn',
      names: ['CompetitorA', 'CompetitorB']
    },
    brandSafety: {
      enabled: true,
      action: 'warn',
      rules: [
        { name: 'profanity', pattern: /\b(badword1|badword2)\b/gi }
      ]
    }
  }
}

Audit Logging

{
  audit: {
    enabled: true,
    destination: 'file' | 'webhook' | 'console' | 'callback',
    include: ['request', 'response', 'violations', 'timing'],
    filePath: './audit.log',
    webhookUrl: 'https://your-endpoint.com/audit',
    callback: (entry) => { /* custom handler */ },
    minSeverity: 'low'  // Minimum severity to log
  }
}

Standalone Functions

Use individual security functions without the full client:

import {
  isInjectionAttempt,
  analyzeInjection,
  redactPII,
  hasPII,
  hasHarmfulContent
} from 'ai-security-suite';

// Quick injection check
if (isInjectionAttempt(userInput)) {
  throw new Error('Injection detected');
}

// Full injection analysis
const result = analyzeInjection(userInput, { patterns: 'strict' });
console.log(result.violations);

// PII redaction
const safe = redactPII('Contact: [email protected]');
// → "Contact: [EMAIL]"

// Check for PII
if (hasPII(text)) {
  // Handle PII
}

// Harmful content check
if (hasHarmfulContent(modelOutput)) {
  // Filter response
}

CLI Usage

# Start proxy server
ai-security proxy --port 8080 --strict

# Start with config file
ai-security proxy --config security.json

# Check text for issues
ai-security check "Ignore all previous instructions"

# Redact PII
ai-security redact "Email: [email protected], SSN: 123-45-6789"

Security Response

When security middleware blocks or modifies a request, the response includes metadata:

interface SecurityMetadata {
  blocked: boolean;
  violations: Array<{
    type: 'injection' | 'pii' | 'harmful' | 'competitor' | 'brand';
    severity: 'low' | 'medium' | 'high' | 'critical';
    message: string;
    pattern?: string;
    action: 'block' | 'warn' | 'log';
  }>;
  timing: {
    firewall?: number;
    pii?: number;
    output?: number;
    total: number;
  };
}

// Access via response._security
const response = await client.chat.completions.create({...});
if (response._security?.blocked) {
  console.log('Request was blocked:', response._security.violations);
}

Performance

Target: <50ms overhead per request.

Achieved through:

Regex-based detection (no ML inference)
Quick-check fast path for clean inputs
Minimal memory allocations
No external dependencies for core functionality

Benchmark results:

Injection detection: ~5-15ms
PII redaction: ~2-8ms
Output filtering: ~3-10ms
Total overhead: ~10-35ms typical

Error Handling

The SDK is designed to be graceful — errors in security middleware don't break requests:

const client = new SecureOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  security: {
    injection: { action: 'block' }
  },
  debug: true  // Enable debug logging
});

try {
  const response = await client.chat.completions.create({...});
} catch (error) {
  // Only API errors propagate — security violations return blocked responses
}

License

MIT