xswarm-ai-sanitize

v2.0.0

Published

9 days ago

Secret detection for AI agents — 600+ patterns, plugins for LangChain, LlamaIndex, Vercel AI, OpenClaw, Nanobot

0High
0Medium
0Low

ai-agent secret-detection langchain llamaindex vercel-ai openclaw nanobot security redaction sanitization credential-detection dlp xswarm

xswarm-ai-sanitize

Secret detection for AI agents — 600+ patterns, plugins for LangChain, LlamaIndex, Vercel AI, OpenClaw, Nanobot, and more.

Why This Matters

AI agents are increasingly given access to sensitive data sources: email inboxes, cloud storage, internal documents, and databases. This creates a critical security vulnerability:

User: "Search my emails for 'deployment'"
Agent: *searches Gmail*
Email contains: "Deploy with AWS_KEY=AKIAIOSFODNN7EXAMPLE"
Agent: *stores in memory/logs*
→ API key now persists in agent memory forever

xswarm-ai-sanitize sits between your AI agent and external data sources to automatically detect and redact secrets before they reach your agent's memory.

Quick Start

Interactive Setup Wizard

npx xswarm-ai-sanitize init

This launches an interactive wizard that:

Detects which AI frameworks you have installed
Shows integration options for each framework
Provides copy-paste code examples
Auto-installs plugins where supported (OpenClaw)

File/Directory Scanning

# Scan a directory for secrets
npx xswarm-ai-sanitize detect src/

# Scan with JSON output (perfect for CI/CD)
npx xswarm-ai-sanitize detect --json .

# Scan specific files
npx xswarm-ai-sanitize detect config.yml .env

Exit code 1 if secrets found (useful for pre-commit hooks and CI/CD pipelines).

Text Sanitization

# Redact secrets from piped input
cat file.txt | npx xswarm-ai-sanitize sanitize

# Or from a file
npx xswarm-ai-sanitize sanitize config.yml

# Block mode (exit 1 if secrets found)
npx xswarm-ai-sanitize sanitize --block .env

Framework Integrations

| Framework | Plugin | Status | |-----------|--------|--------| | LangChain | xswarm-ai-sanitize/plugins/langchain | ✅ Ready | | LlamaIndex | xswarm-ai-sanitize/plugins/llamaindex | ✅ Ready | | Vercel AI SDK | xswarm-ai-sanitize/plugins/vercel-ai | ✅ Ready | | OpenClaw | xswarm-ai-sanitize/plugins/openclaw | ✅ Ready | | Nanobot | xswarm-ai-sanitize/plugins/nanobot | ✅ Ready | | xSwarm | xswarm-ai-sanitize/plugins/xswarm | 🔜 Coming |

LangChain

import { createSanitizeCallback, wrapTool } from 'xswarm-ai-sanitize/plugins/langchain';

// Option 1: Use callback handler
const chain = new LLMChain({
  llm,
  prompt,
  callbacks: [createSanitizeCallback({ mode: 'sanitize' })]
});

// Option 2: Wrap individual tools
const safeTool = wrapTool(myTool, { mode: 'sanitize' });

LlamaIndex

import { createSanitizePostprocessor } from 'xswarm-ai-sanitize/plugins/llamaindex';

const queryEngine = index.asQueryEngine({
  nodePostprocessors: [createSanitizePostprocessor({ mode: 'sanitize' })]
});

Vercel AI SDK

import { sanitizeMiddleware, sanitizeTool } from 'xswarm-ai-sanitize/plugins/vercel-ai';

// Option 1: Use middleware
const result = await generateText({
  model,
  prompt,
  experimental_middleware: sanitizeMiddleware({ mode: 'sanitize' })
});

// Option 2: Wrap tools
const tools = {
  searchEmails: sanitizeTool(emailSearchTool, { mode: 'sanitize' })
};

OpenClaw

import createSanitizePlugin from 'xswarm-ai-sanitize/plugins/openclaw';

export default createSanitizePlugin({ mode: 'sanitize' });

Nanobot (MCP)

import { createSanitizeFilter } from 'xswarm-ai-sanitize/plugins/nanobot';

export default createSanitizeFilter({ mode: 'sanitize' });

CLI Commands

`detect` - Scan Files/Directories

Scan files or directories for secrets with detailed reporting.

# Scan a directory (respects .gitignore)
npx xswarm-ai-sanitize detect src/

# Scan specific files
npx xswarm-ai-sanitize detect config.yml .env secrets.txt

# JSON output for CI/CD pipelines
npx xswarm-ai-sanitize detect --json . > report.json

# Ignore .gitignore patterns
npx xswarm-ai-sanitize detect --no-gitignore .

Output Format:

src/config.js:23:15 [CRITICAL] aws_access_key_id
  ...const AWS_KEY = "AKIA...EXAMPLE"...

.env:5:12 [HIGH] database_url_postgres
  DATABASE_URL=postgres://user:pass@host/db

2 secret(s) found

JSON Output:

{
  "version": "1.0.0",
  "timestamp": "2026-02-06T12:00:00.000Z",
  "summary": {
    "totalFiles": 2,
    "totalFindings": 2,
    "criticalCount": 1,
    "highCount": 1
  },
  "results": [...]
}

Exit Codes:

0 - No secrets found (safe)
1 - Secrets detected (use in pre-commit hooks)

`sanitize` - Redact Secrets from Text

Process text from stdin or files and redact secrets.

# Pipe text through sanitizer
cat .env | npx xswarm-ai-sanitize sanitize -q

# From file
npx xswarm-ai-sanitize sanitize config.yml

# Block mode (exit 1 if secrets found, don't redact)
npx xswarm-ai-sanitize sanitize --block --secrets 1 .env

Options:

-b, --block - Block mode (exit 1 if secrets found)
-s, --secrets N - Block threshold for secret count (default: 3)
-q, --quiet - Suppress statistics

`init` - Interactive Setup Wizard

Launch the interactive wizard to integrate with AI frameworks.

npx xswarm-ai-sanitize init

Detects installed frameworks and provides integration instructions.

Node.js API

import sanitize from 'xswarm-ai-sanitize';

// BLOCK Mode - Reject content with too many secrets
const result = sanitize(emailContent, {
  mode: 'block',
  blockThreshold: {
    secrets: 3,        // Block if 3+ secrets found
    highSeverity: 1    // Always block high-severity threats
  }
});

if (result.blocked) {
  throw new Error(`Secrets detected: ${result.reason}`);
}

// SANITIZE Mode - Always clean, never block
const result = sanitize(content, { mode: 'sanitize' });
console.log(result.sanitized); // Secrets replaced with [REDACTED:type]

Detection Capabilities

Secret Patterns (600+)

| Category | Examples | Count | |----------|----------|-------| | AI/ML Providers | OpenAI, Anthropic, Hugging Face, Groq, Cohere | 25+ | | Cloud Providers | AWS, Azure, GCP, DigitalOcean, Linode, Vultr | 40+ | | Version Control | GitHub, GitLab, Bitbucket, Gitea | 25+ | | CI/CD | CircleCI, Travis, Jenkins, Buildkite, Vercel | 25+ | | Payment | Stripe, PayPal, Square, Plaid, Coinbase | 25+ | | Communication | Slack, Discord, Telegram, Twilio, SendGrid | 30+ | | Databases | MongoDB, PostgreSQL, MySQL, Redis, Supabase | 30+ | | Auth/Identity | Auth0, Okta, Clerk, Keycloak, Firebase | 20+ | | And more... | CRM, Analytics, Maps, Blockchain, IoT, etc. | 300+ |

Entropy Analysis

Detects high-randomness strings that may be secrets without known prefixes:

Shannon entropy calculation (threshold: 4.5)
Minimum length filter (16 chars)
Used as secondary validation for generic patterns

CI/CD Integration

GitHub Actions

name: Secret Detection
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Scan for secrets
        run: npx xswarm-ai-sanitize detect --json . > report.json

      - name: Upload report
        if: failure()
        uses: actions/upload-artifact@v3
        with:
          name: secret-scan-report
          path: report.json

Pre-commit Hook

#!/bin/sh
# .git/hooks/pre-commit

# Scan staged files for secrets
git diff --cached --name-only | xargs npx xswarm-ai-sanitize detect

if [ $? -ne 0 ]; then
  echo "❌ Commit blocked: secrets detected"
  exit 1
fi

GitLab CI

secret_scan:
  stage: test
  script:
    - npx xswarm-ai-sanitize detect --json . > report.json
  artifacts:
    when: on_failure
    paths:
      - report.json

Key Features

Zero Dependencies — Uses only Node.js built-ins
Fully Synchronous — No async, no Promises, no network calls
Fast — <5ms for typical documents
Privacy-First — All processing local, zero external API calls

Performance

1KB content: <1ms
10KB content: <5ms
100KB content: <50ms
Pattern compilation: one-time at module load

Installation

npm install xswarm-ai-sanitize

Testing

npm test

Migration from v1.x to v2.0

Breaking Changes:

Default behavior changed: Running npx xswarm-ai-sanitize without arguments now shows help instead of running the wizard
New command required: The wizard is now behind the init command

Migration:

| v1.x | v2.0 | |------|------| | npx xswarm-ai-sanitize (wizard) | npx xswarm-ai-sanitize init | | cat file \| npx xswarm-ai-sanitize (still works!) | cat file \| npx xswarm-ai-sanitize sanitize | | N/A | npx xswarm-ai-sanitize detect src/ (new!) |

Backward Compatibility:

Piped input without explicit sanitize command still works for compatibility
All Node.js API functions remain unchanged
All framework plugins remain unchanged

New Features in v2.0:

detect command for file/directory scanning
JSON output for CI/CD integration
Structured finding reports with file:line:column
.gitignore support for directory scanning
Binary file filtering

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

xswarm-ai-sanitize

Why This Matters

Quick Start

Interactive Setup Wizard

File/Directory Scanning

Text Sanitization

Framework Integrations

LangChain

LlamaIndex

Vercel AI SDK

OpenClaw

Nanobot (MCP)

CLI Commands

detect - Scan Files/Directories

sanitize - Redact Secrets from Text

init - Interactive Setup Wizard

Node.js API

Detection Capabilities

Secret Patterns (600+)

Entropy Analysis

CI/CD Integration

GitHub Actions

Pre-commit Hook

GitLab CI

Key Features

Performance

Installation

Testing

Migration from v1.x to v2.0

License

Links

`detect` - Scan Files/Directories

`sanitize` - Redact Secrets from Text

`init` - Interactive Setup Wizard