mantis-redteam

v0.2.9

Published

3 months ago

Open-source CLI toolkit for automated red-teaming of LLM-powered applications

Downloads

0High
0Medium
0Low

farhanashrafdev

ai-security llm-security red-team prompt-injection owasp cli appsec devsecops

🔒 Mantis

Open-source CLI toolkit for automated red-teaming of LLM-powered applications

Systematically probe AI applications for prompt injection, data leakage, hallucination, and agent exploitation vulnerabilities — before attackers do.

Quick Start · Attack Modules · CI/CD Integration · Architecture · Contributing

Why Mantis?

LLM-powered applications introduce a fundamentally new class of vulnerabilities that traditional security scanners cannot detect. Prompt injection, data leakage through hidden system prompts, hallucinated URLs, and agent exploitation all require purpose-built tooling.

Mantis is that tooling — a modular, extensible CLI framework that automates AI security testing the same way traditional DAST tools automate web application testing.

What It Finds

| Category | What Mantis Tests | Plugins | Attacks | |----------|-------------------|---------|---------| | 🔴 Prompt Injection | System prompt overrides, jailbreaks, role confusion, instruction extraction | 4 | 20 | | 🟠 Data Leakage | Hidden prompt exposure, secret retrieval, PII extraction, memory exfiltration | 4 | 16 | | 🟡 Hallucination | Fabricated URLs, nonexistent entities, citation failures, confidence mismatches | 4 | 15 | | 🟣 Tool/Agent Exploitation | Command injection, file system access, network exploitation, privilege escalation | 4 | 16 | | | Total | 16 | 67 |

Key Capabilities

67 attack prompts across 16 plugins — covering the most critical AI vulnerability classes
ALVSS scoring — purpose-built CVSS-inspired risk model for AI vulnerabilities (Exploitability, Impact, Data Sensitivity, Reproducibility, Model Compliance)
OWASP LLM Top 10 — every plugin maps to the 2025 OWASP Top 10 for LLM Applications
CI/CD native — exit code gates, SARIF output for GitHub Security tab, Jenkins/GitLab compatible
Extensible — write a custom attack plugin in ~15 lines of TypeScript

🚀 Quick Start

Install

# npm (recommended)
npm install -g mantis-redteam

# Or run without installing
npx mantis-redteam scan --target https://your-ai-app.com/api/chat

# Or use Docker
docker pull ghcr.io/farhanashrafdev/mantis:latest

Scan

# Basic scan with table output
mantis scan --target https://your-ai-app.com/api/chat

# JSON output for automation
mantis scan --target https://your-ai-app.com/api/chat --format json

# SARIF output for GitHub Security tab
mantis scan --target https://your-ai-app.com/api/chat --format sarif --output results.sarif

Docker

docker run --rm ghcr.io/farhanashrafdev/mantis:latest \
    scan --target https://your-ai-app.com/api/chat --format json

Configuration File

For advanced setups, create a mantis.config.yaml:

version: "1.0"

target:
  url: https://your-ai-app.com/api/chat
  method: POST
  headers:
    Content-Type: application/json
  promptField: messages[-1].content
  responseField: choices[0].message.content
  authToken: ${MANTIS_AUTH_TOKEN}

modules:
  include: []       # empty = all plugins
  exclude: []

scan:
  timeoutMs: 30000
  maxRetries: 2
  rateLimit: 10
  severityThreshold: low

output:
  format: table
  verbose: false
  redactResponses: true

mantis scan --config mantis.config.yaml

🔗 CI/CD Integration

Mantis is designed to run as a quality gate in continuous integration pipelines.

GitHub Actions

name: AI Security Scan
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  mantis-scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/checkout@v4

      - name: Install Mantis
        run: npm install -g mantis-redteam

      - name: Run AI security scan
        run: |
          mantis scan \
            --target ${{ secrets.AI_APP_URL }} \
            --format sarif \
            --output results.sarif \
            --severity-threshold medium
        continue-on-error: true

      - name: Upload to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results.sarif

      - name: Fail on critical/high findings
        run: |
          mantis scan \
            --target ${{ secrets.AI_APP_URL }} \
            --severity-threshold high

Jenkins / GitLab CI / Any CI System

npm install -g mantis-redteam
mantis scan --target "$AI_APP_URL" --format sarif --output results.sarif

Exit Codes

| Code | Meaning | |------|---------| | 0 | Scan complete — no critical or high findings | | 1 | Scan complete — critical or high findings detected | | 2 | Runtime error (invalid config, network failure, etc.) |

🏗 Architecture

graph LR
    CLI["CLI"] --> Engine["CoreEngine"]
    Engine --> Registry["PluginRegistry"]
    Registry --> PI["Prompt Injection<br/>4 plugins · 20 attacks"]
    Registry --> DL["Data Leakage<br/>4 plugins · 16 attacks"]
    Registry --> HL["Hallucination<br/>4 plugins · 15 attacks"]
    Registry --> TE["Tool Exploit<br/>4 plugins · 16 attacks"]
    Engine --> Adapter["HttpAdapter"]
    Adapter --> Target["Target LLM"]
    Engine --> Scoring["ALVSS Scorer"]
    Engine --> Reporter["Table · JSON · SARIF"]

How It Works

CLI parses options and loads configuration from file/CLI/env vars
CoreEngine orchestrates the scan lifecycle
PluginRegistry auto-discovers and filters attack plugins
Each Plugin sends attack prompts through the HttpAdapter to the target
Responses are analyzed against known vulnerable/secure patterns
ALVSS Scorer calculates risk scores across 5 weighted dimensions
Reporters output results as table, JSON, or SARIF

Attack Modules (Detail)

| Plugin | Attacks | What It Tests | |--------|---------|---------------| | System Override | 5 | Direct instruction override, DAN persona, developer mode, context reset, multilingual bypass | | Jailbreak | 5 | Roleplay bypass, hypothetical scenarios, Base64 encoding, reverse psychology, academic framing | | Role Confusion | 5 | Admin impersonation, maintenance mode, authority claims, system commands, trust escalation | | Instruction Extraction | 5 | Direct extraction, reflection, debug mode, export prompts, metadata inspection |

| Plugin | Attacks | What It Tests | |--------|---------|---------------| | Hidden Prompt | 4 | Pre-conversation extraction, JSON message dump, constraint extraction, error-triggered leaks | | Secret Retrieval | 4 | API key extraction, credential probing, config dump, environment variable leaks | | PII Extraction | 4 | Training data extraction, user data probing, cross-session leaks, demographic profiling | | Memory Exfiltration | 4 | Conversation history access, stale context, cross-user data, session boundary testing |

| Plugin | Attacks | What It Tests | |--------|---------|---------------| | Fabricated URL | 4 | Fake documentation links, dead URLs in citations, phishing vector generation | | Nonexistent Entity | 4 | Fictional papers, fake APIs, imaginary specifications, fabricated expert opinions | | Citation Verification | 4 | Fake quote attribution, invented statistics, false legal citations, fabricated historical events | | Confidence Mismatch | 3 | Uncertain claims stated with authority, impossible knowledge, future event predictions |

| Plugin | Attacks | What It Tests | |--------|---------|---------------| | Command Injection | 4 | Shell command execution, code evaluation, subprocess spawning, OS interaction | | File System Access | 4 | Path traversal, file read/write, directory listing, sensitive file access | | Network Access | 4 | SSRF probing, DNS exfiltration, outbound connections, internal network scanning | | Privilege Escalation | 4 | Admin function access, permission bypass, role elevation, capability override |

📊 Risk Scoring (ALVSS)

Mantis uses ALVSS (AI LLM Vulnerability Scoring System) — a CVSS-inspired scoring model purpose-built for AI applications:

| Dimension | Weight | What It Measures | |-----------|--------|------------------| | Exploitability | 30% | How easy is the vulnerability to exploit? | | Impact | 25% | What is the potential damage? | | Data Sensitivity | 20% | How sensitive is the exposed data? | | Reproducibility | 15% | Can the attack be reliably repeated? | | Model Compliance | 10% | How much does the model deviate from expected behavior? |

Severity mapping: Critical (≥9.0) → High (≥7.0) → Medium (≥4.0) → Low (<4.0) → Info

📁 Output Formats

| Format | Use Case | Flag | |--------|----------|------| | Table | Interactive terminal use, human review | --format table | | JSON | CI/CD pipelines, programmatic consumption, API integration | --format json | | SARIF | GitHub Security tab, Azure DevOps, VS Code SARIF Viewer | --format sarif |

🗺 Roadmap

| Phase | Scope | Status | |-------|-------|--------| | Phase 1 | Core engine, 16 plugins (67 attacks), CLI, JSON/Table/SARIF reports, ALVSS scoring, config system, Docker, CI/CD workflows | ✅ Complete | | Phase 2 | Plugin marketplace, multi-model adapters, advanced rate limiting, scan replay, historical comparison | 📋 Planned | | Phase 3 | Attack chaining, AI-assisted mutation, campaign mode, web dashboard, team collaboration | 📋 Planned |

🤝 Contributing

We welcome contributions! The easiest way to get started is by writing attack plugins — it takes ~15 lines of TypeScript.

See CONTRIBUTING.md for setup instructions, code standards, and PR guidelines.

src/plugins/
├── prompt-injection/    # 4 plugins
├── data-leakage/        # 4 plugins
├── hallucination/       # 4 plugins
└── tool-exploit/        # 4 plugins

Quick plugin template:

import { BasePlugin } from '../base-plugin.js';
import { AttackCategory, SeverityLevel, type PluginMeta, type AttackPrompt } from '../../types/types.js';

class MyPlugin extends BasePlugin {
    meta: PluginMeta = {
        id: 'category/my-plugin',
        name: 'My Attack Plugin',
        description: 'Tests for a specific vulnerability',
        category: AttackCategory.PromptInjection,
        version: '1.0.0',
        author: 'your-name',
        tags: ['my-tag'],
        owaspLLM: 'LLM01: Prompt Injection',
    };

    prompts: AttackPrompt[] = [
        {
            id: 'my-attack-1',
            prompt: 'Your attack prompt here',
            description: 'What this tests',
            securePatterns: [/I cannot/i],
            vulnerablePatterns: [/here is the secret/i],
            severity: SeverityLevel.High,
        },
    ];

    protected getRemediation(): string {
        return 'How to fix this vulnerability';
    }

    protected getCWE(): string {
        return 'CWE-XXX';
    }
}

export default new MyPlugin();

🔐 Security

For reporting security vulnerabilities in Mantis itself, see SECURITY.md.

⚠️ Responsible Use: Mantis is a security testing tool. Always ensure you have explicit written authorization before scanning any application. Unauthorized security testing is illegal and unethical.

📄 License

Apache 2.0 — see LICENSE for details.

Built for the security community, by the security community.

npm · Docker · Issues · Contributing