mantis-redteam
v0.2.9
Published
Open-source CLI toolkit for automated red-teaming of LLM-powered applications
Maintainers
Readme
🔒 Mantis
Open-source CLI toolkit for automated red-teaming of LLM-powered applications
Systematically probe AI applications for prompt injection, data leakage, hallucination, and agent exploitation vulnerabilities — before attackers do.
Quick Start · Attack Modules · CI/CD Integration · Architecture · Contributing
Why Mantis?
LLM-powered applications introduce a fundamentally new class of vulnerabilities that traditional security scanners cannot detect. Prompt injection, data leakage through hidden system prompts, hallucinated URLs, and agent exploitation all require purpose-built tooling.
Mantis is that tooling — a modular, extensible CLI framework that automates AI security testing the same way traditional DAST tools automate web application testing.
What It Finds
| Category | What Mantis Tests | Plugins | Attacks | |----------|-------------------|---------|---------| | 🔴 Prompt Injection | System prompt overrides, jailbreaks, role confusion, instruction extraction | 4 | 20 | | 🟠 Data Leakage | Hidden prompt exposure, secret retrieval, PII extraction, memory exfiltration | 4 | 16 | | 🟡 Hallucination | Fabricated URLs, nonexistent entities, citation failures, confidence mismatches | 4 | 15 | | 🟣 Tool/Agent Exploitation | Command injection, file system access, network exploitation, privilege escalation | 4 | 16 | | | Total | 16 | 67 |
Key Capabilities
- 67 attack prompts across 16 plugins — covering the most critical AI vulnerability classes
- ALVSS scoring — purpose-built CVSS-inspired risk model for AI vulnerabilities (Exploitability, Impact, Data Sensitivity, Reproducibility, Model Compliance)
- OWASP LLM Top 10 — every plugin maps to the 2025 OWASP Top 10 for LLM Applications
- CI/CD native — exit code gates, SARIF output for GitHub Security tab, Jenkins/GitLab compatible
- Extensible — write a custom attack plugin in ~15 lines of TypeScript
🚀 Quick Start
Install
# npm (recommended)
npm install -g mantis-redteam
# Or run without installing
npx mantis-redteam scan --target https://your-ai-app.com/api/chat
# Or use Docker
docker pull ghcr.io/farhanashrafdev/mantis:latestScan
# Basic scan with table output
mantis scan --target https://your-ai-app.com/api/chat
# JSON output for automation
mantis scan --target https://your-ai-app.com/api/chat --format json
# SARIF output for GitHub Security tab
mantis scan --target https://your-ai-app.com/api/chat --format sarif --output results.sarifDocker
docker run --rm ghcr.io/farhanashrafdev/mantis:latest \
scan --target https://your-ai-app.com/api/chat --format jsonConfiguration File
For advanced setups, create a mantis.config.yaml:
version: "1.0"
target:
url: https://your-ai-app.com/api/chat
method: POST
headers:
Content-Type: application/json
promptField: messages[-1].content
responseField: choices[0].message.content
authToken: ${MANTIS_AUTH_TOKEN}
modules:
include: [] # empty = all plugins
exclude: []
scan:
timeoutMs: 30000
maxRetries: 2
rateLimit: 10
severityThreshold: low
output:
format: table
verbose: false
redactResponses: truemantis scan --config mantis.config.yaml🔗 CI/CD Integration
Mantis is designed to run as a quality gate in continuous integration pipelines.
GitHub Actions
name: AI Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
mantis-scan:
runs-on: ubuntu-latest
permissions:
security-events: write
steps:
- uses: actions/checkout@v4
- name: Install Mantis
run: npm install -g mantis-redteam
- name: Run AI security scan
run: |
mantis scan \
--target ${{ secrets.AI_APP_URL }} \
--format sarif \
--output results.sarif \
--severity-threshold medium
continue-on-error: true
- name: Upload to GitHub Security tab
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
- name: Fail on critical/high findings
run: |
mantis scan \
--target ${{ secrets.AI_APP_URL }} \
--severity-threshold highJenkins / GitLab CI / Any CI System
npm install -g mantis-redteam
mantis scan --target "$AI_APP_URL" --format sarif --output results.sarifExit Codes
| Code | Meaning |
|------|---------|
| 0 | Scan complete — no critical or high findings |
| 1 | Scan complete — critical or high findings detected |
| 2 | Runtime error (invalid config, network failure, etc.) |
🏗 Architecture
graph LR
CLI["CLI"] --> Engine["CoreEngine"]
Engine --> Registry["PluginRegistry"]
Registry --> PI["Prompt Injection<br/>4 plugins · 20 attacks"]
Registry --> DL["Data Leakage<br/>4 plugins · 16 attacks"]
Registry --> HL["Hallucination<br/>4 plugins · 15 attacks"]
Registry --> TE["Tool Exploit<br/>4 plugins · 16 attacks"]
Engine --> Adapter["HttpAdapter"]
Adapter --> Target["Target LLM"]
Engine --> Scoring["ALVSS Scorer"]
Engine --> Reporter["Table · JSON · SARIF"]How It Works
- CLI parses options and loads configuration from file/CLI/env vars
- CoreEngine orchestrates the scan lifecycle
- PluginRegistry auto-discovers and filters attack plugins
- Each Plugin sends attack prompts through the HttpAdapter to the target
- Responses are analyzed against known vulnerable/secure patterns
- ALVSS Scorer calculates risk scores across 5 weighted dimensions
- Reporters output results as table, JSON, or SARIF
Attack Modules (Detail)
| Plugin | Attacks | What It Tests | |--------|---------|---------------| | System Override | 5 | Direct instruction override, DAN persona, developer mode, context reset, multilingual bypass | | Jailbreak | 5 | Roleplay bypass, hypothetical scenarios, Base64 encoding, reverse psychology, academic framing | | Role Confusion | 5 | Admin impersonation, maintenance mode, authority claims, system commands, trust escalation | | Instruction Extraction | 5 | Direct extraction, reflection, debug mode, export prompts, metadata inspection |
| Plugin | Attacks | What It Tests | |--------|---------|---------------| | Hidden Prompt | 4 | Pre-conversation extraction, JSON message dump, constraint extraction, error-triggered leaks | | Secret Retrieval | 4 | API key extraction, credential probing, config dump, environment variable leaks | | PII Extraction | 4 | Training data extraction, user data probing, cross-session leaks, demographic profiling | | Memory Exfiltration | 4 | Conversation history access, stale context, cross-user data, session boundary testing |
| Plugin | Attacks | What It Tests | |--------|---------|---------------| | Fabricated URL | 4 | Fake documentation links, dead URLs in citations, phishing vector generation | | Nonexistent Entity | 4 | Fictional papers, fake APIs, imaginary specifications, fabricated expert opinions | | Citation Verification | 4 | Fake quote attribution, invented statistics, false legal citations, fabricated historical events | | Confidence Mismatch | 3 | Uncertain claims stated with authority, impossible knowledge, future event predictions |
| Plugin | Attacks | What It Tests | |--------|---------|---------------| | Command Injection | 4 | Shell command execution, code evaluation, subprocess spawning, OS interaction | | File System Access | 4 | Path traversal, file read/write, directory listing, sensitive file access | | Network Access | 4 | SSRF probing, DNS exfiltration, outbound connections, internal network scanning | | Privilege Escalation | 4 | Admin function access, permission bypass, role elevation, capability override |
📊 Risk Scoring (ALVSS)
Mantis uses ALVSS (AI LLM Vulnerability Scoring System) — a CVSS-inspired scoring model purpose-built for AI applications:
| Dimension | Weight | What It Measures | |-----------|--------|------------------| | Exploitability | 30% | How easy is the vulnerability to exploit? | | Impact | 25% | What is the potential damage? | | Data Sensitivity | 20% | How sensitive is the exposed data? | | Reproducibility | 15% | Can the attack be reliably repeated? | | Model Compliance | 10% | How much does the model deviate from expected behavior? |
Severity mapping: Critical (≥9.0) → High (≥7.0) → Medium (≥4.0) → Low (<4.0) → Info
📁 Output Formats
| Format | Use Case | Flag |
|--------|----------|------|
| Table | Interactive terminal use, human review | --format table |
| JSON | CI/CD pipelines, programmatic consumption, API integration | --format json |
| SARIF | GitHub Security tab, Azure DevOps, VS Code SARIF Viewer | --format sarif |
🗺 Roadmap
| Phase | Scope | Status | |-------|-------|--------| | Phase 1 | Core engine, 16 plugins (67 attacks), CLI, JSON/Table/SARIF reports, ALVSS scoring, config system, Docker, CI/CD workflows | ✅ Complete | | Phase 2 | Plugin marketplace, multi-model adapters, advanced rate limiting, scan replay, historical comparison | 📋 Planned | | Phase 3 | Attack chaining, AI-assisted mutation, campaign mode, web dashboard, team collaboration | 📋 Planned |
🤝 Contributing
We welcome contributions! The easiest way to get started is by writing attack plugins — it takes ~15 lines of TypeScript.
See CONTRIBUTING.md for setup instructions, code standards, and PR guidelines.
src/plugins/
├── prompt-injection/ # 4 plugins
├── data-leakage/ # 4 plugins
├── hallucination/ # 4 plugins
└── tool-exploit/ # 4 pluginsQuick plugin template:
import { BasePlugin } from '../base-plugin.js';
import { AttackCategory, SeverityLevel, type PluginMeta, type AttackPrompt } from '../../types/types.js';
class MyPlugin extends BasePlugin {
meta: PluginMeta = {
id: 'category/my-plugin',
name: 'My Attack Plugin',
description: 'Tests for a specific vulnerability',
category: AttackCategory.PromptInjection,
version: '1.0.0',
author: 'your-name',
tags: ['my-tag'],
owaspLLM: 'LLM01: Prompt Injection',
};
prompts: AttackPrompt[] = [
{
id: 'my-attack-1',
prompt: 'Your attack prompt here',
description: 'What this tests',
securePatterns: [/I cannot/i],
vulnerablePatterns: [/here is the secret/i],
severity: SeverityLevel.High,
},
];
protected getRemediation(): string {
return 'How to fix this vulnerability';
}
protected getCWE(): string {
return 'CWE-XXX';
}
}
export default new MyPlugin();🔐 Security
For reporting security vulnerabilities in Mantis itself, see SECURITY.md.
⚠️ Responsible Use: Mantis is a security testing tool. Always ensure you have explicit written authorization before scanning any application. Unauthorized security testing is illegal and unethical.
📄 License
Apache 2.0 — see LICENSE for details.
Built for the security community, by the security community.
npm · Docker · Issues · Contributing
