shield-llm
v0.5.1
Published
AI chatbot security scanner — automated red teaming for LLMs
Downloads
1,422
Maintainers
Readme
shield-llm
Automated red teaming for AI chatbots — from your terminal.
shield-llm scans a chatbot (any HTTP endpoint, or an LLM provider directly)
against a catalogue of prompt-injection, jailbreak, data-extraction, excessive-
agency, RAG, supply-chain and multi-turn attacks mapped to the OWASP LLM Top
10 (2025). It produces a security score, a letter grade, and reports in JSON,
Markdown, SARIF, or PDF — ready for CI/CD.
How it works (thin client)
The CLI is a thin client. It authenticates against a Shield LLM backend, pulls the plan-filtered attack catalogue, sends each attack prompt to your target chatbot, and posts the responses back to the backend, which runs an LLM judge and scores them. The CLI itself contains no attack payloads and no scoring logic — so adding new attacks server-side reaches every install instantly, with no upgrade.
A backend connection is required. Two modes, same binary:
- SaaS —
https://shield-llm.com(default)- License / self-hosted — your own instance via
--api-url
Install
npm install -g shield-llm
# or run once without installing:
npx shield-llm --helpRequires Node.js >= 22.
PDF reports (--output pdf) need Puppeteer, which is not installed by default
to keep the CLI install small and dependency-light:
npm install -g puppeteer # one-time, only if you use --output pdfSet PUPPETEER_EXECUTABLE_PATH to use a system Chrome instead of the bundled one.
Quick start
# 1. Authenticate with your API key (create one in your dashboard → Settings)
shield-llm login --key sk_shield_xxxxxxxx
# self-hosted:
shield-llm login --key <license-key> --api-url https://shield.your-corp.com
# 2. Generate a config for your chatbot endpoint (interactive)
shield-llm init
# 3. Scan
shield-llm scan --preset owaspConfiguration
shield-llm init writes a shield.config.json describing your endpoint:
{
"endpoint": {
"url": "https://api.example.com/chat",
"request": { "method": "POST", "body": { "message": "{{prompt}}" } },
"response": { "field": "data.reply" }
},
"preset": "standard"
}{{prompt}}is replaced with each attack.{{history}}is available for multi-turn.response.fieldis a dot-path into the JSON response (e.g.choices[0].message.content). SSE streams are auto-detected.Auth:
bearer,api-key, oroauth2(use$ENV_VARto read secrets from the environment instead of hard-coding them).CLI flags override config values.
knownSecrets(optional canary) — list any secret your chatbot must never reveal (a password, an API key, or a unique canary token you plant in its system prompt). The judge deterministically flags any response that leaks one, even base64/rot13/encoded, catching prompt-leak / data-exfil that an LLM judge alone can miss:{ "endpoint": { "url": "https://api.example.com/chat", "request": { "method": "POST", "body": { "message": "{{prompt}}" } }, "response": { "field": "data.reply" }, "knownSecrets": ["CANARY-7f3a9b2c1d4e"] }, "preset": "standard" }Use a long, high-entropy value (min 8 chars). It only ever raises a finding to vulnerable and is ignored when unset — it never affects other tests.
Commands
| Command | Description |
| ------------------ | --------------------------------------------------------------------------- |
| scan | Run a security scan against a target chatbot |
| init | Interactive wizard → generates shield.config.json |
| test-connection | Send one benign message to check your config works (no attacks) |
| login / logout | Manage your API key credentials |
| attacks | Show the attack catalogue summary for your plan (counts, from your backend) |
| presets | List available scan presets |
| tests | List custom tests (local config and/or --remote) |
| validate | Validate a shield.config.json without scanning |
| report <scanId> | Fetch a past scan from your dashboard |
Presets
| Preset | Scope |
| ---------- | -------------------------------------------------------- |
| dev | quick smoke test |
| quick | fast triage |
| owasp | one test per OWASP LLM Top 10 category |
| standard | broad coverage |
| full | full catalogue + multi-turn (crescendo / memory) attacks |
Available presets depend on your plan; the backend filters them.
Key scan flags
--preset <name> dev | quick | owasp | standard | full
--endpoint <url> target URL (overrides config)
--system-prompt <text> give the target's system prompt for context
--crescendo include multi-turn attacks
--load-custom-tests also run your dashboard's custom tests
--custom-only run ONLY your custom tests (no preset attacks)
--eu-ai-act print an EU AI Act compliance summary
-o, --output <format> json | markdown | sarif | pdf
--output-file <path> write the report to a file
--min-score <n> fail if score < n
--fail-on-critical fail if any CRITICAL finding
--ci JSON to stdout, no spinner (logs to stderr)
--include-full-payloads keep full chatbot responses in the JSON (default: truncated)CI/CD
shield-llm scan --ci --min-score 80 --fail-on-critical \
--output sarif --output-file shield.sarifExit codes: 0 passed · 1 threshold violation · 2 config/auth error ·
3 runtime/network error. Upload the SARIF to GitHub's Security tab, or use
--write-json to emit a parseable copy alongside any other format.
License
MIT
