shield-llm

v0.5.1

Published

11 hours ago

AI chatbot security scanner — automated red teaming for LLMs

Downloads

1,422

0High
0Medium
0Low

sami-essouri

ai security scanner llm red-team owasp chatbot cli

shield-llm

Automated red teaming for AI chatbots — from your terminal.

shield-llm scans a chatbot (any HTTP endpoint, or an LLM provider directly) against a catalogue of prompt-injection, jailbreak, data-extraction, excessive- agency, RAG, supply-chain and multi-turn attacks mapped to the OWASP LLM Top 10 (2025). It produces a security score, a letter grade, and reports in JSON, Markdown, SARIF, or PDF — ready for CI/CD.

How it works (thin client)

The CLI is a thin client. It authenticates against a Shield LLM backend, pulls the plan-filtered attack catalogue, sends each attack prompt to your target chatbot, and posts the responses back to the backend, which runs an LLM judge and scores them. The CLI itself contains no attack payloads and no scoring logic — so adding new attacks server-side reaches every install instantly, with no upgrade.

A backend connection is required. Two modes, same binary:
SaaS — https://shield-llm.com (default)
License / self-hosted — your own instance via --api-url

Install

npm install -g shield-llm
# or run once without installing:
npx shield-llm --help

Requires Node.js >= 22.

PDF reports (--output pdf) need Puppeteer, which is not installed by default to keep the CLI install small and dependency-light:

npm install -g puppeteer   # one-time, only if you use --output pdf

Set PUPPETEER_EXECUTABLE_PATH to use a system Chrome instead of the bundled one.

Quick start

# 1. Authenticate with your API key (create one in your dashboard → Settings)
shield-llm login --key sk_shield_xxxxxxxx
#    self-hosted:
shield-llm login --key <license-key> --api-url https://shield.your-corp.com

# 2. Generate a config for your chatbot endpoint (interactive)
shield-llm init

# 3. Scan
shield-llm scan --preset owasp

Configuration

shield-llm init writes a shield.config.json describing your endpoint:

{
  "endpoint": {
    "url": "https://api.example.com/chat",
    "request": { "method": "POST", "body": { "message": "{{prompt}}" } },
    "response": { "field": "data.reply" }
  },
  "preset": "standard"
}

{{prompt}} is replaced with each attack. {{history}} is available for multi-turn.
response.field is a dot-path into the JSON response (e.g. choices[0].message.content). SSE streams are auto-detected.
Auth: bearer, api-key, or oauth2 (use $ENV_VAR to read secrets from the environment instead of hard-coding them).
CLI flags override config values.
knownSecrets (optional canary) — list any secret your chatbot must never reveal (a password, an API key, or a unique canary token you plant in its system prompt). The judge deterministically flags any response that leaks one, even base64/rot13/encoded, catching prompt-leak / data-exfil that an LLM judge alone can miss:
```
{
  "endpoint": {
    "url": "https://api.example.com/chat",
    "request": { "method": "POST", "body": { "message": "{{prompt}}" } },
    "response": { "field": "data.reply" },
    "knownSecrets": ["CANARY-7f3a9b2c1d4e"]
  },
  "preset": "standard"
}
```
Use a long, high-entropy value (min 8 chars). It only ever raises a finding to vulnerable and is ignored when unset — it never affects other tests.

Commands

| Command | Description | | ------------------ | --------------------------------------------------------------------------- | | scan | Run a security scan against a target chatbot | | init | Interactive wizard → generates shield.config.json | | test-connection | Send one benign message to check your config works (no attacks) | | login / logout | Manage your API key credentials | | attacks | Show the attack catalogue summary for your plan (counts, from your backend) | | presets | List available scan presets | | tests | List custom tests (local config and/or --remote) | | validate | Validate a shield.config.json without scanning | | report <scanId> | Fetch a past scan from your dashboard |

Presets

| Preset | Scope | | ---------- | -------------------------------------------------------- | | dev | quick smoke test | | quick | fast triage | | owasp | one test per OWASP LLM Top 10 category | | standard | broad coverage | | full | full catalogue + multi-turn (crescendo / memory) attacks |

Available presets depend on your plan; the backend filters them.

Key scan flags

--preset <name>            dev | quick | owasp | standard | full
--endpoint <url>           target URL (overrides config)
--system-prompt <text>     give the target's system prompt for context
--crescendo                include multi-turn attacks
--load-custom-tests        also run your dashboard's custom tests
--custom-only              run ONLY your custom tests (no preset attacks)
--eu-ai-act                print an EU AI Act compliance summary
-o, --output <format>      json | markdown | sarif | pdf
--output-file <path>       write the report to a file
--min-score <n>            fail if score < n
--fail-on-critical         fail if any CRITICAL finding
--ci                       JSON to stdout, no spinner (logs to stderr)
--include-full-payloads    keep full chatbot responses in the JSON (default: truncated)

CI/CD

shield-llm scan --ci --min-score 80 --fail-on-critical \
  --output sarif --output-file shield.sarif

Exit codes: 0 passed · 1 threshold violation · 2 config/auth error · 3 runtime/network error. Upload the SARIF to GitHub's Security tab, or use --write-json to emit a parseable copy alongside any other format.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

shield-llm

How it works (thin client)

Install

Quick start

Configuration

Commands

Presets

Key scan flags

CI/CD

License