opencode-prompt-guard

v1.0.1

Published

13 days ago

OpenCode plugin that scans tool outputs using Meta's Llama Prompt Guard 2

0High
0Medium
0Low

johngrimes

opencode plugin prompt-guard llama security ai-safety

opencode-prompt-guard

OpenCode plugin that scans tool outputs for malicious content using Meta's Llama Prompt Guard 2 (22M parameter model).

Overview

This plugin acts as a security layer between tool execution and the LLM. Every tool response is analysed for prompt injection, jailbreak attempts, and other malicious content before it reaches the model. Detected threats are blocked and surfaced to the user as warnings.

Key features:

Local inference - runs entirely on your machine with no external API calls.
Sliding window analysis - handles outputs of any length using overlapping 512-token chunks.
Configurable - adjust sensitivity, allow-list safe tools, and control output formatting.
Fast - 22M parameter DeBERTa model with ~70 MB download and sub-100 ms inference on modern CPUs.

Installation

Add the plugin to your OpenCode configuration:

{
  "plugin": [
    [
      "opencode-prompt-guard",
      {
        "threshold": 0.5,
        "showConfidence": true
      }
    ]
  ]
}

The model will download automatically from Hugging Face on first use.

Authentication

The model meta-llama/Llama-Prompt-Guard-2-22M is gated on Hugging Face. You must authenticate to download it. Three methods are supported:

Plugin option (recommended): pass hfToken directly in opencode.json.
Environment variable: set HF_TOKEN in your shell environment.
CLI login: run huggingface-cli login to persist a token in ~/.cache/huggingface/token.

The plugin checks these sources in order and uses the first token found.

Configuration

| Option | Type | Default | Description | | ---------------- | -------- | ------- | ---------------------------------------------------------- | | threshold | number | 0.5 | Confidence threshold above which MALICIOUS is triggered. | | showConfidence | boolean | true | Include confidence score in warning message. | | allowList | string[] | [] | Tool names to skip scanning (e.g., low-risk tools). | | hfToken | string | null | Hugging Face access token for gated model download. |

How it works

Tool executes
      |
      v
tool.execute.after hook fires
      |
      v
Lazy-load classifier (first call only)
      |
      v
Tokenise full output (no truncation)
      |
      v
Split into overlapping chunks of 512 tokens (10% overlap)
      |
      v
Classify each chunk in parallel
      |
      v
Take MAXIMUM malicious score across all chunks
      |
      +---> BENIGN -----> Pass through unchanged
      |
      +---> MALICIOUS --> Block output, show warning

Blocked output

When malicious content is detected, the tool output is replaced with:

⚠️ SECURITY ALERT: Potentially malicious content detected in tool output.

Tool: <tool_name>
Classification: MALICIOUS
Confidence: <score>

The tool output has been blocked to prevent harmful instructions from
reaching the LLM. If you believe this is a false positive, you can:
- Disable the prompt-guard plugin temporarily
- Adjust the confidence threshold in your opencode.json configuration

Performance

First run: ~70 MB model download plus ONNX conversion (~280 MB peak).
Subsequent runs: cached pipeline loads in under 1 second.
Inference: typically 20-80 ms per 512-token chunk on a modern CPU.
Memory: peak footprint stays under 500 MB.

Development

# Install dependencies
bun install

# Run unit tests (fast, uses mocks)
bun test

# Run integration tests (slow, downloads real model, requires HF_TOKEN)
HF_TOKEN=hf_xxx bun test:integration

# Run all tests
bun test:all

# Build
bun run build

# Lint
bun run lint

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

opencode-prompt-guard

Overview

Installation

Authentication

Configuration

How it works

Blocked output

Performance

Development

Copyright