@vhra/guard-rails

v0.1.0

Published

2 months ago

SDK-agnostic guardrails for any LLM client. One line to add budget limits, PII redaction, and prompt injection protection.

0High
0Medium
0Low

vhra

llm guardrails ai openai anthropic gemini budget pii prompt-injection sdk-agnostic ai-safety

guard-rails

SDK-agnostic guardrails for any LLM client. One line to add budget limits, PII redaction, and prompt injection protection.

import OpenAI from 'openai'
import { wrap } from '@vhra/guard-rails'

const client = wrap(new OpenAI(), {
  budget:    { maxTokens: 50_000, window: '1d' },
  pii:       { mode: 'redact' },
  injection: true,
  nonsense:  true,
  onViolation: (e) => console.warn('[guard]', e),
})

// Use exactly like the normal client — guards run transparently
const response = await client.chat.completions.create({ ... })

Works with OpenAI, Anthropic, Google Gemini, and any SDK that follows the same { messages } / { contents } / { prompt } convention.

Install

npm install @vhra/guard-rails

No runtime dependencies.

Guards

Budget — token spend limits

wrap(client, {
  budget: {
    maxTokens: 100_000,       // hard limit
    window: '1d',             // 'session' | '1h' | '6h' | '1d' | '7d'
    onWarning: (used, limit) => console.warn(`${used}/${limit} tokens used`),
  }
})

Estimates tokens before the call, blocks if the projected total would exceed the limit.
Reconciles with actual token usage reported by the SDK after the response arrives.
Throws BudgetExceededError when the limit is hit.

PII — detect and redact personal information

wrap(client, {
  pii: {
    mode: 'redact',           // 'redact' (default) | 'block'
    types: ['email', 'ssn'],  // default: all types
    replacement: '[REDACTED]',
  }
})

Detects: email, phone, ssn, creditcard, ip.

In redact mode the PII is replaced in the prompt before it reaches the API. In block mode the whole call is rejected.

Injection — prompt injection detection

wrap(client, {
  injection: true,             // enabled by default
  // or pass extra patterns:
  injection: { patterns: [/my-custom-pattern/i] },
})

Detects: "ignore previous instructions", role-switching ("act as", "pretend to be"), system-prompt injection markers ([SYSTEM], <|im_start|>, ### instruction, etc.), jailbreak preambles (DAN, developer mode).

Text is normalised before matching — homoglyphs (Cyrillic/Greek lookalikes) and zero-width characters are stripped so evasion attempts don't bypass patterns.

Throws ViolationError when an injection attempt is detected.

Nonsense — gibberish and encoded-payload detection

wrap(client, {
  nonsense: true,              // enabled by default
  // or tune thresholds:
  nonsense: {
    entropyThreshold: 0.85,    // 0–1, higher = more permissive
    minLength: 20,
  },
})

Catches: high-entropy random strings, excessive character repetition, repeating multi-character patterns (common in base64-encoded payloads), invisible Unicode characters.

Throws ViolationError when nonsense is detected.

Custom guards

import { wrap } from '@vhra/guard-rails'
import type { Guard } from '@vhra/guard-rails'

const toxicityGuard: Guard = {
  name: 'toxicity',
  phase: 'input',
  run(ctx) {
    if (isToxic(ctx.text)) {
      throw new ViolationError({
        guard: 'toxicity',
        phase: 'input',
        action: 'block',
        reason: 'Toxic content detected',
      })
    }
    return { action: 'pass' }
  },
}

const client = wrap(new Anthropic(), { guards: [toxicityGuard] })

When you supply guards directly, the built-in guards are not added automatically — you have full control over the pipeline.

Individual guard factories

import {
  createBudgetGuard,
  createPIIGuard,
  createInjectionGuard,
  createNonsenseGuard,
} from '@vhra/guard-rails'

const guards = [
  createNonsenseGuard(),
  createInjectionGuard(),
  createPIIGuard({ mode: 'redact' }),
  createBudgetGuard({ maxTokens: 10_000 }),
]

const client = wrap(new OpenAI(), { guards })

Violation events

wrap(client, {
  onViolation(event) {
    // event.guard   — 'budget' | 'pii' | 'injection' | 'nonsense' | ...
    // event.phase   — 'input' | 'output'
    // event.action  — 'block' | 'redact'
    // event.reason  — human-readable description
    logger.warn(event)
  },
})

onViolation fires for both blocked and redacted events. Blocked calls also throw an error after the hook returns.

Error types

import { GuardRailsError, BudgetExceededError, ViolationError } from '@vhra/guard-rails'

try {
  await client.chat.completions.create({ ... })
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.error(`Over budget: ${err.used}/${err.limit} in window "${err.window}"`)
  } else if (err instanceof ViolationError) {
    console.error(`Blocked by guard "${err.event.guard}": ${err.event.reason}`)
  }
}

Default guard pipeline

| Guard | Default | Phase | |-----------|---------|--------| | nonsense | ON | input | | injection | ON | input | | pii | OFF | both | | budget | OFF | both |

PII and budget are opt-in because they either modify content (PII redaction) or require configuration (budget).

License

MIT