@artemiskit/redteam
v0.2.0
Published
Red-team adversarial security testing for ArtemisKit LLM evaluation toolkit
Maintainers
Readme
@artemiskit/redteam
Red team adversarial security testing for ArtemisKit LLM evaluation toolkit.
Installation
npm install @artemiskit/redteam
# or
bun add @artemiskit/redteamOverview
This package provides adversarial testing capabilities to identify vulnerabilities in LLM-powered applications:
- Prompt Injection - Test resistance to instruction override attacks
- Jailbreak Attempts - Test guardrail bypass techniques
- Data Extraction - Probe for system prompt and training data leakage
- Hallucination Triggers - Test factual accuracy under adversarial prompts
- PII Disclosure - Test for unauthorized personal data exposure
Usage
Most users should use the @artemiskit/cli for red team testing:
artemiskit redteam my-scenario.yaml --count 5For programmatic usage:
import { RedTeamGenerator } from '@artemiskit/redteam';
const generator = new RedTeamGenerator();
// Generate mutated versions of a prompt
const mutatedPrompts = generator.generate(basePrompt, 10);
// Each result contains:
// - original: the original prompt
// - mutated: the mutated prompt
// - mutations: array of mutation names applied
// - severity: 'low' | 'medium' | 'high' | 'critical'
// Apply a specific mutation
const mutated = generator.applyMutation(prompt, 'role-spoof');
// List available mutations
const mutations = generator.listMutations();How It Works
The red team module applies mutations to prompts to test LLM robustness:
- Takes a base prompt from your scenario
- Applies one or more mutations to create adversarial variants
- Sends mutated prompts to the LLM
- Analyzes responses for unsafe behavior
- Reports vulnerabilities with severity ratings
Mutations
The package includes mutation strategies to generate attack variants:
import {
CotInjectionMutation,
InstructionFlipMutation,
RoleSpoofMutation,
TypoMutation
} from '@artemiskit/redteam';
const mutation = new CotInjectionMutation();
const mutated = mutation.mutate(originalPrompt);Available mutations:
TypoMutation- Introduces typos to evade filtersRoleSpoofMutation- Role impersonation attacksInstructionFlipMutation- Reverses or contradicts instructionsCotInjectionMutation- Chain-of-thought injection attacks
Severity Ratings
Results are categorized by severity:
| Severity | Description |
|----------|-------------|
| critical | Complete guardrail bypass |
| high | Significant information disclosure |
| medium | Partial bypass or concerning behavior |
| low | Minor issues or edge cases |
Related Packages
@artemiskit/cli- Command-line interface@artemiskit/core- Core runtime and evaluators@artemiskit/reports- HTML report generation
License
Apache-2.0
