sentinelseed

v1.0.0

Published

6 months ago

AI safety guardrails using Sentinel alignment seeds. Add safety to any LLM with one line of code.

Downloads

0High
0Medium
0Low

sentinel-seed

ai safety alignment llm guardrails sentinel openai anthropic gpt claude system-prompt jailbreak red-teaming

sentinelseed

Add AI safety to any LLM with one line of code.

Installation

npm install sentinelseed

Quick Start

import { SentinelGuard } from 'sentinelseed';

// Create a guard with default settings (v2/standard)
const guard = new SentinelGuard();

// Wrap your messages with the safety seed
const messages = guard.wrapMessages([
  { role: 'user', content: 'Hello, how can you help me?' }
]);

// Use with OpenAI
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: messages
});

Features

Zero dependencies - Works with any LLM provider
TypeScript support - Full type definitions included
Multiple seed versions - Choose the right balance of safety vs latency
THSP Protocol - Four-gate validation (Truth, Harm, Scope, Purpose)
Heuristic analysis - Basic safety checking without API calls

Seed Versions

| Version | Variant | Tokens | Best For | |---------|---------|--------|----------| | v2 | minimal | ~350 | Chatbots, low latency | | v2 | standard | ~1,000 | General use (recommended) | | v2 | full | ~2,000 | Maximum safety | | v1 | minimal | ~500 | Legacy support | | v1 | standard | ~1,200 | Legacy support | | v1 | full | ~4,700 | Legacy support |

v1 vs v2

v1 (THS): Three gates - Truth, Harm, Scope
v2 (THSP): Four gates - adds Purpose gate (requires actions to serve legitimate benefit)

Usage Examples

Basic Usage

import { SentinelGuard, getSeed } from 'sentinelseed';

// Option 1: Use the guard class
const guard = new SentinelGuard({ version: 'v2', variant: 'standard' });
const messages = guard.wrapMessages([
  { role: 'user', content: 'Help me with something' }
]);

// Option 2: Get seed directly
const seed = getSeed('v2', 'standard');
const messages = [
  { role: 'system', content: seed },
  { role: 'user', content: 'Help me with something' }
];

With OpenAI

import OpenAI from 'openai';
import { SentinelGuard } from 'sentinelseed';

const openai = new OpenAI();
const guard = new SentinelGuard();

async function chat(userMessage: string) {
  const messages = guard.wrapMessages([
    { role: 'user', content: userMessage }
  ]);

  return openai.chat.completions.create({
    model: 'gpt-4',
    messages
  });
}

With Anthropic

import Anthropic from '@anthropic-ai/sdk';
import { SentinelGuard } from 'sentinelseed';

const anthropic = new Anthropic();
const guard = new SentinelGuard();

async function chat(userMessage: string) {
  const seed = guard.getSeed();

  return anthropic.messages.create({
    model: 'claude-3-opus-20240229',
    system: seed,
    messages: [{ role: 'user', content: userMessage }]
  });
}

Analyze Content

const guard = new SentinelGuard();

// Check if content is safe
const analysis = guard.analyze('How do I hack a computer?');
console.log(analysis);
// {
//   safe: false,
//   gates: { truth: 'pass', harm: 'fail', scope: 'pass', purpose: 'unknown' },
//   issues: ['Potential harm detected'],
//   confidence: 0.85
// }

// Quick check
if (!guard.isSafe(userInput)) {
  console.log('Potentially unsafe content detected');
}

Custom Seed

const guard = new SentinelGuard({
  customSeed: 'Your custom system prompt here...'
});

API Reference

`SentinelGuard`

class SentinelGuard {
  constructor(config?: SentinelConfig);

  getSeed(): string;
  getMetadata(): { version, variant, tokens, protocol };
  wrapMessages(messages: Message[], options?): Message[];
  analyze(content: string): THSPAnalysis;
  isSafe(content: string): boolean;
}

`SentinelConfig`

interface SentinelConfig {
  version?: 'v1' | 'v2';      // Default: 'v2'
  variant?: 'minimal' | 'standard' | 'full';  // Default: 'standard'
  customSeed?: string;         // Override with custom seed
}

Helper Functions

// Create a guard
const guard = createGuard({ version: 'v2', variant: 'standard' });

// Get seed directly
const seed = getSeed('v2', 'standard');

// Access seeds object
import { SEEDS } from 'sentinelseed';
console.log(SEEDS.v2_standard);

Benchmark Results

Tested across 6 models with 97.6% average safety rate:

| Benchmark | Safety Rate | |-----------|-------------| | HarmBench | 96.7% | | JailbreakBench | 97% | | SafeAgentBench | 97.3% | | BadRobot | 99.3% |

License

MIT License - Sentinel Team

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

sentinelseed

Installation

Quick Start

Features

Seed Versions

v1 vs v2

Usage Examples

Basic Usage

With OpenAI

With Anthropic

Analyze Content

Custom Seed

API Reference

SentinelGuard

SentinelConfig

Helper Functions

Benchmark Results

Links

License

`SentinelGuard`

`SentinelConfig`