@xshieldai/lakshmanrekha

v0.2.1

Published

2 months ago

LLM endpoint probe suite — 8 deterministic attack probes + replayable refusal classifier + multi-provider runner (OpenAI / Anthropic / Azure / ankr-proxy) + opt-in Agentic Control Center event bus. Extracted from xshieldai-asm-ai-module.

0High
0Medium
0Low

rocketlang

lakshmanrekha xshieldai rocketlang llm-security ai-redteam prompt-injection llm-probes ai-safety jailbreak-detection byok owasp-llm-top10 nist-ai-rmf eu-ai-act

@rocketlang/lakshmanrekha

🔍 Verification status (2026-05-17 IST — v0.2.1)
Tests: ✅ 36/36 passing (tests/lakshmanrekha.test.ts — bun test). Covers §1 registry, §2 classifier (incl. determinism), §3 refusal rate, §4 maskKey, §5 runner (fetch stubbed for openai + anthropic + HTTP errors + network errors), §6 ACC bus + API-key safety regression.
Examples: ⚠️ no runnable quickstart file yet — code blocks below are illustrative
Live demo: ⚠️ planned (Tier 3)
Phase-1 limits: documented in "Phase 1 limits" + "Authorization" sections below (incl. honor-system endpoint ownership)
Test-found behavior worth knowing: runProbe error path can echo the network error message verbatim (sliced to 200 chars). The runner masks the API key in receipts (verified in LR-035 — receipt JSON never contains the secret, and query strings are stripped from endpoint_host), but a hostile network library could theoretically include the key in its own error string. If you log the returned error field directly, also pipe through maskKey().

LLM endpoint probe suite — 8 deterministic attack probes, a replayable refusal classifier, and a multi-provider runner. Extracted from the internal xshieldai-asm-ai-module Fastify service into a standalone SDK.

Probe any LLM endpoint you have authorisation to test. Get a deterministic verdict per probe. Replayable.

What this is

lakshmanrekha (Sanskrit: the line that must not be crossed) is the substrate layer of LakshmanRekha, the LLM-endpoint-posture module inside xShieldAI. The full service has SQLite-backed attestations, Forja STATE/TRUST/SENSE/PROOF endpoints, and an ownership-verification flow — that lives in the closed product. This package is the probe primitives, the part that actually sends the attack to your LLM and classifies the response. Zero service dependencies; install and use.

If you're running an LLM endpoint (your own, or your team's, or a customer's with explicit consent) and want to know "what happens when I send a sockpuppet prefill?" — this is the SDK.

Install

npm install @rocketlang/lakshmanrekha
# or
bun add @rocketlang/lakshmanrekha

Quick start

import { runAllProbes, computeRefusalRate } from '@rocketlang/lakshmanrekha';

const results = await runAllProbes(
  'https://api.openai.com/v1',  // any OpenAI-compatible endpoint
  process.env.OPENAI_API_KEY!,
  'openai',
  { model: 'gpt-4o-mini' }
);

for (const r of results) {
  console.log(`${r.probe_id}: ${r.verdict} (${r.duration_ms}ms)`);
}

const score = computeRefusalRate(results.map((r) => r.verdict));
console.log(`Overall refusal rate: ${score}%`);

The 8 probes

All cite published sources. None are novel attack research. Every probe maps to NIST AI RMF, EU AI Act, and ISO 42001 clauses.

| ID | Name | Severity | Category | Source | |---|---|---|---|---| | ASMAI-PROBE-001 | CA-006 Sockpuppet (Assistant Prefill) | critical | injection | ANKR CA-006, DOI 10.5281/zenodo.19508513 | | ASMAI-PROBE-002 | System Prompt Extraction | high | extraction | Perez & Ribeiro 2022, Greshake et al. 2023 | | ASMAI-PROBE-003 | Multi-Turn Persona Escalation | high | escalation | Perez et al. 2022, NIST AI 600-1 | | ASMAI-PROBE-004 | Tool-Use Function-Call Smuggling | high | injection | Greshake et al. 2023 | | ASMAI-PROBE-005 | Encoding Bypass (Base64) | medium | evasion | Kang et al. 2023, Zou et al. 2023 | | ASMAI-PROBE-006 | JSON-Mode Format Injection | medium | bypass | ANKR internal finding, 2026-04-11 | | ASMAI-PROBE-007 | Indirect Prompt Injection via RAG | critical | injection | Greshake et al. 2023, Riley et al. 2022 | | ASMAI-PROBE-008 | Hypothetical Frame Bypass | medium | bypass | Perez & Ribeiro 2022, OpenAI Red Team 2023 |

Verdicts

The classifier returns one of:

| Verdict | Meaning | |---|---| | refused | The model declined to help (expected outcome for safe models) | | complied | The model answered the harmful prompt (failure) | | partial | Both refusal and compliance signals present (review manually) | | inconclusive | Response too short or ambiguous | | errored | HTTP error, timeout, or other runtime failure |

The classifier is deterministic (ASMAI-S-003). Given the same (response, probeId, classifier_version), you get the same verdict. This is what makes the verdicts replayable — security teams can re-run last month's scan against the same response text and get the same answer.

Run a single probe

import { runProbe, getProbe } from '@rocketlang/lakshmanrekha';

const probe = getProbe('ASMAI-PROBE-001');  // the CA-006 sockpuppet
if (!probe) throw new Error('probe not found');

const result = await runProbe({
  probe,
  endpoint_url: 'https://api.anthropic.com/v1',
  api_key: process.env.ANTHROPIC_API_KEY!,
  api_type: 'anthropic',
  model: 'claude-haiku-20240307',
  timeout_ms: 15_000,
});

console.log(result);
// { probe_id: 'ASMAI-PROBE-001', verdict: 'refused', duration_ms: 412, response_snippet: '...' }

Supported providers

api_type accepts:

openai — standard OpenAI /v1/chat/completions
anthropic — Anthropic /v1/messages
azure — Azure OpenAI (OpenAI-compatible endpoint)
ankr_proxy — ankr-mailer-style AI proxy (OpenAI-compatible)

For self-hosted LLMs that speak OpenAI's chat-completions schema (vLLM, LiteLLM, Together, Groq, etc.), use openai with your endpoint URL.

Use the classifier independently

If you have your own runner (custom transport, batched, async) and just want to classify response text:

import { classifyResponse, computeRefusalRate, REFUSAL_PATTERN_SET, COMPLIANCE_PATTERN_SET } from '@rocketlang/lakshmanrekha';

const verdict = classifyResponse(myLLMResponseText, 'my-probe-id');
// 'refused' | 'complied' | 'partial' | 'inconclusive' | 'errored'

// Or inspect the regex sets directly
console.log(`refusal patterns: ${REFUSAL_PATTERN_SET.length}`);
console.log(`compliance patterns: ${COMPLIANCE_PATTERN_SET.length}`);

Authorization — read this

The runner has no endpoint-ownership enforcement. The user is responsible for ensuring they have authorisation to probe the endpoint_url they pass.

Acceptable use:

Your own LLM endpoints (security testing of your deployment)
Endpoints your team owns or has been hired to test
Endpoints whose operator has given you explicit written consent to probe
Lab / homelab / personal experimentation against your own keys

Not acceptable:

Probing third-party LLM endpoints without authorisation
Using this tool to evaluate competitor products without their consent
Any use that violates the target operator's Terms of Service

This is the same posture as Burp, nuclei, sqlmap, OWASP ZAP — security research tools that assume the user has authorisation. Liability for unauthorised probing falls on the user, not the library.

The full xshieldai-asm-ai-module service (in the closed product) implements ownership verification via DNS-TXT challenge (ASMAI-S-006/ASMAI-S-007). The OSS package is honor-system only — Phase 1 internally, Phase 1 here.

API key safety

Keys are never logged in plaintext. The maskKey() helper returns abcd...wxyz form.
Keys are never persisted by this library — pass them in via RunProbeOptions.api_key, the runner uses them only within the scan window.
Responses are truncated to 200 characters in response_snippet to avoid accidentally logging sensitive completions.

Phase 1 limits (deliberate)

Sequential runner. runAllProbes() runs probes one at a time. Phase 2 may add parallel mode with rate-limiting. (~8 sequential probes = ~5-15 seconds against a fast endpoint.)
Regex classifier. Phase 2 will introduce a fine-tuned classifier with replayable attestations. The deterministic regex is the floor, not the ceiling.
No multi-turn beyond the probe definition. Probes already define their own multi-turn payloads. The runner does not maintain conversation state across probes.

@rocketlang/aegis — agent spend governance (kill-switch, DAN gate)
@rocketlang/kavachos — agent behavior governance (seccomp-bpf, Falco)
@rocketlang/chitta-detect — memory poisoning detection primitives
@rocketlang/aegis-guard — Five Locks SDK (approval tokens, nonces, idempotency, SENSE)
xshieldai-asm-ai-module (internal) — the full Fastify service this was extracted from

License

AGPL-3.0-only. See LICENSE. Any modified version run as a network service must publish source per AGPL clause 13.

The full xshieldai-asm-ai-module service is internal (port 4256) and not currently distributed.

For commercial dual-licensing or partnership: [email protected].

v0.2.0 — Opt-in Agentic Control Center (ACC) event bus

Added 2026-05-17. runProbe() (and runAllProbes() which calls it internally) now emits an AccReceipt per probe run, but only when you wire a bus. Without setEventBus, v0.2.0 behaves identically to v0.1.0 — no emission, no state, no side effect.

Wire it in 3 lines

import { setEventBus, type EventBus, type AccReceipt } from '@rocketlang/lakshmanrekha';

const myBus: EventBus = {
  emit: (r: AccReceipt) => console.log(`[ACC] ${r.event_type} ${r.verdict} ${r.summary}`),
};
setEventBus(myBus);

Receipt events emitted

| Primitive | event_type | verdict | |---|---|---| | runProbe (each call) | probe.run | refused / complied / partial / inconclusive / errored | | runAllProbes | emits one probe.run per probe (8 by default) | per-probe |

Receipt shape

interface AccReceipt {
  receipt_id: string;       // primitive-prefixed: 'lakshman-probe-{probeId}-{ts}'
  primitive: string;        // always 'lakshmanrekha'
  event_type: string;       // 'probe.run'
  emitted_at: string;       // ISO 8601
  agent_id?: string;        // reserved — not yet populated by lakshmanrekha
  verdict?: string;         // refused | complied | partial | inconclusive | errored
  rules_fired?: string[];   // e.g. ['ASMAI-S-001', 'ASMAI-S-002', 'ASMAI-S-003']
  summary?: string;         // "{probe-id} ({severity}/{category}) → {verdict} ({duration_ms}ms)"
  payload?: Record<string, unknown>; // probe_name, technique, api_type, duration_ms, endpoint_host
}

Strict subset of EE PRAMANA receipt format — EE consumers ingest without translation.

Phase-1 limits (v0.2.0)

agent_id is not yet populated — RunProbeOptions doesn't carry an agent context. Future versions may add optional agent_id; today post-process in the bus to add agent context from your own tracking.
classifyResponse does NOT emit independently — it's called many times by runProbe internally. Emission happens at runProbe level with the final verdict.
getProbe / getProbes / PROBE_REGISTRY access do NOT emit — reads only.
maskKey does NOT emit — pure helper.
computeRefusalRate does NOT emit — pure aggregation.
endpoint_url is logged as host only in payload.endpoint_host (not full URL) to avoid leaking query strings or paths that might contain bearer-shaped fragments.
API keys are never in receipts — maskKey continues to apply to any logging; receipts never include api_key field.

Use with `@rocketlang/aegis-suite`

import { wireAllToBus } from '@rocketlang/aegis-suite';  // suite v0.2.0+
wireAllToBus();  // wires aegis-guard + chitta-detect + lakshmanrekha + hanumang-mandate at once