@mukundakatta/kavach

v0.1.0

Published

18 days ago

Small, inspectable threat-scoring library for AI-app security monitoring.

0High
0Medium
0Low

mukundakatta

ai security threat-score prompt-injection soc llm-security

kavach

कवच — shield, armour.

A small, inspectable threat-scoring library for AI-app security monitoring. Given a set of fired detection signals (prompt injection, tool misuse, PII exfil, credential leaks, etc.) it returns a bounded risk score, a tier, and a recommended action for the SOC view to surface.

The project ships as:

assets/threatScore.js — the library (ES module, zero deps).
index.html — a demo landing page.

What it does

threatScore(firedSignals) combines weighted signals with diminishing returns so stacking many weak signals can't overrule a single strong one. It returns a score in [0, 1] plus the list of contributing signal labels for explainability.

import { threatScore, tier, triageIncident } from "./assets/threatScore.js";

threatScore(["promptInjection", "toolMisuse"]);
// { score: 0.545, contributors: ["Prompt-injection language detected",
//                                "Unusual tool / API call pattern"] }

tier(0.9);
// { tier: "critical", color: "#b00020" }

triageIncident(["credentialLeak", "piiExfil"], { model: "dataExfiltration" });
// {
//   score: 0.642,
//   tier: "high",
//   contributors: [...],
//   model: "dataExfiltration",
//   playbook: ["DLP scanning", "egress allowlist", "secrets redaction"],
//   action: "Strip tool access and alert the on-call."
// }

Signals

| Signal | Weight | What fires it | |---|---|---| | promptInjection | 0.35 | Prompt-injection language patterns in user input | | toolMisuse | 0.30 | Unusual tool / API call pattern vs baseline | | piiExfil | 0.35 | PII detected in model output or egress | | credentialLeak | 0.45 | Credential-like string in model output | | jailbreakPattern | 0.30 | Known jailbreak template match | | rateAnomaly | 0.15 | Rate anomaly vs user baseline | | geoAnomaly | 0.15 | New geography for this account |

Weights are tunable per deployment by editing the SIGNALS object.

Threat models

Three coarse classes of AI-app attack, each with attack surfaces and defensive controls:

promptAbuse — chat input, tool arguments, system prompts.
dataExfiltration — model output, file export, network egress.
accountTakeover — auth session, API token, admin console.

buildPlaybook(model) returns the surfaces and numbered control steps for a given model.

Getting started

The JS is an ES module you can import directly:

<script type="module">
  import { triageIncident } from "./assets/threatScore.js";
  // ...
</script>

Or serve the demo locally:

python -m http.server 8000
# open http://localhost:8000

Tests

node --test test/threatScore.test.js

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme