@mukundakatta/kavach
v0.1.0
Published
Small, inspectable threat-scoring library for AI-app security monitoring.
Maintainers
Readme
kavach
कवच — shield, armour.
A small, inspectable threat-scoring library for AI-app security monitoring. Given a set of fired detection signals (prompt injection, tool misuse, PII exfil, credential leaks, etc.) it returns a bounded risk score, a tier, and a recommended action for the SOC view to surface.
The project ships as:
assets/threatScore.js— the library (ES module, zero deps).index.html— a demo landing page.
What it does
threatScore(firedSignals) combines weighted signals with diminishing returns so stacking many weak signals can't overrule a single strong one. It returns a score in [0, 1] plus the list of contributing signal labels for explainability.
import { threatScore, tier, triageIncident } from "./assets/threatScore.js";
threatScore(["promptInjection", "toolMisuse"]);
// { score: 0.545, contributors: ["Prompt-injection language detected",
// "Unusual tool / API call pattern"] }
tier(0.9);
// { tier: "critical", color: "#b00020" }
triageIncident(["credentialLeak", "piiExfil"], { model: "dataExfiltration" });
// {
// score: 0.642,
// tier: "high",
// contributors: [...],
// model: "dataExfiltration",
// playbook: ["DLP scanning", "egress allowlist", "secrets redaction"],
// action: "Strip tool access and alert the on-call."
// }Signals
| Signal | Weight | What fires it |
|---|---|---|
| promptInjection | 0.35 | Prompt-injection language patterns in user input |
| toolMisuse | 0.30 | Unusual tool / API call pattern vs baseline |
| piiExfil | 0.35 | PII detected in model output or egress |
| credentialLeak | 0.45 | Credential-like string in model output |
| jailbreakPattern | 0.30 | Known jailbreak template match |
| rateAnomaly | 0.15 | Rate anomaly vs user baseline |
| geoAnomaly | 0.15 | New geography for this account |
Weights are tunable per deployment by editing the SIGNALS object.
Threat models
Three coarse classes of AI-app attack, each with attack surfaces and defensive controls:
promptAbuse— chat input, tool arguments, system prompts.dataExfiltration— model output, file export, network egress.accountTakeover— auth session, API token, admin console.
buildPlaybook(model) returns the surfaces and numbered control steps for a given model.
Getting started
The JS is an ES module you can import directly:
<script type="module">
import { triageIncident } from "./assets/threatScore.js";
// ...
</script>Or serve the demo locally:
python -m http.server 8000
# open http://localhost:8000Tests
node --test test/threatScore.test.jsLicense
MIT
