shield-harness

v0.5.0

Published

2 months ago

Security harness for Claude Code — hooks-driven, zero-hassle defense

0High
0Medium
0Low

sora-bluesky

claude-code security hooks harness ai-safety

Shield Harness

Hook-driven auto-defense security harness for Claude Code

v0.5.0: 22 hooks, 4-layer defense (L1 Permissions + L2 Hooks + L3 Sandbox + L3b OpenShell), 426 tests including 108 OWASP AITG attack simulations + 35 Auto Mode defense tests.

What is Shield Harness

A security harness that governs Claude Code through multi-layered defense: hooks + rules + permissions + settings deployed in the .claude/ directory.

Quick Start

npx shield-harness init [--profile minimal|standard|strict]

Why Shield Harness

Hooks-driven defense: 22 security hooks monitor every Claude Code operation
Automated security decisions: Hooks handle all security judgments in real time — no manual approval bottleneck
fail-close principle: Automatically stops when safety conditions cannot be verified
Evidence recording: Tamper-proof SHA-256 hash chain records all allow/deny decisions

Architecture Overview

4-layer defense model:

| Layer | Defense | Implementation | | -------- | ------------------ | -------------------------------------------------- | | Layer 1 | Permission control | settings.json deny/allow rules | | Layer 2 | Hook defense | 22 Node.js hook scripts | | Layer 3 | Sandbox | Claude Code native sandbox (bubblewrap / Seatbelt) | | Layer 3b | Container sandbox | NVIDIA OpenShell (optional, Docker environments) |

Profiles

| Profile | Description | Approval-free | Use case | | ------------ | -------------- | ------------- | ------------------------------------------- | | minimal | Minimal config | Enabled | Low-risk tasks | | standard | Recommended | Enabled | Normal development | | strict | Strict config | Disabled | When security audit requires human approval |

Hook Catalog

| # | Hook | Event | Responsibility | | --- | ---------------- | --------------------- | ------------------------------------------------------------------------------------ | | 1 | permission | PreToolUse | 4-category tool usage classification | | 2 | gate | PreToolUse | 7 attack vector inspection for Bash commands | | 3 | injection-guard | PreToolUse | 9-category 50+ pattern injection detection | | 4 | data-boundary | PreToolUse | Production data boundary + jurisdiction tracking | | 5 | quiet-inject | PreToolUse | Auto-inject quiet flags | | 6 | evidence | PostToolUse | SHA-256 hash chain evidence | | 7 | output-control | PostToolUse | Output truncation + token budget | | 8 | dep-audit | PostToolUse | Package install detection | | 9 | lint-on-save | PostToolUse | Auto lint execution | | 10 | session-start | SessionStart | Session init + integrity baseline | | 11 | session-end | SessionEnd | Cleanup + statistics | | 12 | circuit-breaker | Stop | Retry limit (3 attempts) | | 13 | config-guard | ConfigChange | Settings change monitoring + OpenShell policy file protection + Auto Mode protection | | 14 | user-prompt | UserPromptSubmit | User input injection scanning | | 15 | permission-learn | PermissionRequest | Permission learning guard | | 16 | elicitation | Elicitation | Phishing + scope guard | | 17 | subagent | SubagentStart | Subagent budget constraint (25%) | | 18 | instructions | InstructionsLoaded | Rule file integrity monitoring | | 19 | precompact | PreCompact | Pre-compaction backup | | 20 | postcompact | PostCompact | Post-compaction restore + verify | | 21 | worktree | WorktreeCreate/Remove | Security propagation + evidence merge | | 22 | task-gate | TaskCompleted | Test gate |

Pipeline

STG gate-driven automation pipeline:

| STG0 | STG1 | STG2 | STG3 | STG4 | STG5 | STG6 | | :--: | :----: | :--: | :----: | :--: | :----: | :------: | | Reqs | Design | Impl | Verify | CI | Commit | PR/Merge |

Layer 3: Sandbox (OS-Level Isolation)

Layer 3 relies on Claude Code's built-in sandbox. Shield Harness does not implement its own sandbox — it leverages Layers 1 & 2 to compensate when sandboxing is unavailable.

Platform Support

| OS | Sandbox | Technology | Status | | -------------- | ------------- | ------------------ | --------------------------------------- | | macOS | Supported | Seatbelt | Auto-enabled | | Linux | Supported | bubblewrap + socat | sudo apt-get install bubblewrap socat | | WSL2 | Supported | bubblewrap + socat | Same as Linux | | WSL1 | Not supported | — | Kernel features missing | | Windows native | Not supported | — | Planned by Anthropic |

Windows Native: Security Gap & Mitigation

On Windows native, Claude Code's sandbox features (sandbox.filesystem.*, sandbox.network.*, sandbox.autoAllow) do not function. Shield Harness compensates through:

Layer 1: permissions.deny includes Windows-specific commands (type, del, format, Invoke-WebRequest)
Layer 2: All 22 hooks operate normally — injection detection, evidence recording, and gate checks are fully functional
Limitation: Child process file access cannot be restricted at the OS level; raw socket communication bypasses command pattern matching

For enterprise environments, supplementing with Windows Firewall outbound rules for process-level network control is recommended.

Layer 3b: NVIDIA OpenShell (Optional)

Why Layer 3b?

Layer 1 (permissions) and Layer 2 (hooks) inspect tool call inputs — the command text before execution. Once a command passes these checks, the spawned child process runs freely at the OS level.

Layer 1-2 (in-process):
  Claude Code → [Hook inspects input] → Command execution → [Child process is free]
                 ↑ Only controls this point

Layer 3b (out-of-process = kernel-level):
  Claude Code → Command execution → [Landlock: Filesystem access control]
                                     [Seccomp: Syscall control]
                                     [Network NS: Network isolation]
                ↑ Kernel controls ALL processes including children

| Attack Vector | Layer 1-2 Defense | Why It Bypasses | Layer 3b Defense | | ------------------------ | --------------------------- | -------------------------------------- | ------------------------------------- | | Pipe chain file access | Pattern matching | Indirect access via awk, python -c | Landlock LSM denies at kernel level | | Raw socket communication | curl/wget deny rules | Direct socket via python/node | Seccomp BPF blocks socket syscalls | | DNS tunneling | sandbox.network (WSL2 only) | Data embedded in DNS queries | Network Namespace isolates all DNS | | PowerShell sockets | Pattern matching | Encoding/obfuscation | Seccomp BPF + Network NS dual defense |

Structural guarantee: The agent cannot disable its own guardrails — policies exist outside the container and are locked at sandbox creation.

NVIDIA OpenShell (Apache 2.0) provides kernel-level isolation for AI agents via Docker:

| Mechanism | Target | Protection | | ------------ | ---------- | ------------------------- | | Landlock LSM | Filesystem | denyWrite / denyRead | | Seccomp BPF | Syscalls | Socket / process restrict | | Network NS | Network | Domain-level deny |

Key benefits for Windows users:

Policies exist outside the agent process — the agent cannot disable its own guardrails
Runs on Docker Desktop + WSL2 backend (typical Windows dev setup)
Significantly reduces residual risk from Layer 1-2 pattern matching limitations
Freely removable — stop the container and Shield Harness falls back to Layer 1-2

Note: OpenShell is Alpha (v0.0.13) — APIs may change with future releases. Shield Harness GA Phase integration is complete (ADR-037): config guard policy file protection, policy drift check, and full documentation are ready.

Setup

Prerequisites: Docker Desktop (WSL2 backend on Windows)

# 1. Install Docker Desktop and verify it is running
#    https://www.docker.com/products/docker-desktop/
docker --version

# 2. Install OpenShell CLI
pip install openshell

# 3. Generate policy from permissions-spec.json
#    Creates .claude/policies/openshell-generated.yaml
npx shield-harness policy generate

# 4. Start OpenShell container and run Claude Code inside it
#    Docker pulls the sandbox image automatically on first run
#    Kernel-level enforcement (Landlock/Seccomp/Network NS) is active inside the container
openshell run --policy .claude/policies/openshell-generated.yaml

Claude Code running inside the OpenShell container automatically receives Layer 3b kernel enforcement. Shield Harness detects this at session start (sh-session-start.js) — no additional configuration required.

Without OpenShell, Shield Harness falls back to Layer 1-2 defense (no degradation in hook protection).

Policy files are protected by:

permissions.deny: Edit/Write(.claude/policies/**) blocks agent modification
sandbox.denyWrite: .claude/policies in filesystem deny list
sh-config-guard.js: Hash tracking detects policy file tampering or weakening
sh-session-start.js: Drift check at session start verifies spec-policy alignment

Testing

# Run all tests (426 tests including attack simulations)
npm test

# Run attack simulation tests only
node --test tests/attack-sim-*.test.js

| Test Suite | Category | Tests | | ----------------------------- | -------------------------------------- | ----- | | attack-sim-prompt-injection | AITG-APP-01: Direct Prompt Injection | 25 | | attack-sim-indirect-injection | AITG-APP-02: Indirect Prompt Injection | 18 | | attack-sim-data-leak | AITG-APP-03: Sensitive Data Leak | 20 | | attack-sim-agentic-limits | AITG-APP-06: Agentic Behavior Limits | 18 | | attack-sim-sandbox-escape | NVIDIA 3-axis: Sandbox Escape | 15 | | attack-sim-defense-chain | SAIF: Defense-in-depth Chain | 12 | | attack-sim-automode-bypass | Auto Mode: soft_deny/soft_allow bypass | 15 |

Auto Mode Awareness (v0.5.0)

Shield Harness detects Claude Code's Auto Mode (Research Preview) configuration at session start and protects against dangerous settings:

| Setting | Risk | Shield Harness Response | | ---------------------- | ---------------------------------------------------------- | -------------------------------------------------------------------- | | autoMode.soft_deny | CRITICAL — disables all classifier default protections | Config-guard blocks addition; session-start outputs CRITICAL warning | | autoMode.soft_allow | WARN — auto-approves specific tools | Config-guard blocks expansion; session-start outputs WARNING | | autoMode.environment | Safe — informational only | Detected and recorded in session |

All existing hooks (PreToolUse, PostToolUse) fire normally under Auto Mode — permissions.deny rules remain absolute. Auto Mode's classifier cannot override hook denials.

Channel Integration

Supports Claude Code Channels (Telegram/Discord). Channel-sourced messages automatically receive severity boost for enhanced security.

System Requirements

| Tool | Version | Purpose | Required | | ------------ | ------------------ | ----------------------------------- | ------------------ | | Git | 2.x | Version control | Required | | Git Bash | (bundled with Git) | Hook script runtime | Required (Windows) | | Node.js | 18+ | Hook execution + NFKC normalization | Required | | PowerShell 7 | 7.x (pwsh) | Sync scripts | Recommended | | GitHub CLI | 2.x (gh) | PR creation/merge automation | Optional |

OS: Windows-native first (Git Bash), WSL2/Linux compatible.

Versioning

Shield Harness follows Semantic Versioning:

| Bump | Condition | Example | | ------- | -------------------------------------------------------------- | ------------------------------------ | | patch | Bug fixes, pattern updates, documentation fixes | injection-patterns.json update | | minor | New features (backward compatible), Phase must-tasks completed | OCSF support, new hook, CLI option | | major | Breaking changes | Schema incompatible, settings change |

Release trigger: git tag vX.Y.Z && git push origin vX.Y.Z triggers release.yml (automated npm publish + GitHub Release). Security fixes trigger an immediate patch release.

References

Key references:

| Project | Influence | | ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | claude-guardrails | npx install pattern, 50+ injection patterns, deny rule catalog | | claude-warden | 3-tier profiles, token governance (quiet-inject, output-control), ConfigChange self-protection | | claude-hooks | 5-category injection detection, YAML pattern definitions | | tobari | 22-hook architecture, SHA-256 hash chain evidence, STG gate pipeline, PermissionRequest adaptive learning | | OpenClaw | 18 CVE/security issue lessons (gateway auth, credential management, symlink traversal), channel integration design | | everything-claude-code | AgentShield security integration (1,282 tests, 102 rules), comprehensive skill/agent catalog | | NVIDIA OpenShell | Layer 3b kernel-level sandbox (Landlock, Seccomp BPF, Network NS), declarative YAML policies |

License

MIT