@velanir/openclaw-egress-guard

v0.1.10

Published

3 days ago

OpenClaw plugin that redacts Velanir internal diagnostics before outbound user-channel delivery.

0High
0Medium
0Low

Velanir Egress Guard

OpenClaw plugin that rewrites outbound user-channel messages through the message_sending and reply_payload_sending hooks before delivery.

It strips Velanir internal diagnostics, raw tool payload JSON, [object Object] leaks, exact internal sentinels such as NO_REPLY, and partial-progress timeout dumps. It defaults to enforce mode for public installs. In enforce mode it returns sanitized content for redaction cases and cancels recoverable blocks with structured recovery metadata. In shadow mode it only logs what would be stripped.

When a message includes safe user-facing text before the unsafe section, the plugin preserves that text. When stripping leaves no answer, the plugin does not send fallback copy. It requests runtime-owned recovery so the runtime can notify the user once, retry internally, and send either a regenerated answer or a final issue notice.

The package never logs message content. Its logs include only counts, risk reasons, lengths, and routing metadata.

Default Pipeline

The deterministic sanitizer is still the hard boundary for known leak classes. The default public package pipeline is:

outbound Slack or Teams message
  -> message_sending or reply_payload_sending
  -> deterministic sanitizer
  -> final-output rewrite model
  -> deterministic sanitizer again
  -> model gate classifier
  -> send, redact, or request recovery

message_sending covers generic outbound delivery, cron, message-tool sends, and older supported runtimes. On OpenClaw 2026.6.6 and newer, reply_payload_sending covers normalized reply payloads before Slack and Microsoft Teams channel delivery, including normal inbound reply flows that do not pass through message_sending.

Only final reply payloads use the final-output rewrite model and model gate. Tool/progress/block payloads use deterministic cleanup only, so streaming and progress messages do not create extra model noise.

The final-output rewrite prepares the visible answer for a non-technical business user. It removes execution mechanics, simplifies complicated wording, keeps the answer business-framed, uses light emoji only when appropriate, and avoids em dashes in final user-channel output.

The final-output rewrite does not define a plugin maxOutputTokens cap. It uses the runtime/model defaults for the rewritten answer. modelGate.maxOutputTokens only caps the internal classifier JSON response.

Configuration

Normal installs can use the defaults:

{
  "mode": "enforce"
}

A fully explicit config looks like this:

{
  "mode": "enforce",
  "scope": {
    "channels": ["slack", "msteams"]
  },
  "finalOutput": {
    "mode": "enforce",
    "timeoutMs": 5000,
    "audience": "nonTechnical",
    "simplify": true,
    "businessFraming": true,
    "respectResponsibilityOutputRequirements": true,
    "emojiRatio": 0.5,
    "forbidEmDash": true,
    "tone": {
      "style": "friendly_crisp",
      "emojiPolicy": "balanced",
      "maxEmojis": 3
    }
  },
  "modelGate": {
    "mode": "enforce",
    "threshold": 0.85,
    "timeoutMs": 3000,
    "maxOutputTokens": 256
  },
  "logging": {
    "strips": true,
    "includeEvaluation": false
  }
}

Model Selection

When model is omitted, the plugin uses the currently configured OpenClaw coworker model: first agents.defaults.compaction.model, then agents.defaults.model.primary.

Use model and modelFallback only as explicit overrides. They use OpenClaw's provider/model model reference format, for example openai/gpt-5.4-mini:

{
  "finalOutput": {
    "model": "openai/gpt-5.4-mini",
    "modelFallback": "openai/gpt-5.4-mini"
  },
  "modelGate": {
    "model": "openai/gpt-5.4-mini",
    "modelFallback": "openai/gpt-5.4-mini"
  }
}

If the runtime needs a specific OpenClaw auth profile, add it explicitly:

{
  "finalOutput": {
    "authProfileId": "existing-openclaw-auth-profile-id"
  },
  "modelGate": {
    "authProfileId": "existing-openclaw-auth-profile-id"
  }
}

Do not set authProfileId to a placeholder. Omit it unless that exact auth profile exists in the OpenClaw runtime; when omitted, OpenClaw auto-selects the active auth profile for the provider/model.

finalOutput has its own model override fields, but if they are omitted it can reuse the modelGate override before falling back to the current coworker model.

Tone And Emojis

finalOutput.tone is part of the published plugin schema. OpenClaw accepts it in openclaw.json starting with @velanir/[email protected].

{
  "finalOutput": {
    "emojiRatio": 0.5,
    "tone": {
      "style": "friendly_crisp",
      "emojiPolicy": "balanced",
      "maxEmojis": 3
    }
  }
}

Tone settings:

style: plain, friendly_crisp, or warm.
emojiPolicy: off, conservative, or balanced.
maxEmojis: integer from 0 to 3.

The rewrite model receives the tone settings, and the plugin also applies a deterministic cleanup pass afterward:

emojiPolicy: "off" or style: "plain" removes emojis.
All policies cap the final message to maxEmojis.
emojiPolicy: "balanced" can add one leading success emoji for eligible completed-task messages when the rewrite model omitted one.
Emojis are suppressed for errors, blockers, access/approval issues, security, legal, finance, customer escalations, and reconnection/auth messages.

Model Gate

The model gate scores the final user-visible candidate before delivery. It defaults to enforce, so below-threshold output cancels the outbound message and requests runtime-owned recovery instead of sending fallback copy.

The gate is category-first, not average-score-first. A classifier response can score highly and still be blocked when it marks any forbidden category as true:

{
  "egressScore": 0.96,
  "safeToSend": false,
  "userResponsiveScore": 0.94,
  "businessFramingScore": 0.91,
  "nonTechnicalScore": 0.93,
  "completionHonestyScore": 0.95,
  "forbiddenCategories": {
    "internalProcessDisclosure": true,
    "sessionMechanics": true,
    "backgroundTaskDisclosure": false,
    "staleMessageMechanics": false,
    "technicalRecoveryExplanation": false,
    "providerArtifact": false,
    "responsibilityContractViolation": false
  },
  "riskReasons": ["internal_process_disclosure", "session_mechanics"],
  "recommendedAction": "block_recover",
  "rewriteGoal": "Explain the user-facing outcome without internal mechanics."
}

Any true forbidden category forces recovery with model_gate_forbidden_category, regardless of the numeric score. The current categories cover internal process disclosure, session/thread mechanics, background task disclosure, stale delayed-task mechanics, technical recovery explanations, provider artifacts, and visible responsibility output-contract violations.

In shadow mode the classifier can score output without altering delivery. To reduce noise, clean allow decisions are not logged unless logging.includeEvaluation is enabled; blocked, unavailable, and error decisions still log route metadata and risk reasons.

Scope

No user scope configuration is required for the normal Platform install. The default protects Slack and Microsoft Teams for every source and delivery path:

{
  "scope": {
    "channels": ["slack", "msteams"]
  }
}

If a package install omits scope, the plugin uses the same default. Advanced operators can still override scope explicitly for narrower or wider rollouts. Add sources or deliveryPaths only when you intentionally want to limit enforcement to a smaller path such as cron.

OpenClaw Compatibility

The npm hook baseline supports [email protected] and newer. That version provides the outbound message_sending hook, so generic outbound delivery, cron, and message-tool sends remain covered on older supported runtimes.

Full Slack and Microsoft Teams reply coverage requires OpenClaw 2026.6.6 or newer. That runtime installs reply_payload_sending on the channel reply dispatcher before delivery, which lets this plugin evaluate normal channel replies that may not pass through message_sending.

Enforced model rewriting and scoring also require the embedded model-run API to be available in the runtime.

For recoverable blocks, the strongest user experience requires runtime-owned recovery handling for message_sending cancellation metadata. Without recovery handling, the plugin can still block unsafe output, but the runtime may not be able to regenerate a cleaner answer automatically.

For reply_payload_sending, OpenClaw exposes payload rewrite/cancel semantics rather than the richer message_sending recovery metadata. In that path the plugin returns a clean replacement payload when it can, and cancels only when no safe visible payload should be delivered.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme