@framers/agentos-ext-ml-classifiers

v0.3.1

Published

a day ago

ML-based content classifiers for AgentOS — toxicity, prompt injection, and NSFW detection via ONNX models or LLM fallback

0High
0Medium
0Low

manicteam

jdunnfive

@framers/agentos-ext-ml-classifiers

ML-based content classifiers for @framers/agentos: toxicity, prompt-injection, and NSFW detection via local ONNX models with optional LLM fallback for low-confidence cases.

What it does

Runs incoming user messages and agent outputs through a chain of small classifiers. Each classifier returns a score; configurable thresholds gate downstream behavior. Local ONNX inference is fast and offline-friendly; the LLM fallback handles ambiguous cases when confidence is low.

Built-in classifiers:

Toxicity (insults, threats, harassment)
Prompt injection (jailbreak attempts, instruction-override patterns)
NSFW content
Keyword-based prefilter (zero-cost coarse triage)
LLM-as-judge fallback (configurable model)

Install

npm install @framers/agentos-ext-ml-classifiers

Peer dependency: @framers/agentos.

Quickstart

import { AgentOS } from '@framers/agentos';
import { createMLClassifierGuardrail } from '@framers/agentos-ext-ml-classifiers';

const agentos = new AgentOS();
await agentos.initialize({
  extensionManifest: {
    packs: [
      {
        factory: () =>
          createMLClassifierGuardrail({
            classifiers: ['toxicity', 'prompt-injection', 'nsfw'],
            llmFallback: { enabled: true, threshold: 0.6 },
          }),
        enabled: true,
      },
    ],
  },
});

Public API

createMLClassifierGuardrail(options?) — factory returning an ExtensionPack
createExtensionPack(context) — auto-discoverable factory used by AgentOS extension auto-pickup
createMLClassifierPack — alias for createMLClassifierGuardrail

See src/types.ts for MLClassifierOptions.

Examples

test/ — fixtures and threshold-tuning tests

Lazy loading and optional install

This package is an optional dependency of @framers/agentos-extensions-registry. The registry ships catalog metadata; createCuratedManifest() calls import.meta.resolve() per entry and silently skips anything not installed. npm install @framers/agentos-ext-ml-classifiers is the gate.

The ONNX BERT classifiers (toxicity, prompt-injection, NSFW) do not load at activation. The pack registers a factory under the ml:classifier-orchestrator key in SharedServiceRegistry, and each model file enters the module graph only on the first classification that needs it. The keyword prefilter runs first at zero cost; the LLM fallback uses a separate factory gated by an optional requiredSecrets entry, so the descriptor is skipped if no provider key is configured.

The guardrail registers with config.evaluateStreamingChunks = true and runs in Phase 2 of the two-phase dispatcher (parallel classifiers). Worst-action aggregation (BLOCK > FLAG > ALLOW) resolves conflicts when multiple classifiers fire on the same chunk.

For the full DI model and end-to-end walkthrough, see How extensions stay optional and lazy and the auto-loading guide.

License

Apache 2.0 — see the repo root LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@framers/agentos-ext-ml-classifiers

What it does

Install

Quickstart

Public API

Examples

Lazy loading and optional install

License