@nanomind/daemon
v0.2.0
Published
Persistent NanoMind inference daemon with HTTP and IPC interfaces
Readme
@nanomind/daemon
Persistent NanoMind inference server. Runs on localhost:47200 with HTTP and Unix socket interfaces. Lazy-loads the model on first request, unloads after idle timeout.
Install
npm install @nanomind/daemonQuick Start
# Start the daemon
nanomind-daemon start
# Check status
nanomind-daemon status
# Stop
nanomind-daemon stopHTTP API
POST /v1/infer
Classify input text using the loaded NanoMind model.
curl -X POST http://127.0.0.1:47200/v1/infer \
-H "Content-Type: application/json" \
-d '{
"intent": "SCAN_SKILL",
"input": "This skill forwards tokens to an external endpoint",
"context": { "artifactType": "skill" },
"priority": "high"
}'Response (malicious classification):
{
"intent": "SCAN_SKILL",
"result": "exfiltration",
"confidence": 0.94,
"attackClass": "exfiltration_pattern",
"evidence": "exfiltration",
"latencyMs": 2,
"modelVersion": "nanomind-tme-v0.5.0"
}Response (benign):
{
"intent": "INTENT_CHECK",
"result": "benign",
"confidence": 0.91,
"attackClass": "",
"evidence": "benign",
"latencyMs": 1,
"modelVersion": "nanomind-tme-v0.5.0"
}Response schema
| Field | Type | Required | Description |
|---|---|---|---|
| intent | string | yes | Echoes the request intent (or routed intent if not provided). |
| result | string | yes | Raw model label (e.g. "injection", "benign") — convenience for human-readable logs. |
| confidence | number | yes | Softmax probability of the predicted class, in [0, 1]. |
| attackClass | string | yes | Canonical attack-class label, or empty string. See enum + mapping below. |
| evidence | string | no | Raw 10-way model label. Carries audit-trail granularity beyond the canonical bucket. |
| remediation | string | no | Suggested remediation text (reserved for future use). |
| latencyMs | number | yes | End-to-end inference latency in milliseconds. |
| modelVersion | string | yes | Loaded model identifier. |
attackClass enum
The field is always emitted. An empty string means "no malicious intent detected"; non-empty values are produced by the v0.5.0 production classifier:
| Value | Meaning |
|---|---|
| "" | No malicious intent detected (model classified as benign). |
| "exfiltration_pattern" | Output or tool call appears to forward sensitive data to an external destination. |
| "prompt_injection" | Input contains instructions that attempt to override the agent's policy. |
| "tool_misuse" | Capability or tool used outside its declared purpose. |
| "data_extraction" | Sequence of reads consistent with bulk data extraction. |
attackClass mapping
The model emits 10 raw labels (matching the 10-class training corpus). The daemon maps them to the 5-value canonical attackClass enum above for the FGA decision contract, while preserving the raw label in evidence so audit and telemetry retain full granularity.
| Raw model label | attackClass |
|---|---|
| benign | "" |
| injection | prompt_injection |
| social_engineering | prompt_injection |
| exfiltration | exfiltration_pattern |
| steganography | exfiltration_pattern |
| credential_abuse | data_extraction |
| privilege_escalation | tool_misuse |
| persistence | tool_misuse |
| lateral_movement | tool_misuse |
| policy_violation | tool_misuse |
FGA contract
AIM's FGA Step 5 (fga_engine.go::checkIntentSync) reads this response and blocks when:
attackClass != "" && confidence > 0.8The wire contract is required (the field is always present); the value is empty when the model classifies the request as benign and non-empty otherwise. Consumers needing 10-way granularity (e.g. dashboards, runtime correlation) read the raw label from evidence.
GET /health
curl http://127.0.0.1:47200/health
# {"running":true,"port":47200,"activeTasks":0,"modelLoaded":false,"startedAt":"2026-04-28T21:16:18.748Z","uptime":2069}modelLoaded is false until the first /v1/infer call — the model lazy-loads on demand and unloads after idleUnloadSeconds of inactivity.
GET /v1/status
curl http://127.0.0.1:47200/v1/status
# {"running":true,"port":47200,"activeTasks":0,"modelLoaded":false,"startedAt":"2026-04-28T21:16:18.748Z","uptime":2069}Known model-quality limitations (v0.5.0 classifier)
The v0.5.0 Mamba-TME classifier ships with documented limitations on inputs outside its training corpus distribution. These are model properties, not daemon bugs, and the wire contract (attackClass always emitted, canonical 5-value enum) holds in every case.
- Confidence saturation. The softmax output frequently saturates at exactly
1.0rather than expressing graduated probability. Threshold-based decisions (e.g. AIM FGA Step 5'sconfidence > 0.8) become effectively binary in the saturated zone. - False positives on natural-language benign inputs. Generic operational queries (e.g.
"What is the weather today?","list users") may classify as non-benign with high confidence because the training corpus underweights conversational benign prose. - False negatives on natural-language attack paraphrases. Exfiltration intents expressed in natural language outside the training corpus's phrasing (e.g.
"transfer the contents of <file> to <url> via webhook") may classify as benign.
Recommended consumer mitigations until v0.6.0:
- Set
confidence > 0.95(not0.8) before treating a non-emptyattackClassas actionable. - Combine the classifier signal with a corroborating signal (rule-based detectors, telemetry, capability checks) before blocking — do not rely on the classifier alone for high-stakes decisions.
- Log every
(input, attackClass, confidence)triple. Out-of-distribution behavior is the input source for the v0.6.0 retrain corpus.
A retrain to v0.6.0 with broader corpus coverage is tracked separately. The 5-value canonical attackClass enum will not change between v0.5.0 and v0.6.0; only model accuracy improves.
Configuration
| Option | Default | Description |
|--------|---------|-------------|
| httpPort | 47200 | HTTP server port |
| ipcPath | /tmp/nanomind.sock | Unix socket path |
| maxConcurrent | 4 | Max concurrent inference requests |
| idleUnloadSeconds | 300 | Unload model after N seconds idle |
Programmatic Usage
import { NanoMindDaemon } from '@nanomind/daemon';
const daemon = new NanoMindDaemon({ httpPort: 47200 });
await daemon.start();
// Direct inference (bypasses HTTP)
const result = await daemon.infer({
intent: 'COMPILE_AST',
input: skillContent,
priority: 'high',
});Model files
The daemon loads the v0.5.0 production NanoMind classifier (Mamba-TME, 8 blocks, 6000-token vocab) from ~/.nanomind/models/. Three files are required:
| File | Purpose |
|---|---|
| nanomind-tme.onnx | ONNX graph (architecture only — small). |
| nanomind-tme.onnx.data | External weights data file (~8MB). |
| tokenizer.json | Word-level vocabulary (6000 entries). |
Download from HuggingFace (opena2a/nanomind-security-classifier):
mkdir -p ~/.nanomind/models
cd ~/.nanomind/models
BASE=https://huggingface.co/opena2a/nanomind-security-classifier/resolve/main
curl -sSL -o nanomind-tme.onnx "$BASE/nanomind-tme.onnx"
curl -sSL -o nanomind-tme.onnx.data "$BASE/nanomind-tme.onnx.data"
curl -sSL -o tokenizer.json "$BASE/tokenizer.json"Security
- Binds to
127.0.0.1only (no external access). - Model files SHA-256 verified on load against canonical hashes recorded in
nanomind-models.json(v0.5.0). Mismatch fails the daemon hard rather than silently running a tampered or stale model. - Request size capped at 1MB.
- Rate limited to 100 requests/second.
- No credentials in memory after model load.
License
MIT
