@x12i/optimixer
v3.5.2
Published
Resilient scenario-based runtime optimization for AI workloads; ai.max_tokens.v1 with bundled ai-profiles model resolution, warmup tolerance, and Activix learning
Maintainers
Readme
@x12i/optimixer
Version 3.3.0 — resilience release
Part of the Activitix monorepo.
Predict-time normalization for modelProfile: concrete vendor models, gateway wire ids, and @x12i/ai-profiles profile+choice keys — without host pre-resolution. Warmup tolerates messy historical Activix rows; predict and init stay up.
Dependencies: @x12i/ai-profiles ^3.0.0 · @x12i/activix-contracts · optional peer @x12i/activix ^8.3.1 for learning mode.
Resilience guarantees (3.3.0)
| Guarantee | Behavior |
|-----------|----------|
| Single modelProfile entry point | Pass concrete { provider, model }, wire id, or profile+choice (cheap/default) — Optimixer resolves via bundled resolveBundledInput |
| No host pre-resolve | xynthesis / ai-tasks PRE / studio pass keys as stored; Optimixer normalizes before caps + reasoning checks |
| CR-14 effort reconcile | Historical reasoningEffort: not-applicable + reasoning-capable catalog model → promotes to low (or modelProfile.reasoning.effort) |
| Warmup never fails init on one bad row | Invalid rows skipped with optimixer.warmup.row.skipped debug logs; valid rows still load |
| Cold-start never throws | Missing history → conservative budget + labeled confidence |
| Unknown models | UNKNOWN_MODEL_DEFAULTS fallback; caller overrides still apply |
| Complete is best-effort | Missing requestId or persistence errors do not throw |
| Gateway boundary | MAIN HTTP invoke stays concrete-only — Optimixer is predict-only |
Package boundaries
@x12i/ai-profiles profile+choice + catalog caps (sync bundled, text lane)
@x12i/optimixer predict budget, reasoning reconcile, warmup, learning buckets
gateway / ai-skills MAIN HTTP concrete provider + model onlySee .docs/ai-profiles-boundary.md for ownership detail.
What Optimixer is
Optimixer is a scenario-based runtime optimization engine for AI workloads.
It predicts execution decisions before the call, observes what really happened after the call, and improves future executions across templates, models, reasoning modes, and usage profiles.
It is not a “max tokens utility.” Max-token optimization is the first implemented scenario.
Every scenario follows the same lifecycle:
Before execution: predict the best runtime decision
During execution: apply the decision
After execution: observe actual outcome
Over time: learn better decisions for similar future executionsOptimixer decides. Activix remembers. The AI wrapper applies.
Optimixer owns prediction and learning. Activix stores durable evidence. Your gateway or task runner orchestrates the flow.
First scenario: ai.max_tokens.v1
Today's shipped pipeline is ai.max_tokens.v1 (algorithm 3.0.0): learn the right max completion budget for each AI call — from historical evidence, model/reasoning profile, declared output intent, and acceptable risk.
Future scenarios may include model selection, reasoning effort, retry policy, context compression, provider fallback, cost/latency prediction, and output repair. They reuse the same predict → execute → observe → learn pattern.
Why max tokens is a good first scenario
Max tokens is a strong first use case because it is frequent, measurable, easy to observe from provider usage, easy to learn from, and directly tied to failures (truncation) and waste (over-allocation). Every LLM call has a completion budget; every response reports usage.
Why not “just set max tokens very high”?
A huge static budget feels safe but at enterprise scale it is the wrong operating model:
- Hides execution design — you stop learning that classification needs ~120 tokens, extraction ~900, a reasoning finalizer ~5,000. Everything becomes “give it a lot and hope.”
- Increases waste — models ramble, JSON bloats, latency grows, and retry behavior gets harder to reason about across millions of calls.
- Does not solve context-window pressure — output budget competes with system prompt, retrieval, tools, and memory. Oversized output leaves less safe room for input and context.
- Makes failures harder to diagnose — when everything uses 16k, you cannot tell whether failure was input size, model change, reasoning burn, or template drift. Optimixer records what was predicted, why, what was used, and whether the call was under- or over-allocated.
- Reasoning models need a split budget — completion may cover hidden reasoning plus visible output; one large number does not express that.
The right budget is the smallest safe budget for the specific task — not the largest number that fits in a form field.
Max-token optimization is how Optimixer removes manual runtime guessing from enterprise AI execution:
Manual guess: max_tokens = 4000 everywhere
Optimixer: template A → 300
template B → 900
template C → 1,800
template D + reasoning → 5,000
sparse history → related evidence + labeled confidenceAt enterprise scale, execution settings should be learned from historical evidence, not copied from a spreadsheet.
Technical overview
Uses @x12i/activix-contracts for record shapes and client interfaces. Persists prediction and learning evidence through an Activix-compatible client (embedded or standalone).
Deeper architecture: explained.md · Consumer migration: ../../.docs/MIGRATION-CONSUMERS.md
Install
Embedded mode (recommended — you already run Activix):
npm install @x12i/optimixer@^3.3.0 @x12i/activix@^8.0.0@x12i/activix-contracts is installed automatically as a dependency of Optimixer.
Standalone mode (Optimixer owns Activix wiring):
npm install @x12i/optimixer@^3.3.0 @x12i/activix@^8.0.0Peer: @x12i/activix ^8.0.0 (required at runtime for embedded and standalone).
Peer dependency (optional at runtime): @x12i/activix ^8.3.1 — required when using embedded or standalone persistence, not for predict-only.
Migrating to 3.0.0 (breaking)
Every predictAiMaxTokens call must include:
reasoningEffort:not-applicable|none|low|medium|highoutputIntent:{ mode: 'fixed' | 'relative', expectedVisibleTokens: number, outputToInputRatio?: number }
Optimixer no longer guesses output size from input alone. Input tokens affect context-window fit and cost; output budget is driven by your declared intent plus reasoning reserve.
See Token budget model and ../../.docs/MIGRATION-CONSUMERS.md.
3.3.0 — modelProfile resilience
Optimixer is the only place that should resolve profile+choice for prediction. Uses @x12i/ai-profiles resolveBundledInput (3.0.0+, sync bundled, catalogLane: 'text' — same as ai-tasks PRE).
Accepted modelProfile shapes
| Shape | Example | Resolved by |
|-------|---------|-------------|
| Concrete vendor + model | { provider: 'openrouter', model: 'google/gemini-2.5-flash-lite' } | Bundled models catalog |
| Gateway wire id | { model: 'openrouter/openai/gpt-5.5' } | Catalog (gateway prefix split) |
| Profile+choice | { profileChoice: 'cheap/default' } or { model: 'cyber/deep_forensics' } | @x12i/ai-profiles registry (strict profile/choice) → catalog caps |
Legacy field modelProfile.alias is accepted as an alias for profileChoice (Optimixer field name, not ai-profiles registry alias). Bare profile keys (cheap) and registry shortcuts are not supported — use explicit profile/choice (ai-profiles 3.0.0).
Reasoning + historical rows
When the bundled catalog marks the resolved model as reasoning-capable, Optimixer sets modelProfile.reasoning.enabled: true. If the caller (or Activix warmup row) still has reasoningEffort: 'not-applicable', normalize promotes effort to low or modelProfile.reasoning.effort — so xynthesis PRE init succeeds on old rows recorded before reasoning metadata existed.
Live predict with an active effort on a non-reasoning model still throws OPTIMIXER_REASONING_PROFILE_INCONSISTENT (CR-12 inverse case).
Example — profile+choice (PRE / xynthesis)
const prediction = await optimixer.predictAiMaxTokens({
templateId: 'pre-synthesis',
inputSize: 1200,
reasoningEffort: 'low', // or omit reconcile path: pass not-applicable on old shapes
outputIntent: { mode: 'fixed', expectedVisibleTokens: 512 },
modelProfile: { profileChoice: 'cheap/default' },
});
// modelProfile normalized to openrouter + concrete slug + catalog caps before budgetDo not pre-resolve profile+choice in hosts before Optimixer. Gateway / funcx / ai-skills MAIN HTTP must still receive concrete provider + model.
Mismatch on live predict throws OptimixerError — use isOptimixerError(err) / err.code / err.remediationHint.
Functional requirements (FR-OPT)
| FR | Behavior |
|----|----------|
| FR-OPT-1 | Predict works without DB; cold-start never throws; init failures throw Optimixer initialization failed: … |
| FR-OPT-2 | Min 256 tokens, +256 safety margin, structured JSON floor 6144 (outputMode: json + schemaComplexity != none) |
| FR-OPT-3 | Optimixer.create({ persistence: 'predict-only' }) — no Activix/Mongo required |
| FR-OPT-4 | Predict returns recommendedMaxTokens, requestId, confidence, bucketKey |
| FR-OPT-5 | completeAiMaxTokensPrediction is best-effort; no-op without requestId; persistence errors do not throw |
| FR-OPT-6 | Model caps from modelProfile; unknown models use UNKNOWN_MODEL_DEFAULTS (see exports) |
| FR-OPT-7 | 3.3.0 — modelProfile accepts concrete + profile+choice; bundled @x12i/ai-profiles resolution |
| FR-OPT-8 | 3.3.0 — Warmup skips unrecoverable historical rows; init loads all valid evidence |
| FR-OPT-9 | 3.3.0 — Reasoning effort reconciled after catalog resolve (CR-14); no warmup init throw on legacy rows |
Token budget model
input tokens = prompt + messages + tools + context (context fit, cost, latency)
visible output = caller-declared expected visible tokens (fixed or relative)
reasoning budget = reserve by reasoningEffort (low/medium/high)
max_tokens = min(generation budget, model.maxOutputTokens, contextWindow - input - margin)Non-reasoning (reasoningEffort: 'not-applicable'): 5k-word input + yes/no answer → ~10 visible tokens → ~512 max (not thousands).
Reasoning (reasoningEffort: 'medium'): same task → visible ~10 + reasoning reserve ~3000 + margins → ~3500+ max.
Context overflow: 100k input on 128k window clamps output to available headroom regardless of requested budget.
Quick start
import { Activix } from '@x12i/activix';
import { Optimixer } from '@x12i/optimixer';
const activix = await Activix.create({
collection: 'ai-gateway-activities',
mongoUri: process.env.MONGO_URI,
});
const optimixer = await Optimixer.create({
activixClient: activix,
activixCollection: 'ai-gateway-activities', // must match Activix collection name
pipelines: { aiMaxTokens: { enabled: true } },
});
const prediction = await optimixer.predictAiMaxTokens({
templateId: 'summarize',
inputSize: 1200,
contextSize: 800,
acceptableRisk: 'medium',
reasoningEffort: 'not-applicable',
outputIntent: {
mode: 'relative',
expectedVisibleTokens: 900,
outputToInputRatio: 0.05,
},
taskType: 'summarization',
outputMode: 'markdown',
modelProfile: {
provider: 'openai',
model: 'gpt-4o-mini',
outputTokenParam: 'max_tokens',
contextWindow: 128000,
maxOutputTokens: 16384,
},
runContext: { sessionId: 'sess-1', jobId: 'job-1' },
});
await aiClient.call({ ...request, ...prediction.providerParams });
const result = await optimixer.completeAiMaxTokensPrediction({
requestId: prediction.requestId,
actual: {
promptTokens: 900,
completionTokens: 420,
totalTokens: 1320,
finishReason: 'stop',
latencyMs: 800,
},
});
if (result.retryPrediction) {
await aiClient.call({ ...request, ...result.retryPrediction.providerParams });
}Lifecycle
predictAiMaxTokens
→ Activix startRecord (prediction evidence)
→ caller uses providerParams on LLM request
completeAiMaxTokensPrediction
→ update Activix record + in-memory buckets (+ optional profile summaries)requestId is the Activix activityId from startRecord.
Do not call Activix separately for each predict/complete — Optimixer owns that lifecycle.
Request fields
| Field | Required | Role |
|-------|----------|------|
| templateId | yes | Stable learning identity (action/template) |
| inputSize | yes | Estimated input tokens (context fit — not the generation ceiling) |
| reasoningEffort | yes | not-applicable | none | low | medium | high — may be promoted from not-applicable when catalog infers reasoning (CR-14) |
| outputIntent | yes | Fixed or relative visible output guess (see below) |
| contextSize | no | Estimated context tokens (default 0) |
| acceptableRisk | no | very-low | low | medium | high (default medium) → percentile |
| modelProfile | no | provider, model, profileChoice (ai-profiles profile/choice), legacy alias, caps, reasoning — see 3.3.0 resilience |
| taskType | no | e.g. summarization, extraction, code-generation |
| outputMode | no | text, json, markdown, code, schema |
| schemaComplexity | no | none, small, medium, large — with outputMode: json triggers structured floor |
| expectedVerbosity | no | Metadata only (does not drive budget in 3.0) |
| plannedTools | no | Tool names for bucket identity |
| constraints | no | Caller min/max/preferred caps, retry flags |
| runContext | no | Same envelope as Activix (sessionId, jobId, …) |
| dryRun | no | Predict without Activix write (requestId is '') |
outputIntent
| Mode | Fields | Cold-start behavior |
|------|--------|-------------------|
| fixed | expectedVisibleTokens | Uses caller guess; input size does not inflate visible output |
| relative | expectedVisibleTokens, optional outputToInputRatio | max(guess, inputTokens × ratio) until history learns |
Use modelProfile for provider/model or @x12i/ai-profiles profile+choice — see accepted shapes above.
When the bundled catalog marks the resolved model as reasoning-capable, Optimixer sets modelProfile.reasoning.enabled: true automatically.
Stable error codes (throw OptimixerError; check error.code):
| Code | When |
|------|------|
| OPTIMIXER_REASONING_PROFILE_INCONSISTENT | reasoningEffort is low/medium/high but the resolved model is not reasoning-capable |
| OPTIMIXER_PREDICTION_INPUT_INVALID | Missing or invalid required predict inputs |
Both codes include remediationHint where applicable.
Prediction output (required consumer contract)
| Field | Meaning |
|-------|---------|
| recommendedMaxTokens | Final clamped budget (primary field) |
| requestId | Activix activityId; '' when not persisted (predict-only / dry-run / record failure) |
| confidence | none | low | medium | high |
| bucketKey | Bucket used for evidence / learning identity |
Additional fields:
| Field | Meaning |
|-------|---------|
| recommendedMaxCompletionTokens | Completion budget after reasoning split |
| providerParams | Spread into LLM request (max_tokens, max_completion_tokens, …) |
| dataState | exact-history, fallback-history, task-shape-history, global-history, cold-start |
| learningPhase | bootstrap | early | mature — why the budget is conservative or tight |
| templateSampleCount | Completed samples for this templateId at predict time |
| bootstrapMultiplierApplied | Set during bootstrap when a high safety multiplier was applied to outputIntent |
| evidenceConfidence | exact | related | prior | cold-start |
| signature / signatureKey | Canonical prediction identity |
| evidence, explanation, riskPolicy | Debug / Studio explainability |
| reasoningBudget, contextBudget, clamping | Budget breakdown when applicable |
Predict-only mode (no Activix)
const optimixer = await Optimixer.create({ persistence: 'predict-only' });
const prediction = await optimixer.predictAiMaxTokens({
templateId: 'classify',
inputSize: 7500,
reasoningEffort: 'not-applicable',
outputIntent: { mode: 'fixed', expectedVisibleTokens: 10 },
});
// prediction.requestId === ''Or use the pure function with an empty bucket store:
import { predictAiMaxTokensV1, BucketStore } from '@x12i/optimixer';Unknown model defaults (FR-OPT-6)
import { UNKNOWN_MODEL_DEFAULTS } from '@x12i/optimixer';
// { contextWindow: 128_000, maxOutputTokens: 16_384, outputTokenParam: 'max_tokens' }Pass explicit modelProfile.contextWindow / maxOutputTokens to override catalog defaults. Known models resolve from bundled @x12i/ai-profiles ^3.0.0 via a single resolveBundledInput call; reasoning capability uses isReasoningModel for CR-14 effort reconciliation when reasoning.enabled is unset.
Configuration
Optimixer.create({
activixClient,
activixCollection: 'ai-gateway-activities',
pipelines: {
aiMaxTokens: {
enabled: true,
defaultMaxTokens: 4096,
stats: { minSamplesExact: 20, minSamplesFallback: 10 },
bootstrap: {
minSamplesBeforePredict: 2, // predict from history on 3rd call per templateId
safetyMultiplier: 4, // high budget during bootstrap (tune up e.g. 6)
earlyPercentile: 'p99', // conservative until exact bucket reaches minSamplesExact
},
contextReserve: { minTokens: 1000, percentOfContext: 0.02, maxTokens: 8000 },
evidenceStrategy: { mode: 'best_available', minSamplesExact: 20 },
warmup: { enabled: true, profileFirst: true, rawRows: { maxRecords: 500 } },
summaryPersistence: { enabled: true, mode: 'batched', batchIntervalMs: 30000 },
templates: [/* local task/output budgets */],
policies: [/* risk → percentile overrides */],
},
},
});Dry-run
const preview = await optimixer.predictAiMaxTokens({
templateId: 'summarize',
inputSize: 1200,
contextSize: 800,
acceptableRisk: 'medium',
reasoningEffort: 'not-applicable',
outputIntent: { mode: 'relative', expectedVisibleTokens: 900 },
modelProfile: { provider: 'openai', model: 'gpt-4o-mini' },
dryRun: true,
});
// preview.requestId === ''
// complete without requestId is a no-op (does not throw)Stats API
const stats = optimixer.getAiMaxTokensStats({ templateId: 'summarize', model: 'gpt-4o-mini' });
// stats.buckets[].sampleCount, completionTokens percentiles, ratesStandalone mode
const optimixer = await Optimixer.create({
activix: {
collection: 'optimixer-activities',
mongoUri: process.env.MONGO_URI,
storageMode: 'automatic',
},
});Requires @x12i/activix as a peer dependency.
Activix records
Optimixer has no dedicated Mongo collection. In learning mode it writes standard Activix activity rows (default collection ai-actions, database activitix).
Find prediction rows:
db.getCollection('ai-actions').find({
'outer.metadata.optimizer': 'optimixer',
'outer.metadata.kind': 'optimixer:prediction',
'outer.metadata.pipelineId': 'ai.max_tokens.v1',
status: 'completed',
}).sort({ startTime: -1 }).limit(20)Find profile summary rows:
db.getCollection('ai-actions').find({
'outer.metadata.kind': 'optimixer:profile',
'outer.metadata.pipelineId': 'ai.max_tokens.v1',
})| Location | Content |
|----------|---------|
| outer.input | Normalized prediction request (outputIntent, sizes, model profile, …) |
| outer.output | Full decision; on complete adds actual, learning, optional event |
| outer.optimixer | Telemetry: phase, bootstrap multiplier, prediction vs actual fit, token breakdown |
| outer.metadata.kind | optimixer:prediction (ACTIVIX_METADATA_KIND_OPTIMIXER_PREDICTION) |
| outer.metadata.pipelineId | ai.max_tokens.v1 |
outer.optimixer schema
Written at predict; enriched at complete:
| Field | Meaning |
|-------|---------|
| phase | bootstrap | early | mature |
| templateSampleCount | Template-level samples at predict time |
| bootstrap.applied / bootstrap.multiplier | Whether high-safety bootstrap multiplier was used |
| prediction | Predicted visible/completion tokens, recommendedMaxTokens, outputIntent |
| actual | Observed prompt, reasoning, visible, completion, total tokens |
| fit.status | within_budget | exceeded_prediction | truncated |
| fit.exceededByTokens | How far completion exceeded recommendedMaxTokens when truncated |
| fit.predictedVsActualDelta | Predicted visible minus actual visible |
| fit.intentVsActualRatio | Actual visible / input tokens (relative mode) |
| fit.tokenBreakdown | inputShare, reasoningShare, visibleShare of total tokens |
Profile aggregate rows use outer.metadata.kind = optimixer:profile.
Bootstrap learning phases
| Phase | Trigger | Budget strategy |
|-------|---------|-----------------|
| bootstrap | Fewer than bootstrap.minSamplesBeforePredict (default 2) completed samples for this templateId | outputIntent estimate × bootstrap.safetyMultiplier (default 4); global/fallback history ignored |
| early | Template bootstrapped but exact bucket < minSamplesExact (20) | History at bootstrap.earlyPercentile (default p99) + headroom |
| mature | Exact bucket ≥ 20 samples | Normal risk → percentile mapping |
Max-token-limit failures
Always call completeAiMaxTokensPrediction when the provider stops for token limit (finishReason such as length). Optimixer:
- Records
outer.output.event.type = 'max_tokens_too_low' - Learns a lower bound (not “required = truncated count”)
- May return
retryPredictionwith bumpedproviderParams
Warmup
On Optimixer.create() in learning mode, Optimixer:
- Loads
optimixer:profilesummaries when present (warmup.profileFirst) - Replays recent completed
optimixer:predictionrows viafindRecords - Rebuilds multi-level in-memory bucket stats
3.3.0 resilience: each raw row is restored and normalized through the same path as live predict (resolveModelProfile + CR-14 effort reconcile). Rows that cannot be normalized (missing required fields, unrecoverable input) are skipped — logged at optimixer.warmup.row.skipped — without failing init. Mixed alias/concrete historical modelProfile values (e.g. cheap/default, openrouter + gemini-2.5-flash-lite) are supported.
Diagnostics (@x12i/logxer)
Optimixer is the only Activitix monorepo package that depends on @x12i/logxer (pinned 4.5.0). Activix itself uses lightweight console diagnostics.
ENABLE_OPTIMIXER_LOGXER=true— enable full logxer output (errors-only by default)OPTIMIXER_LOGS_LEVEL—debug,info,warn,error,verbose, oroff/none/silent(defaultwarnwhen enabled and unset)LOGXER_PACKAGE_LEVELS— bulk levels, e.g.OPTIMIXER:debug,OTHER_PKG:off(see logxerdocs/package-log-levels-stack.md)logging?: StackLoggingOptionsonOptimixer.createand logger helpers — pass host levels in code
import {
createOptimixerLogxer,
isOptimixerDiagnosticLoggingEnabled,
resolveOptimixerInternalLogger,
type StackLoggingOptions,
} from '@x12i/optimixer';
const log = createOptimixerLogxer({
logging: { packageLevels: { OPTIMIXER: 'debug' } },
});Environment configuration
Operators configure Activix and Optimixer using a unified settings paradigm:
# Master enable switch (Default: true)
OPTIMIXER_ENABLED=true
# MongoDB URI ladder (Resolves OPTIMIXER_MONGO_URI ?? MONGO_LOGS_URI ?? MONGO_URI)
OPTIMIXER_MONGO_URI=mongodb://localhost:27017
# MongoDB Database Name (Resolves ACTIVIX_DB_NAME ?? MONGO_AI_LOGS_DB ?? MONGO_LOGS_DB ?? MONGO_DB ?? 'activitix')
MONGO_LOGS_DB=activitix
# MongoDB Collection Target (Resolves OPTIMIXER_ACTIVIX_COLLECTION ?? ACTIVIX_EXTRA_ACTIVITY_COLLECTIONS with 'ai-actions' ?? 'ai-actions')
OPTIMIXER_ACTIVIX_COLLECTION=ai-actions
# Dev warmup sample cap limit (Default: 500)
OPTIMIXER_WARMUP_MAX_RECORDS=500
# Optional cold-start default max tokens pipeline fallback
OPTIMIXER_DEFAULT_MAX_TOKENS=2000
# Optional fixed buffer added to every recommended max tokens result
OPTIMIXER_MAX_TOKENS_BUFFER=256Load and resolve these values seamlessly in node runtimes or server scripts using the built-in helper:
import { resolveOptimixerActivixConfigFromEnv } from '@x12i/optimixer';
const config = resolveOptimixerActivixConfigFromEnv(process.env);
console.log(config.activixCollection); // e.g. "ai-actions"
console.log(config.dbName); // e.g. "activitix"Dashboards, Vite BFF routes, and React UI for operator tooling live in the client app (Studio), not in this package. The client calls Optimixer.create, predictAiMaxTokens, getAiMaxTokensStats, etc. over its own HTTP layer.
Monorepo
| Topic | Doc | |-------|-----| | Architecture deep-dive | explained.md | | Consumer migration | ../../.docs/MIGRATION-CONSUMERS.md | | Publishing | ../../.docs/PUBLISHING.md | | Changelog | CHANGELOG.md |
Scripts
npm run build
npm run test:allFrom repo root: npm run build:optimixer · npm run test:optimixer
License
Athenix License
