@safetnsr/autotune

v0.1.0

Published

4 months ago

GPU-poor autoresearch — calibrate any scoring tool against ground truth

0High
0Medium
0Low

safetnsr

autoresearch calibration optimization scoring parameter-tuning gpu-poor

autotune

GPU-poor autoresearch. calibrate any scoring tool by iterating parameters against ground truth.

inspired by karpathy/autoresearch — the same pattern (iterate, measure, keep or discard) applied without a GPU.

what it does

you have a scoring tool with tunable parameters. you have a dataset with expected results. autotune:

mutates your parameters
runs your evaluation
measures the metric
keeps improvements, discards failures
repeats

no machine learning. no GPU. just systematic parameter search with a clear metric.

quickstart

npx @safetnsr/autotune init

this creates three files:

autotune.json — defines your parameters and their ranges
params.json — your current parameter values
eval.js — your evaluation script (you implement this)

edit eval.js to run your tool against a dataset and output a metric. then:

npx @safetnsr/autotune run --iterations 100

autotune will iterate 100 times, logging each attempt and keeping only improvements.

example: calibrating a code quality scorer

we used autotune to calibrate vet against 43 public repos. correlation went from -0.32 to +0.83 across 13 iterations.

{
  "name": "vet-calibration",
  "params": "thresholds.json",
  "eval": "bash run-calibration.sh",
  "metric": "correlation",
  "higherIsBetter": true,
  "maxIterations": 50,
  "strategy": "hill-climb",
  "paramDefs": [
    {
      "path": "category_weights",
      "type": "weight-group",
      "step": 0.05,
      "members": [
        "category_weights.security",
        "category_weights.integrity",
        "category_weights.debt",
        "category_weights.deps"
      ]
    },
    {
      "path": "integrity.empty_catch_error_penalty",
      "type": "number",
      "min": 1,
      "max": 15,
      "step": 1
    }
  ]
}

full writeup: from -0.32 to +0.83

use cases

scoring algorithms — calibrate weights and thresholds against labeled data
prompt optimization — iterate on prompt templates, measure output quality
config tuning — optimize build configs, deploy settings, rate limits
any parameter search — if you can measure it, you can autotune it

how it works

autotune uses hill-climbing with random restarts:

pick 1-3 random parameters
mutate each by 1-3 steps in a random direction
run eval, extract metric
if metric improved → keep new params
if metric worsened → revert to previous best
repeat

results are logged as JSONL in .autotune/ — every iteration, every change, every metric.

cli

autotune init                     # scaffold project
autotune run [--iterations N]     # run optimization loop
autotune status                   # show latest run summary
autotune help                     # show help

programmatic

import { run } from '@safetnsr/autotune';

const config = JSON.parse(fs.readFileSync('autotune.json', 'utf-8'));
const summary = run(config, process.cwd());

console.log(`improved from ${summary.startMetric} to ${summary.bestMetric}`);

license

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme