@safetnsr/autotune
v0.1.0
Published
GPU-poor autoresearch — calibrate any scoring tool against ground truth
Maintainers
Readme
autotune
GPU-poor autoresearch. calibrate any scoring tool by iterating parameters against ground truth.
inspired by karpathy/autoresearch — the same pattern (iterate, measure, keep or discard) applied without a GPU.
what it does
you have a scoring tool with tunable parameters. you have a dataset with expected results. autotune:
- mutates your parameters
- runs your evaluation
- measures the metric
- keeps improvements, discards failures
- repeats
no machine learning. no GPU. just systematic parameter search with a clear metric.
quickstart
npx @safetnsr/autotune initthis creates three files:
autotune.json— defines your parameters and their rangesparams.json— your current parameter valueseval.js— your evaluation script (you implement this)
edit eval.js to run your tool against a dataset and output a metric. then:
npx @safetnsr/autotune run --iterations 100autotune will iterate 100 times, logging each attempt and keeping only improvements.
example: calibrating a code quality scorer
we used autotune to calibrate vet against 43 public repos. correlation went from -0.32 to +0.83 across 13 iterations.
{
"name": "vet-calibration",
"params": "thresholds.json",
"eval": "bash run-calibration.sh",
"metric": "correlation",
"higherIsBetter": true,
"maxIterations": 50,
"strategy": "hill-climb",
"paramDefs": [
{
"path": "category_weights",
"type": "weight-group",
"step": 0.05,
"members": [
"category_weights.security",
"category_weights.integrity",
"category_weights.debt",
"category_weights.deps"
]
},
{
"path": "integrity.empty_catch_error_penalty",
"type": "number",
"min": 1,
"max": 15,
"step": 1
}
]
}full writeup: from -0.32 to +0.83
use cases
- scoring algorithms — calibrate weights and thresholds against labeled data
- prompt optimization — iterate on prompt templates, measure output quality
- config tuning — optimize build configs, deploy settings, rate limits
- any parameter search — if you can measure it, you can autotune it
how it works
autotune uses hill-climbing with random restarts:
- pick 1-3 random parameters
- mutate each by 1-3 steps in a random direction
- run eval, extract metric
- if metric improved → keep new params
- if metric worsened → revert to previous best
- repeat
results are logged as JSONL in .autotune/ — every iteration, every change, every metric.
cli
autotune init # scaffold project
autotune run [--iterations N] # run optimization loop
autotune status # show latest run summary
autotune help # show helpprogrammatic
import { run } from '@safetnsr/autotune';
const config = JSON.parse(fs.readFileSync('autotune.json', 'utf-8'));
const summary = run(config, process.cwd());
console.log(`improved from ${summary.startMetric} to ${summary.bestMetric}`);license
MIT
