@arkone_ai/autoresearch

v1.0.0

Published

4 months ago

Claude Code skill: apply the autoresearch methodology to any LLM application or model training project

0High
0Medium
0Low

sgthomas

claude claude-code claude-skill llm ai autoresearch prompt-engineering

claude-skill-autoresearch

A Claude Code skill that applies the autoresearch methodology to any LLM-powered project — API routes, prompt iteration, agents, evaluation pipelines, or full model training.

Install

npm install -g claude-skill-autoresearch

The skill is automatically copied to ~/.claude/skills/autoresearch.md on install.

Usage

In any Claude Code session:

/autoresearch

Claude will apply the autoresearch discipline to whatever AI work you're doing.

What it does

The autoresearch methodology was originally designed for autonomous LLM training experiments (run overnight, keep/discard based on a single metric). This skill generalizes those principles to any AI work.

The universal structure — every AI task maps to the same three parts:

| Role | LLM Application | Model Training | |---|---|---| | Modifiable | Prompt, context, schema, model params | Architecture, optimizer, hyperparameters | | Locked | Evaluation function, test set, metric | Dataloader, tokenizer, time budget | | Findings log | findings.json | results.tsv |

Core principles (apply to everything):

Single metric, chosen upfront, never changed mid-experiment
One change at a time — isolate what caused the improvement
Keep/discard via git reset — no exceptions
Simplicity criterion — equal metric with less code is a win
Autonomous loop — never stop to ask, run until manually halted

Mode A: LLM Application — for API routes, prompt chains, agents:

findings.json pattern: log every run's change, metric, status, observations
History injection: inject prior runs into the prompt so the LLM tracks its own progress
Context accumulation: failures → system prompt constraints; successes → few-shot examples

Mode B: Model Training — for PyTorch/JAX training loops:

Modern transformer architecture defaults (RoPE, Flash Attention, GQA, RMSNorm, softcap)
Heterogeneous optimizer (Muon for matrices, AdamW for embeddings/scalars)
Training loop hygiene (GC management, loss explosion fast-fail, time budget)

Origin

Based on Andrej Karpathy's autoresearch project — a single-GPU LLM training setup designed for autonomous overnight experimentation by AI agents.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords