create-autoresearch
v0.1.0
Published
Scaffold autonomous webapp research loops — adapts karpathy/autoresearch for any web application
Maintainers
Readme
create-autoresearch
Scaffold autonomous AI research loops for any web application. Adapts karpathy/autoresearch — where an AI agent autonomously optimizes ML training code — for webapp optimization.
The agent modifies your code, runs evaluation, keeps improvements, reverts failures, and repeats forever. You wake up to a log of experiments and a better codebase.
Quick Start
# Run in your webapp project directory
npx create-autoresearch
# Test the evaluation harness
bash autoresearch/evaluate.sh --mode quality > run.log 2>&1
grep "^GUARDRAILS:\|^SCORE:" run.log
# Start an autonomous session
# Open Claude Code and say:
# "Read autoresearch/program.md and kick off clean mode"What It Does
create-autoresearch scaffolds an autoresearch/ directory into your project with:
program.md— Agent instructions (the "skill" the AI reads)- 9 mode docs — Strategy guides for each optimization area
- Evaluation harness — Shell scripts + TypeScript analysis tools that measure a 0-100 score
- Guardrails — Hard constraints that prevent the agent from breaking your build, tests, or compliance
The agent runs the loop autonomously:
modify code → commit → evaluate → score improved? → keep : revert → repeatNine Research Modes
| Mode | What It Optimizes | Metric |
|------|-------------------|--------|
| perf | Page load, bundle size, Core Web Vitals | Lighthouse + LCP + bundle KB |
| quality | Tests, types, lint | Coverage + TS errors + lint issues |
| feature | Build features against specs | Acceptance test pass rate |
| clean | Remove dead code, organize files, fix stale docs | Unused exports + dead files + duplicates |
| security | Vulnerabilities, auth gaps, injection risks | npm audit + unauth'd routes + secrets |
| a11y | WCAG compliance | Lighthouse a11y + axe violations + alt text |
| wiring | Frontend↔backend integration | Orphan endpoints + broken refs + missing error states |
| styling | Theme/design system consistency | Hardcoded values + dark mode gaps + raw HTML elements |
| marketing | SEO, meta tags, structured data | Lighthouse SEO + meta coverage + schema.org |
Auto-Detection
The CLI automatically detects:
- Framework: Next.js (App/Pages Router), Nuxt, SvelteKit, React+Vite, Remix
- Directories:
app/,src/,components/,lib/,hooks/,__tests__/ - Design system: shadcn/ui, Headless UI, Mantine, Tailwind
- Integrations: Supabase, Prisma, Stripe, Clerk, Auth0, Firebase
- npm scripts:
build,type-check/typecheck,test,lint
Then asks 4-6 questions for what it can't detect (marketing pages dir, compliance profile, domain description).
Configuration
After init, autoresearch.config.ts in your project root contains all settings:
{
framework: 'nextjs-app',
dirs: { app: 'app', components: 'components', api: 'app/api', ... },
scripts: { build: 'build', typecheck: 'type-check', test: 'test', lint: 'lint' },
designSystem: { componentLibrary: 'shadcn', cssFramework: 'tailwind', fonts: ['Inter'], ... },
integrations: { auth: 'supabase', database: 'supabase', payments: 'stripe' },
compliance: { profile: 'standard', banConsoleLog: false },
api: { publicRoutes: ['health', 'webhooks', 'cron'], authPatterns: [...] },
domain: { description: 'E-commerce platform', notes: { perf: [...], ... } },
}Edit the config, then run npx create-autoresearch --regenerate to update all generated files.
How the Agent Loop Works
- Agent creates a branch:
autoresearch/<tag>-<mode> - Runs baseline evaluation, records score in
results.tsv - Makes a small code change, commits
- Runs
bash autoresearch/evaluate.sh --mode <mode> - Guardrails pass? Build, type-check, tests must all succeed
- Score improved? Keep the commit. Otherwise, revert.
- Logs result to
results.tsv - Repeats forever until you stop it
Each experiment takes 2-8 minutes depending on the mode. Overnight (~8 hours), expect ~60-100 experiments.
Requirements
- Node.js 18+
- A webapp with
package.jsonandnpm run build - An AI coding agent (Claude Code, Codex, etc.)
Optional but recommended:
- TypeScript with a
type-checkortypecheckscript (guardrails skip if missing) - Test suite with a
testortest:unitscript (preferstest:unitto avoid integration tests that need env vars) - For
perf/a11y/marketingmodes:lighthouseCLI (npm i -D lighthouse)
Flags
| Flag | Description |
|------|-------------|
| --dry-run | Show detected config without writing files |
| --regenerate | Re-render all generated files from config |
Inspired By
karpathy/autoresearch — autonomous ML research where an AI agent modifies training code, runs 5-minute experiments, and iterates overnight. This project adapts the same pattern for web applications.
License
MIT
