create-autoresearch

v0.1.0

Published

a month ago

Scaffold autonomous webapp research loops — adapts karpathy/autoresearch for any web application

0High
0Medium
0Low

jamesrosing

autoresearch ai autonomous webapp optimization lighthouse testing

create-autoresearch

Scaffold autonomous AI research loops for any web application. Adapts karpathy/autoresearch — where an AI agent autonomously optimizes ML training code — for webapp optimization.

The agent modifies your code, runs evaluation, keeps improvements, reverts failures, and repeats forever. You wake up to a log of experiments and a better codebase.

Quick Start

# Run in your webapp project directory
npx create-autoresearch

# Test the evaluation harness
bash autoresearch/evaluate.sh --mode quality > run.log 2>&1
grep "^GUARDRAILS:\|^SCORE:" run.log

# Start an autonomous session
# Open Claude Code and say:
# "Read autoresearch/program.md and kick off clean mode"

What It Does

create-autoresearch scaffolds an autoresearch/ directory into your project with:

program.md — Agent instructions (the "skill" the AI reads)
9 mode docs — Strategy guides for each optimization area
Evaluation harness — Shell scripts + TypeScript analysis tools that measure a 0-100 score
Guardrails — Hard constraints that prevent the agent from breaking your build, tests, or compliance

The agent runs the loop autonomously:

modify code → commit → evaluate → score improved? → keep : revert → repeat

Nine Research Modes

| Mode | What It Optimizes | Metric | |------|-------------------|--------| | perf | Page load, bundle size, Core Web Vitals | Lighthouse + LCP + bundle KB | | quality | Tests, types, lint | Coverage + TS errors + lint issues | | feature | Build features against specs | Acceptance test pass rate | | clean | Remove dead code, organize files, fix stale docs | Unused exports + dead files + duplicates | | security | Vulnerabilities, auth gaps, injection risks | npm audit + unauth'd routes + secrets | | a11y | WCAG compliance | Lighthouse a11y + axe violations + alt text | | wiring | Frontend↔backend integration | Orphan endpoints + broken refs + missing error states | | styling | Theme/design system consistency | Hardcoded values + dark mode gaps + raw HTML elements | | marketing | SEO, meta tags, structured data | Lighthouse SEO + meta coverage + schema.org |

Auto-Detection

The CLI automatically detects:

Framework: Next.js (App/Pages Router), Nuxt, SvelteKit, React+Vite, Remix
Directories: app/, src/, components/, lib/, hooks/, __tests__/
Design system: shadcn/ui, Headless UI, Mantine, Tailwind
Integrations: Supabase, Prisma, Stripe, Clerk, Auth0, Firebase
npm scripts: build, type-check/typecheck, test, lint

Then asks 4-6 questions for what it can't detect (marketing pages dir, compliance profile, domain description).

Configuration

After init, autoresearch.config.ts in your project root contains all settings:

{
  framework: 'nextjs-app',
  dirs: { app: 'app', components: 'components', api: 'app/api', ... },
  scripts: { build: 'build', typecheck: 'type-check', test: 'test', lint: 'lint' },
  designSystem: { componentLibrary: 'shadcn', cssFramework: 'tailwind', fonts: ['Inter'], ... },
  integrations: { auth: 'supabase', database: 'supabase', payments: 'stripe' },
  compliance: { profile: 'standard', banConsoleLog: false },
  api: { publicRoutes: ['health', 'webhooks', 'cron'], authPatterns: [...] },
  domain: { description: 'E-commerce platform', notes: { perf: [...], ... } },
}

Edit the config, then run npx create-autoresearch --regenerate to update all generated files.

How the Agent Loop Works

Agent creates a branch: autoresearch/<tag>-<mode>
Runs baseline evaluation, records score in results.tsv
Makes a small code change, commits
Runs bash autoresearch/evaluate.sh --mode <mode>
Guardrails pass? Build, type-check, tests must all succeed
Score improved? Keep the commit. Otherwise, revert.
Logs result to results.tsv
Repeats forever until you stop it

Each experiment takes 2-8 minutes depending on the mode. Overnight (~8 hours), expect ~60-100 experiments.

Requirements

Node.js 18+
A webapp with package.json and npm run build
An AI coding agent (Claude Code, Codex, etc.)

Optional but recommended:

TypeScript with a type-check or typecheck script (guardrails skip if missing)
Test suite with a test or test:unit script (prefers test:unit to avoid integration tests that need env vars)
For perf/a11y/marketing modes: lighthouse CLI (npm i -D lighthouse)

Flags

| Flag | Description | |------|-------------| | --dry-run | Show detected config without writing files | | --regenerate | Re-render all generated files from config |

Inspired By

karpathy/autoresearch — autonomous ML research where an AI agent modifies training code, runs 5-minute experiments, and iterates overnight. This project adapts the same pattern for web applications.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

create-autoresearch

Quick Start

What It Does

Nine Research Modes

Auto-Detection

Configuration

How the Agent Loop Works

Requirements

Flags

Inspired By

License