get-shit-tested

v1.0.0

Published

a month ago

Your AI Quality Champion — context engineering for tests that actually prove correct behavior.

0High
0Medium
0Low

mental1on

claude-code ai testing quality-champion context-engineering test-generation claude anthropic quality-assurance

GET SHIT TESTED

Your AI Quality Champion — the context engineering layer that makes quality a practice, not an afterthought.

AI-generated tests have a bad reputation.

You ask Claude to write tests, it generates 50 expect(true).toBe(true) assertions, mocks everything into meaninglessness, and calls it done. You get green CI and zero confidence.

GST fixes that.

GST is a Quality Champion for your codebase. It doesn't just generate tests — it reads your tickets, researches your domain, understands your architecture, enforces quality standards, finds real bugs in your source, and gets smarter with every run. It advocates for quality the way a great senior engineer would: with context, with standards, and with memory.

The complexity is in the system, not in your workflow. Behind the scenes: ticket integration, domain research, coverage analysis, architecture-aware strategy, type-aware generation, parallel execution, multi-layer quality review, mutation testing, self-learning. What you see: just describe what you want tested.

What is a Quality Champion?

A Quality Champion is not a linter. It's not a coverage badge. It's not a test-count metric.

A Quality Champion is the force in your team that:

Reads intent — understands what a feature is supposed to do before deciding how to test it
Enforces standards — refuses to ship tests that can't fail, tests that test the wrong thing, or tests that lie with false confidence
Finds real bugs — not just exercises code paths, but proves correct behavior and catches incorrect behavior
Remembers context — knows what went wrong last time in this area and doesn't let it happen again
Speaks fluent domain — researches what needs to be tested in payments, auth, notifications, AI pipelines — not just what the code does
Improves with use — gets better at understanding this specific codebase with every ticket it works on

GST is that champion. Automated, always on, and permanently learning.

Install

npx get-shit-tested@latest

Works with Claude Code, OpenCode, and other AI coding runtimes. Installs to ~/.claude/skills/gst/.

Usage — just describe what you want

GST installs as a skill that triggers when you describe a testing need. The most powerful form: reference a ticket.

"I need to test LIVE-1111"
"Write tests for PROJ-42"
"I need to test AB#234"

Or without a ticket:

"I need to test my auth module"
"Write tests for src/payments/stripe.ts"
"I have no tests in this project"
"My CI is failing"

What happens when you reference a ticket

GST runs three things in parallel before asking you a single question:

Reads the ticket — from Jira, Azure DevOps, Linear, or GitHub Issues (whichever you have configured via MCP or CLI). Extracts title, description, acceptance criteria, linked tickets.
Researches the domain — searches the web for testing patterns, edge cases, and security considerations specific to the feature type (auth, payments, file uploads, etc.).
Explores your repo — finds source files related to the ticket via filename search, directory structure, and git history. Reads existing tests. Extracts your exact test conventions: import style, mock patterns, naming, assertion style.

Then it produces a plan. You review and confirm it. Only then does it write tests — matching your existing architecture exactly.

## Test Plan: LIVE-1111 — User Login

### Missing — Critical
1. loginUser — invalid credentials → returns 401 (not 403)
2. loginUser — account locked after 5 failures → returns 423
3. validateToken — expired token → throws TokenExpiredError

### Missing — Edge cases (from domain research)
4. loginUser — empty string password → returns 400
5. loginUser — timing attack surface → uses constant-time comparison

Reply "confirm" to generate, "edit N" to change, "skip N" to remove.

Under the hood: The Six-Command Loop

GST routes your request to the appropriate combination of these commands:

/gst-analyze        →  Map what exists and what's missing
/gst-plan           →  Strategy: what to test, how, in what order
/gst-generate [N]   →  Write tests in parallel waves with fresh context per file
/gst-review [N]     →  Validate quality — no trivial assertions allowed
/gst-run            →  Run the suite, diagnose failures, spawn fix agents
/gst-ship [N]       →  PR with coverage delta

You can also invoke these directly for more control. Each command picks up where the last left off. State persists across sessions in .testing/.

Why It Works

Context rot is the enemy of good tests

When you ask an AI to "write tests for my whole codebase", it fills its context with source code and produces shallow coverage — the first files get decent tests, the last files get it('should work', () => {}).

GST solves this by spawning a fresh 200K-token subagent per module. Each agent knows exactly what it's testing, sees the relevant source, and starts clean.

Tests need strategy, not just code

Bad tests test implementation. Good tests test behavior. GST's planner analyzes your code and produces a structured test plan that prioritizes: critical paths, error boundaries, integration seams, and untested business logic.

Quality gates before commit

A dedicated review agent checks every generated test file for: meaningful assertions, proper mock hygiene, boundary conditions, and negative cases. Shallow tests get flagged before they land.

State Files

GST maintains human-readable artifacts in .testing/:

| File | Purpose | |------|---------| | COVERAGE.md | Live map of tested vs. untested code | | PLAN.md | Test phases with scope and status | | STATE.md | Current position, decisions, blockers | | config.json | Framework config, model profiles, feature flags |

Configuration

/gst-settings

Or edit .testing/config.json directly:

{
  "mode": "interactive",
  "models": { "profile": "balanced" },
  "framework": "auto",
  "coverage": { "target": 80, "strict": false },
  "workflow": {
    "review": true,
    "parallel": true
  }
}

Model profiles: quality (Opus everywhere), balanced (Sonnet for generation, Opus for review), budget (Haiku for generation)

Framework detection: auto detects Jest, Vitest, pytest, xUnit, RSpec, etc. Override with "framework": "jest".

Docs

User Guide — full walkthrough, all flags
Commands — complete reference with examples
Architecture — subagent design, state management, quality gates

Comparison

Made to pair with GSD. Build with GSD, test with GST.