get-shit-tested
v1.0.0
Published
Your AI Quality Champion — context engineering for tests that actually prove correct behavior.
Maintainers
Readme
GET SHIT TESTED
Your AI Quality Champion — the context engineering layer that makes quality a practice, not an afterthought.
AI-generated tests have a bad reputation.
You ask Claude to write tests, it generates 50 expect(true).toBe(true) assertions, mocks everything into meaninglessness, and calls it done. You get green CI and zero confidence.
GST fixes that.
GST is a Quality Champion for your codebase. It doesn't just generate tests — it reads your tickets, researches your domain, understands your architecture, enforces quality standards, finds real bugs in your source, and gets smarter with every run. It advocates for quality the way a great senior engineer would: with context, with standards, and with memory.
The complexity is in the system, not in your workflow. Behind the scenes: ticket integration, domain research, coverage analysis, architecture-aware strategy, type-aware generation, parallel execution, multi-layer quality review, mutation testing, self-learning. What you see: just describe what you want tested.
What is a Quality Champion?
A Quality Champion is not a linter. It's not a coverage badge. It's not a test-count metric.
A Quality Champion is the force in your team that:
- Reads intent — understands what a feature is supposed to do before deciding how to test it
- Enforces standards — refuses to ship tests that can't fail, tests that test the wrong thing, or tests that lie with false confidence
- Finds real bugs — not just exercises code paths, but proves correct behavior and catches incorrect behavior
- Remembers context — knows what went wrong last time in this area and doesn't let it happen again
- Speaks fluent domain — researches what needs to be tested in payments, auth, notifications, AI pipelines — not just what the code does
- Improves with use — gets better at understanding this specific codebase with every ticket it works on
GST is that champion. Automated, always on, and permanently learning.
Install
npx get-shit-tested@latestWorks with Claude Code, OpenCode, and other AI coding runtimes. Installs to ~/.claude/skills/gst/.
Usage — just describe what you want
GST installs as a skill that triggers when you describe a testing need. The most powerful form: reference a ticket.
"I need to test LIVE-1111"
"Write tests for PROJ-42"
"I need to test AB#234"Or without a ticket:
"I need to test my auth module"
"Write tests for src/payments/stripe.ts"
"I have no tests in this project"
"My CI is failing"What happens when you reference a ticket
GST runs three things in parallel before asking you a single question:
Reads the ticket — from Jira, Azure DevOps, Linear, or GitHub Issues (whichever you have configured via MCP or CLI). Extracts title, description, acceptance criteria, linked tickets.
Researches the domain — searches the web for testing patterns, edge cases, and security considerations specific to the feature type (auth, payments, file uploads, etc.).
Explores your repo — finds source files related to the ticket via filename search, directory structure, and git history. Reads existing tests. Extracts your exact test conventions: import style, mock patterns, naming, assertion style.
Then it produces a plan. You review and confirm it. Only then does it write tests — matching your existing architecture exactly.
## Test Plan: LIVE-1111 — User Login
### Missing — Critical
1. loginUser — invalid credentials → returns 401 (not 403)
2. loginUser — account locked after 5 failures → returns 423
3. validateToken — expired token → throws TokenExpiredError
### Missing — Edge cases (from domain research)
4. loginUser — empty string password → returns 400
5. loginUser — timing attack surface → uses constant-time comparison
Reply "confirm" to generate, "edit N" to change, "skip N" to remove.Under the hood: The Six-Command Loop
GST routes your request to the appropriate combination of these commands:
/gst-analyze → Map what exists and what's missing
/gst-plan → Strategy: what to test, how, in what order
/gst-generate [N] → Write tests in parallel waves with fresh context per file
/gst-review [N] → Validate quality — no trivial assertions allowed
/gst-run → Run the suite, diagnose failures, spawn fix agents
/gst-ship [N] → PR with coverage deltaYou can also invoke these directly for more control. Each command picks up where the last left off. State persists across sessions in .testing/.
Why It Works
Context rot is the enemy of good tests
When you ask an AI to "write tests for my whole codebase", it fills its context with source code and produces shallow coverage — the first files get decent tests, the last files get it('should work', () => {}).
GST solves this by spawning a fresh 200K-token subagent per module. Each agent knows exactly what it's testing, sees the relevant source, and starts clean.
Tests need strategy, not just code
Bad tests test implementation. Good tests test behavior. GST's planner analyzes your code and produces a structured test plan that prioritizes: critical paths, error boundaries, integration seams, and untested business logic.
Quality gates before commit
A dedicated review agent checks every generated test file for: meaningful assertions, proper mock hygiene, boundary conditions, and negative cases. Shallow tests get flagged before they land.
State Files
GST maintains human-readable artifacts in .testing/:
| File | Purpose |
|------|---------|
| COVERAGE.md | Live map of tested vs. untested code |
| PLAN.md | Test phases with scope and status |
| STATE.md | Current position, decisions, blockers |
| config.json | Framework config, model profiles, feature flags |
Configuration
/gst-settingsOr edit .testing/config.json directly:
{
"mode": "interactive",
"models": { "profile": "balanced" },
"framework": "auto",
"coverage": { "target": 80, "strict": false },
"workflow": {
"review": true,
"parallel": true
}
}Model profiles: quality (Opus everywhere), balanced (Sonnet for generation, Opus for review), budget (Haiku for generation)
Framework detection: auto detects Jest, Vitest, pytest, xUnit, RSpec, etc. Override with "framework": "jest".
Docs
- User Guide — full walkthrough, all flags
- Commands — complete reference with examples
- Architecture — subagent design, state management, quality gates
Comparison
| | Traditional AI testing | GST | |--|--|--| | Context | Bloated single session | Fresh per-module subagents | | Strategy | None | Planner agent with priority ordering | | Quality | Whatever gets generated | Review agent with quality gates | | Coverage | Random | Systematic gap analysis | | Failures | Your problem | Debug agents with fix plans |
Made to pair with GSD. Build with GSD, test with GST.
