Ignore and untrack BMad directories
This commit is contained in:
@@ -1,472 +0,0 @@
|
||||
# Test Quality Review - Validation Checklist
|
||||
|
||||
Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Note: `test-review` is optional and only audits existing tests; it does not generate tests.
|
||||
|
||||
### Test File Discovery
|
||||
|
||||
- [ ] Test file(s) identified for review (single/directory/suite scope)
|
||||
- [ ] Test files exist and are readable
|
||||
- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
|
||||
- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
|
||||
|
||||
### Knowledge Base Loading
|
||||
|
||||
- [ ] tea-index.csv loaded successfully
|
||||
- [ ] `test-quality.md` loaded (Definition of Done)
|
||||
- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
|
||||
- [ ] `network-first.md` loaded (Route intercept before navigate)
|
||||
- [ ] `data-factories.md` loaded (Factory patterns)
|
||||
- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
|
||||
- [ ] All other enabled fragments loaded successfully
|
||||
|
||||
### Context Gathering
|
||||
|
||||
- [ ] Story file discovered or explicitly provided (if available)
|
||||
- [ ] Test design document discovered or explicitly provided (if available)
|
||||
- [ ] Acceptance criteria extracted from story (if available)
|
||||
- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
|
||||
|
||||
---
|
||||
|
||||
## Process Steps
|
||||
|
||||
### Step 1: Context Loading
|
||||
|
||||
- [ ] Review scope determined (single/directory/suite)
|
||||
- [ ] Test file paths collected
|
||||
- [ ] Related artifacts discovered (story, test-design)
|
||||
- [ ] Knowledge base fragments loaded successfully
|
||||
- [ ] Quality criteria flags read from workflow variables
|
||||
|
||||
### Step 2: Test File Parsing
|
||||
|
||||
**For Each Test File:**
|
||||
|
||||
- [ ] File read successfully
|
||||
- [ ] File size measured (lines, KB)
|
||||
- [ ] File structure parsed (describe blocks, it blocks)
|
||||
- [ ] Test IDs extracted (if present)
|
||||
- [ ] Priority markers extracted (if present)
|
||||
- [ ] Imports analyzed
|
||||
- [ ] Dependencies identified
|
||||
|
||||
**Test Structure Analysis:**
|
||||
|
||||
- [ ] Describe block count calculated
|
||||
- [ ] It/test block count calculated
|
||||
- [ ] BDD structure identified (Given-When-Then)
|
||||
- [ ] Fixture usage detected
|
||||
- [ ] Data factory usage detected
|
||||
- [ ] Network interception patterns identified
|
||||
- [ ] Assertions counted
|
||||
- [ ] Waits and timeouts cataloged
|
||||
- [ ] Conditionals (if/else) detected
|
||||
- [ ] Try/catch blocks detected
|
||||
- [ ] Shared state or globals detected
|
||||
|
||||
### Step 3: Quality Criteria Validation
|
||||
|
||||
**For Each Enabled Criterion:**
|
||||
|
||||
#### BDD Format (if `check_given_when_then: true`)
|
||||
|
||||
- [ ] Given-When-Then structure evaluated
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with line numbers
|
||||
- [ ] Examples of good/bad patterns noted
|
||||
|
||||
#### Test IDs (if `check_test_ids: true`)
|
||||
|
||||
- [ ] Test ID presence validated
|
||||
- [ ] Test ID format checked (e.g., 1.3-E2E-001)
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Missing IDs cataloged
|
||||
|
||||
#### Priority Markers (if `check_priority_markers: true`)
|
||||
|
||||
- [ ] P0/P1/P2/P3 classification validated
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Missing priorities cataloged
|
||||
|
||||
#### Hard Waits (if `check_hard_waits: true`)
|
||||
|
||||
- [ ] sleep(), waitForTimeout(), hardcoded delays detected
|
||||
- [ ] Justification comments checked
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with line numbers and recommended fixes
|
||||
|
||||
#### Determinism (if `check_determinism: true`)
|
||||
|
||||
- [ ] Conditionals (if/else/switch) detected
|
||||
- [ ] Try/catch abuse detected
|
||||
- [ ] Random values (Math.random, Date.now) detected
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
#### Isolation (if `check_isolation: true`)
|
||||
|
||||
- [ ] Cleanup hooks (afterEach/afterAll) validated
|
||||
- [ ] Shared state detected
|
||||
- [ ] Global variable mutations detected
|
||||
- [ ] Resource cleanup verified
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
#### Fixture Patterns (if `check_fixture_patterns: true`)
|
||||
|
||||
- [ ] Fixtures detected (test.extend)
|
||||
- [ ] Pure functions validated
|
||||
- [ ] mergeTests usage checked
|
||||
- [ ] beforeEach complexity analyzed
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
#### Data Factories (if `check_data_factories: true`)
|
||||
|
||||
- [ ] Factory functions detected
|
||||
- [ ] Hardcoded data (magic strings/numbers) detected
|
||||
- [ ] Faker.js or similar usage validated
|
||||
- [ ] API-first setup pattern checked
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
#### Network-First (if `check_network_first: true`)
|
||||
|
||||
- [ ] page.route() before page.goto() validated
|
||||
- [ ] Race conditions detected (route after navigate)
|
||||
- [ ] waitForResponse patterns checked
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
#### Assertions (if `check_assertions: true`)
|
||||
|
||||
- [ ] Explicit assertions counted
|
||||
- [ ] Implicit waits without assertions detected
|
||||
- [ ] Assertion specificity validated
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
#### Test Length (if `check_test_length: true`)
|
||||
|
||||
- [ ] File line count calculated
|
||||
- [ ] Threshold comparison (≤300 lines ideal)
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Splitting recommendations generated (if >300 lines)
|
||||
|
||||
#### Test Duration (if `check_test_duration: true`)
|
||||
|
||||
- [ ] Test complexity analyzed (as proxy for duration if no execution data)
|
||||
- [ ] Threshold comparison (≤1.5 min target)
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Optimization recommendations generated
|
||||
|
||||
#### Flakiness Patterns (if `check_flakiness_patterns: true`)
|
||||
|
||||
- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
|
||||
- [ ] Race conditions detected
|
||||
- [ ] Timing-dependent assertions detected
|
||||
- [ ] Retry logic detected
|
||||
- [ ] Environment-dependent assumptions detected
|
||||
- [ ] Status assigned (PASS/WARN/FAIL)
|
||||
- [ ] Violations recorded with recommended fixes
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Quality Score Calculation
|
||||
|
||||
**Violation Counting:**
|
||||
|
||||
- [ ] Critical (P0) violations counted
|
||||
- [ ] High (P1) violations counted
|
||||
- [ ] Medium (P2) violations counted
|
||||
- [ ] Low (P3) violations counted
|
||||
- [ ] Violation breakdown by criterion recorded
|
||||
|
||||
**Score Calculation:**
|
||||
|
||||
- [ ] Starting score: 100
|
||||
- [ ] Critical violations deducted (-10 each)
|
||||
- [ ] High violations deducted (-5 each)
|
||||
- [ ] Medium violations deducted (-2 each)
|
||||
- [ ] Low violations deducted (-1 each)
|
||||
- [ ] Bonus points added (max +30):
|
||||
- [ ] Excellent BDD structure (+5 if applicable)
|
||||
- [ ] Comprehensive fixtures (+5 if applicable)
|
||||
- [ ] Comprehensive data factories (+5 if applicable)
|
||||
- [ ] Network-first pattern (+5 if applicable)
|
||||
- [ ] Perfect isolation (+5 if applicable)
|
||||
- [ ] All test IDs present (+5 if applicable)
|
||||
- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
|
||||
|
||||
**Quality Grade:**
|
||||
|
||||
- [ ] Grade assigned based on score:
|
||||
- 90-100: A+ (Excellent)
|
||||
- 80-89: A (Good)
|
||||
- 70-79: B (Acceptable)
|
||||
- 60-69: C (Needs Improvement)
|
||||
- <60: F (Critical Issues)
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Review Report Generation
|
||||
|
||||
**Report Sections Created:**
|
||||
|
||||
- [ ] **Header Section**:
|
||||
- [ ] Test file(s) reviewed listed
|
||||
- [ ] Review date recorded
|
||||
- [ ] Review scope noted (single/directory/suite)
|
||||
- [ ] Quality score and grade displayed
|
||||
|
||||
- [ ] **Executive Summary**:
|
||||
- [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
|
||||
- [ ] Key strengths listed (3-5 bullet points)
|
||||
- [ ] Key weaknesses listed (3-5 bullet points)
|
||||
- [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
|
||||
|
||||
- [ ] **Quality Criteria Assessment**:
|
||||
- [ ] Table with all criteria evaluated
|
||||
- [ ] Status for each criterion (PASS/WARN/FAIL)
|
||||
- [ ] Violation count per criterion
|
||||
|
||||
- [ ] **Critical Issues (Must Fix)**:
|
||||
- [ ] P0/P1 violations listed
|
||||
- [ ] Code location provided for each (file:line)
|
||||
- [ ] Issue explanation clear
|
||||
- [ ] Recommended fix provided with code example
|
||||
- [ ] Knowledge base reference provided
|
||||
|
||||
- [ ] **Recommendations (Should Fix)**:
|
||||
- [ ] P2/P3 violations listed
|
||||
- [ ] Code location provided for each (file:line)
|
||||
- [ ] Issue explanation clear
|
||||
- [ ] Recommended improvement provided with code example
|
||||
- [ ] Knowledge base reference provided
|
||||
|
||||
- [ ] **Best Practices Examples** (if good patterns found):
|
||||
- [ ] Good patterns highlighted from tests
|
||||
- [ ] Knowledge base fragments referenced
|
||||
- [ ] Examples provided for others to follow
|
||||
|
||||
- [ ] **Knowledge Base References**:
|
||||
- [ ] All fragments consulted listed
|
||||
- [ ] Links to detailed guidance provided
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Optional Outputs Generation
|
||||
|
||||
**Inline Comments** (if `generate_inline_comments: true`):
|
||||
|
||||
- [ ] Inline comments generated at violation locations
|
||||
- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
|
||||
- [ ] Comments added to test files (no logic changes)
|
||||
- [ ] Test files remain valid and executable
|
||||
|
||||
**Quality Badge** (if `generate_quality_badge: true`):
|
||||
|
||||
- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
|
||||
- [ ] Badge format suitable for README or documentation
|
||||
- [ ] Badge saved to output folder
|
||||
|
||||
**Story Update** (if `append_to_story: true` and story file exists):
|
||||
|
||||
- [ ] "Test Quality Review" section created
|
||||
- [ ] Quality score included
|
||||
- [ ] Critical issues summarized
|
||||
- [ ] Link to full review report provided
|
||||
- [ ] Story file updated successfully
|
||||
|
||||
---
|
||||
|
||||
### Step 7: Save and Notify
|
||||
|
||||
**Outputs Saved:**
|
||||
|
||||
- [ ] Review report saved to `{output_file}`
|
||||
- [ ] Inline comments written to test files (if enabled)
|
||||
- [ ] Quality badge saved (if enabled)
|
||||
- [ ] Story file updated (if enabled)
|
||||
- [ ] All outputs are valid and readable
|
||||
|
||||
**Summary Message Generated:**
|
||||
|
||||
- [ ] Quality score and grade included
|
||||
- [ ] Critical issue count stated
|
||||
- [ ] Recommendation provided (Approve/Request changes/Block)
|
||||
- [ ] Next steps clarified
|
||||
- [ ] Message displayed to user
|
||||
|
||||
---
|
||||
|
||||
## Output Validation
|
||||
|
||||
### Review Report Completeness
|
||||
|
||||
- [ ] All required sections present
|
||||
- [ ] No placeholder text or TODOs in report
|
||||
- [ ] All code locations are accurate (file:line)
|
||||
- [ ] All code examples are valid and demonstrate fix
|
||||
- [ ] All knowledge base references are correct
|
||||
|
||||
### Review Report Accuracy
|
||||
|
||||
- [ ] Quality score matches violation breakdown
|
||||
- [ ] Grade matches score range
|
||||
- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
|
||||
- [ ] Violations correctly attributed to quality criteria
|
||||
- [ ] No false positives (violations are legitimate issues)
|
||||
- [ ] No false negatives (critical issues not missed)
|
||||
|
||||
### Review Report Clarity
|
||||
|
||||
- [ ] Executive summary is clear and actionable
|
||||
- [ ] Issue explanations are understandable
|
||||
- [ ] Recommended fixes are implementable
|
||||
- [ ] Code examples are correct and runnable
|
||||
- [ ] Recommendation (Approve/Request changes) is clear
|
||||
|
||||
---
|
||||
|
||||
## Quality Checks
|
||||
|
||||
### Knowledge-Based Validation
|
||||
|
||||
- [ ] All feedback grounded in knowledge base fragments
|
||||
- [ ] Recommendations follow proven patterns
|
||||
- [ ] No arbitrary or opinion-based feedback
|
||||
- [ ] Knowledge fragment references accurate and relevant
|
||||
|
||||
### Actionable Feedback
|
||||
|
||||
- [ ] Every issue includes recommended fix
|
||||
- [ ] Every fix includes code example
|
||||
- [ ] Code examples demonstrate correct pattern
|
||||
- [ ] Fixes reference knowledge base for more detail
|
||||
|
||||
### Severity Classification
|
||||
|
||||
- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
|
||||
- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
|
||||
- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
|
||||
- [ ] Low (P3) issues are minor style/preference (verbose tests)
|
||||
|
||||
### Context Awareness
|
||||
|
||||
- [ ] Review considers project context (some patterns may be justified)
|
||||
- [ ] Violations with justification comments noted as acceptable
|
||||
- [ ] Edge cases acknowledged
|
||||
- [ ] Recommendations are pragmatic, not dogmatic
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Story File Integration
|
||||
|
||||
- [ ] Story file discovered correctly (if available)
|
||||
- [ ] Acceptance criteria extracted and used for context
|
||||
- [ ] Test quality section appended to story (if enabled)
|
||||
- [ ] Link to review report added to story
|
||||
|
||||
### Test Design Integration
|
||||
|
||||
- [ ] Test design document discovered correctly (if available)
|
||||
- [ ] Priority context (P0/P1/P2/P3) extracted and used
|
||||
- [ ] Review validates tests align with prioritization
|
||||
- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
|
||||
|
||||
### Knowledge Base Integration
|
||||
|
||||
- [ ] tea-index.csv loaded successfully
|
||||
- [ ] All required fragments loaded
|
||||
- [ ] Fragments applied correctly to validation
|
||||
- [ ] Fragment references in report are accurate
|
||||
|
||||
---
|
||||
|
||||
## Edge Cases and Special Situations
|
||||
|
||||
### Empty or Minimal Tests
|
||||
|
||||
- [ ] If test file is empty, report notes "No tests found"
|
||||
- [ ] If test file has only boilerplate, report notes "No meaningful tests"
|
||||
- [ ] Score reflects lack of content appropriately
|
||||
|
||||
### Legacy Tests
|
||||
|
||||
- [ ] Legacy tests acknowledged in context
|
||||
- [ ] Review provides practical recommendations for improvement
|
||||
- [ ] Recognizes that complete refactor may not be feasible
|
||||
- [ ] Prioritizes critical issues (flakiness) over style
|
||||
|
||||
### Test Framework Variations
|
||||
|
||||
- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
|
||||
- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
|
||||
- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
|
||||
- [ ] Knowledge fragments applied appropriately for framework
|
||||
|
||||
### Justified Violations
|
||||
|
||||
- [ ] Violations with justification comments in code noted as acceptable
|
||||
- [ ] Justifications evaluated for legitimacy
|
||||
- [ ] Report acknowledges justified patterns
|
||||
- [ ] Score not penalized for justified violations
|
||||
|
||||
---
|
||||
|
||||
## Final Validation
|
||||
|
||||
### Review Completeness
|
||||
|
||||
- [ ] All enabled quality criteria evaluated
|
||||
- [ ] All test files in scope reviewed
|
||||
- [ ] All violations cataloged
|
||||
- [ ] All recommendations provided
|
||||
- [ ] Review report is comprehensive
|
||||
|
||||
### Review Accuracy
|
||||
|
||||
- [ ] Quality score is accurate
|
||||
- [ ] Violations are correct (no false positives)
|
||||
- [ ] Critical issues not missed (no false negatives)
|
||||
- [ ] Code locations are correct
|
||||
- [ ] Knowledge base references are accurate
|
||||
|
||||
### Review Usefulness
|
||||
|
||||
- [ ] Feedback is actionable
|
||||
- [ ] Recommendations are implementable
|
||||
- [ ] Code examples are correct
|
||||
- [ ] Review helps developer improve tests
|
||||
- [ ] Review educates on best practices
|
||||
|
||||
### Workflow Complete
|
||||
|
||||
- [ ] All checklist items completed
|
||||
- [ ] All outputs validated and saved
|
||||
- [ ] User notified with summary
|
||||
- [ ] Review ready for developer consumption
|
||||
- [ ] Follow-up actions identified (if any)
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
Record any issues, observations, or important context during workflow execution:
|
||||
|
||||
- **Test Framework**: [Playwright, Jest, Cypress, etc.]
|
||||
- **Review Scope**: [single file, directory, full suite]
|
||||
- **Quality Score**: [0-100 score, letter grade]
|
||||
- **Critical Issues**: [Count of P0/P1 violations]
|
||||
- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
|
||||
- **Special Considerations**: [Legacy code, justified patterns, edge cases]
|
||||
- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]
|
||||
@@ -1,628 +0,0 @@
|
||||
# Test Quality Review - Instructions v4.0
|
||||
|
||||
**Workflow:** `testarch-test-review`
|
||||
**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
|
||||
**Agent:** Test Architect (TEA)
|
||||
**Format:** Pure Markdown v4.0 (no XML blocks)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
|
||||
|
||||
**Key Capabilities:**
|
||||
|
||||
- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
|
||||
- **Quality Scoring**: 0-100 score based on violations and best practices
|
||||
- **Multi-Scope**: Review single file, directory, or entire test suite
|
||||
- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
|
||||
- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
|
||||
- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
|
||||
- **Integration**: Works with story files, test-design, acceptance criteria
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**Required:**
|
||||
|
||||
- Test file(s) to review (auto-discovered or explicitly provided)
|
||||
- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
|
||||
|
||||
**Recommended:**
|
||||
|
||||
- Story file with acceptance criteria (for context)
|
||||
- Test design document (for priority context)
|
||||
- Knowledge base fragments available in tea-index.csv
|
||||
|
||||
**Halt Conditions:**
|
||||
|
||||
- If test file path is invalid or file doesn't exist, halt and request correction
|
||||
- If test_dir is empty (no tests found), halt and notify user
|
||||
|
||||
---
|
||||
|
||||
## Workflow Steps
|
||||
|
||||
### Step 1: Load Context and Knowledge Base
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. Check playwright-utils flag:
|
||||
- Read `{config_source}` and check `config.tea_use_playwright_utils`
|
||||
|
||||
2. Load relevant knowledge fragments from `{project-root}/_bmad/bmm/testarch/tea-index.csv`:
|
||||
|
||||
**Core Patterns (Always load):**
|
||||
- `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
|
||||
- `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
|
||||
- `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
|
||||
- `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
|
||||
- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
|
||||
- `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
|
||||
- `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
|
||||
|
||||
**If `config.tea_use_playwright_utils: true` (All Utilities):**
|
||||
- `overview.md` - Playwright utils best practices
|
||||
- `api-request.md` - Validate apiRequest usage patterns
|
||||
- `network-recorder.md` - Review HAR record/playback implementation
|
||||
- `auth-session.md` - Check auth token management
|
||||
- `intercept-network-call.md` - Validate network interception
|
||||
- `recurse.md` - Review polling patterns
|
||||
- `log.md` - Check logging best practices
|
||||
- `file-utils.md` - Validate file operation patterns
|
||||
- `burn-in.md` - Review burn-in configuration
|
||||
- `network-error-monitor.md` - Check error monitoring setup
|
||||
- `fixtures-composition.md` - Validate mergeTests usage
|
||||
|
||||
**If `config.tea_use_playwright_utils: false`:**
|
||||
- `fixture-architecture.md` - Pure function → Fixture → mergeTests composition with auto-cleanup (406 lines, 5 examples)
|
||||
- `network-first.md` - Route intercept before navigate to prevent race conditions (489 lines, 5 examples)
|
||||
- `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
|
||||
- `component-tdd.md` - Red-Green-Refactor patterns with provider isolation (480 lines, 4 examples)
|
||||
- `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
|
||||
|
||||
3. Determine review scope:
|
||||
- **single**: Review one test file (`test_file_path` provided)
|
||||
- **directory**: Review all tests in directory (`test_dir` provided)
|
||||
- **suite**: Review entire test suite (discover all test files)
|
||||
|
||||
4. Auto-discover related artifacts (if `auto_discover_story: true`):
|
||||
- Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
|
||||
- Search for story file (`story-1.3.md`)
|
||||
- Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
|
||||
|
||||
5. Read story file for context (if available):
|
||||
- Extract acceptance criteria
|
||||
- Extract priority classification
|
||||
- Extract expected test IDs
|
||||
|
||||
**Output:** Complete knowledge base loaded, review scope determined, context gathered
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Discover and Parse Test Files
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. **Discover test files** based on scope:
|
||||
- **single**: Use `test_file_path` variable
|
||||
- **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
|
||||
- **suite**: Use `glob` to find all test files recursively from project root
|
||||
|
||||
2. **Parse test file metadata**:
|
||||
- File path and name
|
||||
- File size (warn if >15 KB or >300 lines)
|
||||
- Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
|
||||
- Imports and dependencies
|
||||
- Test structure (describe/context/it blocks)
|
||||
|
||||
3. **Extract test structure**:
|
||||
- Count of describe blocks (test suites)
|
||||
- Count of it/test blocks (individual tests)
|
||||
- Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
|
||||
- Priority markers (if present, e.g., `test.describe.only` for P0)
|
||||
- BDD structure (Given-When-Then comments or steps)
|
||||
|
||||
4. **Identify test patterns**:
|
||||
- Fixtures used
|
||||
- Data factories used
|
||||
- Network interception patterns
|
||||
- Assertions used (expect, assert, toHaveText, etc.)
|
||||
- Waits and timeouts (page.waitFor, sleep, hardcoded delays)
|
||||
- Conditionals (if/else, switch, ternary)
|
||||
- Try/catch blocks
|
||||
- Shared state or globals
|
||||
|
||||
**Output:** Complete test file inventory with structure and pattern analysis
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Validate Against Quality Criteria
|
||||
|
||||
**Actions:**
|
||||
|
||||
For each test file, validate against quality criteria (configurable via workflow variables):
|
||||
|
||||
#### 1. BDD Format Validation (if `check_given_when_then: true`)
|
||||
|
||||
- ✅ **PASS**: Tests use Given-When-Then structure (comments or step organization)
|
||||
- ⚠️ **WARN**: Tests have some structure but not explicit GWT
|
||||
- ❌ **FAIL**: Tests lack clear structure, hard to understand intent
|
||||
|
||||
**Knowledge Fragment**: test-quality.md, tdd-cycles.md
|
||||
|
||||
---
|
||||
|
||||
#### 2. Test ID Conventions (if `check_test_ids: true`)
|
||||
|
||||
- ✅ **PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
|
||||
- ⚠️ **WARN**: Some test IDs missing or inconsistent
|
||||
- ❌ **FAIL**: No test IDs, can't trace tests to requirements
|
||||
|
||||
**Knowledge Fragment**: traceability.md, test-quality.md
|
||||
|
||||
---
|
||||
|
||||
#### 3. Priority Markers (if `check_priority_markers: true`)
|
||||
|
||||
- ✅ **PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
|
||||
- ⚠️ **WARN**: Some priority classifications missing
|
||||
- ❌ **FAIL**: No priority classification, can't determine criticality
|
||||
|
||||
**Knowledge Fragment**: test-priorities.md, risk-governance.md
|
||||
|
||||
---
|
||||
|
||||
#### 4. Hard Waits Detection (if `check_hard_waits: true`)
|
||||
|
||||
- ✅ **PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
|
||||
- ⚠️ **WARN**: Some hard waits used but with justification comments
|
||||
- ❌ **FAIL**: Hard waits detected without justification (flakiness risk)
|
||||
|
||||
**Patterns to detect:**
|
||||
|
||||
- `sleep(1000)`, `setTimeout()`, `delay()`
|
||||
- `page.waitForTimeout(5000)` without explicit reason
|
||||
- `await new Promise(resolve => setTimeout(resolve, 3000))`
|
||||
|
||||
**Knowledge Fragment**: test-quality.md, network-first.md
|
||||
|
||||
---
|
||||
|
||||
#### 5. Determinism Check (if `check_determinism: true`)
|
||||
|
||||
- ✅ **PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
|
||||
- ⚠️ **WARN**: Some conditionals but with clear justification
|
||||
- ❌ **FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
|
||||
|
||||
**Patterns to detect:**
|
||||
|
||||
- `if (condition) { test logic }` - tests should work deterministically
|
||||
- `try { test } catch { fallback }` - tests shouldn't swallow errors
|
||||
- `Math.random()`, `Date.now()` without factory abstraction
|
||||
|
||||
**Knowledge Fragment**: test-quality.md, data-factories.md
|
||||
|
||||
---
|
||||
|
||||
#### 6. Isolation Validation (if `check_isolation: true`)
|
||||
|
||||
- ✅ **PASS**: Tests clean up resources, no shared state, can run in any order
|
||||
- ⚠️ **WARN**: Some cleanup missing but isolated enough
|
||||
- ❌ **FAIL**: Tests share state, depend on execution order, leave resources
|
||||
|
||||
**Patterns to check:**
|
||||
|
||||
- afterEach/afterAll cleanup hooks present
|
||||
- No global variables mutated
|
||||
- Database/API state cleaned up after tests
|
||||
- Test data deleted or marked inactive
|
||||
|
||||
**Knowledge Fragment**: test-quality.md, data-factories.md
|
||||
|
||||
---
|
||||
|
||||
#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
|
||||
|
||||
- ✅ **PASS**: Uses pure function → Fixture → mergeTests pattern
|
||||
- ⚠️ **WARN**: Some fixtures used but not consistently
|
||||
- ❌ **FAIL**: No fixtures, tests repeat setup code (maintainability risk)
|
||||
|
||||
**Patterns to check:**
|
||||
|
||||
- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
|
||||
- Pure functions used for fixture logic
|
||||
- mergeTests used to combine fixtures
|
||||
- No beforeEach with complex setup (should be in fixtures)
|
||||
|
||||
**Knowledge Fragment**: fixture-architecture.md
|
||||
|
||||
---
|
||||
|
||||
#### 8. Data Factories (if `check_data_factories: true`)
|
||||
|
||||
- ✅ **PASS**: Uses factory functions with overrides, API-first setup
|
||||
- ⚠️ **WARN**: Some factories used but also hardcoded data
|
||||
- ❌ **FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
|
||||
|
||||
**Patterns to check:**
|
||||
|
||||
- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
|
||||
- Factories use faker.js or similar for realistic data
|
||||
- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
|
||||
- API-first setup (create via API, test via UI)
|
||||
|
||||
**Knowledge Fragment**: data-factories.md
|
||||
|
||||
---
|
||||
|
||||
#### 9. Network-First Pattern (if `check_network_first: true`)
|
||||
|
||||
- ✅ **PASS**: Route interception set up BEFORE navigation (race condition prevention)
|
||||
- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
|
||||
- ❌ **FAIL**: Route interception after navigation (race condition risk)
|
||||
|
||||
**Patterns to check:**
|
||||
|
||||
- `page.route()` called before `page.goto()`
|
||||
- `page.waitForResponse()` used with explicit URL pattern
|
||||
- No navigation followed immediately by route setup
|
||||
|
||||
**Knowledge Fragment**: network-first.md
|
||||
|
||||
---
|
||||
|
||||
#### 10. Assertions (if `check_assertions: true`)
|
||||
|
||||
- ✅ **PASS**: Explicit assertions present (expect, assert, toHaveText)
|
||||
- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
|
||||
- ❌ **FAIL**: Missing assertions, tests don't verify behavior
|
||||
|
||||
**Patterns to check:**
|
||||
|
||||
- Each test has at least one assertion
|
||||
- Assertions are specific (not just truthy checks)
|
||||
- Assertions use framework-provided matchers (toHaveText, toBeVisible)
|
||||
|
||||
**Knowledge Fragment**: test-quality.md
|
||||
|
||||
---
|
||||
|
||||
#### 11. Test Length (if `check_test_length: true`)
|
||||
|
||||
- ✅ **PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
|
||||
- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
|
||||
- ❌ **FAIL**: Test file >500 lines (too large, maintainability risk)
|
||||
|
||||
**Knowledge Fragment**: test-quality.md
|
||||
|
||||
---
|
||||
|
||||
#### 12. Test Duration (if `check_test_duration: true`)
|
||||
|
||||
- ✅ **PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
|
||||
- ⚠️ **WARN**: Some tests 1.5-3 minutes (consider optimization)
|
||||
- ❌ **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
|
||||
|
||||
**Note:** Duration estimation based on complexity analysis if execution data unavailable
|
||||
|
||||
**Knowledge Fragment**: test-quality.md, selective-testing.md
|
||||
|
||||
---
|
||||
|
||||
#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
|
||||
|
||||
- ✅ **PASS**: No known flaky patterns detected
|
||||
- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
|
||||
- ❌ **FAIL**: Multiple flaky patterns detected (high flakiness risk)
|
||||
|
||||
**Patterns to detect:**
|
||||
|
||||
- Tight timeouts (e.g., `{ timeout: 1000 }`)
|
||||
- Race conditions (navigation before route interception)
|
||||
- Timing-dependent assertions (e.g., checking timestamps)
|
||||
- Retry logic in tests (hides flakiness)
|
||||
- Environment-dependent assumptions (hardcoded URLs, ports)
|
||||
|
||||
**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Calculate Quality Score
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. **Count violations** by severity:
|
||||
- **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
|
||||
- **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
|
||||
- **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
|
||||
- **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
|
||||
|
||||
2. **Calculate quality score** (if `quality_score_enabled: true`):
|
||||
|
||||
```
|
||||
Starting Score: 100
|
||||
|
||||
Critical Violations: -10 points each
|
||||
High Violations: -5 points each
|
||||
Medium Violations: -2 points each
|
||||
Low Violations: -1 point each
|
||||
|
||||
Bonus Points:
|
||||
+ Excellent BDD structure: +5
|
||||
+ Comprehensive fixtures: +5
|
||||
+ Comprehensive data factories: +5
|
||||
+ Network-first pattern: +5
|
||||
+ Perfect isolation: +5
|
||||
+ All test IDs present: +5
|
||||
|
||||
Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
|
||||
```
|
||||
|
||||
3. **Quality Grade**:
|
||||
- **90-100**: Excellent (A+)
|
||||
- **80-89**: Good (A)
|
||||
- **70-79**: Acceptable (B)
|
||||
- **60-69**: Needs Improvement (C)
|
||||
- **<60**: Critical Issues (F)
|
||||
|
||||
**Output:** Quality score calculated with violation breakdown
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Generate Review Report
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. **Create review report** using `test-review-template.md`:
|
||||
|
||||
**Header Section:**
|
||||
- Test file(s) reviewed
|
||||
- Review date
|
||||
- Review scope (single/directory/suite)
|
||||
- Quality score and grade
|
||||
|
||||
**Executive Summary:**
|
||||
- Overall assessment (Excellent/Good/Needs Improvement/Critical)
|
||||
- Key strengths
|
||||
- Key weaknesses
|
||||
- Recommendation (Approve/Approve with comments/Request changes)
|
||||
|
||||
**Quality Criteria Assessment:**
|
||||
- Table with all criteria evaluated
|
||||
- Status for each (PASS/WARN/FAIL)
|
||||
- Violation count per criterion
|
||||
|
||||
**Critical Issues (Must Fix):**
|
||||
- Priority P0/P1 violations
|
||||
- Code location (file:line)
|
||||
- Explanation of issue
|
||||
- Recommended fix
|
||||
- Knowledge base reference
|
||||
|
||||
**Recommendations (Should Fix):**
|
||||
- Priority P2/P3 violations
|
||||
- Code location (file:line)
|
||||
- Explanation of issue
|
||||
- Recommended improvement
|
||||
- Knowledge base reference
|
||||
|
||||
**Best Practices Examples:**
|
||||
- Highlight good patterns found in tests
|
||||
- Reference knowledge base fragments
|
||||
- Provide examples for others to follow
|
||||
|
||||
**Knowledge Base References:**
|
||||
- List all fragments consulted
|
||||
- Provide links to detailed guidance
|
||||
|
||||
2. **Generate inline comments** (if `generate_inline_comments: true`):
|
||||
- Add TODO comments in test files at violation locations
|
||||
- Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
|
||||
- Never modify test logic, only add comments
|
||||
|
||||
3. **Generate quality badge** (if `generate_quality_badge: true`):
|
||||
- Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
|
||||
- Format for inclusion in README or documentation
|
||||
|
||||
4. **Append to story file** (if `append_to_story: true` and story file exists):
|
||||
- Add "Test Quality Review" section to story
|
||||
- Include quality score and critical issues
|
||||
- Link to full review report
|
||||
|
||||
**Output:** Comprehensive review report with actionable feedback
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Save Outputs and Notify
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. **Save review report** to `{output_file}`
|
||||
2. **Save inline comments** to test files (if enabled)
|
||||
3. **Save quality badge** to output folder (if enabled)
|
||||
4. **Update story file** (if enabled)
|
||||
5. **Generate summary message** for user:
|
||||
- Quality score and grade
|
||||
- Critical issue count
|
||||
- Recommendation
|
||||
|
||||
**Output:** All review artifacts saved and user notified
|
||||
|
||||
---
|
||||
|
||||
## Quality Criteria Decision Matrix
|
||||
|
||||
| Criterion | PASS | WARN | FAIL | Knowledge Fragment |
|
||||
| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
|
||||
| BDD Format | Given-When-Then present | Some structure | No structure | test-quality.md |
|
||||
| Test IDs | All tests have IDs | Some missing | No IDs | traceability.md |
|
||||
| Priority Markers | All classified | Some missing | No classification | test-priorities.md |
|
||||
| Hard Waits | No hard waits | Some justified | Hard waits present | test-quality.md |
|
||||
| Determinism | No conditionals/random | Some justified | Conditionals/random | test-quality.md |
|
||||
| Isolation | Clean up, no shared state | Some gaps | Shared state | test-quality.md |
|
||||
| Fixture Patterns | Pure fn → Fixture | Some fixtures | No fixtures | fixture-architecture.md |
|
||||
| Data Factories | Factory functions | Some factories | Hardcoded data | data-factories.md |
|
||||
| Network-First | Intercept before navigate | Some correct | Race conditions | network-first.md |
|
||||
| Assertions | Explicit assertions | Some implicit | Missing assertions | test-quality.md |
|
||||
| Test Length | ≤300 lines | 301-500 lines | >500 lines | test-quality.md |
|
||||
| Test Duration | ≤1.5 min | 1.5-3 min | >3 min | test-quality.md |
|
||||
| Flakiness Patterns | No flaky patterns | Some potential | Multiple patterns | ci-burn-in.md |
|
||||
|
||||
---
|
||||
|
||||
## Example Review Summary
|
||||
|
||||
````markdown
|
||||
# Test Quality Review: auth-login.spec.ts
|
||||
|
||||
**Quality Score**: 78/100 (B - Acceptable)
|
||||
**Review Date**: 2025-10-14
|
||||
**Recommendation**: Approve with Comments
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
|
||||
|
||||
**Strengths:**
|
||||
|
||||
- Excellent BDD structure with clear Given-When-Then comments
|
||||
- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
|
||||
- Comprehensive assertions on authentication state
|
||||
|
||||
**Weaknesses:**
|
||||
|
||||
- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
|
||||
- Hardcoded test data (email: 'test@example.com') - use factories instead
|
||||
- Missing fixture for common login setup - DRY violation
|
||||
|
||||
**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
|
||||
|
||||
## Critical Issues (Must Fix)
|
||||
|
||||
### 1. Hard Wait Detected (Line 45)
|
||||
|
||||
**Severity**: P0 (Critical)
|
||||
**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
|
||||
**Fix**: Use explicit wait for element or network request instead
|
||||
**Knowledge**: See test-quality.md, network-first.md
|
||||
|
||||
```typescript
|
||||
// ❌ Bad (current)
|
||||
await page.waitForTimeout(2000);
|
||||
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
|
||||
|
||||
// ✅ Good (recommended)
|
||||
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
|
||||
```
|
||||
````
|
||||
|
||||
## Recommendations (Should Fix)
|
||||
|
||||
### 1. Use Data Factory for Test User (Lines 23, 32, 41)
|
||||
|
||||
**Severity**: P1 (High)
|
||||
**Issue**: Hardcoded email `test@example.com` - maintainability risk
|
||||
**Fix**: Create factory function for test users
|
||||
**Knowledge**: See data-factories.md
|
||||
|
||||
```typescript
|
||||
// ✅ Good (recommended)
|
||||
import { createTestUser } from './factories/user-factory';
|
||||
|
||||
const testUser = createTestUser({ role: 'admin' });
|
||||
await loginPage.login(testUser.email, testUser.password);
|
||||
```
|
||||
|
||||
### 2. Extract Login Setup to Fixture (Lines 18-28)
|
||||
|
||||
**Severity**: P1 (High)
|
||||
**Issue**: Login setup repeated across tests - DRY violation
|
||||
**Fix**: Create fixture for authenticated state
|
||||
**Knowledge**: See fixture-architecture.md
|
||||
|
||||
```typescript
|
||||
// ✅ Good (recommended)
|
||||
const test = base.extend({
|
||||
authenticatedPage: async ({ page }, use) => {
|
||||
const user = createTestUser();
|
||||
await loginPage.login(user.email, user.password);
|
||||
await use(page);
|
||||
},
|
||||
});
|
||||
|
||||
test('user can access dashboard', async ({ authenticatedPage }) => {
|
||||
// Test starts already logged in
|
||||
});
|
||||
```
|
||||
|
||||
## Quality Score Breakdown
|
||||
|
||||
- Starting Score: 100
|
||||
- Critical Violations (1 × -10): -10
|
||||
- High Violations (2 × -5): -10
|
||||
- Medium Violations (0 × -2): 0
|
||||
- Low Violations (1 × -1): -1
|
||||
- Bonus (BDD +5, Test IDs +5): +10
|
||||
- **Final Score**: 78/100 (B)
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Other Workflows
|
||||
|
||||
### Before Test Review
|
||||
|
||||
- **atdd**: Generate acceptance tests (TEA reviews them for quality)
|
||||
- **automate**: Expand regression suite (TEA reviews new tests)
|
||||
- **dev story**: Developer writes implementation tests (TEA reviews them)
|
||||
|
||||
### After Test Review
|
||||
|
||||
- **Developer**: Addresses critical issues, improves based on recommendations
|
||||
- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
|
||||
|
||||
### Coordinates With
|
||||
|
||||
- **Story File**: Review links to acceptance criteria context
|
||||
- **Test Design**: Review validates tests align with prioritization
|
||||
- **Knowledge Base**: Review references fragments for detailed guidance
|
||||
|
||||
---
|
||||
|
||||
## Important Notes
|
||||
|
||||
1. **Non-Prescriptive**: Review provides guidance, not rigid rules
|
||||
2. **Context Matters**: Some violations may be justified for specific scenarios
|
||||
3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
|
||||
4. **Actionable**: Every issue includes recommended fix with code examples
|
||||
5. **Quality Score**: Use as indicator, not absolute measure
|
||||
6. **Continuous Improvement**: Review same tests periodically as patterns evolve
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Problem: No test files found**
|
||||
- Verify test_dir path is correct
|
||||
- Check test file extensions match glob pattern
|
||||
- Ensure test files exist in expected location
|
||||
|
||||
**Problem: Quality score seems too low/high**
|
||||
- Review violation counts - may need to adjust thresholds
|
||||
- Consider context - some projects have different standards
|
||||
- Focus on critical issues first, not just score
|
||||
|
||||
**Problem: Inline comments not generated**
|
||||
- Check generate_inline_comments: true in variables
|
||||
- Verify write permissions on test files
|
||||
- Review append_to_file: false (separate report mode)
|
||||
|
||||
**Problem: Knowledge fragments not loading**
|
||||
- Verify tea-index.csv exists in testarch/ directory
|
||||
- Check fragment file paths are correct
|
||||
- Ensure auto_load_knowledge: true in variables
|
||||
```
|
||||
@@ -1,390 +0,0 @@
|
||||
# Test Quality Review: {test_filename}
|
||||
|
||||
**Quality Score**: {score}/100 ({grade} - {assessment})
|
||||
**Review Date**: {YYYY-MM-DD}
|
||||
**Review Scope**: {single | directory | suite}
|
||||
**Reviewer**: {user_name or TEA Agent}
|
||||
|
||||
---
|
||||
|
||||
Note: This review audits existing tests; it does not generate tests.
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
|
||||
|
||||
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
|
||||
|
||||
### Key Strengths
|
||||
|
||||
✅ {strength_1}
|
||||
✅ {strength_2}
|
||||
✅ {strength_3}
|
||||
|
||||
### Key Weaknesses
|
||||
|
||||
❌ {weakness_1}
|
||||
❌ {weakness_2}
|
||||
❌ {weakness_3}
|
||||
|
||||
### Summary
|
||||
|
||||
{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
|
||||
|
||||
---
|
||||
|
||||
## Quality Criteria Assessment
|
||||
|
||||
| Criterion | Status | Violations | Notes |
|
||||
| ------------------------------------ | ------------------------------- | ---------- | ------------ |
|
||||
| BDD Format (Given-When-Then) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Test IDs | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Priority Markers (P0/P1/P2/P3) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Hard Waits (sleep, waitForTimeout) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Determinism (no conditionals) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Fixture Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Data Factories | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Network-First Pattern | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Explicit Assertions | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
| Test Length (≤300 lines) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines} | {brief_note} |
|
||||
| Test Duration (≤1.5 min) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
|
||||
| Flakiness Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
|
||||
|
||||
**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
|
||||
|
||||
---
|
||||
|
||||
## Quality Score Breakdown
|
||||
|
||||
```
|
||||
Starting Score: 100
|
||||
Critical Violations: -{critical_count} × 10 = -{critical_deduction}
|
||||
High Violations: -{high_count} × 5 = -{high_deduction}
|
||||
Medium Violations: -{medium_count} × 2 = -{medium_deduction}
|
||||
Low Violations: -{low_count} × 1 = -{low_deduction}
|
||||
|
||||
Bonus Points:
|
||||
Excellent BDD: +{0|5}
|
||||
Comprehensive Fixtures: +{0|5}
|
||||
Data Factories: +{0|5}
|
||||
Network-First: +{0|5}
|
||||
Perfect Isolation: +{0|5}
|
||||
All Test IDs: +{0|5}
|
||||
--------
|
||||
Total Bonus: +{bonus_total}
|
||||
|
||||
Final Score: {final_score}/100
|
||||
Grade: {grade}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Critical Issues (Must Fix)
|
||||
|
||||
{If no critical issues: "No critical issues detected. ✅"}
|
||||
|
||||
{For each critical issue:}
|
||||
|
||||
### {issue_number}. {Issue Title}
|
||||
|
||||
**Severity**: P0 (Critical)
|
||||
**Location**: `{filename}:{line_number}`
|
||||
**Criterion**: {criterion_name}
|
||||
**Knowledge Base**: [{fragment_name}]({fragment_path})
|
||||
|
||||
**Issue Description**:
|
||||
{Detailed explanation of what the problem is and why it's critical}
|
||||
|
||||
**Current Code**:
|
||||
|
||||
```typescript
|
||||
// ❌ Bad (current implementation)
|
||||
{
|
||||
code_snippet_showing_problem;
|
||||
}
|
||||
```
|
||||
|
||||
**Recommended Fix**:
|
||||
|
||||
```typescript
|
||||
// ✅ Good (recommended approach)
|
||||
{
|
||||
code_snippet_showing_solution;
|
||||
}
|
||||
```
|
||||
|
||||
**Why This Matters**:
|
||||
{Explanation of impact - flakiness risk, maintainability, reliability}
|
||||
|
||||
**Related Violations**:
|
||||
{If similar issue appears elsewhere, note line numbers}
|
||||
|
||||
---
|
||||
|
||||
## Recommendations (Should Fix)
|
||||
|
||||
{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
|
||||
|
||||
{For each recommendation:}
|
||||
|
||||
### {rec_number}. {Recommendation Title}
|
||||
|
||||
**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
|
||||
**Location**: `{filename}:{line_number}`
|
||||
**Criterion**: {criterion_name}
|
||||
**Knowledge Base**: [{fragment_name}]({fragment_path})
|
||||
|
||||
**Issue Description**:
|
||||
{Detailed explanation of what could be improved and why}
|
||||
|
||||
**Current Code**:
|
||||
|
||||
```typescript
|
||||
// ⚠️ Could be improved (current implementation)
|
||||
{
|
||||
code_snippet_showing_current_approach;
|
||||
}
|
||||
```
|
||||
|
||||
**Recommended Improvement**:
|
||||
|
||||
```typescript
|
||||
// ✅ Better approach (recommended)
|
||||
{
|
||||
code_snippet_showing_improvement;
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
{Explanation of benefits - maintainability, readability, reusability}
|
||||
|
||||
**Priority**:
|
||||
{Why this is P1/P2/P3 - urgency and impact}
|
||||
|
||||
---
|
||||
|
||||
## Best Practices Found
|
||||
|
||||
{If good patterns found, highlight them}
|
||||
|
||||
{For each best practice:}
|
||||
|
||||
### {practice_number}. {Best Practice Title}
|
||||
|
||||
**Location**: `{filename}:{line_number}`
|
||||
**Pattern**: {pattern_name}
|
||||
**Knowledge Base**: [{fragment_name}]({fragment_path})
|
||||
|
||||
**Why This Is Good**:
|
||||
{Explanation of why this pattern is excellent}
|
||||
|
||||
**Code Example**:
|
||||
|
||||
```typescript
|
||||
// ✅ Excellent pattern demonstrated in this test
|
||||
{
|
||||
code_snippet_showing_best_practice;
|
||||
}
|
||||
```
|
||||
|
||||
**Use as Reference**:
|
||||
{Encourage using this pattern in other tests}
|
||||
|
||||
---
|
||||
|
||||
## Test File Analysis
|
||||
|
||||
### File Metadata
|
||||
|
||||
- **File Path**: `{relative_path_from_project_root}`
|
||||
- **File Size**: {line_count} lines, {kb_size} KB
|
||||
- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
|
||||
- **Language**: {TypeScript | JavaScript}
|
||||
|
||||
### Test Structure
|
||||
|
||||
- **Describe Blocks**: {describe_count}
|
||||
- **Test Cases (it/test)**: {test_count}
|
||||
- **Average Test Length**: {avg_lines_per_test} lines per test
|
||||
- **Fixtures Used**: {fixture_count} ({fixture_names})
|
||||
- **Data Factories Used**: {factory_count} ({factory_names})
|
||||
|
||||
### Test Coverage Scope
|
||||
|
||||
- **Test IDs**: {test_id_list}
|
||||
- **Priority Distribution**:
|
||||
- P0 (Critical): {p0_count} tests
|
||||
- P1 (High): {p1_count} tests
|
||||
- P2 (Medium): {p2_count} tests
|
||||
- P3 (Low): {p3_count} tests
|
||||
- Unknown: {unknown_count} tests
|
||||
|
||||
### Assertions Analysis
|
||||
|
||||
- **Total Assertions**: {assertion_count}
|
||||
- **Assertions per Test**: {avg_assertions_per_test} (avg)
|
||||
- **Assertion Types**: {assertion_types_used}
|
||||
|
||||
---
|
||||
|
||||
## Context and Integration
|
||||
|
||||
### Related Artifacts
|
||||
|
||||
{If story file found:}
|
||||
|
||||
- **Story File**: [{story_filename}]({story_path})
|
||||
- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
|
||||
|
||||
{If test-design found:}
|
||||
|
||||
- **Test Design**: [{test_design_filename}]({test_design_path})
|
||||
- **Risk Assessment**: {risk_level}
|
||||
- **Priority Framework**: P0-P3 applied
|
||||
|
||||
### Acceptance Criteria Validation
|
||||
|
||||
{If story file available, map tests to ACs:}
|
||||
|
||||
| Acceptance Criterion | Test ID | Status | Notes |
|
||||
| -------------------- | --------- | -------------------------- | ------- |
|
||||
| {AC_1} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
|
||||
| {AC_2} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
|
||||
| {AC_3} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
|
||||
|
||||
**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
|
||||
|
||||
---
|
||||
|
||||
## Knowledge Base References
|
||||
|
||||
This review consulted the following knowledge base fragments:
|
||||
|
||||
- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
|
||||
- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function → Fixture → mergeTests pattern
|
||||
- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
|
||||
- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
|
||||
- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
|
||||
- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
|
||||
- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
|
||||
- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
|
||||
- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
|
||||
- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
|
||||
|
||||
See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Immediate Actions (Before Merge)
|
||||
|
||||
1. **{action_1}** - {description}
|
||||
- Priority: {P0 | P1 | P2}
|
||||
- Owner: {team_or_person}
|
||||
- Estimated Effort: {time_estimate}
|
||||
|
||||
2. **{action_2}** - {description}
|
||||
- Priority: {P0 | P1 | P2}
|
||||
- Owner: {team_or_person}
|
||||
- Estimated Effort: {time_estimate}
|
||||
|
||||
### Follow-up Actions (Future PRs)
|
||||
|
||||
1. **{action_1}** - {description}
|
||||
- Priority: {P2 | P3}
|
||||
- Target: {next_sprint | backlog}
|
||||
|
||||
2. **{action_2}** - {description}
|
||||
- Priority: {P2 | P3}
|
||||
- Target: {next_sprint | backlog}
|
||||
|
||||
### Re-Review Needed?
|
||||
|
||||
{✅ No re-review needed - approve as-is}
|
||||
{⚠️ Re-review after critical fixes - request changes, then re-review}
|
||||
{❌ Major refactor required - block merge, pair programming recommended}
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
|
||||
|
||||
**Rationale**:
|
||||
{1-2 paragraph explanation of recommendation based on findings}
|
||||
|
||||
**For Approve**:
|
||||
|
||||
> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
|
||||
|
||||
**For Approve with Comments**:
|
||||
|
||||
> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
|
||||
|
||||
**For Request Changes**:
|
||||
|
||||
> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
|
||||
|
||||
**For Block**:
|
||||
|
||||
> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
|
||||
|
||||
---
|
||||
|
||||
## Appendix
|
||||
|
||||
### Violation Summary by Location
|
||||
|
||||
{Table of all violations sorted by line number:}
|
||||
|
||||
| Line | Severity | Criterion | Issue | Fix |
|
||||
| ------ | ------------- | ----------- | ------------- | ----------- |
|
||||
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
|
||||
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
|
||||
|
||||
### Quality Trends
|
||||
|
||||
{If reviewing same file multiple times, show trend:}
|
||||
|
||||
| Review Date | Score | Grade | Critical Issues | Trend |
|
||||
| ------------ | ------------- | --------- | --------------- | ----------- |
|
||||
| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1} | ⬆️ Improved |
|
||||
| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2} | ⬇️ Declined |
|
||||
| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3} | ➡️ Stable |
|
||||
|
||||
### Related Reviews
|
||||
|
||||
{If reviewing multiple files in directory/suite:}
|
||||
|
||||
| File | Score | Grade | Critical | Status |
|
||||
| -------- | ----------- | ------- | -------- | ------------------ |
|
||||
| {file_1} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
|
||||
| {file_2} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
|
||||
| {file_3} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
|
||||
|
||||
**Suite Average**: {avg_score}/100 ({avg_grade})
|
||||
|
||||
---
|
||||
|
||||
## Review Metadata
|
||||
|
||||
**Generated By**: BMad TEA Agent (Test Architect)
|
||||
**Workflow**: testarch-test-review v4.0
|
||||
**Review ID**: test-review-{filename}-{YYYYMMDD}
|
||||
**Timestamp**: {YYYY-MM-DD HH:MM:SS}
|
||||
**Version**: 1.0
|
||||
|
||||
---
|
||||
|
||||
## Feedback on This Review
|
||||
|
||||
If you have questions or feedback on this review:
|
||||
|
||||
1. Review patterns in knowledge base: `testarch/knowledge/`
|
||||
2. Consult tea-index.csv for detailed guidance
|
||||
3. Request clarification on specific violations
|
||||
4. Pair with QA engineer to apply patterns
|
||||
|
||||
This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.
|
||||
@@ -1,46 +0,0 @@
|
||||
# Test Architect workflow: test-review
|
||||
name: testarch-test-review
|
||||
description: "Review test quality using comprehensive knowledge base and best practices validation"
|
||||
author: "BMad"
|
||||
|
||||
# Critical variables from config
|
||||
config_source: "{project-root}/_bmad/bmm/config.yaml"
|
||||
output_folder: "{config_source}:output_folder"
|
||||
user_name: "{config_source}:user_name"
|
||||
communication_language: "{config_source}:communication_language"
|
||||
document_output_language: "{config_source}:document_output_language"
|
||||
date: system-generated
|
||||
|
||||
# Workflow components
|
||||
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-review"
|
||||
instructions: "{installed_path}/instructions.md"
|
||||
validation: "{installed_path}/checklist.md"
|
||||
template: "{installed_path}/test-review-template.md"
|
||||
|
||||
# Variables and inputs
|
||||
variables:
|
||||
test_dir: "{project-root}/tests" # Root test directory
|
||||
review_scope: "single" # single (one file), directory (folder), suite (all tests)
|
||||
|
||||
# Output configuration
|
||||
default_output_file: "{output_folder}/test-review.md"
|
||||
|
||||
# Required tools
|
||||
required_tools:
|
||||
- read_file # Read test files, story, test-design
|
||||
- write_file # Create review report
|
||||
- list_files # Discover test files in directory
|
||||
- search_repo # Find tests by patterns
|
||||
- glob # Find test files matching patterns
|
||||
|
||||
tags:
|
||||
- qa
|
||||
- test-architect
|
||||
- code-review
|
||||
- quality
|
||||
- best-practices
|
||||
|
||||
execution_hints:
|
||||
interactive: false # Minimize prompts
|
||||
autonomous: true # Proceed without user input unless blocked
|
||||
iterative: true # Can review multiple files
|
||||
Reference in New Issue
Block a user