fix: ChatBubble crash and DeepSeek API compatibility

- Fix ChatBubble to handle non-string content with String() wrapper - Fix API route to use generateText for non-streaming requests - Add @ai-sdk/openai-compatible for non-OpenAI providers (DeepSeek, etc.) - Use Chat Completions API instead of Responses API for compatible providers - Update ChatBubble tests and fix component exports to kebab-case - Remove stale PascalCase ChatBubble.tsx file
2026-01-26 16:55:05 +07:00
parent 6b113e0392
commit e9e6fadb1d
544 changed files with 113077 additions and 427 deletions
--- a/_bmad/bmm/workflows/testarch/test-review/checklist.md
+++ b/_bmad/bmm/workflows/testarch/test-review/checklist.md
@@ -0,0 +1,472 @@
+# Test Quality Review - Validation Checklist
+
+Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
+
+---
+
+## Prerequisites
+
+Note: `test-review` is optional and only audits existing tests; it does not generate tests.
+
+### Test File Discovery
+
+- [ ] Test file(s) identified for review (single/directory/suite scope)
+- [ ] Test files exist and are readable
+- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
+
+### Knowledge Base Loading
+
+- [ ] tea-index.csv loaded successfully
+- [ ] `test-quality.md` loaded (Definition of Done)
+- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
+- [ ] `network-first.md` loaded (Route intercept before navigate)
+- [ ] `data-factories.md` loaded (Factory patterns)
+- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
+- [ ] All other enabled fragments loaded successfully
+
+### Context Gathering
+
+- [ ] Story file discovered or explicitly provided (if available)
+- [ ] Test design document discovered or explicitly provided (if available)
+- [ ] Acceptance criteria extracted from story (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
+
+---
+
+## Process Steps
+
+### Step 1: Context Loading
+
+- [ ] Review scope determined (single/directory/suite)
+- [ ] Test file paths collected
+- [ ] Related artifacts discovered (story, test-design)
+- [ ] Knowledge base fragments loaded successfully
+- [ ] Quality criteria flags read from workflow variables
+
+### Step 2: Test File Parsing
+
+**For Each Test File:**
+
+- [ ] File read successfully
+- [ ] File size measured (lines, KB)
+- [ ] File structure parsed (describe blocks, it blocks)
+- [ ] Test IDs extracted (if present)
+- [ ] Priority markers extracted (if present)
+- [ ] Imports analyzed
+- [ ] Dependencies identified
+
+**Test Structure Analysis:**
+
+- [ ] Describe block count calculated
+- [ ] It/test block count calculated
+- [ ] BDD structure identified (Given-When-Then)
+- [ ] Fixture usage detected
+- [ ] Data factory usage detected
+- [ ] Network interception patterns identified
+- [ ] Assertions counted
+- [ ] Waits and timeouts cataloged
+- [ ] Conditionals (if/else) detected
+- [ ] Try/catch blocks detected
+- [ ] Shared state or globals detected
+
+### Step 3: Quality Criteria Validation
+
+**For Each Enabled Criterion:**
+
+#### BDD Format (if `check_given_when_then: true`)
+
+- [ ] Given-When-Then structure evaluated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers
+- [ ] Examples of good/bad patterns noted
+
+#### Test IDs (if `check_test_ids: true`)
+
+- [ ] Test ID presence validated
+- [ ] Test ID format checked (e.g., 1.3-E2E-001)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing IDs cataloged
+
+#### Priority Markers (if `check_priority_markers: true`)
+
+- [ ] P0/P1/P2/P3 classification validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Missing priorities cataloged
+
+#### Hard Waits (if `check_hard_waits: true`)
+
+- [ ] sleep(), waitForTimeout(), hardcoded delays detected
+- [ ] Justification comments checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with line numbers and recommended fixes
+
+#### Determinism (if `check_determinism: true`)
+
+- [ ] Conditionals (if/else/switch) detected
+- [ ] Try/catch abuse detected
+- [ ] Random values (Math.random, Date.now) detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Isolation (if `check_isolation: true`)
+
+- [ ] Cleanup hooks (afterEach/afterAll) validated
+- [ ] Shared state detected
+- [ ] Global variable mutations detected
+- [ ] Resource cleanup verified
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Fixture Patterns (if `check_fixture_patterns: true`)
+
+- [ ] Fixtures detected (test.extend)
+- [ ] Pure functions validated
+- [ ] mergeTests usage checked
+- [ ] beforeEach complexity analyzed
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Data Factories (if `check_data_factories: true`)
+
+- [ ] Factory functions detected
+- [ ] Hardcoded data (magic strings/numbers) detected
+- [ ] Faker.js or similar usage validated
+- [ ] API-first setup pattern checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Network-First (if `check_network_first: true`)
+
+- [ ] page.route() before page.goto() validated
+- [ ] Race conditions detected (route after navigate)
+- [ ] waitForResponse patterns checked
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Assertions (if `check_assertions: true`)
+
+- [ ] Explicit assertions counted
+- [ ] Implicit waits without assertions detected
+- [ ] Assertion specificity validated
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+#### Test Length (if `check_test_length: true`)
+
+- [ ] File line count calculated
+- [ ] Threshold comparison (≤300 lines ideal)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Splitting recommendations generated (if >300 lines)
+
+#### Test Duration (if `check_test_duration: true`)
+
+- [ ] Test complexity analyzed (as proxy for duration if no execution data)
+- [ ] Threshold comparison (≤1.5 min target)
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Optimization recommendations generated
+
+#### Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
+- [ ] Race conditions detected
+- [ ] Timing-dependent assertions detected
+- [ ] Retry logic detected
+- [ ] Environment-dependent assumptions detected
+- [ ] Status assigned (PASS/WARN/FAIL)
+- [ ] Violations recorded with recommended fixes
+
+---
+
+### Step 4: Quality Score Calculation
+
+**Violation Counting:**
+
+- [ ] Critical (P0) violations counted
+- [ ] High (P1) violations counted
+- [ ] Medium (P2) violations counted
+- [ ] Low (P3) violations counted
+- [ ] Violation breakdown by criterion recorded
+
+**Score Calculation:**
+
+- [ ] Starting score: 100
+- [ ] Critical violations deducted (-10 each)
+- [ ] High violations deducted (-5 each)
+- [ ] Medium violations deducted (-2 each)
+- [ ] Low violations deducted (-1 each)
+- [ ] Bonus points added (max +30):
+  - [ ] Excellent BDD structure (+5 if applicable)
+  - [ ] Comprehensive fixtures (+5 if applicable)
+  - [ ] Comprehensive data factories (+5 if applicable)
+  - [ ] Network-first pattern (+5 if applicable)
+  - [ ] Perfect isolation (+5 if applicable)
+  - [ ] All test IDs present (+5 if applicable)
+- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
+
+**Quality Grade:**
+
+- [ ] Grade assigned based on score:
+  - 90-100: A+ (Excellent)
+  - 80-89: A (Good)
+  - 70-79: B (Acceptable)
+  - 60-69: C (Needs Improvement)
+  - <60: F (Critical Issues)
+
+---
+
+### Step 5: Review Report Generation
+
+**Report Sections Created:**
+
+- [ ] **Header Section**:
+  - [ ] Test file(s) reviewed listed
+  - [ ] Review date recorded
+  - [ ] Review scope noted (single/directory/suite)
+  - [ ] Quality score and grade displayed
+
+- [ ] **Executive Summary**:
+  - [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
+  - [ ] Key strengths listed (3-5 bullet points)
+  - [ ] Key weaknesses listed (3-5 bullet points)
+  - [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
+
+- [ ] **Quality Criteria Assessment**:
+  - [ ] Table with all criteria evaluated
+  - [ ] Status for each criterion (PASS/WARN/FAIL)
+  - [ ] Violation count per criterion
+
+- [ ] **Critical Issues (Must Fix)**:
+  - [ ] P0/P1 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended fix provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Recommendations (Should Fix)**:
+  - [ ] P2/P3 violations listed
+  - [ ] Code location provided for each (file:line)
+  - [ ] Issue explanation clear
+  - [ ] Recommended improvement provided with code example
+  - [ ] Knowledge base reference provided
+
+- [ ] **Best Practices Examples** (if good patterns found):
+  - [ ] Good patterns highlighted from tests
+  - [ ] Knowledge base fragments referenced
+  - [ ] Examples provided for others to follow
+
+- [ ] **Knowledge Base References**:
+  - [ ] All fragments consulted listed
+  - [ ] Links to detailed guidance provided
+
+---
+
+### Step 6: Optional Outputs Generation
+
+**Inline Comments** (if `generate_inline_comments: true`):
+
+- [ ] Inline comments generated at violation locations
+- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
+- [ ] Comments added to test files (no logic changes)
+- [ ] Test files remain valid and executable
+
+**Quality Badge** (if `generate_quality_badge: true`):
+
+- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
+- [ ] Badge format suitable for README or documentation
+- [ ] Badge saved to output folder
+
+**Story Update** (if `append_to_story: true` and story file exists):
+
+- [ ] "Test Quality Review" section created
+- [ ] Quality score included
+- [ ] Critical issues summarized
+- [ ] Link to full review report provided
+- [ ] Story file updated successfully
+
+---
+
+### Step 7: Save and Notify
+
+**Outputs Saved:**
+
+- [ ] Review report saved to `{output_file}`
+- [ ] Inline comments written to test files (if enabled)
+- [ ] Quality badge saved (if enabled)
+- [ ] Story file updated (if enabled)
+- [ ] All outputs are valid and readable
+
+**Summary Message Generated:**
+
+- [ ] Quality score and grade included
+- [ ] Critical issue count stated
+- [ ] Recommendation provided (Approve/Request changes/Block)
+- [ ] Next steps clarified
+- [ ] Message displayed to user
+
+---
+
+## Output Validation
+
+### Review Report Completeness
+
+- [ ] All required sections present
+- [ ] No placeholder text or TODOs in report
+- [ ] All code locations are accurate (file:line)
+- [ ] All code examples are valid and demonstrate fix
+- [ ] All knowledge base references are correct
+
+### Review Report Accuracy
+
+- [ ] Quality score matches violation breakdown
+- [ ] Grade matches score range
+- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
+- [ ] Violations correctly attributed to quality criteria
+- [ ] No false positives (violations are legitimate issues)
+- [ ] No false negatives (critical issues not missed)
+
+### Review Report Clarity
+
+- [ ] Executive summary is clear and actionable
+- [ ] Issue explanations are understandable
+- [ ] Recommended fixes are implementable
+- [ ] Code examples are correct and runnable
+- [ ] Recommendation (Approve/Request changes) is clear
+
+---
+
+## Quality Checks
+
+### Knowledge-Based Validation
+
+- [ ] All feedback grounded in knowledge base fragments
+- [ ] Recommendations follow proven patterns
+- [ ] No arbitrary or opinion-based feedback
+- [ ] Knowledge fragment references accurate and relevant
+
+### Actionable Feedback
+
+- [ ] Every issue includes recommended fix
+- [ ] Every fix includes code example
+- [ ] Code examples demonstrate correct pattern
+- [ ] Fixes reference knowledge base for more detail
+
+### Severity Classification
+
+- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
+- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
+- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
+- [ ] Low (P3) issues are minor style/preference (verbose tests)
+
+### Context Awareness
+
+- [ ] Review considers project context (some patterns may be justified)
+- [ ] Violations with justification comments noted as acceptable
+- [ ] Edge cases acknowledged
+- [ ] Recommendations are pragmatic, not dogmatic
+
+---
+
+## Integration Points
+
+### Story File Integration
+
+- [ ] Story file discovered correctly (if available)
+- [ ] Acceptance criteria extracted and used for context
+- [ ] Test quality section appended to story (if enabled)
+- [ ] Link to review report added to story
+
+### Test Design Integration
+
+- [ ] Test design document discovered correctly (if available)
+- [ ] Priority context (P0/P1/P2/P3) extracted and used
+- [ ] Review validates tests align with prioritization
+- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
+
+### Knowledge Base Integration
+
+- [ ] tea-index.csv loaded successfully
+- [ ] All required fragments loaded
+- [ ] Fragments applied correctly to validation
+- [ ] Fragment references in report are accurate
+
+---
+
+## Edge Cases and Special Situations
+
+### Empty or Minimal Tests
+
+- [ ] If test file is empty, report notes "No tests found"
+- [ ] If test file has only boilerplate, report notes "No meaningful tests"
+- [ ] Score reflects lack of content appropriately
+
+### Legacy Tests
+
+- [ ] Legacy tests acknowledged in context
+- [ ] Review provides practical recommendations for improvement
+- [ ] Recognizes that complete refactor may not be feasible
+- [ ] Prioritizes critical issues (flakiness) over style
+
+### Test Framework Variations
+
+- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
+- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
+- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
+- [ ] Knowledge fragments applied appropriately for framework
+
+### Justified Violations
+
+- [ ] Violations with justification comments in code noted as acceptable
+- [ ] Justifications evaluated for legitimacy
+- [ ] Report acknowledges justified patterns
+- [ ] Score not penalized for justified violations
+
+---
+
+## Final Validation
+
+### Review Completeness
+
+- [ ] All enabled quality criteria evaluated
+- [ ] All test files in scope reviewed
+- [ ] All violations cataloged
+- [ ] All recommendations provided
+- [ ] Review report is comprehensive
+
+### Review Accuracy
+
+- [ ] Quality score is accurate
+- [ ] Violations are correct (no false positives)
+- [ ] Critical issues not missed (no false negatives)
+- [ ] Code locations are correct
+- [ ] Knowledge base references are accurate
+
+### Review Usefulness
+
+- [ ] Feedback is actionable
+- [ ] Recommendations are implementable
+- [ ] Code examples are correct
+- [ ] Review helps developer improve tests
+- [ ] Review educates on best practices
+
+### Workflow Complete
+
+- [ ] All checklist items completed
+- [ ] All outputs validated and saved
+- [ ] User notified with summary
+- [ ] Review ready for developer consumption
+- [ ] Follow-up actions identified (if any)
+
+---
+
+## Notes
+
+Record any issues, observations, or important context during workflow execution:
+
+- **Test Framework**: [Playwright, Jest, Cypress, etc.]
+- **Review Scope**: [single file, directory, full suite]
+- **Quality Score**: [0-100 score, letter grade]
+- **Critical Issues**: [Count of P0/P1 violations]
+- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
+- **Special Considerations**: [Legacy code, justified patterns, edge cases]
+- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]
--- a/_bmad/bmm/workflows/testarch/test-review/instructions.md
+++ b/_bmad/bmm/workflows/testarch/test-review/instructions.md
@@ -0,0 +1,628 @@
+# Test Quality Review - Instructions v4.0
+
+**Workflow:** `testarch-test-review`
+**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
+**Agent:** Test Architect (TEA)
+**Format:** Pure Markdown v4.0 (no XML blocks)
+
+---
+
+## Overview
+
+This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
+
+**Key Capabilities:**
+
+- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
+- **Quality Scoring**: 0-100 score based on violations and best practices
+- **Multi-Scope**: Review single file, directory, or entire test suite
+- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
+- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
+- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
+- **Integration**: Works with story files, test-design, acceptance criteria
+
+---
+
+## Prerequisites
+
+**Required:**
+
+- Test file(s) to review (auto-discovered or explicitly provided)
+- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
+
+**Recommended:**
+
+- Story file with acceptance criteria (for context)
+- Test design document (for priority context)
+- Knowledge base fragments available in tea-index.csv
+
+**Halt Conditions:**
+
+- If test file path is invalid or file doesn't exist, halt and request correction
+- If test_dir is empty (no tests found), halt and notify user
+
+---
+
+## Workflow Steps
+
+### Step 1: Load Context and Knowledge Base
+
+**Actions:**
+
+1. Check playwright-utils flag:
+   - Read `{config_source}` and check `config.tea_use_playwright_utils`
+
+2. Load relevant knowledge fragments from `{project-root}/_bmad/bmm/testarch/tea-index.csv`:
+
+   **Core Patterns (Always load):**
+   - `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
+   - `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
+   - `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
+   - `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
+   - `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
+   - `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
+   - `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
+
+   **If `config.tea_use_playwright_utils: true` (All Utilities):**
+   - `overview.md` - Playwright utils best practices
+   - `api-request.md` - Validate apiRequest usage patterns
+   - `network-recorder.md` - Review HAR record/playback implementation
+   - `auth-session.md` - Check auth token management
+   - `intercept-network-call.md` - Validate network interception
+   - `recurse.md` - Review polling patterns
+   - `log.md` - Check logging best practices
+   - `file-utils.md` - Validate file operation patterns
+   - `burn-in.md` - Review burn-in configuration
+   - `network-error-monitor.md` - Check error monitoring setup
+   - `fixtures-composition.md` - Validate mergeTests usage
+
+   **If `config.tea_use_playwright_utils: false`:**
+   - `fixture-architecture.md` - Pure function → Fixture → mergeTests composition with auto-cleanup (406 lines, 5 examples)
+   - `network-first.md` - Route intercept before navigate to prevent race conditions (489 lines, 5 examples)
+   - `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
+   - `component-tdd.md` - Red-Green-Refactor patterns with provider isolation (480 lines, 4 examples)
+   - `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
+
+3. Determine review scope:
+   - **single**: Review one test file (`test_file_path` provided)
+   - **directory**: Review all tests in directory (`test_dir` provided)
+   - **suite**: Review entire test suite (discover all test files)
+
+4. Auto-discover related artifacts (if `auto_discover_story: true`):
+   - Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
+   - Search for story file (`story-1.3.md`)
+   - Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
+
+5. Read story file for context (if available):
+   - Extract acceptance criteria
+   - Extract priority classification
+   - Extract expected test IDs
+
+**Output:** Complete knowledge base loaded, review scope determined, context gathered
+
+---
+
+### Step 2: Discover and Parse Test Files
+
+**Actions:**
+
+1. **Discover test files** based on scope:
+   - **single**: Use `test_file_path` variable
+   - **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
+   - **suite**: Use `glob` to find all test files recursively from project root
+
+2. **Parse test file metadata**:
+   - File path and name
+   - File size (warn if >15 KB or >300 lines)
+   - Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
+   - Imports and dependencies
+   - Test structure (describe/context/it blocks)
+
+3. **Extract test structure**:
+   - Count of describe blocks (test suites)
+   - Count of it/test blocks (individual tests)
+   - Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
+   - Priority markers (if present, e.g., `test.describe.only` for P0)
+   - BDD structure (Given-When-Then comments or steps)
+
+4. **Identify test patterns**:
+   - Fixtures used
+   - Data factories used
+   - Network interception patterns
+   - Assertions used (expect, assert, toHaveText, etc.)
+   - Waits and timeouts (page.waitFor, sleep, hardcoded delays)
+   - Conditionals (if/else, switch, ternary)
+   - Try/catch blocks
+   - Shared state or globals
+
+**Output:** Complete test file inventory with structure and pattern analysis
+
+---
+
+### Step 3: Validate Against Quality Criteria
+
+**Actions:**
+
+For each test file, validate against quality criteria (configurable via workflow variables):
+
+#### 1. BDD Format Validation (if `check_given_when_then: true`)
+
+- ✅ **PASS**: Tests use Given-When-Then structure (comments or step organization)
+- ⚠️ **WARN**: Tests have some structure but not explicit GWT
+- ❌ **FAIL**: Tests lack clear structure, hard to understand intent
+
+**Knowledge Fragment**: test-quality.md, tdd-cycles.md
+
+---
+
+#### 2. Test ID Conventions (if `check_test_ids: true`)
+
+- ✅ **PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
+- ⚠️ **WARN**: Some test IDs missing or inconsistent
+- ❌ **FAIL**: No test IDs, can't trace tests to requirements
+
+**Knowledge Fragment**: traceability.md, test-quality.md
+
+---
+
+#### 3. Priority Markers (if `check_priority_markers: true`)
+
+- ✅ **PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
+- ⚠️ **WARN**: Some priority classifications missing
+- ❌ **FAIL**: No priority classification, can't determine criticality
+
+**Knowledge Fragment**: test-priorities.md, risk-governance.md
+
+---
+
+#### 4. Hard Waits Detection (if `check_hard_waits: true`)
+
+- ✅ **PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
+- ⚠️ **WARN**: Some hard waits used but with justification comments
+- ❌ **FAIL**: Hard waits detected without justification (flakiness risk)
+
+**Patterns to detect:**
+
+- `sleep(1000)`, `setTimeout()`, `delay()`
+- `page.waitForTimeout(5000)` without explicit reason
+- `await new Promise(resolve => setTimeout(resolve, 3000))`
+
+**Knowledge Fragment**: test-quality.md, network-first.md
+
+---
+
+#### 5. Determinism Check (if `check_determinism: true`)
+
+- ✅ **PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
+- ⚠️ **WARN**: Some conditionals but with clear justification
+- ❌ **FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
+
+**Patterns to detect:**
+
+- `if (condition) { test logic }` - tests should work deterministically
+- `try { test } catch { fallback }` - tests shouldn't swallow errors
+- `Math.random()`, `Date.now()` without factory abstraction
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 6. Isolation Validation (if `check_isolation: true`)
+
+- ✅ **PASS**: Tests clean up resources, no shared state, can run in any order
+- ⚠️ **WARN**: Some cleanup missing but isolated enough
+- ❌ **FAIL**: Tests share state, depend on execution order, leave resources
+
+**Patterns to check:**
+
+- afterEach/afterAll cleanup hooks present
+- No global variables mutated
+- Database/API state cleaned up after tests
+- Test data deleted or marked inactive
+
+**Knowledge Fragment**: test-quality.md, data-factories.md
+
+---
+
+#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
+
+- ✅ **PASS**: Uses pure function → Fixture → mergeTests pattern
+- ⚠️ **WARN**: Some fixtures used but not consistently
+- ❌ **FAIL**: No fixtures, tests repeat setup code (maintainability risk)
+
+**Patterns to check:**
+
+- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
+- Pure functions used for fixture logic
+- mergeTests used to combine fixtures
+- No beforeEach with complex setup (should be in fixtures)
+
+**Knowledge Fragment**: fixture-architecture.md
+
+---
+
+#### 8. Data Factories (if `check_data_factories: true`)
+
+- ✅ **PASS**: Uses factory functions with overrides, API-first setup
+- ⚠️ **WARN**: Some factories used but also hardcoded data
+- ❌ **FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
+
+**Patterns to check:**
+
+- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
+- Factories use faker.js or similar for realistic data
+- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
+- API-first setup (create via API, test via UI)
+
+**Knowledge Fragment**: data-factories.md
+
+---
+
+#### 9. Network-First Pattern (if `check_network_first: true`)
+
+- ✅ **PASS**: Route interception set up BEFORE navigation (race condition prevention)
+- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
+- ❌ **FAIL**: Route interception after navigation (race condition risk)
+
+**Patterns to check:**
+
+- `page.route()` called before `page.goto()`
+- `page.waitForResponse()` used with explicit URL pattern
+- No navigation followed immediately by route setup
+
+**Knowledge Fragment**: network-first.md
+
+---
+
+#### 10. Assertions (if `check_assertions: true`)
+
+- ✅ **PASS**: Explicit assertions present (expect, assert, toHaveText)
+- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
+- ❌ **FAIL**: Missing assertions, tests don't verify behavior
+
+**Patterns to check:**
+
+- Each test has at least one assertion
+- Assertions are specific (not just truthy checks)
+- Assertions use framework-provided matchers (toHaveText, toBeVisible)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 11. Test Length (if `check_test_length: true`)
+
+- ✅ **PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
+- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
+- ❌ **FAIL**: Test file >500 lines (too large, maintainability risk)
+
+**Knowledge Fragment**: test-quality.md
+
+---
+
+#### 12. Test Duration (if `check_test_duration: true`)
+
+- ✅ **PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
+- ⚠️ **WARN**: Some tests 1.5-3 minutes (consider optimization)
+- ❌ **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
+
+**Note:** Duration estimation based on complexity analysis if execution data unavailable
+
+**Knowledge Fragment**: test-quality.md, selective-testing.md
+
+---
+
+#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
+
+- ✅ **PASS**: No known flaky patterns detected
+- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
+- ❌ **FAIL**: Multiple flaky patterns detected (high flakiness risk)
+
+**Patterns to detect:**
+
+- Tight timeouts (e.g., `{ timeout: 1000 }`)
+- Race conditions (navigation before route interception)
+- Timing-dependent assertions (e.g., checking timestamps)
+- Retry logic in tests (hides flakiness)
+- Environment-dependent assumptions (hardcoded URLs, ports)
+
+**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
+
+---
+
+### Step 4: Calculate Quality Score
+
+**Actions:**
+
+1. **Count violations** by severity:
+   - **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
+   - **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
+   - **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
+   - **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
+
+2. **Calculate quality score** (if `quality_score_enabled: true`):
+
+```
+Starting Score: 100
+
+Critical Violations: -10 points each
+High Violations: -5 points each
+Medium Violations: -2 points each
+Low Violations: -1 point each
+
+Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
+
+Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
+```
+
+3. **Quality Grade**:
+   - **90-100**: Excellent (A+)
+   - **80-89**: Good (A)
+   - **70-79**: Acceptable (B)
+   - **60-69**: Needs Improvement (C)
+   - **<60**: Critical Issues (F)
+
+**Output:** Quality score calculated with violation breakdown
+
+---
+
+### Step 5: Generate Review Report
+
+**Actions:**
+
+1. **Create review report** using `test-review-template.md`:
+
+   **Header Section:**
+   - Test file(s) reviewed
+   - Review date
+   - Review scope (single/directory/suite)
+   - Quality score and grade
+
+   **Executive Summary:**
+   - Overall assessment (Excellent/Good/Needs Improvement/Critical)
+   - Key strengths
+   - Key weaknesses
+   - Recommendation (Approve/Approve with comments/Request changes)
+
+   **Quality Criteria Assessment:**
+   - Table with all criteria evaluated
+   - Status for each (PASS/WARN/FAIL)
+   - Violation count per criterion
+
+   **Critical Issues (Must Fix):**
+   - Priority P0/P1 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended fix
+   - Knowledge base reference
+
+   **Recommendations (Should Fix):**
+   - Priority P2/P3 violations
+   - Code location (file:line)
+   - Explanation of issue
+   - Recommended improvement
+   - Knowledge base reference
+
+   **Best Practices Examples:**
+   - Highlight good patterns found in tests
+   - Reference knowledge base fragments
+   - Provide examples for others to follow
+
+   **Knowledge Base References:**
+   - List all fragments consulted
+   - Provide links to detailed guidance
+
+2. **Generate inline comments** (if `generate_inline_comments: true`):
+   - Add TODO comments in test files at violation locations
+   - Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
+   - Never modify test logic, only add comments
+
+3. **Generate quality badge** (if `generate_quality_badge: true`):
+   - Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
+   - Format for inclusion in README or documentation
+
+4. **Append to story file** (if `append_to_story: true` and story file exists):
+   - Add "Test Quality Review" section to story
+   - Include quality score and critical issues
+   - Link to full review report
+
+**Output:** Comprehensive review report with actionable feedback
+
+---
+
+### Step 6: Save Outputs and Notify
+
+**Actions:**
+
+1. **Save review report** to `{output_file}`
+2. **Save inline comments** to test files (if enabled)
+3. **Save quality badge** to output folder (if enabled)
+4. **Update story file** (if enabled)
+5. **Generate summary message** for user:
+   - Quality score and grade
+   - Critical issue count
+   - Recommendation
+
+**Output:** All review artifacts saved and user notified
+
+---
+
+## Quality Criteria Decision Matrix
+
+| Criterion          | PASS                      | WARN           | FAIL                | Knowledge Fragment      |
+| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
+| BDD Format         | Given-When-Then present   | Some structure | No structure        | test-quality.md         |
+| Test IDs           | All tests have IDs        | Some missing   | No IDs              | traceability.md         |
+| Priority Markers   | All classified            | Some missing   | No classification   | test-priorities.md      |
+| Hard Waits         | No hard waits             | Some justified | Hard waits present  | test-quality.md         |
+| Determinism        | No conditionals/random    | Some justified | Conditionals/random | test-quality.md         |
+| Isolation          | Clean up, no shared state | Some gaps      | Shared state        | test-quality.md         |
+| Fixture Patterns   | Pure fn → Fixture         | Some fixtures  | No fixtures         | fixture-architecture.md |
+| Data Factories     | Factory functions         | Some factories | Hardcoded data      | data-factories.md       |
+| Network-First      | Intercept before navigate | Some correct   | Race conditions     | network-first.md        |
+| Assertions         | Explicit assertions       | Some implicit  | Missing assertions  | test-quality.md         |
+| Test Length        | ≤300 lines                | 301-500 lines  | >500 lines          | test-quality.md         |
+| Test Duration      | ≤1.5 min                  | 1.5-3 min      | >3 min              | test-quality.md         |
+| Flakiness Patterns | No flaky patterns         | Some potential | Multiple patterns   | ci-burn-in.md           |
+
+---
+
+## Example Review Summary
+
+````markdown
+# Test Quality Review: auth-login.spec.ts
+
+**Quality Score**: 78/100 (B - Acceptable)
+**Review Date**: 2025-10-14
+**Recommendation**: Approve with Comments
+
+## Executive Summary
+
+Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
+
+**Strengths:**
+
+- Excellent BDD structure with clear Given-When-Then comments
+- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
+- Comprehensive assertions on authentication state
+
+**Weaknesses:**
+
+- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
+- Hardcoded test data (email: 'test@example.com') - use factories instead
+- Missing fixture for common login setup - DRY violation
+
+**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
+
+## Critical Issues (Must Fix)
+
+### 1. Hard Wait Detected (Line 45)
+
+**Severity**: P0 (Critical)
+**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
+**Fix**: Use explicit wait for element or network request instead
+**Knowledge**: See test-quality.md, network-first.md
+
+```typescript
+// ❌ Bad (current)
+await page.waitForTimeout(2000);
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
+
+// ✅ Good (recommended)
+await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
+```
+````
+
+## Recommendations (Should Fix)
+
+### 1. Use Data Factory for Test User (Lines 23, 32, 41)
+
+**Severity**: P1 (High)
+**Issue**: Hardcoded email `test@example.com` - maintainability risk
+**Fix**: Create factory function for test users
+**Knowledge**: See data-factories.md
+
+```typescript
+// ✅ Good (recommended)
+import { createTestUser } from './factories/user-factory';
+
+const testUser = createTestUser({ role: 'admin' });
+await loginPage.login(testUser.email, testUser.password);
+```
+
+### 2. Extract Login Setup to Fixture (Lines 18-28)
+
+**Severity**: P1 (High)
+**Issue**: Login setup repeated across tests - DRY violation
+**Fix**: Create fixture for authenticated state
+**Knowledge**: See fixture-architecture.md
+
+```typescript
+// ✅ Good (recommended)
+const test = base.extend({
+  authenticatedPage: async ({ page }, use) => {
+    const user = createTestUser();
+    await loginPage.login(user.email, user.password);
+    await use(page);
+  },
+});
+
+test('user can access dashboard', async ({ authenticatedPage }) => {
+  // Test starts already logged in
+});
+```
+
+## Quality Score Breakdown
+
+- Starting Score: 100
+- Critical Violations (1 × -10): -10
+- High Violations (2 × -5): -10
+- Medium Violations (0 × -2): 0
+- Low Violations (1 × -1): -1
+- Bonus (BDD +5, Test IDs +5): +10
+- **Final Score**: 78/100 (B)
+
+```
+
+---
+
+## Integration with Other Workflows
+
+### Before Test Review
+
+- **atdd**: Generate acceptance tests (TEA reviews them for quality)
+- **automate**: Expand regression suite (TEA reviews new tests)
+- **dev story**: Developer writes implementation tests (TEA reviews them)
+
+### After Test Review
+
+- **Developer**: Addresses critical issues, improves based on recommendations
+- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
+
+### Coordinates With
+
+- **Story File**: Review links to acceptance criteria context
+- **Test Design**: Review validates tests align with prioritization
+- **Knowledge Base**: Review references fragments for detailed guidance
+
+---
+
+## Important Notes
+
+1. **Non-Prescriptive**: Review provides guidance, not rigid rules
+2. **Context Matters**: Some violations may be justified for specific scenarios
+3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
+4. **Actionable**: Every issue includes recommended fix with code examples
+5. **Quality Score**: Use as indicator, not absolute measure
+6. **Continuous Improvement**: Review same tests periodically as patterns evolve
+
+---
+
+## Troubleshooting
+
+**Problem: No test files found**
+- Verify test_dir path is correct
+- Check test file extensions match glob pattern
+- Ensure test files exist in expected location
+
+**Problem: Quality score seems too low/high**
+- Review violation counts - may need to adjust thresholds
+- Consider context - some projects have different standards
+- Focus on critical issues first, not just score
+
+**Problem: Inline comments not generated**
+- Check generate_inline_comments: true in variables
+- Verify write permissions on test files
+- Review append_to_file: false (separate report mode)
+
+**Problem: Knowledge fragments not loading**
+- Verify tea-index.csv exists in testarch/ directory
+- Check fragment file paths are correct
+- Ensure auto_load_knowledge: true in variables
+```
--- a/_bmad/bmm/workflows/testarch/test-review/test-review-template.md
+++ b/_bmad/bmm/workflows/testarch/test-review/test-review-template.md
@@ -0,0 +1,390 @@
+# Test Quality Review: {test_filename}
+
+**Quality Score**: {score}/100 ({grade} - {assessment})
+**Review Date**: {YYYY-MM-DD}
+**Review Scope**: {single | directory | suite}
+**Reviewer**: {user_name or TEA Agent}
+
+---
+
+Note: This review audits existing tests; it does not generate tests.
+
+## Executive Summary
+
+**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+### Key Strengths
+
+✅ {strength_1}
+✅ {strength_2}
+✅ {strength_3}
+
+### Key Weaknesses
+
+❌ {weakness_1}
+❌ {weakness_2}
+❌ {weakness_3}
+
+### Summary
+
+{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
+
+---
+
+## Quality Criteria Assessment
+
+| Criterion                            | Status                          | Violations | Notes        |
+| ------------------------------------ | ------------------------------- | ---------- | ------------ |
+| BDD Format (Given-When-Then)         | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test IDs                             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Priority Markers (P0/P1/P2/P3)       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Hard Waits (sleep, waitForTimeout)   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Determinism (no conditionals)        | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Fixture Patterns                     | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Data Factories                       | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Network-First Pattern                | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Explicit Assertions                  | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+| Test Length (≤300 lines)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines}    | {brief_note} |
+| Test Duration (≤1.5 min)             | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
+| Flakiness Patterns                   | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count}    | {brief_note} |
+
+**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
+
+---
+
+## Quality Score Breakdown
+
+```
+Starting Score:          100
+Critical Violations:     -{critical_count} × 10 = -{critical_deduction}
+High Violations:         -{high_count} × 5 = -{high_deduction}
+Medium Violations:       -{medium_count} × 2 = -{medium_deduction}
+Low Violations:          -{low_count} × 1 = -{low_deduction}
+
+Bonus Points:
+  Excellent BDD:         +{0|5}
+  Comprehensive Fixtures: +{0|5}
+  Data Factories:        +{0|5}
+  Network-First:         +{0|5}
+  Perfect Isolation:     +{0|5}
+  All Test IDs:          +{0|5}
+                         --------
+Total Bonus:             +{bonus_total}
+
+Final Score:             {final_score}/100
+Grade:                   {grade}
+```
+
+---
+
+## Critical Issues (Must Fix)
+
+{If no critical issues: "No critical issues detected. ✅"}
+
+{For each critical issue:}
+
+### {issue_number}. {Issue Title}
+
+**Severity**: P0 (Critical)
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what the problem is and why it's critical}
+
+**Current Code**:
+
+```typescript
+// ❌ Bad (current implementation)
+{
+  code_snippet_showing_problem;
+}
+```
+
+**Recommended Fix**:
+
+```typescript
+// ✅ Good (recommended approach)
+{
+  code_snippet_showing_solution;
+}
+```
+
+**Why This Matters**:
+{Explanation of impact - flakiness risk, maintainability, reliability}
+
+**Related Violations**:
+{If similar issue appears elsewhere, note line numbers}
+
+---
+
+## Recommendations (Should Fix)
+
+{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
+
+{For each recommendation:}
+
+### {rec_number}. {Recommendation Title}
+
+**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
+**Location**: `{filename}:{line_number}`
+**Criterion**: {criterion_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Issue Description**:
+{Detailed explanation of what could be improved and why}
+
+**Current Code**:
+
+```typescript
+// ⚠️ Could be improved (current implementation)
+{
+  code_snippet_showing_current_approach;
+}
+```
+
+**Recommended Improvement**:
+
+```typescript
+// ✅ Better approach (recommended)
+{
+  code_snippet_showing_improvement;
+}
+```
+
+**Benefits**:
+{Explanation of benefits - maintainability, readability, reusability}
+
+**Priority**:
+{Why this is P1/P2/P3 - urgency and impact}
+
+---
+
+## Best Practices Found
+
+{If good patterns found, highlight them}
+
+{For each best practice:}
+
+### {practice_number}. {Best Practice Title}
+
+**Location**: `{filename}:{line_number}`
+**Pattern**: {pattern_name}
+**Knowledge Base**: [{fragment_name}]({fragment_path})
+
+**Why This Is Good**:
+{Explanation of why this pattern is excellent}
+
+**Code Example**:
+
+```typescript
+// ✅ Excellent pattern demonstrated in this test
+{
+  code_snippet_showing_best_practice;
+}
+```
+
+**Use as Reference**:
+{Encourage using this pattern in other tests}
+
+---
+
+## Test File Analysis
+
+### File Metadata
+
+- **File Path**: `{relative_path_from_project_root}`
+- **File Size**: {line_count} lines, {kb_size} KB
+- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
+- **Language**: {TypeScript | JavaScript}
+
+### Test Structure
+
+- **Describe Blocks**: {describe_count}
+- **Test Cases (it/test)**: {test_count}
+- **Average Test Length**: {avg_lines_per_test} lines per test
+- **Fixtures Used**: {fixture_count} ({fixture_names})
+- **Data Factories Used**: {factory_count} ({factory_names})
+
+### Test Coverage Scope
+
+- **Test IDs**: {test_id_list}
+- **Priority Distribution**:
+  - P0 (Critical): {p0_count} tests
+  - P1 (High): {p1_count} tests
+  - P2 (Medium): {p2_count} tests
+  - P3 (Low): {p3_count} tests
+  - Unknown: {unknown_count} tests
+
+### Assertions Analysis
+
+- **Total Assertions**: {assertion_count}
+- **Assertions per Test**: {avg_assertions_per_test} (avg)
+- **Assertion Types**: {assertion_types_used}
+
+---
+
+## Context and Integration
+
+### Related Artifacts
+
+{If story file found:}
+
+- **Story File**: [{story_filename}]({story_path})
+- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
+
+{If test-design found:}
+
+- **Test Design**: [{test_design_filename}]({test_design_path})
+- **Risk Assessment**: {risk_level}
+- **Priority Framework**: P0-P3 applied
+
+### Acceptance Criteria Validation
+
+{If story file available, map tests to ACs:}
+
+| Acceptance Criterion | Test ID   | Status                     | Notes   |
+| -------------------- | --------- | -------------------------- | ------- |
+| {AC_1}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_2}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+| {AC_3}               | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
+
+**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
+
+---
+
+## Knowledge Base References
+
+This review consulted the following knowledge base fragments:
+
+- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
+- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function → Fixture → mergeTests pattern
+- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
+- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
+- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
+- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
+- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
+- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
+- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
+- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
+
+See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
+
+---
+
+## Next Steps
+
+### Immediate Actions (Before Merge)
+
+1. **{action_1}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+2. **{action_2}** - {description}
+   - Priority: {P0 | P1 | P2}
+   - Owner: {team_or_person}
+   - Estimated Effort: {time_estimate}
+
+### Follow-up Actions (Future PRs)
+
+1. **{action_1}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+2. **{action_2}** - {description}
+   - Priority: {P2 | P3}
+   - Target: {next_sprint | backlog}
+
+### Re-Review Needed?
+
+{✅ No re-review needed - approve as-is}
+{⚠️ Re-review after critical fixes - request changes, then re-review}
+{❌ Major refactor required - block merge, pair programming recommended}
+
+---
+
+## Decision
+
+**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
+
+**Rationale**:
+{1-2 paragraph explanation of recommendation based on findings}
+
+**For Approve**:
+
+> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
+
+**For Approve with Comments**:
+
+> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
+
+**For Request Changes**:
+
+> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
+
+**For Block**:
+
+> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
+
+---
+
+## Appendix
+
+### Violation Summary by Location
+
+{Table of all violations sorted by line number:}
+
+| Line   | Severity      | Criterion   | Issue         | Fix         |
+| ------ | ------------- | ----------- | ------------- | ----------- |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
+
+### Quality Trends
+
+{If reviewing same file multiple times, show trend:}
+
+| Review Date  | Score         | Grade     | Critical Issues | Trend       |
+| ------------ | ------------- | --------- | --------------- | ----------- |
+| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1}       | ⬆️ Improved |
+| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2}       | ⬇️ Declined |
+| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3}       | ➡️ Stable   |
+
+### Related Reviews
+
+{If reviewing multiple files in directory/suite:}
+
+| File     | Score       | Grade   | Critical | Status             |
+| -------- | ----------- | ------- | -------- | ------------------ |
+| {file_1} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_2} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+| {file_3} | {score}/100 | {grade} | {count}  | {Approved/Blocked} |
+
+**Suite Average**: {avg_score}/100 ({avg_grade})
+
+---
+
+## Review Metadata
+
+**Generated By**: BMad TEA Agent (Test Architect)
+**Workflow**: testarch-test-review v4.0
+**Review ID**: test-review-{filename}-{YYYYMMDD}
+**Timestamp**: {YYYY-MM-DD HH:MM:SS}
+**Version**: 1.0
+
+---
+
+## Feedback on This Review
+
+If you have questions or feedback on this review:
+
+1. Review patterns in knowledge base: `testarch/knowledge/`
+2. Consult tea-index.csv for detailed guidance
+3. Request clarification on specific violations
+4. Pair with QA engineer to apply patterns
+
+This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.
--- a/_bmad/bmm/workflows/testarch/test-review/workflow.yaml
+++ b/_bmad/bmm/workflows/testarch/test-review/workflow.yaml
@@ -0,0 +1,46 @@
+# Test Architect workflow: test-review
+name: testarch-test-review
+description: "Review test quality using comprehensive knowledge base and best practices validation"
+author: "BMad"
+
+# Critical variables from config
+config_source: "{project-root}/_bmad/bmm/config.yaml"
+output_folder: "{config_source}:output_folder"
+user_name: "{config_source}:user_name"
+communication_language: "{config_source}:communication_language"
+document_output_language: "{config_source}:document_output_language"
+date: system-generated
+
+# Workflow components
+installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-review"
+instructions: "{installed_path}/instructions.md"
+validation: "{installed_path}/checklist.md"
+template: "{installed_path}/test-review-template.md"
+
+# Variables and inputs
+variables:
+  test_dir: "{project-root}/tests" # Root test directory
+  review_scope: "single" # single (one file), directory (folder), suite (all tests)
+
+# Output configuration
+default_output_file: "{output_folder}/test-review.md"
+
+# Required tools
+required_tools:
+  - read_file # Read test files, story, test-design
+  - write_file # Create review report
+  - list_files # Discover test files in directory
+  - search_repo # Find tests by patterns
+  - glob # Find test files matching patterns
+
+tags:
+  - qa
+  - test-architect
+  - code-review
+  - quality
+  - best-practices
+
+execution_hints:
+  interactive: false # Minimize prompts
+  autonomous: true # Proceed without user input unless blocked
+  iterative: true # Can review multiple files