Ignore and untrack BMad directories

This commit is contained in:
Max
2026-01-26 15:49:36 +07:00
parent 7b732372e3
commit 6b113e0392
525 changed files with 2 additions and 112645 deletions

View File

@@ -1,364 +0,0 @@
# ATDD Checklist - Epic {epic_num}, Story {story_num}: {story_title}
**Date:** {date}
**Author:** {user_name}
**Primary Test Level:** {primary_level}
---
## Story Summary
{Brief 2-3 sentence summary of the user story}
**As a** {user_role}
**I want** {feature_description}
**So that** {business_value}
---
## Acceptance Criteria
{List all testable acceptance criteria from the story}
1. {Acceptance criterion 1}
2. {Acceptance criterion 2}
3. {Acceptance criterion 3}
---
## Failing Tests Created (RED Phase)
### E2E Tests ({e2e_test_count} tests)
**File:** `{e2e_test_file_path}` ({line_count} lines)
{List each E2E test with its current status and expected failure reason}
-**Test:** {test_name}
- **Status:** RED - {failure_reason}
- **Verifies:** {what_this_test_validates}
### API Tests ({api_test_count} tests)
**File:** `{api_test_file_path}` ({line_count} lines)
{List each API test with its current status and expected failure reason}
-**Test:** {test_name}
- **Status:** RED - {failure_reason}
- **Verifies:** {what_this_test_validates}
### Component Tests ({component_test_count} tests)
**File:** `{component_test_file_path}` ({line_count} lines)
{List each component test with its current status and expected failure reason}
-**Test:** {test_name}
- **Status:** RED - {failure_reason}
- **Verifies:** {what_this_test_validates}
---
## Data Factories Created
{List all data factory files created with their exports}
### {Entity} Factory
**File:** `tests/support/factories/{entity}.factory.ts`
**Exports:**
- `create{Entity}(overrides?)` - Create single entity with optional overrides
- `create{Entity}s(count)` - Create array of entities
**Example Usage:**
```typescript
const user = createUser({ email: 'specific@example.com' });
const users = createUsers(5); // Generate 5 random users
```
---
## Fixtures Created
{List all test fixture files created with their fixture names and descriptions}
### {Feature} Fixtures
**File:** `tests/support/fixtures/{feature}.fixture.ts`
**Fixtures:**
- `{fixtureName}` - {description_of_what_fixture_provides}
- **Setup:** {what_setup_does}
- **Provides:** {what_test_receives}
- **Cleanup:** {what_cleanup_does}
**Example Usage:**
```typescript
import { test } from './fixtures/{feature}.fixture';
test('should do something', async ({ {fixtureName} }) => {
// {fixtureName} is ready to use with auto-cleanup
});
```
---
## Mock Requirements
{Document external services that need mocking and their requirements}
### {Service Name} Mock
**Endpoint:** `{HTTP_METHOD} {endpoint_url}`
**Success Response:**
```json
{
{success_response_example}
}
```
**Failure Response:**
```json
{
{failure_response_example}
}
```
**Notes:** {any_special_mock_requirements}
---
## Required data-testid Attributes
{List all data-testid attributes required in UI implementation for test stability}
### {Page or Component Name}
- `{data-testid-name}` - {description_of_element}
- `{data-testid-name}` - {description_of_element}
**Implementation Example:**
```tsx
<button data-testid="login-button">Log In</button>
<input data-testid="email-input" type="email" />
<div data-testid="error-message">{errorText}</div>
```
---
## Implementation Checklist
{Map each failing test to concrete implementation tasks that will make it pass}
### Test: {test_name_1}
**File:** `{test_file_path}`
**Tasks to make this test pass:**
- [ ] {Implementation task 1}
- [ ] {Implementation task 2}
- [ ] {Implementation task 3}
- [ ] Add required data-testid attributes: {list_of_testids}
- [ ] Run test: `{test_execution_command}`
- [ ] ✅ Test passes (green phase)
**Estimated Effort:** {effort_estimate} hours
---
### Test: {test_name_2}
**File:** `{test_file_path}`
**Tasks to make this test pass:**
- [ ] {Implementation task 1}
- [ ] {Implementation task 2}
- [ ] {Implementation task 3}
- [ ] Add required data-testid attributes: {list_of_testids}
- [ ] Run test: `{test_execution_command}`
- [ ] ✅ Test passes (green phase)
**Estimated Effort:** {effort_estimate} hours
---
## Running Tests
```bash
# Run all failing tests for this story
{test_command_all}
# Run specific test file
{test_command_specific_file}
# Run tests in headed mode (see browser)
{test_command_headed}
# Debug specific test
{test_command_debug}
# Run tests with coverage
{test_command_coverage}
```
---
## Red-Green-Refactor Workflow
### RED Phase (Complete) ✅
**TEA Agent Responsibilities:**
- ✅ All tests written and failing
- ✅ Fixtures and factories created with auto-cleanup
- ✅ Mock requirements documented
- ✅ data-testid requirements listed
- ✅ Implementation checklist created
**Verification:**
- All tests run and fail as expected
- Failure messages are clear and actionable
- Tests fail due to missing implementation, not test bugs
---
### GREEN Phase (DEV Team - Next Steps)
**DEV Agent Responsibilities:**
1. **Pick one failing test** from implementation checklist (start with highest priority)
2. **Read the test** to understand expected behavior
3. **Implement minimal code** to make that specific test pass
4. **Run the test** to verify it now passes (green)
5. **Check off the task** in implementation checklist
6. **Move to next test** and repeat
**Key Principles:**
- One test at a time (don't try to fix all at once)
- Minimal implementation (don't over-engineer)
- Run tests frequently (immediate feedback)
- Use implementation checklist as roadmap
**Progress Tracking:**
- Check off tasks as you complete them
- Share progress in daily standup
- Mark story as IN PROGRESS in `bmm-workflow-status.md`
---
### REFACTOR Phase (DEV Team - After All Tests Pass)
**DEV Agent Responsibilities:**
1. **Verify all tests pass** (green phase complete)
2. **Review code for quality** (readability, maintainability, performance)
3. **Extract duplications** (DRY principle)
4. **Optimize performance** (if needed)
5. **Ensure tests still pass** after each refactor
6. **Update documentation** (if API contracts change)
**Key Principles:**
- Tests provide safety net (refactor with confidence)
- Make small refactors (easier to debug if tests fail)
- Run tests after each change
- Don't change test behavior (only implementation)
**Completion:**
- All tests pass
- Code quality meets team standards
- No duplications or code smells
- Ready for code review and story approval
---
## Next Steps
1. **Share this checklist and failing tests** with the dev workflow (manual handoff)
2. **Review this checklist** with team in standup or planning
3. **Run failing tests** to confirm RED phase: `{test_command_all}`
4. **Begin implementation** using implementation checklist as guide
5. **Work one test at a time** (red → green for each)
6. **Share progress** in daily standup
7. **When all tests pass**, refactor code for quality
8. **When refactoring complete**, manually update story status to 'done' in sprint-status.yaml
---
## Knowledge Base References Applied
This ATDD workflow consulted the following knowledge fragments:
- **fixture-architecture.md** - Test fixture patterns with setup/teardown and auto-cleanup using Playwright's `test.extend()`
- **data-factories.md** - Factory patterns using `@faker-js/faker` for random test data generation with overrides support
- **component-tdd.md** - Component test strategies using Playwright Component Testing
- **network-first.md** - Route interception patterns (intercept BEFORE navigation to prevent race conditions)
- **test-quality.md** - Test design principles (Given-When-Then, one assertion per test, determinism, isolation)
- **test-levels-framework.md** - Test level selection framework (E2E vs API vs Component vs Unit)
See `tea-index.csv` for complete knowledge fragment mapping.
---
## Test Execution Evidence
### Initial Test Run (RED Phase Verification)
**Command:** `{test_command_all}`
**Results:**
```
{paste_test_run_output_showing_all_tests_failing}
```
**Summary:**
- Total tests: {total_test_count}
- Passing: 0 (expected)
- Failing: {total_test_count} (expected)
- Status: ✅ RED phase verified
**Expected Failure Messages:**
{list_expected_failure_messages_for_each_test}
---
## Notes
{Any additional notes, context, or special considerations for this story}
- {Note 1}
- {Note 2}
- {Note 3}
---
## Contact
**Questions or Issues?**
- Ask in team standup
- Tag @{tea_agent_username} in Slack/Discord
- Refer to `./bmm/docs/tea-README.md` for workflow documentation
- Consult `./bmm/testarch/knowledge` for testing best practices
---
**Generated by BMad TEA Agent** - {date}

View File

@@ -1,374 +0,0 @@
# ATDD Workflow Validation Checklist
Use this checklist to validate that the ATDD workflow has been executed correctly and all deliverables meet quality standards.
## Prerequisites
Before starting this workflow, verify:
- [ ] Story approved with clear acceptance criteria (AC must be testable)
- [ ] Development sandbox/environment ready
- [ ] Framework scaffolding exists (run `framework` workflow if missing)
- [ ] Test framework configuration available (playwright.config.ts or cypress.config.ts)
- [ ] Package.json has test dependencies installed (Playwright or Cypress)
**Halt if missing:** Framework scaffolding or story acceptance criteria
---
## Step 1: Story Context and Requirements
- [ ] Story markdown file loaded and parsed successfully
- [ ] All acceptance criteria identified and extracted
- [ ] Affected systems and components identified
- [ ] Technical constraints documented
- [ ] Framework configuration loaded (playwright.config.ts or cypress.config.ts)
- [ ] Test directory structure identified from config
- [ ] Existing fixture patterns reviewed for consistency
- [ ] Similar test patterns searched and found in `{test_dir}`
- [ ] Knowledge base fragments loaded:
- [ ] `fixture-architecture.md`
- [ ] `data-factories.md`
- [ ] `component-tdd.md`
- [ ] `network-first.md`
- [ ] `test-quality.md`
---
## Step 2: Test Level Selection and Strategy
- [ ] Each acceptance criterion analyzed for appropriate test level
- [ ] Test level selection framework applied (E2E vs API vs Component vs Unit)
- [ ] E2E tests: Critical user journeys and multi-system integration identified
- [ ] API tests: Business logic and service contracts identified
- [ ] Component tests: UI component behavior and interactions identified
- [ ] Unit tests: Pure logic and edge cases identified (if applicable)
- [ ] Duplicate coverage avoided (same behavior not tested at multiple levels unnecessarily)
- [ ] Tests prioritized using P0-P3 framework (if test-design document exists)
- [ ] Primary test level set in `primary_level` variable (typically E2E or API)
- [ ] Test levels documented in ATDD checklist
---
## Step 3: Failing Tests Generated
### Test File Structure Created
- [ ] Test files organized in appropriate directories:
- [ ] `tests/e2e/` for end-to-end tests
- [ ] `tests/api/` for API tests
- [ ] `tests/component/` for component tests
- [ ] `tests/support/` for infrastructure (fixtures, factories, helpers)
### E2E Tests (If Applicable)
- [ ] E2E test files created in `tests/e2e/`
- [ ] All tests follow Given-When-Then format
- [ ] Tests use `data-testid` selectors (not CSS classes or fragile selectors)
- [ ] One assertion per test (atomic test design)
- [ ] No hard waits or sleeps (explicit waits only)
- [ ] Network-first pattern applied (route interception BEFORE navigation)
- [ ] Tests fail initially (RED phase verified by local test run)
- [ ] Failure messages are clear and actionable
### API Tests (If Applicable)
- [ ] API test files created in `tests/api/`
- [ ] Tests follow Given-When-Then format
- [ ] API contracts validated (request/response structure)
- [ ] HTTP status codes verified
- [ ] Response body validation includes all required fields
- [ ] Error cases tested (400, 401, 403, 404, 500)
- [ ] Tests fail initially (RED phase verified)
### Component Tests (If Applicable)
- [ ] Component test files created in `tests/component/`
- [ ] Tests follow Given-When-Then format
- [ ] Component mounting works correctly
- [ ] Interaction testing covers user actions (click, hover, keyboard)
- [ ] State management within component validated
- [ ] Props and events tested
- [ ] Tests fail initially (RED phase verified)
### Test Quality Validation
- [ ] All tests use Given-When-Then structure with clear comments
- [ ] All tests have descriptive names explaining what they test
- [ ] No duplicate tests (same behavior tested multiple times)
- [ ] No flaky patterns (race conditions, timing issues)
- [ ] No test interdependencies (tests can run in any order)
- [ ] Tests are deterministic (same input always produces same result)
---
## Step 4: Data Infrastructure Built
### Data Factories Created
- [ ] Factory files created in `tests/support/factories/`
- [ ] All factories use `@faker-js/faker` for random data generation (no hardcoded values)
- [ ] Factories support overrides for specific test scenarios
- [ ] Factories generate complete valid objects matching API contracts
- [ ] Helper functions for bulk creation provided (e.g., `createUsers(count)`)
- [ ] Factory exports are properly typed (TypeScript)
### Test Fixtures Created
- [ ] Fixture files created in `tests/support/fixtures/`
- [ ] All fixtures use Playwright's `test.extend()` pattern
- [ ] Fixtures have setup phase (arrange test preconditions)
- [ ] Fixtures provide data to tests via `await use(data)`
- [ ] Fixtures have teardown phase with auto-cleanup (delete created data)
- [ ] Fixtures are composable (can use other fixtures if needed)
- [ ] Fixtures are isolated (each test gets fresh data)
- [ ] Fixtures are type-safe (TypeScript types defined)
### Mock Requirements Documented
- [ ] External service mocking requirements identified
- [ ] Mock endpoints documented with URLs and methods
- [ ] Success response examples provided
- [ ] Failure response examples provided
- [ ] Mock requirements documented in ATDD checklist for DEV team
### data-testid Requirements Listed
- [ ] All required data-testid attributes identified from E2E tests
- [ ] data-testid list organized by page or component
- [ ] Each data-testid has clear description of element it targets
- [ ] data-testid list included in ATDD checklist for DEV team
---
## Step 5: Implementation Checklist Created
- [ ] Implementation checklist created with clear structure
- [ ] Each failing test mapped to concrete implementation tasks
- [ ] Tasks include:
- [ ] Route/component creation
- [ ] Business logic implementation
- [ ] API integration
- [ ] data-testid attribute additions
- [ ] Error handling
- [ ] Test execution command
- [ ] Completion checkbox
- [ ] Red-Green-Refactor workflow documented in checklist
- [ ] RED phase marked as complete (TEA responsibility)
- [ ] GREEN phase tasks listed for DEV team
- [ ] REFACTOR phase guidance provided
- [ ] Execution commands provided:
- [ ] Run all tests: `npm run test:e2e`
- [ ] Run specific test file
- [ ] Run in headed mode
- [ ] Debug specific test
- [ ] Estimated effort included (hours or story points)
---
## Step 6: Deliverables Generated
### ATDD Checklist Document Created
- [ ] Output file created at `{output_folder}/atdd-checklist-{story_id}.md`
- [ ] Document follows template structure from `atdd-checklist-template.md`
- [ ] Document includes all required sections:
- [ ] Story summary
- [ ] Acceptance criteria breakdown
- [ ] Failing tests created (paths and line counts)
- [ ] Data factories created
- [ ] Fixtures created
- [ ] Mock requirements
- [ ] Required data-testid attributes
- [ ] Implementation checklist
- [ ] Red-green-refactor workflow
- [ ] Execution commands
- [ ] Next steps for DEV team
- [ ] Output shared with DEV workflow (manual handoff; not auto-consumed)
### All Tests Verified to Fail (RED Phase)
- [ ] Full test suite run locally before finalizing
- [ ] All tests fail as expected (RED phase confirmed)
- [ ] No tests passing before implementation (if passing, test is invalid)
- [ ] Failure messages documented in ATDD checklist
- [ ] Failures are due to missing implementation, not test bugs
- [ ] Test run output captured for reference
### Summary Provided
- [ ] Summary includes:
- [ ] Story ID
- [ ] Primary test level
- [ ] Test counts (E2E, API, Component)
- [ ] Test file paths
- [ ] Factory count
- [ ] Fixture count
- [ ] Mock requirements count
- [ ] data-testid count
- [ ] Implementation task count
- [ ] Estimated effort
- [ ] Next steps for DEV team
- [ ] Output file path
- [ ] Knowledge base references applied
---
## Quality Checks
### Test Design Quality
- [ ] Tests are readable (clear Given-When-Then structure)
- [ ] Tests are maintainable (use factories and fixtures, not hardcoded data)
- [ ] Tests are isolated (no shared state between tests)
- [ ] Tests are deterministic (no race conditions or flaky patterns)
- [ ] Tests are atomic (one assertion per test)
- [ ] Tests are fast (no unnecessary waits or delays)
### Knowledge Base Integration
- [ ] fixture-architecture.md patterns applied to all fixtures
- [ ] data-factories.md patterns applied to all factories
- [ ] network-first.md patterns applied to E2E tests with network requests
- [ ] component-tdd.md patterns applied to component tests
- [ ] test-quality.md principles applied to all test design
### Code Quality
- [ ] All TypeScript types are correct and complete
- [ ] No linting errors in generated test files
- [ ] Consistent naming conventions followed
- [ ] Imports are organized and correct
- [ ] Code follows project style guide
---
## Integration Points
### With DEV Agent
- [ ] ATDD checklist provides clear implementation guidance
- [ ] Implementation tasks are granular and actionable
- [ ] data-testid requirements are complete and clear
- [ ] Mock requirements include all necessary details
- [ ] Execution commands work correctly
### With Story Workflow
- [ ] Story ID correctly referenced in output files
- [ ] Acceptance criteria from story accurately reflected in tests
- [ ] Technical constraints from story considered in test design
### With Framework Workflow
- [ ] Test framework configuration correctly detected and used
- [ ] Directory structure matches framework setup
- [ ] Fixtures and helpers follow established patterns
- [ ] Naming conventions consistent with framework standards
### With test-design Workflow (If Available)
- [ ] P0 scenarios from test-design prioritized in ATDD
- [ ] Risk assessment from test-design considered in test coverage
- [ ] Coverage strategy from test-design aligned with ATDD tests
---
## Completion Criteria
All of the following must be true before marking this workflow as complete:
- [ ] **Story acceptance criteria analyzed** and mapped to appropriate test levels
- [ ] **Failing tests created** at all appropriate levels (E2E, API, Component)
- [ ] **Given-When-Then format** used consistently across all tests
- [ ] **RED phase verified** by local test run (all tests failing as expected)
- [ ] **Network-first pattern** applied to E2E tests with network requests
- [ ] **Data factories created** using faker (no hardcoded test data)
- [ ] **Fixtures created** with auto-cleanup in teardown
- [ ] **Mock requirements documented** for external services
- [ ] **data-testid attributes listed** for DEV team
- [ ] **Implementation checklist created** mapping tests to code tasks
- [ ] **Red-green-refactor workflow documented** in ATDD checklist
- [ ] **Execution commands provided** and verified to work
- [ ] **ATDD checklist document created** and saved to correct location
- [ ] **Output file formatted correctly** using template structure
- [ ] **Knowledge base references applied** and documented in summary
- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data)
---
## Common Issues and Resolutions
### Issue: Tests pass before implementation
**Problem:** A test passes even though no implementation code exists yet.
**Resolution:**
- Review test to ensure it's testing actual behavior, not mocked/stubbed behavior
- Check if test is accidentally using existing functionality
- Verify test assertions are correct and meaningful
- Rewrite test to fail until implementation is complete
### Issue: Network-first pattern not applied
**Problem:** Route interception happens after navigation, causing race conditions.
**Resolution:**
- Move `await page.route()` calls BEFORE `await page.goto()`
- Review `network-first.md` knowledge fragment
- Update all E2E tests to follow network-first pattern
### Issue: Hardcoded test data in tests
**Problem:** Tests use hardcoded strings/numbers instead of factories.
**Resolution:**
- Replace all hardcoded data with factory function calls
- Use `faker` for all random data generation
- Update data-factories to support all required test scenarios
### Issue: Fixtures missing auto-cleanup
**Problem:** Fixtures create data but don't clean it up in teardown.
**Resolution:**
- Add cleanup logic after `await use(data)` in fixture
- Call deletion/cleanup functions in teardown
- Verify cleanup works by checking database/storage after test run
### Issue: Tests have multiple assertions
**Problem:** Tests verify multiple behaviors in single test (not atomic).
**Resolution:**
- Split into separate tests (one assertion per test)
- Each test should verify exactly one behavior
- Use descriptive test names to clarify what each test verifies
### Issue: Tests depend on execution order
**Problem:** Tests fail when run in isolation or different order.
**Resolution:**
- Remove shared state between tests
- Each test should create its own test data
- Use fixtures for consistent setup across tests
- Verify tests can run with `.only` flag
---
## Notes for TEA Agent
- **Preflight halt is critical:** Do not proceed if story has no acceptance criteria or framework is missing
- **RED phase verification is mandatory:** Tests must fail before sharing with DEV team
- **Network-first pattern:** Route interception BEFORE navigation prevents race conditions
- **One assertion per test:** Atomic tests provide clear failure diagnosis
- **Auto-cleanup is non-negotiable:** Every fixture must clean up data in teardown
- **Use knowledge base:** Load relevant fragments (fixture-architecture, data-factories, network-first, component-tdd, test-quality) for guidance
- **Share with DEV agent:** ATDD checklist provides implementation roadmap from red to green

View File

@@ -1,806 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Acceptance Test-Driven Development (ATDD)
**Workflow ID**: `_bmad/bmm/testarch/atdd`
**Version**: 4.0 (BMad v6)
---
## Overview
Generates failing acceptance tests BEFORE implementation following TDD's red-green-refactor cycle. This workflow creates comprehensive test coverage at appropriate levels (E2E, API, Component) with supporting infrastructure (fixtures, factories, mocks) and provides an implementation checklist to guide development.
**Core Principle**: Tests fail first (red phase), then guide development to green, then enable confident refactoring.
---
## Preflight Requirements
**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
- ✅ Story approved with clear acceptance criteria
- ✅ Development sandbox/environment ready
- ✅ Framework scaffolding exists (run `framework` workflow if missing)
- ✅ Test framework configuration available (playwright.config.ts or cypress.config.ts)
---
## Step 1: Load Story Context and Requirements
### Actions
1. **Read Story Markdown**
- Load story file from `{story_file}` variable
- Extract acceptance criteria (all testable requirements)
- Identify affected systems and components
- Note any technical constraints or dependencies
2. **Load Framework Configuration**
- Read framework config (playwright.config.ts or cypress.config.ts)
- Identify test directory structure
- Check existing fixture patterns
- Note test runner capabilities
3. **Load Existing Test Patterns**
- Search `{test_dir}` for similar tests
- Identify reusable fixtures and helpers
- Check data factory patterns
- Note naming conventions
4. **Check Playwright Utils Flag**
Read `{config_source}` and check `config.tea_use_playwright_utils`.
5. **Load Knowledge Base Fragments**
**Critical:** Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` to load:
**Core Patterns (Always load):**
- `data-factories.md` - Factory patterns using faker (override patterns, nested factories, API seeding, 498 lines, 5 examples)
- `component-tdd.md` - Component test strategies (red-green-refactor, provider isolation, accessibility, visual regression, 480 lines, 4 examples)
- `test-quality.md` - Test design principles (deterministic tests, isolated with cleanup, explicit assertions, length limits, execution time optimization, 658 lines, 5 examples)
- `test-healing-patterns.md` - Common failure patterns and healing strategies (stale selectors, race conditions, dynamic data, network errors, hard waits, 648 lines, 5 examples)
- `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, dynamic patterns, anti-patterns, 541 lines, 4 examples)
- `timing-debugging.md` - Race condition prevention and async debugging (network-first, deterministic waiting, anti-patterns, 370 lines, 3 examples)
**If `config.tea_use_playwright_utils: true` (All Utilities):**
- `overview.md` - Playwright utils for ATDD patterns
- `api-request.md` - API test examples with schema validation
- `network-recorder.md` - HAR record/playback for UI acceptance tests
- `auth-session.md` - Auth setup for acceptance tests
- `intercept-network-call.md` - Network interception in ATDD scenarios
- `recurse.md` - Polling for async acceptance criteria
- `log.md` - Logging in ATDD tests
- `file-utils.md` - File download validation in acceptance tests
- `network-error-monitor.md` - Catch silent failures in ATDD
- `fixtures-composition.md` - Composing utilities for ATDD
**If `config.tea_use_playwright_utils: false`:**
- `fixture-architecture.md` - Test fixture patterns with auto-cleanup (pure function → fixture → mergeTests composition, 406 lines, 5 examples)
- `network-first.md` - Route interception patterns (intercept before navigate, HAR capture, deterministic waiting, 489 lines, 5 examples)
**Halt Condition:** If story has no acceptance criteria or framework is missing, HALT with message: "ATDD requires clear acceptance criteria and test framework setup"
---
## Step 1.5: Generation Mode Selection (NEW - Phase 2.5)
### Actions
1. **Detect Generation Mode**
Determine mode based on scenario complexity:
**AI Generation Mode (DEFAULT)**:
- Clear acceptance criteria with standard patterns
- Uses: AI-generated tests from requirements
- Appropriate for: CRUD, auth, navigation, API tests
- Fastest approach
**Recording Mode (OPTIONAL - Complex UI)**:
- Complex UI interactions (drag-drop, wizards, multi-page flows)
- Uses: Interactive test recording with Playwright MCP
- Appropriate for: Visual workflows, unclear requirements
- Only if config.tea_use_mcp_enhancements is true AND MCP available
2. **AI Generation Mode (DEFAULT - Continue to Step 2)**
For standard scenarios:
- Continue with existing workflow (Step 2: Select Test Levels and Strategy)
- AI generates tests based on acceptance criteria from Step 1
- Use knowledge base patterns for test structure
3. **Recording Mode (OPTIONAL - Complex UI Only)**
For complex UI scenarios AND config.tea_use_mcp_enhancements is true:
**A. Check MCP Availability**
If Playwright MCP tools are available in your IDE:
- Use MCP recording mode (Step 3.B)
If MCP unavailable:
- Fallback to AI generation mode (silent, automatic)
- Continue to Step 2
**B. Interactive Test Recording (MCP-Based)**
Use Playwright MCP test-generator tools:
**Setup:**
```
1. Use generator_setup_page to initialize recording session
2. Navigate to application starting URL (from story context)
3. Ready to record user interactions
```
**Recording Process (Per Acceptance Criterion):**
```
4. Read acceptance criterion from story
5. Manually execute test scenario using browser_* tools:
- browser_navigate: Navigate to pages
- browser_click: Click buttons, links, elements
- browser_type: Fill form fields
- browser_select: Select dropdown options
- browser_check: Check/uncheck checkboxes
6. Add verification steps using browser_verify_* tools:
- browser_verify_text: Verify text content
- browser_verify_visible: Verify element visibility
- browser_verify_url: Verify URL navigation
7. Capture interaction log with generator_read_log
8. Generate test file with generator_write_test
9. Repeat for next acceptance criterion
```
**Post-Recording Enhancement:**
```
10. Review generated test code
11. Enhance with knowledge base patterns:
- Add Given-When-Then comments
- Replace recorded selectors with data-testid (if needed)
- Add network-first interception (from network-first.md)
- Add fixtures for auth/data setup (from fixture-architecture.md)
- Use factories for test data (from data-factories.md)
12. Verify tests fail (missing implementation)
13. Continue to Step 4 (Build Data Infrastructure)
```
**When to Use Recording Mode:**
- ✅ Complex UI interactions (drag-drop, multi-step forms, wizards)
- ✅ Visual workflows (modals, dialogs, animations)
- ✅ Unclear requirements (exploratory, discovering expected behavior)
- ✅ Multi-page flows (checkout, registration, onboarding)
- ❌ NOT for simple CRUD (AI generation faster)
- ❌ NOT for API-only tests (no UI to record)
**When to Use AI Generation (Default):**
- ✅ Clear acceptance criteria available
- ✅ Standard patterns (login, CRUD, navigation)
- ✅ Need many tests quickly
- ✅ API/backend tests (no UI interaction)
4. **Proceed to Test Level Selection**
After mode selection:
- AI Generation: Continue to Step 2 (Select Test Levels and Strategy)
- Recording: Skip to Step 4 (Build Data Infrastructure) - tests already generated
---
## Step 2: Select Test Levels and Strategy
### Actions
1. **Analyze Acceptance Criteria**
For each acceptance criterion, determine:
- Does it require full user journey? → E2E test
- Does it test business logic/API contract? → API test
- Does it validate UI component behavior? → Component test
- Can it be unit tested? → Unit test
2. **Apply Test Level Selection Framework**
**Knowledge Base Reference**: `test-levels-framework.md`
**E2E (End-to-End)**:
- Critical user journeys (login, checkout, core workflow)
- Multi-system integration
- User-facing acceptance criteria
- **Characteristics**: High confidence, slow execution, brittle
**API (Integration)**:
- Business logic validation
- Service contracts
- Data transformations
- **Characteristics**: Fast feedback, good balance, stable
**Component**:
- UI component behavior (buttons, forms, modals)
- Interaction testing
- Visual regression
- **Characteristics**: Fast, isolated, granular
**Unit**:
- Pure business logic
- Edge cases
- Error handling
- **Characteristics**: Fastest, most granular
3. **Avoid Duplicate Coverage**
Don't test same behavior at multiple levels unless necessary:
- Use E2E for critical happy path only
- Use API tests for complex business logic variations
- Use component tests for UI interaction edge cases
- Use unit tests for pure logic edge cases
4. **Prioritize Tests**
If test-design document exists, align with priority levels:
- P0 scenarios → Must cover in failing tests
- P1 scenarios → Should cover if time permits
- P2/P3 scenarios → Optional for this iteration
**Decision Point:** Set `primary_level` variable to main test level for this story (typically E2E or API)
---
## Step 3: Generate Failing Tests
### Actions
1. **Create Test File Structure**
```
tests/
├── e2e/
│ └── {feature-name}.spec.ts # E2E acceptance tests
├── api/
│ └── {feature-name}.api.spec.ts # API contract tests
├── component/
│ └── {ComponentName}.test.tsx # Component tests
└── support/
├── fixtures/ # Test fixtures
├── factories/ # Data factories
└── helpers/ # Utility functions
```
2. **Write Failing E2E Tests (If Applicable)**
**Use Given-When-Then format:**
```typescript
import { test, expect } from '@playwright/test';
test.describe('User Login', () => {
test('should display error for invalid credentials', async ({ page }) => {
// GIVEN: User is on login page
await page.goto('/login');
// WHEN: User submits invalid credentials
await page.fill('[data-testid="email-input"]', 'invalid@example.com');
await page.fill('[data-testid="password-input"]', 'wrongpassword');
await page.click('[data-testid="login-button"]');
// THEN: Error message is displayed
await expect(page.locator('[data-testid="error-message"]')).toHaveText('Invalid email or password');
});
});
```
**Critical patterns:**
- One assertion per test (atomic tests)
- Explicit waits (no hard waits/sleeps)
- Network-first approach (route interception before navigation)
- data-testid selectors for stability
- Clear Given-When-Then structure
3. **Apply Network-First Pattern**
**Knowledge Base Reference**: `network-first.md`
```typescript
test('should load user dashboard after login', async ({ page }) => {
// CRITICAL: Intercept routes BEFORE navigation
await page.route('**/api/user', (route) =>
route.fulfill({
status: 200,
body: JSON.stringify({ id: 1, name: 'Test User' }),
}),
);
// NOW navigate
await page.goto('/dashboard');
await expect(page.locator('[data-testid="user-name"]')).toHaveText('Test User');
});
```
4. **Write Failing API Tests (If Applicable)**
```typescript
import { test, expect } from '@playwright/test';
test.describe('User API', () => {
test('POST /api/users - should create new user', async ({ request }) => {
// GIVEN: Valid user data
const userData = {
email: 'newuser@example.com',
name: 'New User',
};
// WHEN: Creating user via API
const response = await request.post('/api/users', {
data: userData,
});
// THEN: User is created successfully
expect(response.status()).toBe(201);
const body = await response.json();
expect(body).toMatchObject({
email: userData.email,
name: userData.name,
id: expect.any(Number),
});
});
});
```
5. **Write Failing Component Tests (If Applicable)**
**Knowledge Base Reference**: `component-tdd.md`
```typescript
import { test, expect } from '@playwright/experimental-ct-react';
import { LoginForm } from './LoginForm';
test.describe('LoginForm Component', () => {
test('should disable submit button when fields are empty', async ({ mount }) => {
// GIVEN: LoginForm is mounted
const component = await mount(<LoginForm />);
// WHEN: Form is initially rendered
const submitButton = component.locator('button[type="submit"]');
// THEN: Submit button is disabled
await expect(submitButton).toBeDisabled();
});
});
```
6. **Verify Tests Fail Initially**
**Critical verification:**
- Run tests locally to confirm they fail
- Failure should be due to missing implementation, not test errors
- Failure messages should be clear and actionable
- All tests must be in RED phase before sharing with DEV
**Important:** Tests MUST fail initially. If a test passes before implementation, it's not a valid acceptance test.
---
## Step 4: Build Data Infrastructure
### Actions
1. **Create Data Factories**
**Knowledge Base Reference**: `data-factories.md`
```typescript
// tests/support/factories/user.factory.ts
import { faker } from '@faker-js/faker';
export const createUser = (overrides = {}) => ({
id: faker.number.int(),
email: faker.internet.email(),
name: faker.person.fullName(),
createdAt: faker.date.recent().toISOString(),
...overrides,
});
export const createUsers = (count: number) => Array.from({ length: count }, () => createUser());
```
**Factory principles:**
- Use faker for random data (no hardcoded values)
- Support overrides for specific scenarios
- Generate complete valid objects
- Include helper functions for bulk creation
2. **Create Test Fixtures**
**Knowledge Base Reference**: `fixture-architecture.md`
```typescript
// tests/support/fixtures/auth.fixture.ts
import { test as base } from '@playwright/test';
export const test = base.extend({
authenticatedUser: async ({ page }, use) => {
// Setup: Create and authenticate user
const user = await createUser();
await page.goto('/login');
await page.fill('[data-testid="email"]', user.email);
await page.fill('[data-testid="password"]', 'password123');
await page.click('[data-testid="login-button"]');
await page.waitForURL('/dashboard');
// Provide to test
await use(user);
// Cleanup: Delete user
await deleteUser(user.id);
},
});
```
**Fixture principles:**
- Auto-cleanup (always delete created data)
- Composable (fixtures can use other fixtures)
- Isolated (each test gets fresh data)
- Type-safe
3. **Document Mock Requirements**
If external services need mocking, document requirements:
```markdown
### Mock Requirements for DEV Team
**Payment Gateway Mock**:
- Endpoint: `POST /api/payments`
- Success response: `{ status: 'success', transactionId: '123' }`
- Failure response: `{ status: 'failed', error: 'Insufficient funds' }`
**Email Service Mock**:
- Should not send real emails in test environment
- Log email contents for verification
```
4. **List Required data-testid Attributes**
```markdown
### Required data-testid Attributes
**Login Page**:
- `email-input` - Email input field
- `password-input` - Password input field
- `login-button` - Submit button
- `error-message` - Error message container
**Dashboard Page**:
- `user-name` - User name display
- `logout-button` - Logout button
```
---
## Step 5: Create Implementation Checklist
### Actions
1. **Map Tests to Implementation Tasks**
For each failing test, create corresponding implementation task:
```markdown
## Implementation Checklist
### Epic X - User Authentication
#### Test: User Login with Valid Credentials
- [ ] Create `/login` route
- [ ] Implement login form component
- [ ] Add email/password validation
- [ ] Integrate authentication API
- [ ] Add `data-testid` attributes: `email-input`, `password-input`, `login-button`
- [ ] Implement error handling
- [ ] Run test: `npm run test:e2e -- login.spec.ts`
- [ ] ✅ Test passes (green phase)
#### Test: Display Error for Invalid Credentials
- [ ] Add error state management
- [ ] Display error message UI
- [ ] Add `data-testid="error-message"`
- [ ] Run test: `npm run test:e2e -- login.spec.ts`
- [ ] ✅ Test passes (green phase)
```
2. **Include Red-Green-Refactor Guidance**
```markdown
## Red-Green-Refactor Workflow
**RED Phase** (Complete):
- ✅ All tests written and failing
- ✅ Fixtures and factories created
- ✅ Mock requirements documented
**GREEN Phase** (DEV Team):
1. Pick one failing test
2. Implement minimal code to make it pass
3. Run test to verify green
4. Move to next test
5. Repeat until all tests pass
**REFACTOR Phase** (DEV Team):
1. All tests passing (green)
2. Improve code quality
3. Extract duplications
4. Optimize performance
5. Ensure tests still pass
```
3. **Add Execution Commands**
````markdown
## Running Tests
```bash
# Run all failing tests
npm run test:e2e
# Run specific test file
npm run test:e2e -- login.spec.ts
# Run tests in headed mode (see browser)
npm run test:e2e -- --headed
# Debug specific test
npm run test:e2e -- login.spec.ts --debug
```
````
```
```
---
## Step 6: Generate Deliverables
### Actions
1. **Create ATDD Checklist Document**
Use template structure at `{installed_path}/atdd-checklist-template.md`:
- Story summary
- Acceptance criteria breakdown
- Test files created (with paths)
- Data factories created
- Fixtures created
- Mock requirements
- Required data-testid attributes
- Implementation checklist
- Red-green-refactor workflow
- Execution commands
2. **Verify All Tests Fail**
Before finalizing:
- Run full test suite locally
- Confirm all tests in RED phase
- Document expected failure messages
- Ensure failures are due to missing implementation, not test bugs
3. **Write to Output File**
Save to `{output_folder}/atdd-checklist-{story_id}.md`
---
## Important Notes
### Red-Green-Refactor Cycle
**RED Phase** (TEA responsibility):
- Write failing tests first
- Tests define expected behavior
- Tests must fail for right reason (missing implementation)
**GREEN Phase** (DEV responsibility):
- Implement minimal code to pass tests
- One test at a time
- Don't over-engineer
**REFACTOR Phase** (DEV responsibility):
- Improve code quality with confidence
- Tests provide safety net
- Extract duplications, optimize
### Given-When-Then Structure
**GIVEN** (Setup):
- Arrange test preconditions
- Create necessary data
- Navigate to starting point
**WHEN** (Action):
- Execute the behavior being tested
- Single action per test
**THEN** (Assertion):
- Verify expected outcome
- One assertion per test (atomic)
### Network-First Testing
**Critical pattern:**
```typescript
// ✅ CORRECT: Intercept BEFORE navigation
await page.route('**/api/data', handler);
await page.goto('/page');
// ❌ WRONG: Navigate then intercept (race condition)
await page.goto('/page');
await page.route('**/api/data', handler); // Too late!
```
### Data Factory Best Practices
**Use faker for all test data:**
```typescript
// ✅ CORRECT: Random data
email: faker.internet.email();
// ❌ WRONG: Hardcoded data (collisions, maintenance burden)
email: 'test@example.com';
```
**Auto-cleanup principle:**
- Every factory that creates data must provide cleanup
- Fixtures automatically cleanup in teardown
- No manual cleanup in test code
### One Assertion Per Test
**Atomic test design:**
```typescript
// ✅ CORRECT: One assertion
test('should display user name', async ({ page }) => {
await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
});
// ❌ WRONG: Multiple assertions (not atomic)
test('should display user info', async ({ page }) => {
await expect(page.locator('[data-testid="user-name"]')).toHaveText('John');
await expect(page.locator('[data-testid="user-email"]')).toHaveText('john@example.com');
});
```
**Why?** If second assertion fails, you don't know if first is still valid.
### Component Test Strategy
**When to use component tests:**
- Complex UI interactions (drag-drop, keyboard nav)
- Form validation logic
- State management within component
- Visual edge cases
**When NOT to use:**
- Simple rendering (snapshot tests are sufficient)
- Integration with backend (use E2E or API tests)
- Full user journeys (use E2E tests)
### Knowledge Base Integration
**Core Fragments (Auto-loaded in Step 1):**
- `fixture-architecture.md` - Pure function → fixture → mergeTests patterns (406 lines, 5 examples)
- `data-factories.md` - Factory patterns with faker, overrides, API seeding (498 lines, 5 examples)
- `component-tdd.md` - Red-green-refactor, provider isolation, accessibility, visual regression (480 lines, 4 examples)
- `network-first.md` - Intercept before navigate, HAR capture, deterministic waiting (489 lines, 5 examples)
- `test-quality.md` - Deterministic tests, cleanup, explicit assertions, length/time limits (658 lines, 5 examples)
- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
- `selector-resilience.md` - Selector hierarchy (data-testid > ARIA > text > CSS), dynamic patterns, anti-patterns (541 lines, 4 examples)
- `timing-debugging.md` - Race condition prevention, deterministic waiting, async debugging (370 lines, 3 examples)
**Reference for Test Level Selection:**
- `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework (467 lines, 4 examples)
**Manual Reference (Optional):**
- Use `tea-index.csv` to find additional specialized fragments as needed
---
## Output Summary
After completing this workflow, provide a summary:
```markdown
## ATDD Complete - Tests in RED Phase
**Story**: {story_id}
**Primary Test Level**: {primary_level}
**Failing Tests Created**:
- E2E tests: {e2e_count} tests in {e2e_files}
- API tests: {api_count} tests in {api_files}
- Component tests: {component_count} tests in {component_files}
**Supporting Infrastructure**:
- Data factories: {factory_count} factories created
- Fixtures: {fixture_count} fixtures with auto-cleanup
- Mock requirements: {mock_count} services documented
**Implementation Checklist**:
- Total tasks: {task_count}
- Estimated effort: {effort_estimate} hours
**Required data-testid Attributes**: {data_testid_count} attributes documented
**Next Steps for DEV Team**:
1. Run failing tests: `npm run test:e2e`
2. Review implementation checklist
3. Implement one test at a time (RED → GREEN)
4. Refactor with confidence (tests provide safety net)
5. Share progress in daily standup
**Output File**: {output_file}
**Manual Handoff**: Share `{output_file}` and failing tests with the dev workflow (not auto-consumed).
**Knowledge Base References Applied**:
- Fixture architecture patterns
- Data factory patterns with faker
- Network-first route interception
- Component TDD strategies
- Test quality principles
```
---
## Validation
After completing all steps, verify:
- [ ] Story acceptance criteria analyzed and mapped to tests
- [ ] Appropriate test levels selected (E2E, API, Component)
- [ ] All tests written in Given-When-Then format
- [ ] All tests fail initially (RED phase verified)
- [ ] Network-first pattern applied (route interception before navigation)
- [ ] Data factories created with faker
- [ ] Fixtures created with auto-cleanup
- [ ] Mock requirements documented for DEV team
- [ ] Required data-testid attributes listed
- [ ] Implementation checklist created with clear tasks
- [ ] Red-green-refactor workflow documented
- [ ] Execution commands provided
- [ ] Output file created and formatted correctly
Refer to `checklist.md` for comprehensive validation criteria.

View File

@@ -1,45 +0,0 @@
# Test Architect workflow: atdd
name: testarch-atdd
description: "Generate failing acceptance tests before implementation using TDD red-green-refactor cycle"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/atdd"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: "{installed_path}/atdd-checklist-template.md"
# Variables and inputs
variables:
test_dir: "{project-root}/tests" # Root test directory
# Output configuration
default_output_file: "{output_folder}/atdd-checklist-{story_id}.md"
# Required tools
required_tools:
- read_file # Read story markdown, framework config
- write_file # Create test files, checklist, factory stubs
- create_directory # Create test directories
- list_files # Find existing fixtures and helpers
- search_repo # Search for similar test patterns
tags:
- qa
- atdd
- test-architect
- tdd
- red-green-refactor
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true

View File

@@ -1,582 +0,0 @@
# Automate Workflow Validation Checklist
Use this checklist to validate that the automate workflow has been executed correctly and all deliverables meet quality standards.
## Prerequisites
Before starting this workflow, verify:
- [ ] Framework scaffolding configured (playwright.config.ts or cypress.config.ts exists)
- [ ] Test directory structure exists (tests/ folder with subdirectories)
- [ ] Package.json has test framework dependencies installed
**Halt only if:** Framework scaffolding is completely missing (run `framework` workflow first)
**Note:** BMad artifacts (story, tech-spec, PRD) are OPTIONAL - workflow can run without them
**Note:** `automate` generates tests; it does not run `*atdd` or `*test-review`. If ATDD outputs exist, use them as input and avoid duplicate coverage.
---
## Step 1: Execution Mode Determination and Context Loading
### Mode Detection
- [ ] Execution mode correctly determined:
- [ ] BMad-Integrated Mode (story_file variable set) OR
- [ ] Standalone Mode (target_feature or target_files set) OR
- [ ] Auto-discover Mode (no targets specified)
### BMad Artifacts (If Available - OPTIONAL)
- [ ] Story markdown loaded (if `{story_file}` provided)
- [ ] Acceptance criteria extracted from story (if available)
- [ ] Tech-spec.md loaded (if `{use_tech_spec}` true and file exists)
- [ ] Test-design.md loaded (if `{use_test_design}` true and file exists)
- [ ] PRD.md loaded (if `{use_prd}` true and file exists)
- [ ] **Note**: Absence of BMad artifacts does NOT halt workflow
### Framework Configuration
- [ ] Test framework config loaded (playwright.config.ts or cypress.config.ts)
- [ ] Test directory structure identified from `{test_dir}`
- [ ] Existing test patterns reviewed
- [ ] Test runner capabilities noted (parallel execution, fixtures, etc.)
### Coverage Analysis
- [ ] Existing test files searched in `{test_dir}` (if `{analyze_coverage}` true)
- [ ] Tested features vs untested features identified
- [ ] Coverage gaps mapped (tests to source files)
- [ ] Existing fixture and factory patterns checked
### Knowledge Base Fragments Loaded
- [ ] `test-levels-framework.md` - Test level selection
- [ ] `test-priorities.md` - Priority classification (P0-P3)
- [ ] `fixture-architecture.md` - Fixture patterns with auto-cleanup
- [ ] `data-factories.md` - Factory patterns using faker
- [ ] `selective-testing.md` - Targeted test execution strategies
- [ ] `ci-burn-in.md` - Flaky test detection patterns
- [ ] `test-quality.md` - Test design principles
---
## Step 2: Automation Targets Identification
### Target Determination
**BMad-Integrated Mode (if story available):**
- [ ] Acceptance criteria mapped to test scenarios
- [ ] Features implemented in story identified
- [ ] Existing ATDD tests checked (if any)
- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
**Standalone Mode (if no story):**
- [ ] Specific feature analyzed (if `{target_feature}` specified)
- [ ] Specific files analyzed (if `{target_files}` specified)
- [ ] Features auto-discovered (if `{auto_discover_features}` true)
- [ ] Features prioritized by:
- [ ] No test coverage (highest priority)
- [ ] Complex business logic
- [ ] External integrations (API, database, auth)
- [ ] Critical user paths (login, checkout, etc.)
### Test Level Selection
- [ ] Test level selection framework applied (from `test-levels-framework.md`)
- [ ] E2E tests identified: Critical user journeys, multi-system integration
- [ ] API tests identified: Business logic, service contracts, data transformations
- [ ] Component tests identified: UI behavior, interactions, state management
- [ ] Unit tests identified: Pure logic, edge cases, error handling
### Duplicate Coverage Avoidance
- [ ] Same behavior NOT tested at multiple levels unnecessarily
- [ ] E2E used for critical happy path only
- [ ] API tests used for business logic variations
- [ ] Component tests used for UI interaction edge cases
- [ ] Unit tests used for pure logic edge cases
### Priority Assignment
- [ ] Test priorities assigned using `test-priorities.md` framework
- [ ] P0 tests: Critical paths, security-critical, data integrity
- [ ] P1 tests: Important features, integration points, error handling
- [ ] P2 tests: Edge cases, less-critical variations, performance
- [ ] P3 tests: Nice-to-have, rarely-used features, exploratory
- [ ] Priority variables respected:
- [ ] `{include_p0}` = true (always include)
- [ ] `{include_p1}` = true (high priority)
- [ ] `{include_p2}` = true (medium priority)
- [ ] `{include_p3}` = false (low priority, skip by default)
### Coverage Plan Created
- [ ] Test coverage plan documented
- [ ] What will be tested at each level listed
- [ ] Priorities assigned to each test
- [ ] Coverage strategy clear (critical-paths, comprehensive, or selective)
---
## Step 3: Test Infrastructure Generated
### Fixture Architecture
- [ ] Existing fixtures checked in `tests/support/fixtures/`
- [ ] Fixture architecture created/enhanced (if `{generate_fixtures}` true)
- [ ] All fixtures use Playwright's `test.extend()` pattern
- [ ] All fixtures have auto-cleanup in teardown
- [ ] Common fixtures created/enhanced:
- [ ] authenticatedUser (with auto-delete)
- [ ] apiRequest (authenticated client)
- [ ] mockNetwork (external service mocking)
- [ ] testDatabase (with auto-cleanup)
### Data Factories
- [ ] Existing factories checked in `tests/support/factories/`
- [ ] Factory architecture created/enhanced (if `{generate_factories}` true)
- [ ] All factories use `@faker-js/faker` for random data (no hardcoded values)
- [ ] All factories support overrides for specific scenarios
- [ ] Common factories created/enhanced:
- [ ] User factory (email, password, name, role)
- [ ] Product factory (name, price, SKU)
- [ ] Order factory (items, total, status)
- [ ] Cleanup helpers provided (e.g., deleteUser(), deleteProduct())
### Helper Utilities
- [ ] Existing helpers checked in `tests/support/helpers/` (if `{update_helpers}` true)
- [ ] Common utilities created/enhanced:
- [ ] waitFor (polling for complex conditions)
- [ ] retry (retry helper for flaky operations)
- [ ] testData (test data generation)
- [ ] assertions (custom assertion helpers)
---
## Step 4: Test Files Generated
### Test File Structure
- [ ] Test files organized correctly:
- [ ] `tests/e2e/` for E2E tests
- [ ] `tests/api/` for API tests
- [ ] `tests/component/` for component tests
- [ ] `tests/unit/` for unit tests
- [ ] `tests/support/` for fixtures/factories/helpers
### E2E Tests (If Applicable)
- [ ] E2E test files created in `tests/e2e/`
- [ ] All tests follow Given-When-Then format
- [ ] All tests have priority tags ([P0], [P1], [P2], [P3]) in test name
- [ ] All tests use data-testid selectors (not CSS classes)
- [ ] One assertion per test (atomic design)
- [ ] No hard waits or sleeps (explicit waits only)
- [ ] Network-first pattern applied (route interception BEFORE navigation)
- [ ] Clear Given-When-Then comments in test code
### API Tests (If Applicable)
- [ ] API test files created in `tests/api/`
- [ ] All tests follow Given-When-Then format
- [ ] All tests have priority tags in test name
- [ ] API contracts validated (request/response structure)
- [ ] HTTP status codes verified
- [ ] Response body validation includes required fields
- [ ] Error cases tested (400, 401, 403, 404, 500)
- [ ] JWT token format validated (if auth tests)
### Component Tests (If Applicable)
- [ ] Component test files created in `tests/component/`
- [ ] All tests follow Given-When-Then format
- [ ] All tests have priority tags in test name
- [ ] Component mounting works correctly
- [ ] Interaction testing covers user actions (click, hover, keyboard)
- [ ] State management validated
- [ ] Props and events tested
### Unit Tests (If Applicable)
- [ ] Unit test files created in `tests/unit/`
- [ ] All tests follow Given-When-Then format
- [ ] All tests have priority tags in test name
- [ ] Pure logic tested (no dependencies)
- [ ] Edge cases covered
- [ ] Error handling tested
### Quality Standards Enforced
- [ ] All tests use Given-When-Then format with clear comments
- [ ] All tests have descriptive names with priority tags
- [ ] No duplicate tests (same behavior tested multiple times)
- [ ] No flaky patterns (race conditions, timing issues)
- [ ] No test interdependencies (tests can run in any order)
- [ ] Tests are deterministic (same input always produces same result)
- [ ] All tests use data-testid selectors (E2E tests)
- [ ] No hard waits: `await page.waitForTimeout()` (forbidden)
- [ ] No conditional flow: `if (await element.isVisible())` (forbidden)
- [ ] No try-catch for test logic (only for cleanup)
- [ ] No hardcoded test data (use factories with faker)
- [ ] No page object classes (tests are direct and simple)
- [ ] No shared state between tests
### Network-First Pattern Applied
- [ ] Route interception set up BEFORE navigation (E2E tests with network requests)
- [ ] `page.route()` called before `page.goto()` to prevent race conditions
- [ ] Network-first pattern verified in all E2E tests that make API calls
---
## Step 5: Test Validation and Healing (NEW - Phase 2.5)
### Healing Configuration
- [ ] Healing configuration checked:
- [ ] `{auto_validate}` setting noted (default: true)
- [ ] `{auto_heal_failures}` setting noted (default: false)
- [ ] `{max_healing_iterations}` setting noted (default: 3)
- [ ] `{use_mcp_healing}` setting noted (default: true)
### Healing Knowledge Fragments Loaded (If Healing Enabled)
- [ ] `test-healing-patterns.md` loaded (common failure patterns and fixes)
- [ ] `selector-resilience.md` loaded (selector refactoring guide)
- [ ] `timing-debugging.md` loaded (race condition fixes)
### Test Execution and Validation
- [ ] Generated tests executed (if `{auto_validate}` true)
- [ ] Test results captured:
- [ ] Total tests run
- [ ] Passing tests count
- [ ] Failing tests count
- [ ] Error messages and stack traces captured
### Healing Loop (If Enabled and Tests Failed)
- [ ] Healing loop entered (if `{auto_heal_failures}` true AND tests failed)
- [ ] For each failing test:
- [ ] Failure pattern identified (selector, timing, data, network, hard wait)
- [ ] Appropriate healing strategy applied:
- [ ] Stale selector → Replaced with data-testid or ARIA role
- [ ] Race condition → Added network-first interception or state waits
- [ ] Dynamic data → Replaced hardcoded values with regex/dynamic generation
- [ ] Network error → Added route mocking
- [ ] Hard wait → Replaced with event-based wait
- [ ] Healed test re-run to validate fix
- [ ] Iteration count tracked (max 3 attempts)
### Unfixable Tests Handling
- [ ] Tests that couldn't be healed after 3 iterations marked with `test.fixme()` (if `{mark_unhealable_as_fixme}` true)
- [ ] Detailed comment added to test.fixme() tests:
- [ ] What failure occurred
- [ ] What healing was attempted (3 iterations)
- [ ] Why healing failed
- [ ] Manual investigation steps needed
- [ ] Original test logic preserved in comments
### Healing Report Generated
- [ ] Healing report generated (if healing attempted)
- [ ] Report includes:
- [ ] Auto-heal enabled status
- [ ] Healing mode (MCP-assisted or Pattern-based)
- [ ] Iterations allowed (max_healing_iterations)
- [ ] Validation results (total, passing, failing)
- [ ] Successfully healed tests (count, file:line, fix applied)
- [ ] Unable to heal tests (count, file:line, reason)
- [ ] Healing patterns applied (selector fixes, timing fixes, data fixes)
- [ ] Knowledge base references used
---
## Step 6: Documentation and Scripts Updated
### Test README Updated
- [ ] `tests/README.md` created or updated (if `{update_readme}` true)
- [ ] Test suite structure overview included
- [ ] Test execution instructions provided (all, specific files, by priority)
- [ ] Fixture usage examples provided
- [ ] Factory usage examples provided
- [ ] Priority tagging convention explained ([P0], [P1], [P2], [P3])
- [ ] How to write new tests documented
- [ ] Common patterns documented
- [ ] Anti-patterns documented (what to avoid)
### package.json Scripts Updated
- [ ] package.json scripts added/updated (if `{update_package_scripts}` true)
- [ ] `test:e2e` script for all E2E tests
- [ ] `test:e2e:p0` script for P0 tests only
- [ ] `test:e2e:p1` script for P0 + P1 tests
- [ ] `test:api` script for API tests
- [ ] `test:component` script for component tests
- [ ] `test:unit` script for unit tests (if applicable)
### Test Suite Executed
- [ ] Test suite run locally (if `{run_tests_after_generation}` true)
- [ ] Test results captured (passing/failing counts)
- [ ] No flaky patterns detected (tests are deterministic)
- [ ] Setup requirements documented (if any)
- [ ] Known issues documented (if any)
---
## Step 6: Automation Summary Generated
### Automation Summary Document
- [ ] Output file created at `{output_summary}`
- [ ] Document includes execution mode (BMad-Integrated, Standalone, Auto-discover)
- [ ] Feature analysis included (source files, coverage gaps) - Standalone mode
- [ ] Tests created listed (E2E, API, Component, Unit) with counts and paths
- [ ] Infrastructure created listed (fixtures, factories, helpers)
- [ ] Test execution instructions provided
- [ ] Coverage analysis included:
- [ ] Total test count
- [ ] Priority breakdown (P0, P1, P2, P3 counts)
- [ ] Test level breakdown (E2E, API, Component, Unit counts)
- [ ] Coverage percentage (if calculated)
- [ ] Coverage status (acceptance criteria covered, gaps identified)
- [ ] Definition of Done checklist included
- [ ] Next steps provided
- [ ] Recommendations included (if Standalone mode)
### Summary Provided to User
- [ ] Concise summary output provided
- [ ] Total tests created across test levels
- [ ] Priority breakdown (P0, P1, P2, P3 counts)
- [ ] Infrastructure counts (fixtures, factories, helpers)
- [ ] Test execution command provided
- [ ] Output file path provided
- [ ] Next steps listed
---
## Quality Checks
### Test Design Quality
- [ ] Tests are readable (clear Given-When-Then structure)
- [ ] Tests are maintainable (use factories/fixtures, not hardcoded data)
- [ ] Tests are isolated (no shared state between tests)
- [ ] Tests are deterministic (no race conditions or flaky patterns)
- [ ] Tests are atomic (one assertion per test)
- [ ] Tests are fast (no unnecessary waits or delays)
- [ ] Tests are lean (files under {max_file_lines} lines)
### Knowledge Base Integration
- [ ] Test level selection framework applied (from `test-levels-framework.md`)
- [ ] Priority classification applied (from `test-priorities.md`)
- [ ] Fixture architecture patterns applied (from `fixture-architecture.md`)
- [ ] Data factory patterns applied (from `data-factories.md`)
- [ ] Selective testing strategies considered (from `selective-testing.md`)
- [ ] Flaky test detection patterns considered (from `ci-burn-in.md`)
- [ ] Test quality principles applied (from `test-quality.md`)
### Code Quality
- [ ] All TypeScript types are correct and complete
- [ ] No linting errors in generated test files
- [ ] Consistent naming conventions followed
- [ ] Imports are organized and correct
- [ ] Code follows project style guide
- [ ] No console.log or debug statements in test code
---
## Integration Points
### With Framework Workflow
- [ ] Test framework configuration detected and used
- [ ] Directory structure matches framework setup
- [ ] Fixtures and helpers follow established patterns
- [ ] Naming conventions consistent with framework standards
### With BMad Workflows (If Available - OPTIONAL)
**With Story Workflow:**
- [ ] Story ID correctly referenced in output (if story available)
- [ ] Acceptance criteria from story reflected in tests (if story available)
- [ ] Technical constraints from story considered (if story available)
**With test-design Workflow:**
- [ ] P0 scenarios from test-design prioritized (if test-design available)
- [ ] Risk assessment from test-design considered (if test-design available)
- [ ] Coverage strategy aligned with test-design (if test-design available)
**With atdd Workflow:**
- [ ] ATDD artifacts provided or located (manual handoff; `atdd` not auto-run)
- [ ] Existing ATDD tests checked (if story had ATDD workflow run)
- [ ] Expansion beyond ATDD planned (edge cases, negative paths)
- [ ] No duplicate coverage with ATDD tests
### With CI Pipeline
- [ ] Tests can run in CI environment
- [ ] Tests are parallelizable (no shared state)
- [ ] Tests have appropriate timeouts
- [ ] Tests clean up their data (no CI environment pollution)
---
## Completion Criteria
All of the following must be true before marking this workflow as complete:
- [ ] **Execution mode determined** (BMad-Integrated, Standalone, or Auto-discover)
- [ ] **Framework configuration loaded** and validated
- [ ] **Coverage analysis completed** (gaps identified if analyze_coverage true)
- [ ] **Automation targets identified** (what needs testing)
- [ ] **Test levels selected** appropriately (E2E, API, Component, Unit)
- [ ] **Duplicate coverage avoided** (same behavior not tested at multiple levels)
- [ ] **Test priorities assigned** (P0, P1, P2, P3)
- [ ] **Fixture architecture created/enhanced** with auto-cleanup
- [ ] **Data factories created/enhanced** using faker (no hardcoded data)
- [ ] **Helper utilities created/enhanced** (if needed)
- [ ] **Test files generated** at appropriate levels (E2E, API, Component, Unit)
- [ ] **Given-When-Then format used** consistently across all tests
- [ ] **Priority tags added** to all test names ([P0], [P1], [P2], [P3])
- [ ] **data-testid selectors used** in E2E tests (not CSS classes)
- [ ] **Network-first pattern applied** (route interception before navigation)
- [ ] **Quality standards enforced** (no hard waits, no flaky patterns, self-cleaning, deterministic)
- [ ] **Test README updated** with execution instructions and patterns
- [ ] **package.json scripts updated** with test execution commands
- [ ] **Test suite run locally** (if run_tests_after_generation true)
- [ ] **Tests validated** (if auto_validate enabled)
- [ ] **Failures healed** (if auto_heal_failures enabled and tests failed)
- [ ] **Healing report generated** (if healing attempted)
- [ ] **Unfixable tests marked** with test.fixme() and detailed comments (if any)
- [ ] **Automation summary created** and saved to correct location
- [ ] **Output file formatted correctly**
- [ ] **Knowledge base references applied** and documented (including healing fragments if used)
- [ ] **No test quality issues** (flaky patterns, race conditions, hardcoded data, page objects)
---
## Common Issues and Resolutions
### Issue: BMad artifacts not found
**Problem:** Story, tech-spec, or PRD files not found when variables are set.
**Resolution:**
- **automate does NOT require BMad artifacts** - they are OPTIONAL enhancements
- If files not found, switch to Standalone Mode automatically
- Analyze source code directly without BMad context
- Continue workflow without halting
### Issue: Framework configuration not found
**Problem:** No playwright.config.ts or cypress.config.ts found.
**Resolution:**
- **HALT workflow** - framework is required
- Message: "Framework scaffolding required. Run `bmad tea *framework` first."
- User must run framework workflow before automate
### Issue: No automation targets identified
**Problem:** Neither story, target_feature, nor target_files specified, and auto-discover finds nothing.
**Resolution:**
- Check if source_dir variable is correct
- Verify source code exists in project
- Ask user to specify target_feature or target_files explicitly
- Provide examples: `target_feature: "src/auth/"` or `target_files: "src/auth/login.ts,src/auth/session.ts"`
### Issue: Duplicate coverage detected
**Problem:** Same behavior tested at multiple levels (E2E + API + Component).
**Resolution:**
- Review test level selection framework (test-levels-framework.md)
- Use E2E for critical happy path ONLY
- Use API for business logic variations
- Use Component for UI edge cases
- Remove redundant tests that duplicate coverage
### Issue: Tests have hardcoded data
**Problem:** Tests use hardcoded email addresses, passwords, or other data.
**Resolution:**
- Replace all hardcoded data with factory function calls
- Use faker for all random data generation
- Update data-factories to support all required test scenarios
- Example: `createUser({ email: faker.internet.email() })`
### Issue: Tests are flaky
**Problem:** Tests fail intermittently, pass on retry.
**Resolution:**
- Remove all hard waits (`page.waitForTimeout()`)
- Use explicit waits (`page.waitForSelector()`)
- Apply network-first pattern (route interception before navigation)
- Remove conditional flow (`if (await element.isVisible())`)
- Ensure tests are deterministic (no race conditions)
- Run burn-in loop (10 iterations) to detect flakiness
### Issue: Fixtures don't clean up data
**Problem:** Test data persists after test run, causing test pollution.
**Resolution:**
- Ensure all fixtures have cleanup in teardown phase
- Cleanup happens AFTER `await use(data)`
- Call deletion/cleanup functions (deleteUser, deleteProduct, etc.)
- Verify cleanup works by checking database/storage after test run
### Issue: Tests too slow
**Problem:** Tests take longer than 90 seconds (max_test_duration).
**Resolution:**
- Remove unnecessary waits and delays
- Use parallel execution where possible
- Mock external services (don't make real API calls)
- Use API tests instead of E2E for business logic
- Optimize test data creation (use in-memory database, etc.)
---
## Notes for TEA Agent
- **automate is flexible:** Can work with or without BMad artifacts (story, tech-spec, PRD are OPTIONAL)
- **Standalone mode is powerful:** Analyze any codebase and generate tests independently
- **Auto-discover mode:** Scan codebase for features needing tests when no targets specified
- **Framework is the ONLY hard requirement:** HALT if framework config missing, otherwise proceed
- **Avoid duplicate coverage:** E2E for critical paths only, API/Component for variations
- **Priority tagging enables selective execution:** P0 tests run on every commit, P1 on PR, P2 nightly
- **Network-first pattern prevents race conditions:** Route interception BEFORE navigation
- **No page objects:** Keep tests simple, direct, and maintainable
- **Use knowledge base:** Load relevant fragments (test-levels, test-priorities, fixture-architecture, data-factories, healing patterns) for guidance
- **Deterministic tests only:** No hard waits, no conditional flow, no flaky patterns allowed
- **Optional healing:** auto_heal_failures disabled by default (opt-in for automatic test healing)
- **Graceful degradation:** Healing works without Playwright MCP (pattern-based fallback)
- **Unfixable tests handled:** Mark with test.fixme() and detailed comments (not silently broken)

File diff suppressed because it is too large Load Diff

View File

@@ -1,52 +0,0 @@
# Test Architect workflow: automate
name: testarch-automate
description: "Expand test automation coverage after implementation or analyze existing codebase to generate comprehensive test suite"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/automate"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: false
# Variables and inputs
variables:
# Execution mode and targeting
standalone_mode: true # Can work without BMad artifacts (true) or integrate with BMad (false)
coverage_target: "critical-paths" # critical-paths, comprehensive, selective
# Directory paths
test_dir: "{project-root}/tests" # Root test directory
source_dir: "{project-root}/src" # Source code directory
# Output configuration
default_output_file: "{output_folder}/automation-summary.md"
# Required tools
required_tools:
- read_file # Read source code, existing tests, BMad artifacts
- write_file # Create test files, fixtures, factories, summaries
- create_directory # Create test directories
- list_files # Discover features and existing tests
- search_repo # Find coverage gaps and patterns
- glob # Find test files and source files
tags:
- qa
- automation
- test-architect
- regression
- coverage
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true

View File

@@ -1,248 +0,0 @@
# CI/CD Pipeline Setup - Validation Checklist
## Prerequisites
- [ ] Git repository initialized (`.git/` exists)
- [ ] Git remote configured (`git remote -v` shows origin)
- [ ] Test framework configured (`playwright.config._` or `cypress.config._`)
- [ ] Local tests pass (`npm run test:e2e` succeeds)
- [ ] Team agrees on CI platform
- [ ] Access to CI platform settings (if updating)
Note: CI setup is typically a one-time task per repo and can be run any time after the test framework is configured.
## Process Steps
### Step 1: Preflight Checks
- [ ] Git repository validated
- [ ] Framework configuration detected
- [ ] Local test execution successful
- [ ] CI platform detected or selected
- [ ] Node version identified (.nvmrc or default)
- [ ] No blocking issues found
### Step 2: CI Pipeline Configuration
- [ ] CI configuration file created (`.github/workflows/test.yml` or `.gitlab-ci.yml`)
- [ ] File is syntactically valid (no YAML errors)
- [ ] Correct framework commands configured
- [ ] Node version matches project
- [ ] Test directory paths correct
### Step 3: Parallel Sharding
- [ ] Matrix strategy configured (4 shards default)
- [ ] Shard syntax correct for framework
- [ ] fail-fast set to false
- [ ] Shard count appropriate for test suite size
### Step 4: Burn-In Loop
- [ ] Burn-in job created
- [ ] 10 iterations configured
- [ ] Proper exit on failure (`|| exit 1`)
- [ ] Runs on appropriate triggers (PR, cron)
- [ ] Failure artifacts uploaded
### Step 5: Caching Configuration
- [ ] Dependency cache configured (npm/yarn)
- [ ] Cache key uses lockfile hash
- [ ] Browser cache configured (Playwright/Cypress)
- [ ] Restore-keys defined for fallback
- [ ] Cache paths correct for platform
### Step 6: Artifact Collection
- [ ] Artifacts upload on failure only
- [ ] Correct artifact paths (test-results/, traces/, etc.)
- [ ] Retention days set (30 default)
- [ ] Artifact names unique per shard
- [ ] No sensitive data in artifacts
### Step 7: Retry Logic
- [ ] Retry action/strategy configured
- [ ] Max attempts: 2-3
- [ ] Timeout appropriate (30 min)
- [ ] Retry only on transient errors
### Step 8: Helper Scripts
- [ ] `scripts/test-changed.sh` created
- [ ] `scripts/ci-local.sh` created
- [ ] `scripts/burn-in.sh` created (optional)
- [ ] Scripts are executable (`chmod +x`)
- [ ] Scripts use correct test commands
- [ ] Shebang present (`#!/bin/bash`)
### Step 9: Documentation
- [ ] `docs/ci.md` created with pipeline guide
- [ ] `docs/ci-secrets-checklist.md` created
- [ ] Required secrets documented
- [ ] Setup instructions clear
- [ ] Troubleshooting section included
- [ ] Badge URLs provided (optional)
## Output Validation
### Configuration Validation
- [ ] CI file loads without errors
- [ ] All paths resolve correctly
- [ ] No hardcoded values (use env vars)
- [ ] Triggers configured (push, pull_request, schedule)
- [ ] Platform-specific syntax correct
### Execution Validation
- [ ] First CI run triggered (push to remote)
- [ ] Pipeline starts without errors
- [ ] All jobs appear in CI dashboard
- [ ] Caching works (check logs for cache hit)
- [ ] Tests execute in parallel
- [ ] Artifacts collected on failure
### Performance Validation
- [ ] Lint stage: <2 minutes
- [ ] Test stage (per shard): <10 minutes
- [ ] Burn-in stage: <30 minutes
- [ ] Total pipeline: <45 minutes
- [ ] Cache reduces install time by 2-5 minutes
## Quality Checks
### Best Practices Compliance
- [ ] Burn-in loop follows production patterns
- [ ] Parallel sharding configured optimally
- [ ] Failure-only artifact collection
- [ ] Selective testing enabled (optional)
- [ ] Retry logic handles transient failures only
- [ ] No secrets in configuration files
### Knowledge Base Alignment
- [ ] Burn-in pattern matches `ci-burn-in.md`
- [ ] Selective testing matches `selective-testing.md`
- [ ] Artifact collection matches `visual-debugging.md`
- [ ] Test quality matches `test-quality.md`
### Security Checks
- [ ] No credentials in CI configuration
- [ ] Secrets use platform secret management
- [ ] Environment variables for sensitive data
- [ ] Artifact retention appropriate (not too long)
- [ ] No debug output exposing secrets
## Integration Points
### Status File Integration
- [ ] `bmm-workflow-status.md` exists
- [ ] CI setup logged in Quality & Testing Progress section
- [ ] Status updated with completion timestamp
- [ ] Platform and configuration noted
### Knowledge Base Integration
- [ ] Relevant knowledge fragments loaded
- [ ] Patterns applied from knowledge base
- [ ] Documentation references knowledge base
- [ ] Knowledge base references in README
### Workflow Dependencies
- [ ] `framework` workflow completed first
- [ ] Can proceed to `atdd` workflow after CI setup
- [ ] Can proceed to `automate` workflow
- [ ] CI integrates with `gate` workflow
## Completion Criteria
**All must be true:**
- [ ] All prerequisites met
- [ ] All process steps completed
- [ ] All output validations passed
- [ ] All quality checks passed
- [ ] All integration points verified
- [ ] First CI run successful
- [ ] Performance targets met
- [ ] Documentation complete
## Post-Workflow Actions
**User must complete:**
1. [ ] Commit CI configuration
2. [ ] Push to remote repository
3. [ ] Configure required secrets in CI platform
4. [ ] Open PR to trigger first CI run
5. [ ] Monitor and verify pipeline execution
6. [ ] Adjust parallelism if needed (based on actual run times)
7. [ ] Set up notifications (optional)
**Recommended next workflows:**
1. [ ] Run `atdd` workflow for test generation
2. [ ] Run `automate` workflow for coverage expansion
3. [ ] Run `gate` workflow for quality gates
## Rollback Procedure
If workflow fails:
1. [ ] Delete CI configuration file
2. [ ] Remove helper scripts directory
3. [ ] Remove documentation (docs/ci.md, etc.)
4. [ ] Clear CI platform secrets (if added)
5. [ ] Review error logs
6. [ ] Fix issues and retry workflow
## Notes
### Common Issues
**Issue**: CI file syntax errors
- **Solution**: Validate YAML syntax online or with linter
**Issue**: Tests fail in CI but pass locally
- **Solution**: Use `scripts/ci-local.sh` to mirror CI environment
**Issue**: Caching not working
- **Solution**: Check cache key formula, verify paths
**Issue**: Burn-in too slow
- **Solution**: Reduce iterations or run on cron only
### Platform-Specific
**GitHub Actions:**
- Secrets: Repository Settings Secrets and variables Actions
- Runners: Ubuntu latest recommended
- Concurrency limits: 20 jobs for free tier
**GitLab CI:**
- Variables: Project Settings CI/CD Variables
- Runners: Shared or project-specific
- Pipeline quota: 400 minutes/month free tier
---
**Checklist Complete**: Sign off when all items validated.
**Completed by:** {name}
**Date:** {date}
**Platform:** {GitHub Actions, GitLab CI, Other}
**Notes:** {notes}

View File

@@ -1,198 +0,0 @@
# GitHub Actions CI/CD Pipeline for Test Execution
# Generated by BMad TEA Agent - Test Architect Module
# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
name: Test Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
schedule:
# Weekly burn-in on Sundays at 2 AM UTC
- cron: "0 2 * * 0"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
# Lint stage - Code quality checks
lint:
name: Lint
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- uses: actions/checkout@v4
- name: Determine Node version
id: node-version
run: |
if [ -f .nvmrc ]; then
echo "value=$(cat .nvmrc)" >> "$GITHUB_OUTPUT"
echo "Using Node from .nvmrc"
else
echo "value=24" >> "$GITHUB_OUTPUT"
echo "Using default Node 24 (current LTS)"
fi
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ steps.node-version.outputs.value }}
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
# Test stage - Parallel execution with sharding
test:
name: Test (Shard ${{ matrix.shard }})
runs-on: ubuntu-latest
timeout-minutes: 30
needs: lint
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- name: Determine Node version
id: node-version
run: |
if [ -f .nvmrc ]; then
echo "value=$(cat .nvmrc)" >> "$GITHUB_OUTPUT"
echo "Using Node from .nvmrc"
else
echo "value=22" >> "$GITHUB_OUTPUT"
echo "Using default Node 22 (current LTS)"
fi
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ steps.node-version.outputs.value }}
cache: "npm"
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-playwright-
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run tests (shard ${{ matrix.shard }}/4)
run: npm run test:e2e -- --shard=${{ matrix.shard }}/4
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.shard }}
path: |
test-results/
playwright-report/
retention-days: 30
# Burn-in stage - Flaky test detection
burn-in:
name: Burn-In (Flaky Detection)
runs-on: ubuntu-latest
timeout-minutes: 60
needs: test
# Only run burn-in on PRs to main/develop or on schedule
if: github.event_name == 'pull_request' || github.event_name == 'schedule'
steps:
- uses: actions/checkout@v4
- name: Determine Node version
id: node-version
run: |
if [ -f .nvmrc ]; then
echo "value=$(cat .nvmrc)" >> "$GITHUB_OUTPUT"
echo "Using Node from .nvmrc"
else
echo "value=22" >> "$GITHUB_OUTPUT"
echo "Using default Node 22 (current LTS)"
fi
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ steps.node-version.outputs.value }}
cache: "npm"
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run burn-in loop (10 iterations)
run: |
echo "🔥 Starting burn-in loop - detecting flaky tests"
for i in {1..10}; do
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "🔥 Burn-in iteration $i/10"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
npm run test:e2e || exit 1
done
echo "✅ Burn-in complete - no flaky tests detected"
- name: Upload burn-in failure artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: burn-in-failures
path: |
test-results/
playwright-report/
retention-days: 30
# Report stage - Aggregate and publish results
report:
name: Test Report
runs-on: ubuntu-latest
needs: [test, burn-in]
if: always()
steps:
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: artifacts
- name: Generate summary
run: |
echo "## Test Execution Summary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "- **Status**: ${{ needs.test.result }}" >> $GITHUB_STEP_SUMMARY
echo "- **Burn-in**: ${{ needs.burn-in.result }}" >> $GITHUB_STEP_SUMMARY
echo "- **Shards**: 4" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
if [ "${{ needs.burn-in.result }}" == "failure" ]; then
echo "⚠️ **Flaky tests detected** - Review burn-in artifacts" >> $GITHUB_STEP_SUMMARY
fi

View File

@@ -1,149 +0,0 @@
# GitLab CI/CD Pipeline for Test Execution
# Generated by BMad TEA Agent - Test Architect Module
# Optimized for: Playwright/Cypress, Parallel Sharding, Burn-In Loop
stages:
- lint
- test
- burn-in
- report
variables:
# Disable git depth for accurate change detection
GIT_DEPTH: 0
# Use npm ci for faster, deterministic installs
npm_config_cache: "$CI_PROJECT_DIR/.npm"
# Playwright browser cache
PLAYWRIGHT_BROWSERS_PATH: "$CI_PROJECT_DIR/.cache/ms-playwright"
# Default Node version when .nvmrc is missing
DEFAULT_NODE_VERSION: "24"
# Caching configuration
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- .cache/ms-playwright/
- node_modules/
# Lint stage - Code quality checks
lint:
stage: lint
image: node:$DEFAULT_NODE_VERSION
before_script:
- |
NODE_VERSION=$(cat .nvmrc 2>/dev/null || echo "$DEFAULT_NODE_VERSION")
echo "Using Node $NODE_VERSION"
npm install -g n
n "$NODE_VERSION"
node -v
- npm ci
script:
- npm run lint
timeout: 5 minutes
# Test stage - Parallel execution with sharding
.test-template: &test-template
stage: test
image: node:$DEFAULT_NODE_VERSION
needs:
- lint
before_script:
- |
NODE_VERSION=$(cat .nvmrc 2>/dev/null || echo "$DEFAULT_NODE_VERSION")
echo "Using Node $NODE_VERSION"
npm install -g n
n "$NODE_VERSION"
node -v
- npm ci
- npx playwright install --with-deps chromium
artifacts:
when: on_failure
paths:
- test-results/
- playwright-report/
expire_in: 30 days
timeout: 30 minutes
test:shard-1:
<<: *test-template
script:
- npm run test:e2e -- --shard=1/4
test:shard-2:
<<: *test-template
script:
- npm run test:e2e -- --shard=2/4
test:shard-3:
<<: *test-template
script:
- npm run test:e2e -- --shard=3/4
test:shard-4:
<<: *test-template
script:
- npm run test:e2e -- --shard=4/4
# Burn-in stage - Flaky test detection
burn-in:
stage: burn-in
image: node:$DEFAULT_NODE_VERSION
needs:
- test:shard-1
- test:shard-2
- test:shard-3
- test:shard-4
# Only run burn-in on merge requests to main/develop or on schedule
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
- if: '$CI_PIPELINE_SOURCE == "schedule"'
before_script:
- |
NODE_VERSION=$(cat .nvmrc 2>/dev/null || echo "$DEFAULT_NODE_VERSION")
echo "Using Node $NODE_VERSION"
npm install -g n
n "$NODE_VERSION"
node -v
- npm ci
- npx playwright install --with-deps chromium
script:
- |
echo "🔥 Starting burn-in loop - detecting flaky tests"
for i in {1..10}; do
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "🔥 Burn-in iteration $i/10"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
npm run test:e2e || exit 1
done
echo "✅ Burn-in complete - no flaky tests detected"
artifacts:
when: on_failure
paths:
- test-results/
- playwright-report/
expire_in: 30 days
timeout: 60 minutes
# Report stage - Aggregate results
report:
stage: report
image: alpine:latest
needs:
- test:shard-1
- test:shard-2
- test:shard-3
- test:shard-4
- burn-in
when: always
script:
- |
echo "## Test Execution Summary"
echo ""
echo "- Pipeline: $CI_PIPELINE_ID"
echo "- Shards: 4"
echo "- Branch: $CI_COMMIT_REF_NAME"
echo ""
echo "View detailed results in job artifacts"

View File

@@ -1,536 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# CI/CD Pipeline Setup
**Workflow ID**: `_bmad/bmm/testarch/ci`
**Version**: 4.0 (BMad v6)
---
## Overview
Scaffolds a production-ready CI/CD quality pipeline with test execution, burn-in loops for flaky test detection, parallel sharding, artifact collection, and notification configuration. This workflow creates platform-specific CI configuration optimized for fast feedback and reliable test execution.
Note: This is typically a one-time setup per repo; run it any time after the test framework exists, ideally before feature work starts.
---
## Preflight Requirements
**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
- ✅ Git repository is initialized (`.git/` directory exists)
- ✅ Local test suite passes (`npm run test:e2e` succeeds)
- ✅ Test framework is configured (from `framework` workflow)
- ✅ Team agrees on target CI platform (GitHub Actions, GitLab CI, Circle CI, etc.)
- ✅ Access to CI platform settings/secrets available (if updating existing pipeline)
---
## Step 1: Run Preflight Checks
### Actions
1. **Verify Git Repository**
- Check for `.git/` directory
- Confirm remote repository configured (`git remote -v`)
- If not initialized, HALT with message: "Git repository required for CI/CD setup"
2. **Validate Test Framework**
- Look for `playwright.config.*` or `cypress.config.*`
- Read framework configuration to extract:
- Test directory location
- Test command
- Reporter configuration
- Timeout settings
- If not found, HALT with message: "Run `framework` workflow first to set up test infrastructure"
3. **Run Local Tests**
- Execute `npm run test:e2e` (or equivalent from package.json)
- Ensure tests pass before CI setup
- If tests fail, HALT with message: "Fix failing tests before setting up CI/CD"
4. **Detect CI Platform**
- Check for existing CI configuration:
- `.github/workflows/*.yml` (GitHub Actions)
- `.gitlab-ci.yml` (GitLab CI)
- `.circleci/config.yml` (Circle CI)
- `Jenkinsfile` (Jenkins)
- If found, ask user: "Update existing CI configuration or create new?"
- If not found, detect platform from git remote:
- `github.com` → GitHub Actions (default)
- `gitlab.com` → GitLab CI
- Ask user if unable to auto-detect
5. **Read Environment Configuration**
- Use `.nvmrc` for Node version if present
- If missing, default to a current LTS (Node 24) or newer instead of a fixed old version
- Read `package.json` to identify dependencies (affects caching strategy)
**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
---
## Step 2: Scaffold CI Pipeline
### Actions
1. **Select CI Platform Template**
Based on detection or user preference, use the appropriate template:
**GitHub Actions** (`.github/workflows/test.yml`):
- Most common platform
- Excellent caching and matrix support
- Free for public repos, generous free tier for private
**GitLab CI** (`.gitlab-ci.yml`):
- Integrated with GitLab
- Built-in registry and runners
- Powerful pipeline features
**Circle CI** (`.circleci/config.yml`):
- Fast execution with parallelism
- Docker-first approach
- Enterprise features
**Jenkins** (`Jenkinsfile`):
- Self-hosted option
- Maximum customization
- Requires infrastructure management
2. **Generate Pipeline Configuration**
Use templates from `{installed_path}/` directory:
- `github-actions-template.yml`
- `gitlab-ci-template.yml`
**Key pipeline stages:**
```yaml
stages:
- lint # Code quality checks
- test # Test execution (parallel shards)
- burn-in # Flaky test detection
- report # Aggregate results and publish
```
3. **Configure Test Execution**
**Parallel Sharding:**
```yaml
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- name: Run tests
run: npm run test:e2e -- --shard=${{ matrix.shard }}/${{ strategy.job-total }}
```
**Purpose:** Splits tests into N parallel jobs for faster execution (target: <10 min per shard)
4. **Add Burn-In Loop**
**Critical pattern from production systems:**
```yaml
burn-in:
name: Flaky Test Detection
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
- name: Install dependencies
run: npm ci
- name: Run burn-in loop (10 iterations)
run: |
for i in {1..10}; do
echo "🔥 Burn-in iteration $i/10"
npm run test:e2e || exit 1
done
- name: Upload failure artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: burn-in-failures
path: test-results/
retention-days: 30
```
**Purpose:** Runs tests multiple times to catch non-deterministic failures before they reach main branch.
**When to run:**
- On pull requests to main/develop
- Weekly on cron schedule
- After significant test infrastructure changes
5. **Configure Caching**
**Node modules cache:**
```yaml
- name: Cache dependencies
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
```
**Browser binaries cache (Playwright):**
```yaml
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-${{ hashFiles('**/package-lock.json') }}
```
**Purpose:** Reduces CI execution time by 2-5 minutes per run.
6. **Configure Artifact Collection**
**Failure artifacts only:**
```yaml
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.shard }}
path: |
test-results/
playwright-report/
retention-days: 30
```
**Artifacts to collect:**
- Traces (Playwright) - full debugging context
- Screenshots - visual evidence of failures
- Videos - interaction playback
- HTML reports - detailed test results
- Console logs - error messages and warnings
7. **Add Retry Logic**
```yaml
- name: Run tests with retries
uses: nick-invision/retry@v2
with:
timeout_minutes: 30
max_attempts: 3
retry_on: error
command: npm run test:e2e
```
**Purpose:** Handles transient failures (network issues, race conditions)
8. **Configure Notifications** (Optional)
If `notify_on_failure` is enabled:
```yaml
- name: Notify on failure
if: failure()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
text: 'Test failures detected in PR #${{ github.event.pull_request.number }}'
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
```
9. **Generate Helper Scripts**
**Selective testing script** (`scripts/test-changed.sh`):
```bash
#!/bin/bash
# Run only tests for changed files
CHANGED_FILES=$(git diff --name-only HEAD~1)
if echo "$CHANGED_FILES" | grep -q "src/.*\.ts$"; then
echo "Running affected tests..."
npm run test:e2e -- --grep="$(echo $CHANGED_FILES | sed 's/src\///g' | sed 's/\.ts//g')"
else
echo "No test-affecting changes detected"
fi
```
**Local mirror script** (`scripts/ci-local.sh`):
```bash
#!/bin/bash
# Mirror CI execution locally for debugging
echo "🔍 Running CI pipeline locally..."
# Lint
npm run lint || exit 1
# Tests
npm run test:e2e || exit 1
# Burn-in (reduced iterations)
for i in {1..3}; do
echo "🔥 Burn-in $i/3"
npm run test:e2e || exit 1
done
echo "✅ Local CI pipeline passed"
```
10. **Generate Documentation**
**CI README** (`docs/ci.md`):
- Pipeline stages and purpose
- How to run locally
- Debugging failed CI runs
- Secrets and environment variables needed
- Notification setup
- Badge URLs for README
**Secrets checklist** (`docs/ci-secrets-checklist.md`):
- Required secrets list (SLACK_WEBHOOK, etc.)
- Where to configure in CI platform
- Security best practices
---
## Step 3: Deliverables
### Primary Artifacts Created
1. **CI Configuration File**
- `.github/workflows/test.yml` (GitHub Actions)
- `.gitlab-ci.yml` (GitLab CI)
- `.circleci/config.yml` (Circle CI)
2. **Pipeline Stages**
- **Lint**: Code quality checks (ESLint, Prettier)
- **Test**: Parallel test execution (4 shards)
- **Burn-in**: Flaky test detection (10 iterations)
- **Report**: Result aggregation and publishing
3. **Helper Scripts**
- `scripts/test-changed.sh` - Selective testing
- `scripts/ci-local.sh` - Local CI mirror
- `scripts/burn-in.sh` - Standalone burn-in execution
4. **Documentation**
- `docs/ci.md` - CI pipeline guide
- `docs/ci-secrets-checklist.md` - Required secrets
- Inline comments in CI configuration
5. **Optimization Features**
- Dependency caching (npm, browser binaries)
- Parallel sharding (4 jobs default)
- Retry logic (2 retries on failure)
- Failure-only artifact upload
### Performance Targets
- **Lint stage**: <2 minutes
- **Test stage** (per shard): <10 minutes
- **Burn-in stage**: <30 minutes (10 iterations)
- **Total pipeline**: <45 minutes
**Speedup:** 20× faster than sequential execution through parallelism and caching.
---
## Important Notes
### Knowledge Base Integration
**Critical:** Check configuration and load appropriate fragments.
Read `{config_source}` and check `config.tea_use_playwright_utils`.
**Core CI Patterns (Always load):**
- `ci-burn-in.md` - Burn-in loop patterns: 10-iteration detection, GitHub Actions workflow, shard orchestration, selective execution (678 lines, 4 examples)
- `selective-testing.md` - Changed test detection strategies: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
- `visual-debugging.md` - Artifact collection best practices: trace viewer, HAR recording, custom artifacts, accessibility integration (522 lines, 5 examples)
- `test-quality.md` - CI-specific test quality criteria: deterministic tests, isolated with cleanup, explicit assertions, length/time optimization (658 lines, 5 examples)
- `playwright-config.md` - CI-optimized configuration: parallelization, artifact output, project dependencies, sharding (722 lines, 5 examples)
**If `config.tea_use_playwright_utils: true`:**
Load playwright-utils CI-relevant fragments:
- `burn-in.md` - Smart test selection with git diff analysis (very important for CI optimization)
- `network-error-monitor.md` - Automatic HTTP 4xx/5xx detection (recommend in CI pipelines)
Recommend:
- Add burn-in script for pull request validation
- Enable network-error-monitor in merged fixtures for catching silent failures
- Reference full docs in `*framework` and `*automate` workflows
### CI Platform-Specific Guidance
**GitHub Actions:**
- Use `actions/cache` for caching
- Matrix strategy for parallelism
- Secrets in repository settings
- Free 2000 minutes/month for private repos
**GitLab CI:**
- Use `.gitlab-ci.yml` in root
- `cache:` directive for caching
- Parallel execution with `parallel: 4`
- Variables in project CI/CD settings
**Circle CI:**
- Use `.circleci/config.yml`
- Docker executors recommended
- Parallelism with `parallelism: 4`
- Context for shared secrets
### Burn-In Loop Strategy
**When to run:**
- ✅ On PRs to main/develop branches
- ✅ Weekly on schedule (cron)
- ✅ After test infrastructure changes
- ❌ Not on every commit (too slow)
**Iterations:**
- **10 iterations** for thorough detection
- **3 iterations** for quick feedback
- **100 iterations** for high-confidence stability
**Failure threshold:**
- Even ONE failure in burn-in → tests are flaky
- Must fix before merging
### Artifact Retention
**Failure artifacts only:**
- Saves storage costs
- Maintains debugging capability
- 30-day retention default
**Artifact types:**
- Traces (Playwright) - 5-10 MB per test
- Screenshots - 100-500 KB per screenshot
- Videos - 2-5 MB per test
- HTML reports - 1-2 MB per run
### Selective Testing
**Detect changed files:**
```bash
git diff --name-only HEAD~1
```
**Run affected tests only:**
- Faster feedback for small changes
- Full suite still runs on main branch
- Reduces CI time by 50-80% for focused PRs
**Trade-off:**
- May miss integration issues
- Run full suite at least on merge
### Local CI Mirror
**Purpose:** Debug CI failures locally
**Usage:**
```bash
./scripts/ci-local.sh
```
**Mirrors CI environment:**
- Same Node version
- Same test command
- Same stages (lint → test → burn-in)
- Reduced burn-in iterations (3 vs 10)
---
## Output Summary
After completing this workflow, provide a summary:
```markdown
## CI/CD Pipeline Complete
**Platform**: GitHub Actions (or GitLab CI, etc.)
**Artifacts Created**:
- ✅ Pipeline configuration: .github/workflows/test.yml
- ✅ Burn-in loop: 10 iterations for flaky detection
- ✅ Parallel sharding: 4 jobs for fast execution
- ✅ Caching: Dependencies + browser binaries
- ✅ Artifact collection: Failure-only traces/screenshots/videos
- ✅ Helper scripts: test-changed.sh, ci-local.sh, burn-in.sh
- ✅ Documentation: docs/ci.md, docs/ci-secrets-checklist.md
**Performance:**
- Lint: <2 min
- Test (per shard): <10 min
- Burn-in: <30 min
- Total: <45 min (20× speedup vs sequential)
**Next Steps**:
1. Commit CI configuration: `git add .github/workflows/test.yml && git commit -m "ci: add test pipeline"`
2. Push to remote: `git push`
3. Configure required secrets in CI platform settings (see docs/ci-secrets-checklist.md)
4. Open a PR to trigger first CI run
5. Monitor pipeline execution and adjust parallelism if needed
**Knowledge Base References Applied**:
- Burn-in loop pattern (ci-burn-in.md)
- Selective testing strategy (selective-testing.md)
- Artifact collection (visual-debugging.md)
- Test quality criteria (test-quality.md)
```
---
## Validation
After completing all steps, verify:
- [ ] CI configuration file created and syntactically valid
- [ ] Burn-in loop configured (10 iterations)
- [ ] Parallel sharding enabled (4 jobs)
- [ ] Caching configured (dependencies + browsers)
- [ ] Artifact collection on failure only
- [ ] Helper scripts created and executable (`chmod +x`)
- [ ] Documentation complete (ci.md, secrets checklist)
- [ ] No errors or warnings during scaffold
Refer to `checklist.md` for comprehensive validation criteria.

View File

@@ -1,45 +0,0 @@
# Test Architect workflow: ci
name: testarch-ci
description: "Scaffold CI/CD quality pipeline with test execution, burn-in loops, and artifact collection"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/ci"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
# Variables and inputs
variables:
ci_platform: "auto" # auto, github-actions, gitlab-ci, circle-ci, jenkins - user can override
test_dir: "{project-root}/tests" # Root test directory
# Output configuration
default_output_file: "{project-root}/.github/workflows/test.yml" # GitHub Actions default
# Required tools
required_tools:
- read_file # Read .nvmrc, package.json, framework config
- write_file # Create CI config, scripts, documentation
- create_directory # Create .github/workflows/ or .gitlab-ci/ directories
- list_files # Detect existing CI configuration
- search_repo # Find test files for selective testing
tags:
- qa
- ci-cd
- test-architect
- pipeline
- automation
execution_hints:
interactive: false # Minimize prompts, auto-detect when possible
autonomous: true # Proceed without user input unless blocked
iterative: true

View File

@@ -1,321 +0,0 @@
# Test Framework Setup - Validation Checklist
This checklist ensures the framework workflow completes successfully and all deliverables meet quality standards.
---
## Prerequisites
Before starting the workflow:
- [ ] Project root contains valid `package.json`
- [ ] No existing modern E2E framework detected (`playwright.config.*`, `cypress.config.*`)
- [ ] Project type identifiable (React, Vue, Angular, Next.js, Node, etc.)
- [ ] Bundler identifiable (Vite, Webpack, Rollup, esbuild) or not applicable
- [ ] User has write permissions to create directories and files
---
## Process Steps
### Step 1: Preflight Checks
- [ ] package.json successfully read and parsed
- [ ] Project type extracted correctly
- [ ] Bundler identified (or marked as N/A for backend projects)
- [ ] No framework conflicts detected
- [ ] Architecture documents located (if available)
### Step 2: Framework Selection
- [ ] Framework auto-detection logic executed
- [ ] Framework choice justified (Playwright vs Cypress)
- [ ] Framework preference respected (if explicitly set)
- [ ] User notified of framework selection and rationale
### Step 3: Directory Structure
- [ ] `tests/` root directory created
- [ ] `tests/e2e/` directory created (or user's preferred structure)
- [ ] `tests/support/` directory created (critical pattern)
- [ ] `tests/support/fixtures/` directory created
- [ ] `tests/support/fixtures/factories/` directory created
- [ ] `tests/support/helpers/` directory created
- [ ] `tests/support/page-objects/` directory created (if applicable)
- [ ] All directories have correct permissions
**Note**: Test organization is flexible (e2e/, api/, integration/). The **support/** folder is the key pattern.
### Step 4: Configuration Files
- [ ] Framework config file created (`playwright.config.ts` or `cypress.config.ts`)
- [ ] Config file uses TypeScript (if `use_typescript: true`)
- [ ] Timeouts configured correctly (action: 15s, navigation: 30s, test: 60s)
- [ ] Base URL configured with environment variable fallback
- [ ] Trace/screenshot/video set to retain-on-failure
- [ ] Multiple reporters configured (HTML + JUnit + console)
- [ ] Parallel execution enabled
- [ ] CI-specific settings configured (retries, workers)
- [ ] Config file is syntactically valid (no compilation errors)
### Step 5: Environment Configuration
- [ ] `.env.example` created in project root
- [ ] `TEST_ENV` variable defined
- [ ] `BASE_URL` variable defined with default
- [ ] `API_URL` variable defined (if applicable)
- [ ] Authentication variables defined (if applicable)
- [ ] Feature flag variables defined (if applicable)
- [ ] `.nvmrc` created with appropriate Node version
### Step 6: Fixture Architecture
- [ ] `tests/support/fixtures/index.ts` created
- [ ] Base fixture extended from Playwright/Cypress
- [ ] Type definitions for fixtures created
- [ ] mergeTests pattern implemented (if multiple fixtures)
- [ ] Auto-cleanup logic included in fixtures
- [ ] Fixture architecture follows knowledge base patterns
### Step 7: Data Factories
- [ ] At least one factory created (e.g., UserFactory)
- [ ] Factories use @faker-js/faker for realistic data
- [ ] Factories track created entities (for cleanup)
- [ ] Factories implement `cleanup()` method
- [ ] Factories integrate with fixtures
- [ ] Factories follow knowledge base patterns
### Step 8: Sample Tests
- [ ] Example test file created (`tests/e2e/example.spec.ts`)
- [ ] Test uses fixture architecture
- [ ] Test demonstrates data factory usage
- [ ] Test uses proper selector strategy (data-testid)
- [ ] Test follows Given-When-Then structure
- [ ] Test includes proper assertions
- [ ] Network interception demonstrated (if applicable)
### Step 9: Helper Utilities
- [ ] API helper created (if API testing needed)
- [ ] Network helper created (if network mocking needed)
- [ ] Auth helper created (if authentication needed)
- [ ] Helpers follow functional patterns
- [ ] Helpers have proper error handling
### Step 10: Documentation
- [ ] `tests/README.md` created
- [ ] Setup instructions included
- [ ] Running tests section included
- [ ] Architecture overview section included
- [ ] Best practices section included
- [ ] CI integration section included
- [ ] Knowledge base references included
- [ ] Troubleshooting section included
### Step 11: Package.json Updates
- [ ] Minimal test script added to package.json: `test:e2e`
- [ ] Test framework dependency added (if not already present)
- [ ] Type definitions added (if TypeScript)
- [ ] Users can extend with additional scripts as needed
---
## Output Validation
### Configuration Validation
- [ ] Config file loads without errors
- [ ] Config file passes linting (if linter configured)
- [ ] Config file uses correct syntax for chosen framework
- [ ] All paths in config resolve correctly
- [ ] Reporter output directories exist or are created on test run
### Test Execution Validation
- [ ] Sample test runs successfully
- [ ] Test execution produces expected output (pass/fail)
- [ ] Test artifacts generated correctly (traces, screenshots, videos)
- [ ] Test report generated successfully
- [ ] No console errors or warnings during test run
### Directory Structure Validation
- [ ] All required directories exist
- [ ] Directory structure matches framework conventions
- [ ] No duplicate or conflicting directories
- [ ] Directories accessible with correct permissions
### File Integrity Validation
- [ ] All generated files are syntactically correct
- [ ] No placeholder text left in files (e.g., "TODO", "FIXME")
- [ ] All imports resolve correctly
- [ ] No hardcoded credentials or secrets in files
- [ ] All file paths use correct separators for OS
---
## Quality Checks
### Code Quality
- [ ] Generated code follows project coding standards
- [ ] TypeScript types are complete and accurate (no `any` unless necessary)
- [ ] No unused imports or variables
- [ ] Consistent code formatting (matches project style)
- [ ] No linting errors in generated files
### Best Practices Compliance
- [ ] Fixture architecture follows pure function → fixture → mergeTests pattern
- [ ] Data factories implement auto-cleanup
- [ ] Network interception occurs before navigation
- [ ] Selectors use data-testid strategy
- [ ] Artifacts only captured on failure
- [ ] Tests follow Given-When-Then structure
- [ ] No hard-coded waits or sleeps
### Knowledge Base Alignment
- [ ] Fixture pattern matches `fixture-architecture.md`
- [ ] Data factories match `data-factories.md`
- [ ] Network handling matches `network-first.md`
- [ ] Config follows `playwright-config.md` or `test-config.md`
- [ ] Test quality matches `test-quality.md`
### Security Checks
- [ ] No credentials in configuration files
- [ ] .env.example contains placeholders, not real values
- [ ] Sensitive test data handled securely
- [ ] API keys and tokens use environment variables
- [ ] No secrets committed to version control
---
## Integration Points
### Status File Integration
- [ ] `bmm-workflow-status.md` exists
- [ ] Framework initialization logged in Quality & Testing Progress section
- [ ] Status file updated with completion timestamp
- [ ] Status file shows framework: Playwright or Cypress
### Knowledge Base Integration
- [ ] Relevant knowledge fragments identified from tea-index.csv
- [ ] Knowledge fragments successfully loaded
- [ ] Patterns from knowledge base applied correctly
- [ ] Knowledge base references included in documentation
### Workflow Dependencies
- [ ] Can proceed to `ci` workflow after completion
- [ ] Can proceed to `test-design` workflow after completion
- [ ] Can proceed to `atdd` workflow after completion
- [ ] Framework setup compatible with downstream workflows
---
## Completion Criteria
**All of the following must be true:**
- [ ] All prerequisite checks passed
- [ ] All process steps completed without errors
- [ ] All output validations passed
- [ ] All quality checks passed
- [ ] All integration points verified
- [ ] Sample test executes successfully
- [ ] User can run `npm run test:e2e` without errors
- [ ] Documentation is complete and accurate
- [ ] No critical issues or blockers identified
---
## Post-Workflow Actions
**User must complete:**
1. [ ] Copy `.env.example` to `.env`
2. [ ] Fill in environment-specific values in `.env`
3. [ ] Run `npm install` to install test dependencies
4. [ ] Run `npm run test:e2e` to verify setup
5. [ ] Review `tests/README.md` for project-specific guidance
**Recommended next workflows:**
1. [ ] Run `ci` workflow to set up CI/CD pipeline
2. [ ] Run `test-design` workflow to plan test coverage
3. [ ] Run `atdd` workflow when ready to develop stories
---
## Rollback Procedure
If workflow fails and needs to be rolled back:
1. [ ] Delete `tests/` directory
2. [ ] Remove test scripts from package.json
3. [ ] Delete `.env.example` (if created)
4. [ ] Delete `.nvmrc` (if created)
5. [ ] Delete framework config file
6. [ ] Remove test dependencies from package.json (if added)
7. [ ] Run `npm install` to clean up node_modules
---
## Notes
### Common Issues
**Issue**: Config file has TypeScript errors
- **Solution**: Ensure `@playwright/test` or `cypress` types are installed
**Issue**: Sample test fails to run
- **Solution**: Check BASE_URL in .env, ensure app is running
**Issue**: Fixture cleanup not working
- **Solution**: Verify cleanup() is called in fixture teardown
**Issue**: Network interception not working
- **Solution**: Ensure route setup occurs before page.goto()
### Framework-Specific Considerations
**Playwright:**
- Requires Node.js 18+
- Browser binaries auto-installed on first run
- Trace viewer requires running `npx playwright show-trace`
**Cypress:**
- Requires Node.js 18+
- Cypress app opens on first run
- Component testing requires additional setup
### Version Compatibility
- [ ] Node.js version matches .nvmrc
- [ ] Framework version compatible with Node.js version
- [ ] TypeScript version compatible with framework
- [ ] All peer dependencies satisfied
---
**Checklist Complete**: Sign off when all items checked and validated.
**Completed by:** {name}
**Date:** {date}
**Framework:** { Playwright / Cypress or something else}
**Notes:** {notes}

View File

@@ -1,481 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Test Framework Setup
**Workflow ID**: `_bmad/bmm/testarch/framework`
**Version**: 4.0 (BMad v6)
---
## Overview
Initialize a production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, configuration, and best practices. This workflow scaffolds the complete testing infrastructure for modern web applications.
---
## Preflight Requirements
**Critical:** Verify these requirements before proceeding. If any fail, HALT and notify the user.
-`package.json` exists in project root
- ✅ No modern E2E test harness is already configured (check for existing `playwright.config.*` or `cypress.config.*`)
- ✅ Architectural/stack context available (project type, bundler, dependencies)
---
## Step 1: Run Preflight Checks
### Actions
1. **Validate package.json**
- Read `{project-root}/package.json`
- Extract project type (React, Vue, Angular, Next.js, Node, etc.)
- Identify bundler (Vite, Webpack, Rollup, esbuild)
- Note existing test dependencies
2. **Check for Existing Framework**
- Search for `playwright.config.*`, `cypress.config.*`, `cypress.json`
- Check `package.json` for `@playwright/test` or `cypress` dependencies
- If found, HALT with message: "Existing test framework detected. Use workflow `upgrade-framework` instead."
3. **Gather Context**
- Look for architecture documents (`architecture.md`, `tech-spec*.md`)
- Check for API documentation or endpoint lists
- Identify authentication requirements
**Halt Condition:** If preflight checks fail, stop immediately and report which requirement failed.
---
## Step 2: Scaffold Framework
### Actions
1. **Framework Selection**
**Default Logic:**
- **Playwright** (recommended for):
- Large repositories (100+ files)
- Performance-critical applications
- Multi-browser support needed
- Complex user flows requiring video/trace debugging
- Projects requiring worker parallelism
- **Cypress** (recommended for):
- Small teams prioritizing developer experience
- Component testing focus
- Real-time reloading during test development
- Simpler setup requirements
**Detection Strategy:**
- Check `package.json` for existing preference
- Consider `project_size` variable from workflow config
- Use `framework_preference` variable if set
- Default to **Playwright** if uncertain
2. **Create Directory Structure**
```
{project-root}/
├── tests/ # Root test directory
│ ├── e2e/ # Test files (users organize as needed)
│ ├── support/ # Framework infrastructure (key pattern)
│ │ ├── fixtures/ # Test fixtures (data, mocks)
│ │ ├── helpers/ # Utility functions
│ │ └── page-objects/ # Page object models (optional)
│ └── README.md # Test suite documentation
```
**Note**: Users organize test files (e2e/, api/, integration/, component/) as needed. The **support/** folder is the critical pattern for fixtures and helpers used across tests.
3. **Generate Configuration File**
**For Playwright** (`playwright.config.ts` or `playwright.config.js`):
```typescript
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests/e2e',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 1 : undefined,
timeout: 60 * 1000, // Test timeout: 60s
expect: {
timeout: 15 * 1000, // Assertion timeout: 15s
},
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'retain-on-failure',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
actionTimeout: 15 * 1000, // Action timeout: 15s
navigationTimeout: 30 * 1000, // Navigation timeout: 30s
},
reporter: [['html', { outputFolder: 'test-results/html' }], ['junit', { outputFile: 'test-results/junit.xml' }], ['list']],
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
],
});
```
**For Cypress** (`cypress.config.ts` or `cypress.config.js`):
```typescript
import { defineConfig } from 'cypress';
export default defineConfig({
e2e: {
baseUrl: process.env.BASE_URL || 'http://localhost:3000',
specPattern: 'tests/e2e/**/*.cy.{js,jsx,ts,tsx}',
supportFile: 'tests/support/e2e.ts',
video: false,
screenshotOnRunFailure: true,
setupNodeEvents(on, config) {
// implement node event listeners here
},
},
retries: {
runMode: 2,
openMode: 0,
},
defaultCommandTimeout: 15000,
requestTimeout: 30000,
responseTimeout: 30000,
pageLoadTimeout: 60000,
});
```
4. **Generate Environment Configuration**
Create `.env.example`:
```bash
# Test Environment Configuration
TEST_ENV=local
BASE_URL=http://localhost:3000
API_URL=http://localhost:3001/api
# Authentication (if applicable)
TEST_USER_EMAIL=test@example.com
TEST_USER_PASSWORD=
# Feature Flags (if applicable)
FEATURE_FLAG_NEW_UI=true
# API Keys (if applicable)
TEST_API_KEY=
```
5. **Generate Node Version File**
Create `.nvmrc`:
```
20.11.0
```
(Use Node version from existing `.nvmrc` or default to current LTS)
6. **Implement Fixture Architecture**
**Knowledge Base Reference**: `testarch/knowledge/fixture-architecture.md`
Create `tests/support/fixtures/index.ts`:
```typescript
import { test as base } from '@playwright/test';
import { UserFactory } from './factories/user-factory';
type TestFixtures = {
userFactory: UserFactory;
};
export const test = base.extend<TestFixtures>({
userFactory: async ({}, use) => {
const factory = new UserFactory();
await use(factory);
await factory.cleanup(); // Auto-cleanup
},
});
export { expect } from '@playwright/test';
```
7. **Implement Data Factories**
**Knowledge Base Reference**: `testarch/knowledge/data-factories.md`
Create `tests/support/fixtures/factories/user-factory.ts`:
```typescript
import { faker } from '@faker-js/faker';
export class UserFactory {
private createdUsers: string[] = [];
async createUser(overrides = {}) {
const user = {
email: faker.internet.email(),
name: faker.person.fullName(),
password: faker.internet.password({ length: 12 }),
...overrides,
};
// API call to create user
const response = await fetch(`${process.env.API_URL}/users`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(user),
});
const created = await response.json();
this.createdUsers.push(created.id);
return created;
}
async cleanup() {
// Delete all created users
for (const userId of this.createdUsers) {
await fetch(`${process.env.API_URL}/users/${userId}`, {
method: 'DELETE',
});
}
this.createdUsers = [];
}
}
```
8. **Generate Sample Tests**
Create `tests/e2e/example.spec.ts`:
```typescript
import { test, expect } from '../support/fixtures';
test.describe('Example Test Suite', () => {
test('should load homepage', async ({ page }) => {
await page.goto('/');
await expect(page).toHaveTitle(/Home/i);
});
test('should create user and login', async ({ page, userFactory }) => {
// Create test user
const user = await userFactory.createUser();
// Login
await page.goto('/login');
await page.fill('[data-testid="email-input"]', user.email);
await page.fill('[data-testid="password-input"]', user.password);
await page.click('[data-testid="login-button"]');
// Assert login success
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
});
});
```
9. **Update package.json Scripts**
Add minimal test script to `package.json`:
```json
{
"scripts": {
"test:e2e": "playwright test"
}
}
```
**Note**: Users can add additional scripts as needed (e.g., `--ui`, `--headed`, `--debug`, `show-report`).
10. **Generate Documentation**
Create `tests/README.md` with setup instructions (see Step 3 deliverables).
---
## Step 3: Deliverables
### Primary Artifacts Created
1. **Configuration File**
- `playwright.config.ts` or `cypress.config.ts`
- Timeouts: action 15s, navigation 30s, test 60s
- Reporters: HTML + JUnit XML
2. **Directory Structure**
- `tests/` with `e2e/`, `api/`, `support/` subdirectories
- `support/fixtures/` for test fixtures
- `support/helpers/` for utility functions
3. **Environment Configuration**
- `.env.example` with `TEST_ENV`, `BASE_URL`, `API_URL`
- `.nvmrc` with Node version
4. **Test Infrastructure**
- Fixture architecture (`mergeTests` pattern)
- Data factories (faker-based, with auto-cleanup)
- Sample tests demonstrating patterns
5. **Documentation**
- `tests/README.md` with setup instructions
- Comments in config files explaining options
### README Contents
The generated `tests/README.md` should include:
- **Setup Instructions**: How to install dependencies, configure environment
- **Running Tests**: Commands for local execution, headed mode, debug mode
- **Architecture Overview**: Fixture pattern, data factories, page objects
- **Best Practices**: Selector strategy (data-testid), test isolation, cleanup
- **CI Integration**: How tests run in CI/CD pipeline
- **Knowledge Base References**: Links to relevant TEA knowledge fragments
---
## Important Notes
### Knowledge Base Integration
**Critical:** Check configuration and load appropriate fragments.
Read `{config_source}` and check `config.tea_use_playwright_utils`.
**If `config.tea_use_playwright_utils: true` (Playwright Utils Integration):**
Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` and load:
- `overview.md` - Playwright utils installation and design principles
- `fixtures-composition.md` - mergeTests composition with playwright-utils
- `auth-session.md` - Token persistence setup (if auth needed)
- `api-request.md` - API testing utilities (if API tests planned)
- `burn-in.md` - Smart test selection for CI (recommend during framework setup)
- `network-error-monitor.md` - Automatic HTTP error detection (recommend in merged fixtures)
- `data-factories.md` - Factory patterns with faker (498 lines, 5 examples)
Recommend installing playwright-utils:
```bash
npm install -D @seontechnologies/playwright-utils
```
Recommend adding burn-in and network-error-monitor to merged fixtures for enhanced reliability.
**If `config.tea_use_playwright_utils: false` (Traditional Patterns):**
Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` and load:
- `fixture-architecture.md` - Pure function → fixture → `mergeTests` composition with auto-cleanup (406 lines, 5 examples)
- `data-factories.md` - Faker-based factories with overrides, nested factories, API seeding, auto-cleanup (498 lines, 5 examples)
- `network-first.md` - Network-first testing safeguards: intercept before navigate, HAR capture, deterministic waiting (489 lines, 5 examples)
- `playwright-config.md` - Playwright-specific configuration: environment-based, timeout standards, artifact output, parallelization, project config (722 lines, 5 examples)
- `test-quality.md` - Test design principles: deterministic, isolated with cleanup, explicit assertions, length/time limits (658 lines, 5 examples)
### Framework-Specific Guidance
**Playwright Advantages:**
- Worker parallelism (significantly faster for large suites)
- Trace viewer (powerful debugging with screenshots, network, console)
- Multi-language support (TypeScript, JavaScript, Python, C#, Java)
- Built-in API testing capabilities
- Better handling of multiple browser contexts
**Cypress Advantages:**
- Superior developer experience (real-time reloading)
- Excellent for component testing (Cypress CT or use Vitest)
- Simpler setup for small teams
- Better suited for watch mode during development
**Avoid Cypress when:**
- API chains are heavy and complex
- Multi-tab/window scenarios are common
- Worker parallelism is critical for CI performance
### Selector Strategy
**Always recommend**:
- `data-testid` attributes for UI elements
- `data-cy` attributes if Cypress is chosen
- Avoid brittle CSS selectors or XPath
### Contract Testing
For microservices architectures, **recommend Pact** for consumer-driven contract testing alongside E2E tests.
### Failure Artifacts
Configure **failure-only** capture:
- Screenshots: only on failure
- Videos: retain on failure (delete on success)
- Traces: retain on failure (Playwright)
This reduces storage overhead while maintaining debugging capability.
---
## Output Summary
After completing this workflow, provide a summary:
```markdown
## Framework Scaffold Complete
**Framework Selected**: Playwright (or Cypress)
**Artifacts Created**:
- ✅ Configuration file: `playwright.config.ts`
- ✅ Directory structure: `tests/e2e/`, `tests/support/`
- ✅ Environment config: `.env.example`
- ✅ Node version: `.nvmrc`
- ✅ Fixture architecture: `tests/support/fixtures/`
- ✅ Data factories: `tests/support/fixtures/factories/`
- ✅ Sample tests: `tests/e2e/example.spec.ts`
- ✅ Documentation: `tests/README.md`
**Next Steps**:
1. Copy `.env.example` to `.env` and fill in environment variables
2. Run `npm install` to install test dependencies
3. Run `npm run test:e2e` to execute sample tests
4. Review `tests/README.md` for detailed setup instructions
**Knowledge Base References Applied**:
- Fixture architecture pattern (pure functions + mergeTests)
- Data factories with auto-cleanup (faker-based)
- Network-first testing safeguards
- Failure-only artifact capture
```
---
## Validation
After completing all steps, verify:
- [ ] Configuration file created and valid
- [ ] Directory structure exists
- [ ] Environment configuration generated
- [ ] Sample tests run successfully
- [ ] Documentation complete and accurate
- [ ] No errors or warnings during scaffold
Refer to `checklist.md` for comprehensive validation criteria.

View File

@@ -1,47 +0,0 @@
# Test Architect workflow: framework
name: testarch-framework
description: "Initialize production-ready test framework architecture (Playwright or Cypress) with fixtures, helpers, and configuration"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/framework"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
# Variables and inputs
variables:
test_dir: "{project-root}/tests" # Root test directory
use_typescript: true # Prefer TypeScript configuration
framework_preference: "auto" # auto, playwright, cypress - user can override auto-detection
project_size: "auto" # auto, small, large - influences framework recommendation
# Output configuration
default_output_file: "{test_dir}/README.md" # Main deliverable is test setup README
# Required tools
required_tools:
- read_file # Read package.json, existing configs
- write_file # Create config files, helpers, fixtures, tests
- create_directory # Create test directory structure
- list_files # Check for existing framework
- search_repo # Find architecture docs
tags:
- qa
- setup
- test-architect
- framework
- initialization
execution_hints:
interactive: false # Minimize prompts; auto-detect when possible
autonomous: true # Proceed without user input unless blocked
iterative: true

View File

@@ -1,407 +0,0 @@
# Non-Functional Requirements Assessment - Validation Checklist
**Workflow:** `testarch-nfr`
**Purpose:** Ensure comprehensive and evidence-based NFR assessment with actionable recommendations
---
Note: `nfr-assess` evaluates existing evidence; it does not run tests or CI workflows.
## Prerequisites Validation
- [ ] Implementation is deployed and accessible for evaluation
- [ ] Evidence sources are available (test results, metrics, logs, CI results)
- [ ] NFR categories are determined (performance, security, reliability, maintainability, custom)
- [ ] Evidence directories exist and are accessible (`test_results_dir`, `metrics_dir`, `logs_dir`)
- [ ] Knowledge base is loaded (nfr-criteria, ci-burn-in, test-quality)
---
## Context Loading
- [ ] Tech-spec.md loaded successfully (if available)
- [ ] PRD.md loaded (if available)
- [ ] Story file loaded (if applicable)
- [ ] Relevant knowledge fragments loaded from `tea-index.csv`:
- [ ] `nfr-criteria.md`
- [ ] `ci-burn-in.md`
- [ ] `test-quality.md`
- [ ] `playwright-config.md` (if using Playwright)
---
## NFR Categories and Thresholds
### Performance
- [ ] Response time threshold defined or marked as UNKNOWN
- [ ] Throughput threshold defined or marked as UNKNOWN
- [ ] Resource usage thresholds defined or marked as UNKNOWN
- [ ] Scalability requirements defined or marked as UNKNOWN
### Security
- [ ] Authentication requirements defined or marked as UNKNOWN
- [ ] Authorization requirements defined or marked as UNKNOWN
- [ ] Data protection requirements defined or marked as UNKNOWN
- [ ] Vulnerability management thresholds defined or marked as UNKNOWN
- [ ] Compliance requirements identified (GDPR, HIPAA, PCI-DSS, etc.)
### Reliability
- [ ] Availability (uptime) threshold defined or marked as UNKNOWN
- [ ] Error rate threshold defined or marked as UNKNOWN
- [ ] MTTR (Mean Time To Recovery) threshold defined or marked as UNKNOWN
- [ ] Fault tolerance requirements defined or marked as UNKNOWN
- [ ] Disaster recovery requirements defined (RTO, RPO) or marked as UNKNOWN
### Maintainability
- [ ] Test coverage threshold defined or marked as UNKNOWN
- [ ] Code quality threshold defined or marked as UNKNOWN
- [ ] Technical debt threshold defined or marked as UNKNOWN
- [ ] Documentation completeness threshold defined or marked as UNKNOWN
### Custom NFR Categories (if applicable)
- [ ] Custom NFR category 1: Thresholds defined or marked as UNKNOWN
- [ ] Custom NFR category 2: Thresholds defined or marked as UNKNOWN
- [ ] Custom NFR category 3: Thresholds defined or marked as UNKNOWN
---
## Evidence Gathering
### Performance Evidence
- [ ] Load test results collected (JMeter, k6, Gatling, etc.)
- [ ] Application metrics collected (response times, throughput, resource usage)
- [ ] APM data collected (New Relic, Datadog, Dynatrace, etc.)
- [ ] Lighthouse reports collected (if web app)
- [ ] Playwright performance traces collected (if applicable)
### Security Evidence
- [ ] SAST results collected (SonarQube, Checkmarx, Veracode, etc.)
- [ ] DAST results collected (OWASP ZAP, Burp Suite, etc.)
- [ ] Dependency scanning results collected (Snyk, Dependabot, npm audit)
- [ ] Penetration test reports collected (if available)
- [ ] Security audit logs collected
- [ ] Compliance audit results collected (if applicable)
### Reliability Evidence
- [ ] Uptime monitoring data collected (Pingdom, UptimeRobot, StatusCake)
- [ ] Error logs collected
- [ ] Error rate metrics collected
- [ ] CI burn-in results collected (stability over time)
- [ ] Chaos engineering test results collected (if available)
- [ ] Failover/recovery test results collected (if available)
- [ ] Incident reports and postmortems collected (if applicable)
### Maintainability Evidence
- [ ] Code coverage reports collected (Istanbul, NYC, c8, JaCoCo)
- [ ] Static analysis results collected (ESLint, SonarQube, CodeClimate)
- [ ] Technical debt metrics collected
- [ ] Documentation audit results collected
- [ ] Test review report collected (from test-review workflow, if available)
- [ ] Git metrics collected (code churn, commit frequency, etc.)
---
## NFR Assessment with Deterministic Rules
### Performance Assessment
- [ ] Response time assessed against threshold
- [ ] Throughput assessed against threshold
- [ ] Resource usage assessed against threshold
- [ ] Scalability assessed against requirements
- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
- [ ] Evidence source documented (file path, metric name)
### Security Assessment
- [ ] Authentication strength assessed against requirements
- [ ] Authorization controls assessed against requirements
- [ ] Data protection assessed against requirements
- [ ] Vulnerability management assessed against thresholds
- [ ] Compliance assessed against requirements
- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
- [ ] Evidence source documented (file path, scan result)
### Reliability Assessment
- [ ] Availability (uptime) assessed against threshold
- [ ] Error rate assessed against threshold
- [ ] MTTR assessed against threshold
- [ ] Fault tolerance assessed against requirements
- [ ] Disaster recovery assessed against requirements (RTO, RPO)
- [ ] CI burn-in assessed (stability over time)
- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
- [ ] Evidence source documented (file path, monitoring data)
### Maintainability Assessment
- [ ] Test coverage assessed against threshold
- [ ] Code quality assessed against threshold
- [ ] Technical debt assessed against threshold
- [ ] Documentation completeness assessed against threshold
- [ ] Test quality assessed (from test-review, if available)
- [ ] Status classified (PASS/CONCERNS/FAIL) with justification
- [ ] Evidence source documented (file path, coverage report)
### Custom NFR Assessment (if applicable)
- [ ] Custom NFR 1 assessed against threshold with justification
- [ ] Custom NFR 2 assessed against threshold with justification
- [ ] Custom NFR 3 assessed against threshold with justification
---
## Status Classification Validation
### PASS Criteria Verified
- [ ] Evidence exists for PASS status
- [ ] Evidence meets or exceeds threshold
- [ ] No concerns flagged in evidence
- [ ] Quality is acceptable
### CONCERNS Criteria Verified
- [ ] Threshold is UNKNOWN (documented) OR
- [ ] Evidence is MISSING or INCOMPLETE (documented) OR
- [ ] Evidence is close to threshold (within 10%, documented) OR
- [ ] Evidence shows intermittent issues (documented)
### FAIL Criteria Verified
- [ ] Evidence exists BUT does not meet threshold (documented) OR
- [ ] Critical evidence is MISSING (documented) OR
- [ ] Evidence shows consistent failures (documented) OR
- [ ] Quality is unacceptable (documented)
### No Threshold Guessing
- [ ] All thresholds are either defined or marked as UNKNOWN
- [ ] No thresholds were guessed or inferred
- [ ] All UNKNOWN thresholds result in CONCERNS status
---
## Quick Wins and Recommended Actions
### Quick Wins Identified
- [ ] Low-effort, high-impact improvements identified for CONCERNS/FAIL
- [ ] Configuration changes (no code changes) identified
- [ ] Optimization opportunities identified (caching, indexing, compression)
- [ ] Monitoring additions identified (detect issues before failures)
### Recommended Actions
- [ ] Specific remediation steps provided (not generic advice)
- [ ] Priority assigned (CRITICAL, HIGH, MEDIUM, LOW)
- [ ] Estimated effort provided (hours, days)
- [ ] Owner suggestions provided (dev, ops, security)
### Monitoring Hooks
- [ ] Performance monitoring suggested (APM, synthetic monitoring)
- [ ] Error tracking suggested (Sentry, Rollbar, error logs)
- [ ] Security monitoring suggested (intrusion detection, audit logs)
- [ ] Alerting thresholds suggested (notify before breach)
### Fail-Fast Mechanisms
- [ ] Circuit breakers suggested for reliability
- [ ] Rate limiting suggested for performance
- [ ] Validation gates suggested for security
- [ ] Smoke tests suggested for maintainability
---
## Deliverables Generated
### NFR Assessment Report
- [ ] File created at `{output_folder}/nfr-assessment.md`
- [ ] Template from `nfr-report-template.md` used
- [ ] Executive summary included (overall status, critical issues)
- [ ] Assessment by category included (performance, security, reliability, maintainability)
- [ ] Evidence for each NFR documented
- [ ] Status classifications documented (PASS/CONCERNS/FAIL)
- [ ] Findings summary included (PASS count, CONCERNS count, FAIL count)
- [ ] Quick wins section included
- [ ] Recommended actions section included
- [ ] Evidence gaps checklist included
### Gate YAML Snippet (if enabled)
- [ ] YAML snippet generated
- [ ] Date included
- [ ] Categories status included (performance, security, reliability, maintainability)
- [ ] Overall status included (PASS/CONCERNS/FAIL)
- [ ] Issue counts included (critical, high, medium, concerns)
- [ ] Blockers flag included (true/false)
- [ ] Recommendations included
### Evidence Checklist (if enabled)
- [ ] All NFRs with MISSING or INCOMPLETE evidence listed
- [ ] Owners assigned for evidence collection
- [ ] Suggested evidence sources provided
- [ ] Deadlines set for evidence collection
### Updated Story File (if enabled and requested)
- [ ] "NFR Assessment" section added to story markdown
- [ ] Link to NFR assessment report included
- [ ] Overall status and critical issues included
- [ ] Gate status included
---
## Quality Assurance
### Accuracy Checks
- [ ] All NFR categories assessed (none skipped)
- [ ] All thresholds documented (defined or UNKNOWN)
- [ ] All evidence sources documented (file paths, metric names)
- [ ] Status classifications are deterministic and consistent
- [ ] No false positives (status correctly assigned)
- [ ] No false negatives (all issues identified)
### Completeness Checks
- [ ] All NFR categories covered (performance, security, reliability, maintainability, custom)
- [ ] All evidence sources checked (test results, metrics, logs, CI results)
- [ ] All status types used appropriately (PASS, CONCERNS, FAIL)
- [ ] All NFRs with CONCERNS/FAIL have recommendations
- [ ] All evidence gaps have owners and deadlines
### Actionability Checks
- [ ] Recommendations are specific (not generic)
- [ ] Remediation steps are clear and actionable
- [ ] Priorities are assigned (CRITICAL, HIGH, MEDIUM, LOW)
- [ ] Effort estimates are provided (hours, days)
- [ ] Owners are suggested (dev, ops, security)
---
## Integration with BMad Artifacts
### With tech-spec.md
- [ ] Tech spec loaded for NFR requirements and thresholds
- [ ] Performance targets extracted
- [ ] Security requirements extracted
- [ ] Reliability SLAs extracted
- [ ] Architectural decisions considered
### With test-design.md
- [ ] Test design loaded for NFR test plan
- [ ] Test priorities referenced (P0/P1/P2/P3)
- [ ] Assessment aligned with planned NFR validation
### With PRD.md
- [ ] PRD loaded for product-level NFR context
- [ ] User experience goals considered
- [ ] Unstated requirements checked
- [ ] Product-level SLAs referenced
---
## Quality Gates Validation
### Release Blocker (FAIL)
- [ ] Critical NFR status checked (security, reliability)
- [ ] Performance failures assessed for user impact
- [ ] Release blocker flagged if critical NFR has FAIL status
### PR Blocker (HIGH CONCERNS)
- [ ] High-priority NFR status checked
- [ ] Multiple CONCERNS assessed
- [ ] PR blocker flagged if HIGH priority issues exist
### Warning (CONCERNS)
- [ ] Any NFR with CONCERNS status flagged
- [ ] Missing or incomplete evidence documented
- [ ] Warning issued to address before next release
### Pass (PASS)
- [ ] All NFRs have PASS status
- [ ] No blockers or concerns exist
- [ ] Ready for release confirmed
---
## Non-Prescriptive Validation
- [ ] NFR categories adapted to team needs
- [ ] Thresholds appropriate for project context
- [ ] Assessment criteria customized as needed
- [ ] Teams can extend with custom NFR categories
- [ ] Integration with external tools supported (New Relic, Datadog, SonarQube, JIRA)
---
## Documentation and Communication
- [ ] NFR assessment report is readable and well-formatted
- [ ] Tables render correctly in markdown
- [ ] Code blocks have proper syntax highlighting
- [ ] Links are valid and accessible
- [ ] Recommendations are clear and prioritized
- [ ] Overall status is prominent and unambiguous
- [ ] Executive summary provides quick understanding
---
## Final Validation
- [ ] All prerequisites met
- [ ] All NFR categories assessed with evidence (or gaps documented)
- [ ] No thresholds were guessed (all defined or UNKNOWN)
- [ ] Status classifications are deterministic and justified
- [ ] Quick wins identified for all CONCERNS/FAIL
- [ ] Recommended actions are specific and actionable
- [ ] Evidence gaps documented with owners and deadlines
- [ ] NFR assessment report generated and saved
- [ ] Gate YAML snippet generated (if enabled)
- [ ] Evidence checklist generated (if enabled)
- [ ] Workflow completed successfully
---
## Sign-Off
**NFR Assessment Status:**
- [ ] ✅ PASS - All NFRs meet requirements, ready for release
- [ ] ⚠️ CONCERNS - Some NFRs have concerns, address before next release
- [ ] ❌ FAIL - Critical NFRs not met, BLOCKER for release
**Next Actions:**
- If PASS ✅: Proceed to `*gate` workflow or release
- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
**Critical Issues:** {COUNT}
**High Priority Issues:** {COUNT}
**Concerns:** {COUNT}
---
<!-- Powered by BMAD-CORE™ -->

View File

@@ -1,722 +0,0 @@
# Non-Functional Requirements Assessment - Instructions v4.0
**Workflow:** `testarch-nfr`
**Purpose:** Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation
**Agent:** Test Architect (TEA)
**Format:** Pure Markdown v4.0 (no XML blocks)
---
## Overview
This workflow performs a comprehensive assessment of non-functional requirements (NFRs) to validate that the implementation meets performance, security, reliability, and maintainability standards before release. It uses evidence-based validation with deterministic PASS/CONCERNS/FAIL rules and provides actionable recommendations for remediation.
**Key Capabilities:**
- Assess multiple NFR categories (performance, security, reliability, maintainability, custom)
- Validate NFRs against defined thresholds from tech specs, PRD, or defaults
- Classify status deterministically (PASS/CONCERNS/FAIL) based on evidence
- Never guess thresholds - mark as CONCERNS if unknown
- Generate gate-ready YAML snippets for CI/CD integration
- Provide quick wins and recommended actions for remediation
- Create evidence checklists for gaps
---
## Prerequisites
**Required:**
- Implementation deployed locally or accessible for evaluation
- Evidence sources available (test results, metrics, logs, CI results)
**Recommended:**
- NFR requirements defined in tech-spec.md, PRD.md, or story
- Test results from performance, security, reliability tests
- Application metrics (response times, error rates, throughput)
- CI/CD pipeline results for burn-in validation
**Halt Conditions:**
- If NFR targets are undefined and cannot be obtained, halt and request definition
- If implementation is not accessible for evaluation, halt and request deployment
---
## Workflow Steps
### Step 1: Load Context and Knowledge Base
**Actions:**
1. Load relevant knowledge fragments from `{project-root}/_bmad/bmm/testarch/tea-index.csv`:
- `nfr-criteria.md` - Non-functional requirements criteria and thresholds (security, performance, reliability, maintainability with code examples, 658 lines, 4 examples)
- `ci-burn-in.md` - CI/CD burn-in patterns for reliability validation (10-iteration detection, sharding, selective execution, 678 lines, 4 examples)
- `test-quality.md` - Test quality expectations for maintainability (deterministic, isolated, explicit assertions, length/time limits, 658 lines, 5 examples)
- `playwright-config.md` - Performance configuration patterns: parallelization, timeout standards, artifact output (722 lines, 5 examples)
- `error-handling.md` - Reliability validation patterns: scoped exceptions, retry validation, telemetry logging, graceful degradation (736 lines, 4 examples)
2. Read story file (if provided):
- Extract NFR requirements
- Identify specific thresholds or SLAs
- Note any custom NFR categories
3. Read related BMad artifacts (if available):
- `tech-spec.md` - Technical NFR requirements and targets
- `PRD.md` - Product-level NFR context (user expectations)
- `test-design.md` - NFR test plan and priorities
**Output:** Complete understanding of NFR targets, evidence sources, and validation criteria
---
### Step 2: Identify NFR Categories and Thresholds
**Actions:**
1. Determine which NFR categories to assess (default: performance, security, reliability, maintainability):
- **Performance**: Response time, throughput, resource usage
- **Security**: Authentication, authorization, data protection, vulnerability scanning
- **Reliability**: Error handling, recovery, availability, fault tolerance
- **Maintainability**: Code quality, test coverage, documentation, technical debt
2. Add custom NFR categories if specified (e.g., accessibility, internationalization, compliance)
3. Gather thresholds for each NFR:
- From tech-spec.md (primary source)
- From PRD.md (product-level SLAs)
- From story file (feature-specific requirements)
- From workflow variables (default thresholds)
- Mark thresholds as UNKNOWN if not defined
4. Never guess thresholds - if a threshold is unknown, mark the NFR as CONCERNS
**Output:** Complete list of NFRs to assess with defined (or UNKNOWN) thresholds
---
### Step 3: Gather Evidence
**Actions:**
1. For each NFR category, discover evidence sources:
**Performance Evidence:**
- Load test results (JMeter, k6, Lighthouse)
- Application metrics (response times, throughput, resource usage)
- Performance monitoring data (New Relic, Datadog, APM)
- Playwright performance traces (if applicable)
**Security Evidence:**
- Security scan results (SAST, DAST, dependency scanning)
- Authentication/authorization test results
- Penetration test reports
- Vulnerability assessment reports
- Compliance audit results
**Reliability Evidence:**
- Error logs and error rates
- Uptime monitoring data
- Chaos engineering test results
- Failover/recovery test results
- CI burn-in results (stability over time)
**Maintainability Evidence:**
- Code coverage reports (Istanbul, NYC, c8)
- Static analysis results (ESLint, SonarQube)
- Technical debt metrics
- Documentation completeness
- Test quality assessment (from test-review workflow)
2. Read relevant files from evidence directories:
- `{test_results_dir}` for test execution results
- `{metrics_dir}` for application metrics
- `{logs_dir}` for application logs
- CI/CD pipeline results (if `include_ci_results` is true)
3. Mark NFRs without evidence as "NO EVIDENCE" - never infer or assume
**Output:** Comprehensive evidence inventory for each NFR
---
### Step 4: Assess NFRs with Deterministic Rules
**Actions:**
1. For each NFR, apply deterministic PASS/CONCERNS/FAIL rules:
**PASS Criteria:**
- Evidence exists AND meets defined threshold
- No concerns flagged in evidence
- Example: Response time is 350ms (threshold: 500ms) → PASS
**CONCERNS Criteria:**
- Threshold is UNKNOWN (not defined)
- Evidence is MISSING or INCOMPLETE
- Evidence is close to threshold (within 10%)
- Evidence shows intermittent issues
- Example: Response time is 480ms (threshold: 500ms, 96% of threshold) → CONCERNS
**FAIL Criteria:**
- Evidence exists BUT does not meet threshold
- Critical evidence is MISSING
- Evidence shows consistent failures
- Example: Response time is 750ms (threshold: 500ms) → FAIL
2. Document findings for each NFR:
- Status (PASS/CONCERNS/FAIL)
- Evidence source (file path, test name, metric name)
- Actual value vs threshold
- Justification for status classification
3. Classify severity based on category:
- **CRITICAL**: Security failures, reliability failures (affect users immediately)
- **HIGH**: Performance failures, maintainability failures (affect users soon)
- **MEDIUM**: Concerns without failures (may affect users eventually)
- **LOW**: Missing evidence for non-critical NFRs
**Output:** Complete NFR assessment with deterministic status classifications
---
### Step 5: Identify Quick Wins and Recommended Actions
**Actions:**
1. For each NFR with CONCERNS or FAIL status, identify quick wins:
- Low-effort, high-impact improvements
- Configuration changes (no code changes needed)
- Optimization opportunities (caching, indexing, compression)
- Monitoring additions (detect issues before they become failures)
2. Provide recommended actions for each issue:
- Specific steps to remediate (not generic advice)
- Priority (CRITICAL, HIGH, MEDIUM, LOW)
- Estimated effort (hours, days)
- Owner suggestion (dev, ops, security)
3. Suggest monitoring hooks for gaps:
- Add performance monitoring (APM, synthetic monitoring)
- Add error tracking (Sentry, Rollbar, error logs)
- Add security monitoring (intrusion detection, audit logs)
- Add alerting thresholds (notify before thresholds are breached)
4. Suggest fail-fast mechanisms:
- Add circuit breakers for reliability
- Add rate limiting for performance
- Add validation gates for security
- Add smoke tests for maintainability
**Output:** Actionable remediation plan with prioritized recommendations
---
### Step 6: Generate Deliverables
**Actions:**
1. Create NFR assessment markdown file:
- Use template from `nfr-report-template.md`
- Include executive summary (overall status, critical issues)
- Add NFR-by-NFR assessment (status, evidence, thresholds)
- Add findings summary (PASS count, CONCERNS count, FAIL count)
- Add quick wins section
- Add recommended actions section
- Add evidence gaps checklist
- Save to `{output_folder}/nfr-assessment.md`
2. Generate gate YAML snippet (if enabled):
```yaml
nfr_assessment:
date: '2025-10-14'
categories:
performance: 'PASS'
security: 'CONCERNS'
reliability: 'PASS'
maintainability: 'PASS'
overall_status: 'CONCERNS'
critical_issues: 0
high_priority_issues: 1
concerns: 2
blockers: false
```
3. Generate evidence checklist (if enabled):
- List all NFRs with MISSING or INCOMPLETE evidence
- Assign owners for evidence collection
- Suggest evidence sources (tests, metrics, logs)
- Set deadlines for evidence collection
4. Update story file (if enabled and requested):
- Add "NFR Assessment" section to story markdown
- Link to NFR assessment report
- Include overall status and critical issues
- Add gate status
**Output:** Complete NFR assessment documentation ready for review and CI/CD integration
---
## Non-Prescriptive Approach
**Minimal Examples:** This workflow provides principles and patterns, not rigid templates. Teams should adapt NFR categories, thresholds, and assessment criteria to their needs.
**Key Patterns to Follow:**
- Use evidence-based validation (no guessing or inference)
- Apply deterministic rules (consistent PASS/CONCERNS/FAIL classification)
- Never guess thresholds (mark as CONCERNS if unknown)
- Provide actionable recommendations (specific steps, not generic advice)
- Generate gate-ready artifacts (YAML snippets for CI/CD)
**Extend as Needed:**
- Add custom NFR categories (accessibility, internationalization, compliance)
- Integrate with external tools (New Relic, Datadog, SonarQube, JIRA)
- Add custom thresholds and rules
- Link to external assessment systems
---
## NFR Categories and Criteria
### Performance
**Criteria:**
- Response time (p50, p95, p99 percentiles)
- Throughput (requests per second, transactions per second)
- Resource usage (CPU, memory, disk, network)
- Scalability (horizontal, vertical)
**Thresholds (Default):**
- Response time p95: 500ms
- Throughput: 100 RPS
- CPU usage: < 70% average
- Memory usage: < 80% max
**Evidence Sources:**
- Load test results (JMeter, k6, Gatling)
- APM data (New Relic, Datadog, Dynatrace)
- Lighthouse reports (for web apps)
- Playwright performance traces
---
### Security
**Criteria:**
- Authentication (login security, session management)
- Authorization (access control, permissions)
- Data protection (encryption, PII handling)
- Vulnerability management (SAST, DAST, dependency scanning)
- Compliance (GDPR, HIPAA, PCI-DSS)
**Thresholds (Default):**
- Security score: >= 85/100
- Critical vulnerabilities: 0
- High vulnerabilities: < 3
- Authentication strength: MFA enabled
**Evidence Sources:**
- SAST results (SonarQube, Checkmarx, Veracode)
- DAST results (OWASP ZAP, Burp Suite)
- Dependency scanning (Snyk, Dependabot, npm audit)
- Penetration test reports
- Security audit logs
---
### Reliability
**Criteria:**
- Availability (uptime percentage)
- Error handling (graceful degradation, error recovery)
- Fault tolerance (redundancy, failover)
- Disaster recovery (backup, restore, RTO/RPO)
- Stability (CI burn-in, chaos engineering)
**Thresholds (Default):**
- Uptime: >= 99.9% (three nines)
- Error rate: < 0.1% (1 in 1000 requests)
- MTTR (Mean Time To Recovery): < 15 minutes
- CI burn-in: 100 consecutive successful runs
**Evidence Sources:**
- Uptime monitoring (Pingdom, UptimeRobot, StatusCake)
- Error logs and error rates
- CI burn-in results (see `ci-burn-in.md`)
- Chaos engineering test results (Chaos Monkey, Gremlin)
- Incident reports and postmortems
---
### Maintainability
**Criteria:**
- Code quality (complexity, duplication, code smells)
- Test coverage (unit, integration, E2E)
- Documentation (code comments, README, architecture docs)
- Technical debt (debt ratio, code churn)
- Test quality (from test-review workflow)
**Thresholds (Default):**
- Test coverage: >= 80%
- Code quality score: >= 85/100
- Technical debt ratio: < 5%
- Documentation completeness: >= 90%
**Evidence Sources:**
- Coverage reports (Istanbul, NYC, c8, JaCoCo)
- Static analysis (ESLint, SonarQube, CodeClimate)
- Documentation audit (manual or automated)
- Test review report (from test-review workflow)
- Git metrics (code churn, commit frequency)
---
## Deterministic Assessment Rules
### PASS Rules
- Evidence exists
- Evidence meets or exceeds threshold
- No concerns flagged
- Quality is acceptable
**Example:**
```markdown
NFR: Response Time p95
Threshold: 500ms
Evidence: Load test result shows 350ms p95
Status: PASS ✅
```
---
### CONCERNS Rules
- Threshold is UNKNOWN
- Evidence is MISSING or INCOMPLETE
- Evidence is close to threshold (within 10%)
- Evidence shows intermittent issues
- Quality is marginal
**Example:**
```markdown
NFR: Response Time p95
Threshold: 500ms
Evidence: Load test result shows 480ms p95 (96% of threshold)
Status: CONCERNS ⚠️
Recommendation: Optimize before production - very close to threshold
```
---
### FAIL Rules
- Evidence exists BUT does not meet threshold
- Critical evidence is MISSING
- Evidence shows consistent failures
- Quality is unacceptable
**Example:**
```markdown
NFR: Response Time p95
Threshold: 500ms
Evidence: Load test result shows 750ms p95 (150% of threshold)
Status: FAIL ❌
Recommendation: BLOCKER - optimize performance before release
```
---
## Integration with BMad Artifacts
### With tech-spec.md
- Primary source for NFR requirements and thresholds
- Load performance targets, security requirements, reliability SLAs
- Use architectural decisions to understand NFR trade-offs
### With test-design.md
- Understand NFR test plan and priorities
- Reference test priorities (P0/P1/P2/P3) for severity classification
- Align assessment with planned NFR validation
### With PRD.md
- Understand product-level NFR expectations
- Verify NFRs align with user experience goals
- Check for unstated NFR requirements (implied by product goals)
---
## Quality Gates
### Release Blocker (FAIL)
- Critical NFR has FAIL status (security, reliability)
- Performance failure affects user experience severely
- Do not release until FAIL is resolved
### PR Blocker (HIGH CONCERNS)
- High-priority NFR has FAIL status
- Multiple CONCERNS exist
- Block PR merge until addressed
### Warning (CONCERNS)
- Any NFR has CONCERNS status
- Evidence is missing or incomplete
- Address before next release
### Pass (PASS)
- All NFRs have PASS status
- No blockers or concerns
- Ready for release
---
## Example NFR Assessment
````markdown
# NFR Assessment - Story 1.3
**Feature:** User Authentication
**Date:** 2025-10-14
**Overall Status:** CONCERNS ⚠️ (1 HIGH issue)
## Executive Summary
**Assessment:** 3 PASS, 1 CONCERNS, 0 FAIL
**Blockers:** None
**High Priority Issues:** 1 (Security - MFA not enforced)
**Recommendation:** Address security concern before release
## Performance Assessment
### Response Time (p95)
- **Status:** PASS ✅
- **Threshold:** 500ms
- **Actual:** 320ms (64% of threshold)
- **Evidence:** Load test results (test-results/load-2025-10-14.json)
- **Findings:** Response time well below threshold across all percentiles
### Throughput
- **Status:** PASS ✅
- **Threshold:** 100 RPS
- **Actual:** 250 RPS (250% of threshold)
- **Evidence:** Load test results (test-results/load-2025-10-14.json)
- **Findings:** System handles 2.5x target load without degradation
## Security Assessment
### Authentication Strength
- **Status:** CONCERNS ⚠️
- **Threshold:** MFA enabled for all users
- **Actual:** MFA optional (not enforced)
- **Evidence:** Security audit (security-audit-2025-10-14.md)
- **Findings:** MFA is implemented but not enforced by default
- **Recommendation:** HIGH - Enforce MFA for all new accounts, provide migration path for existing users
### Data Protection
- **Status:** PASS ✅
- **Threshold:** PII encrypted at rest and in transit
- **Actual:** AES-256 at rest, TLS 1.3 in transit
- **Evidence:** Security scan (security-scan-2025-10-14.json)
- **Findings:** All PII properly encrypted
## Reliability Assessment
### Uptime
- **Status:** PASS ✅
- **Threshold:** 99.9% (three nines)
- **Actual:** 99.95% over 30 days
- **Evidence:** Uptime monitoring (uptime-report-2025-10-14.csv)
- **Findings:** Exceeds target with margin
### Error Rate
- **Status:** PASS ✅
- **Threshold:** < 0.1% (1 in 1000)
- **Actual:** 0.05% (1 in 2000)
- **Evidence:** Error logs (logs/errors-2025-10.log)
- **Findings:** Error rate well below threshold
## Maintainability Assessment
### Test Coverage
- **Status:** PASS ✅
- **Threshold:** >= 80%
- **Actual:** 87%
- **Evidence:** Coverage report (coverage/lcov-report/index.html)
- **Findings:** Coverage exceeds threshold with good distribution
### Code Quality
- **Status:** PASS ✅
- **Threshold:** >= 85/100
- **Actual:** 92/100
- **Evidence:** SonarQube analysis (sonarqube-report-2025-10-14.pdf)
- **Findings:** High code quality score with low technical debt
## Quick Wins
1. **Enforce MFA (Security)** - HIGH - 4 hours
- Add configuration flag to enforce MFA for new accounts
- No code changes needed, only config adjustment
## Recommended Actions
### Immediate (Before Release)
1. **Enforce MFA for all new accounts** - HIGH - 4 hours - Security Team
- Add `ENFORCE_MFA=true` to production config
- Update user onboarding flow to require MFA setup
- Test MFA enforcement in staging environment
### Short-term (Next Sprint)
1. **Migrate existing users to MFA** - MEDIUM - 3 days - Product + Engineering
- Design migration UX (prompt, incentives, deadline)
- Implement migration flow with grace period
- Communicate migration to existing users
## Evidence Gaps
- [ ] Chaos engineering test results (reliability)
- Owner: DevOps Team
- Deadline: 2025-10-21
- Suggested evidence: Run chaos monkey tests in staging
- [ ] Penetration test report (security)
- Owner: Security Team
- Deadline: 2025-10-28
- Suggested evidence: Schedule third-party pentest
## Gate YAML Snippet
```yaml
nfr_assessment:
date: '2025-10-14'
story_id: '1.3'
categories:
performance: 'PASS'
security: 'CONCERNS'
reliability: 'PASS'
maintainability: 'PASS'
overall_status: 'CONCERNS'
critical_issues: 0
high_priority_issues: 1
medium_priority_issues: 0
concerns: 1
blockers: false
recommendations:
- 'Enforce MFA for all new accounts (HIGH - 4 hours)'
evidence_gaps: 2
```
````
## Recommendations Summary
- **Release Blocker:** None ✅
- **High Priority:** 1 (Enforce MFA before release)
- **Medium Priority:** 1 (Migrate existing users to MFA)
- **Next Steps:** Address HIGH priority item, then proceed to gate workflow
```
---
## Validation Checklist
Before completing this workflow, verify:
- ✅ All NFR categories assessed (performance, security, reliability, maintainability, custom)
- ✅ Thresholds defined or marked as UNKNOWN
- ✅ Evidence gathered for each NFR (or marked as MISSING)
- ✅ Status classified deterministically (PASS/CONCERNS/FAIL)
- ✅ No thresholds were guessed (marked as CONCERNS if unknown)
- ✅ Quick wins identified for CONCERNS/FAIL
- ✅ Recommended actions are specific and actionable
- ✅ Evidence gaps documented with owners and deadlines
- ✅ NFR assessment report generated and saved
- ✅ Gate YAML snippet generated (if enabled)
- ✅ Evidence checklist generated (if enabled)
---
## Notes
- **Never Guess Thresholds:** If a threshold is unknown, mark as CONCERNS and recommend defining it
- **Evidence-Based:** Every assessment must be backed by evidence (tests, metrics, logs, CI results)
- **Deterministic Rules:** Use consistent PASS/CONCERNS/FAIL classification based on evidence
- **Actionable Recommendations:** Provide specific steps, not generic advice
- **Gate Integration:** Generate YAML snippets that can be consumed by CI/CD pipelines
---
## Troubleshooting
### "NFR thresholds not defined"
- Check tech-spec.md for NFR requirements
- Check PRD.md for product-level SLAs
- Check story file for feature-specific requirements
- If thresholds truly unknown, mark as CONCERNS and recommend defining them
### "No evidence found"
- Check evidence directories (test-results, metrics, logs)
- Check CI/CD pipeline for test results
- If evidence truly missing, mark NFR as "NO EVIDENCE" and recommend generating it
### "CONCERNS status but no threshold exceeded"
- CONCERNS is correct when threshold is UNKNOWN or evidence is MISSING/INCOMPLETE
- CONCERNS is also correct when evidence is close to threshold (within 10%)
- Document why CONCERNS was assigned
### "FAIL status blocks release"
- This is intentional - FAIL means critical NFR not met
- Recommend remediation actions with specific steps
- Re-run assessment after remediation
---
## Related Workflows
- **testarch-test-design** - Define NFR requirements and test plan
- **testarch-framework** - Set up performance/security testing frameworks
- **testarch-ci** - Configure CI/CD for NFR validation
- **testarch-gate** - Use NFR assessment as input for quality gate decisions
- **testarch-test-review** - Review test quality (maintainability NFR)
---
<!-- Powered by BMAD-CORE™ -->
```

View File

@@ -1,445 +0,0 @@
# NFR Assessment - {FEATURE_NAME}
**Date:** {DATE}
**Story:** {STORY_ID} (if applicable)
**Overall Status:** {OVERALL_STATUS} {STATUS_ICON}
---
Note: This assessment summarizes existing evidence; it does not run tests or CI workflows.
## Executive Summary
**Assessment:** {PASS_COUNT} PASS, {CONCERNS_COUNT} CONCERNS, {FAIL_COUNT} FAIL
**Blockers:** {BLOCKER_COUNT} {BLOCKER_DESCRIPTION}
**High Priority Issues:** {HIGH_PRIORITY_COUNT} {HIGH_PRIORITY_DESCRIPTION}
**Recommendation:** {OVERALL_RECOMMENDATION}
---
## Performance Assessment
### Response Time (p95)
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE}
- **Actual:** {ACTUAL_VALUE}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
### Throughput
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE}
- **Actual:** {ACTUAL_VALUE}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
### Resource Usage
- **CPU Usage**
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE}
- **Actual:** {ACTUAL_VALUE}
- **Evidence:** {EVIDENCE_SOURCE}
- **Memory Usage**
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE}
- **Actual:** {ACTUAL_VALUE}
- **Evidence:** {EVIDENCE_SOURCE}
### Scalability
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
---
## Security Assessment
### Authentication Strength
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
- **Recommendation:** {RECOMMENDATION} (if CONCERNS or FAIL)
### Authorization Controls
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
### Data Protection
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
### Vulnerability Management
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION} (e.g., "0 critical, <3 high vulnerabilities")
- **Actual:** {ACTUAL_DESCRIPTION} (e.g., "0 critical, 1 high, 5 medium vulnerabilities")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Snyk scan results - scan-2025-10-14.json")
- **Findings:** {FINDINGS_DESCRIPTION}
### Compliance (if applicable)
- **Status:** {STATUS} {STATUS_ICON}
- **Standards:** {COMPLIANCE_STANDARDS} (e.g., "GDPR, HIPAA, PCI-DSS")
- **Actual:** {ACTUAL_COMPLIANCE_STATUS}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
---
## Reliability Assessment
### Availability (Uptime)
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., "99.9%")
- **Actual:** {ACTUAL_VALUE} (e.g., "99.95%")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Uptime monitoring - uptime-report-2025-10-14.csv")
- **Findings:** {FINDINGS_DESCRIPTION}
### Error Rate
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., "<0.1%")
- **Actual:** {ACTUAL_VALUE} (e.g., "0.05%")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Error logs - logs/errors-2025-10.log")
- **Findings:** {FINDINGS_DESCRIPTION}
### MTTR (Mean Time To Recovery)
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., "<15 minutes")
- **Actual:** {ACTUAL_VALUE} (e.g., "12 minutes")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Incident reports - incidents/")
- **Findings:** {FINDINGS_DESCRIPTION}
### Fault Tolerance
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
### CI Burn-In (Stability)
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., "100 consecutive successful runs")
- **Actual:** {ACTUAL_VALUE} (e.g., "150 consecutive successful runs")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CI burn-in results - ci-burn-in-2025-10-14.log")
- **Findings:** {FINDINGS_DESCRIPTION}
### Disaster Recovery (if applicable)
- **RTO (Recovery Time Objective)**
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE}
- **Actual:** {ACTUAL_VALUE}
- **Evidence:** {EVIDENCE_SOURCE}
- **RPO (Recovery Point Objective)**
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE}
- **Actual:** {ACTUAL_VALUE}
- **Evidence:** {EVIDENCE_SOURCE}
---
## Maintainability Assessment
### Test Coverage
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=80%")
- **Actual:** {ACTUAL_VALUE} (e.g., "87%")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Coverage report - coverage/lcov-report/index.html")
- **Findings:** {FINDINGS_DESCRIPTION}
### Code Quality
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=85/100")
- **Actual:** {ACTUAL_VALUE} (e.g., "92/100")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "SonarQube analysis - sonarqube-report-2025-10-14.pdf")
- **Findings:** {FINDINGS_DESCRIPTION}
### Technical Debt
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., "<5% debt ratio")
- **Actual:** {ACTUAL_VALUE} (e.g., "3.2% debt ratio")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "CodeClimate analysis - codeclimate-2025-10-14.json")
- **Findings:** {FINDINGS_DESCRIPTION}
### Documentation Completeness
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_VALUE} (e.g., ">=90%")
- **Actual:** {ACTUAL_VALUE} (e.g., "95%")
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Documentation audit - docs-audit-2025-10-14.md")
- **Findings:** {FINDINGS_DESCRIPTION}
### Test Quality (from test-review, if available)
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE} (e.g., "Test review report - test-review-2025-10-14.md")
- **Findings:** {FINDINGS_DESCRIPTION}
---
## Custom NFR Assessments (if applicable)
### {CUSTOM_NFR_NAME_1}
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
### {CUSTOM_NFR_NAME_2}
- **Status:** {STATUS} {STATUS_ICON}
- **Threshold:** {THRESHOLD_DESCRIPTION}
- **Actual:** {ACTUAL_DESCRIPTION}
- **Evidence:** {EVIDENCE_SOURCE}
- **Findings:** {FINDINGS_DESCRIPTION}
---
## Quick Wins
{QUICK_WIN_COUNT} quick wins identified for immediate implementation:
1. **{QUICK_WIN_TITLE_1}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
- {QUICK_WIN_DESCRIPTION}
- No code changes needed / Minimal code changes
2. **{QUICK_WIN_TITLE_2}** ({NFR_CATEGORY}) - {PRIORITY} - {ESTIMATED_EFFORT}
- {QUICK_WIN_DESCRIPTION}
---
## Recommended Actions
### Immediate (Before Release) - CRITICAL/HIGH Priority
1. **{ACTION_TITLE_1}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
- {ACTION_DESCRIPTION}
- {SPECIFIC_STEPS}
- {VALIDATION_CRITERIA}
2. **{ACTION_TITLE_2}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
- {ACTION_DESCRIPTION}
- {SPECIFIC_STEPS}
- {VALIDATION_CRITERIA}
### Short-term (Next Sprint) - MEDIUM Priority
1. **{ACTION_TITLE_3}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
- {ACTION_DESCRIPTION}
2. **{ACTION_TITLE_4}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
- {ACTION_DESCRIPTION}
### Long-term (Backlog) - LOW Priority
1. **{ACTION_TITLE_5}** - {PRIORITY} - {ESTIMATED_EFFORT} - {OWNER}
- {ACTION_DESCRIPTION}
---
## Monitoring Hooks
{MONITORING_HOOK_COUNT} monitoring hooks recommended to detect issues before failures:
### Performance Monitoring
- [ ] {MONITORING_TOOL_1} - {MONITORING_DESCRIPTION}
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
- [ ] {MONITORING_TOOL_2} - {MONITORING_DESCRIPTION}
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
### Security Monitoring
- [ ] {MONITORING_TOOL_3} - {MONITORING_DESCRIPTION}
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
### Reliability Monitoring
- [ ] {MONITORING_TOOL_4} - {MONITORING_DESCRIPTION}
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
### Alerting Thresholds
- [ ] {ALERT_DESCRIPTION} - Notify when {THRESHOLD_CONDITION}
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
---
## Fail-Fast Mechanisms
{FAIL_FAST_COUNT} fail-fast mechanisms recommended to prevent failures:
### Circuit Breakers (Reliability)
- [ ] {CIRCUIT_BREAKER_DESCRIPTION}
- **Owner:** {OWNER}
- **Estimated Effort:** {EFFORT}
### Rate Limiting (Performance)
- [ ] {RATE_LIMITING_DESCRIPTION}
- **Owner:** {OWNER}
- **Estimated Effort:** {EFFORT}
### Validation Gates (Security)
- [ ] {VALIDATION_GATE_DESCRIPTION}
- **Owner:** {OWNER}
- **Estimated Effort:** {EFFORT}
### Smoke Tests (Maintainability)
- [ ] {SMOKE_TEST_DESCRIPTION}
- **Owner:** {OWNER}
- **Estimated Effort:** {EFFORT}
---
## Evidence Gaps
{EVIDENCE_GAP_COUNT} evidence gaps identified - action required:
- [ ] **{NFR_NAME_1}** ({NFR_CATEGORY})
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
- **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
- **Impact:** {IMPACT_DESCRIPTION}
- [ ] **{NFR_NAME_2}** ({NFR_CATEGORY})
- **Owner:** {OWNER}
- **Deadline:** {DEADLINE}
- **Suggested Evidence:** {SUGGESTED_EVIDENCE_SOURCE}
- **Impact:** {IMPACT_DESCRIPTION}
---
## Findings Summary
| Category | PASS | CONCERNS | FAIL | Overall Status |
| --------------- | ---------------- | -------------------- | ---------------- | ----------------------------------- |
| Performance | {P_PASS_COUNT} | {P_CONCERNS_COUNT} | {P_FAIL_COUNT} | {P_STATUS} {P_ICON} |
| Security | {S_PASS_COUNT} | {S_CONCERNS_COUNT} | {S_FAIL_COUNT} | {S_STATUS} {S_ICON} |
| Reliability | {R_PASS_COUNT} | {R_CONCERNS_COUNT} | {R_FAIL_COUNT} | {R_STATUS} {R_ICON} |
| Maintainability | {M_PASS_COUNT} | {M_CONCERNS_COUNT} | {M_FAIL_COUNT} | {M_STATUS} {M_ICON} |
| **Total** | **{TOTAL_PASS}** | **{TOTAL_CONCERNS}** | **{TOTAL_FAIL}** | **{OVERALL_STATUS} {OVERALL_ICON}** |
---
## Gate YAML Snippet
```yaml
nfr_assessment:
date: '{DATE}'
story_id: '{STORY_ID}'
feature_name: '{FEATURE_NAME}'
categories:
performance: '{PERFORMANCE_STATUS}'
security: '{SECURITY_STATUS}'
reliability: '{RELIABILITY_STATUS}'
maintainability: '{MAINTAINABILITY_STATUS}'
overall_status: '{OVERALL_STATUS}'
critical_issues: { CRITICAL_COUNT }
high_priority_issues: { HIGH_COUNT }
medium_priority_issues: { MEDIUM_COUNT }
concerns: { CONCERNS_COUNT }
blockers: { BLOCKER_BOOLEAN } # true/false
quick_wins: { QUICK_WIN_COUNT }
evidence_gaps: { EVIDENCE_GAP_COUNT }
recommendations:
- '{RECOMMENDATION_1}'
- '{RECOMMENDATION_2}'
- '{RECOMMENDATION_3}'
```
---
## Related Artifacts
- **Story File:** {STORY_FILE_PATH} (if applicable)
- **Tech Spec:** {TECH_SPEC_PATH} (if available)
- **PRD:** {PRD_PATH} (if available)
- **Test Design:** {TEST_DESIGN_PATH} (if available)
- **Evidence Sources:**
- Test Results: {TEST_RESULTS_DIR}
- Metrics: {METRICS_DIR}
- Logs: {LOGS_DIR}
- CI Results: {CI_RESULTS_PATH}
---
## Recommendations Summary
**Release Blocker:** {RELEASE_BLOCKER_SUMMARY}
**High Priority:** {HIGH_PRIORITY_SUMMARY}
**Medium Priority:** {MEDIUM_PRIORITY_SUMMARY}
**Next Steps:** {NEXT_STEPS_DESCRIPTION}
---
## Sign-Off
**NFR Assessment:**
- Overall Status: {OVERALL_STATUS} {OVERALL_ICON}
- Critical Issues: {CRITICAL_COUNT}
- High Priority Issues: {HIGH_COUNT}
- Concerns: {CONCERNS_COUNT}
- Evidence Gaps: {EVIDENCE_GAP_COUNT}
**Gate Status:** {GATE_STATUS} {GATE_ICON}
**Next Actions:**
- If PASS ✅: Proceed to `*gate` workflow or release
- If CONCERNS ⚠️: Address HIGH/CRITICAL issues, re-run `*nfr-assess`
- If FAIL ❌: Resolve FAIL status NFRs, re-run `*nfr-assess`
**Generated:** {DATE}
**Workflow:** testarch-nfr v4.0
---
<!-- Powered by BMAD-CORE™ -->

View File

@@ -1,47 +0,0 @@
# Test Architect workflow: nfr-assess
name: testarch-nfr
description: "Assess non-functional requirements (performance, security, reliability, maintainability) before release with evidence-based validation"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/nfr-assess"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: "{installed_path}/nfr-report-template.md"
# Variables and inputs
variables:
# NFR category assessment (defaults to all categories)
custom_nfr_categories: "" # Optional additional categories beyond standard (security, performance, reliability, maintainability)
# Output configuration
default_output_file: "{output_folder}/nfr-assessment.md"
# Required tools
required_tools:
- read_file # Read story, test results, metrics, logs, BMad artifacts
- write_file # Create NFR assessment, gate YAML, evidence checklist
- list_files # Discover test results, metrics, logs
- search_repo # Find NFR-related tests and evidence
- glob # Find result files matching patterns
tags:
- qa
- nfr
- test-architect
- performance
- security
- reliability
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true

View File

@@ -1,235 +0,0 @@
# Test Design and Risk Assessment - Validation Checklist
## Prerequisites
- [ ] Story markdown with clear acceptance criteria exists
- [ ] PRD or epic documentation available
- [ ] Architecture documents available (optional)
- [ ] Requirements are testable and unambiguous
## Process Steps
### Step 1: Context Loading
- [ ] PRD.md read and requirements extracted
- [ ] Epics.md or specific epic documentation loaded
- [ ] Story markdown with acceptance criteria analyzed
- [ ] Architecture documents reviewed (if available)
- [ ] Existing test coverage analyzed
- [ ] Knowledge base fragments loaded (risk-governance, probability-impact, test-levels, test-priorities)
### Step 2: Risk Assessment
- [ ] Genuine risks identified (not just features)
- [ ] Risks classified by category (TECH/SEC/PERF/DATA/BUS/OPS)
- [ ] Probability scored (1-3 for each risk)
- [ ] Impact scored (1-3 for each risk)
- [ ] Risk scores calculated (probability × impact)
- [ ] High-priority risks (score ≥6) flagged
- [ ] Mitigation plans defined for high-priority risks
- [ ] Owners assigned for each mitigation
- [ ] Timelines set for mitigations
- [ ] Residual risk documented
### Step 3: Coverage Design
- [ ] Acceptance criteria broken into atomic scenarios
- [ ] Test levels selected (E2E/API/Component/Unit)
- [ ] No duplicate coverage across levels
- [ ] Priority levels assigned (P0/P1/P2/P3)
- [ ] P0 scenarios meet strict criteria (blocks core + high risk + no workaround)
- [ ] Data prerequisites identified
- [ ] Tooling requirements documented
- [ ] Execution order defined (smoke → P0 → P1 → P2/P3)
### Step 4: Deliverables Generation
- [ ] Risk assessment matrix created
- [ ] Coverage matrix created
- [ ] Execution order documented
- [ ] Resource estimates calculated
- [ ] Quality gate criteria defined
- [ ] Output file written to correct location
- [ ] Output file uses template structure
## Output Validation
### Risk Assessment Matrix
- [ ] All risks have unique IDs (R-001, R-002, etc.)
- [ ] Each risk has category assigned
- [ ] Probability values are 1, 2, or 3
- [ ] Impact values are 1, 2, or 3
- [ ] Scores calculated correctly (P × I)
- [ ] High-priority risks (≥6) clearly marked
- [ ] Mitigation strategies specific and actionable
### Coverage Matrix
- [ ] All requirements mapped to test levels
- [ ] Priorities assigned to all scenarios
- [ ] Risk linkage documented
- [ ] Test counts realistic
- [ ] Owners assigned where applicable
- [ ] No duplicate coverage (same behavior at multiple levels)
### Execution Order
- [ ] Smoke tests defined (<5 min target)
- [ ] P0 tests listed (<10 min target)
- [ ] P1 tests listed (<30 min target)
- [ ] P2/P3 tests listed (<60 min target)
- [ ] Order optimizes for fast feedback
### Resource Estimates
- [ ] P0 hours calculated (count × 2 hours)
- [ ] P1 hours calculated (count × 1 hour)
- [ ] P2 hours calculated (count × 0.5 hours)
- [ ] P3 hours calculated (count × 0.25 hours)
- [ ] Total hours summed
- [ ] Days estimate provided (hours / 8)
- [ ] Estimates include setup time
### Quality Gate Criteria
- [ ] P0 pass rate threshold defined (should be 100%)
- [ ] P1 pass rate threshold defined (typically 95%)
- [ ] High-risk mitigation completion required
- [ ] Coverage targets specified (≥80% recommended)
## Quality Checks
### Evidence-Based Assessment
- [ ] Risk assessment based on documented evidence
- [ ] No speculation on business impact
- [ ] Assumptions clearly documented
- [ ] Clarifications requested where needed
- [ ] Historical data referenced where available
### Risk Classification Accuracy
- [ ] TECH risks are architecture/integration issues
- [ ] SEC risks are security vulnerabilities
- [ ] PERF risks are performance/scalability concerns
- [ ] DATA risks are data integrity issues
- [ ] BUS risks are business/revenue impacts
- [ ] OPS risks are deployment/operational issues
### Priority Assignment Accuracy
- [ ] P0: Truly blocks core functionality
- [ ] P0: High-risk (score 6)
- [ ] P0: No workaround exists
- [ ] P1: Important but not blocking
- [ ] P2/P3: Nice-to-have or edge cases
### Test Level Selection
- [ ] E2E used only for critical paths
- [ ] API tests cover complex business logic
- [ ] Component tests for UI interactions
- [ ] Unit tests for edge cases and algorithms
- [ ] No redundant coverage
## Integration Points
### Knowledge Base Integration
- [ ] risk-governance.md consulted
- [ ] probability-impact.md applied
- [ ] test-levels-framework.md referenced
- [ ] test-priorities-matrix.md used
- [ ] Additional fragments loaded as needed
### Status File Integration
- [ ] bmm-workflow-status.md exists
- [ ] Test design logged in Quality & Testing Progress
- [ ] Epic number and scope documented
- [ ] Completion timestamp recorded
### Workflow Dependencies
- [ ] Can proceed to `*atdd` workflow with P0 scenarios
- [ ] `*atdd` is a separate workflow and must be run explicitly (not auto-run)
- [ ] Can proceed to `automate` workflow with full coverage plan
- [ ] Risk assessment informs `gate` workflow criteria
- [ ] Integrates with `ci` workflow execution order
## Completion Criteria
**All must be true:**
- [ ] All prerequisites met
- [ ] All process steps completed
- [ ] All output validations passed
- [ ] All quality checks passed
- [ ] All integration points verified
- [ ] Output file complete and well-formatted
- [ ] Team review scheduled (if required)
## Post-Workflow Actions
**User must complete:**
1. [ ] Review risk assessment with team
2. [ ] Prioritize mitigation for high-priority risks (score 6)
3. [ ] Allocate resources per estimates
4. [ ] Run `*atdd` workflow to generate P0 tests (separate workflow; not auto-run)
5. [ ] Set up test data factories and fixtures
6. [ ] Schedule team review of test design document
**Recommended next workflows:**
1. [ ] Run `atdd` workflow for P0 test generation
2. [ ] Run `framework` workflow if not already done
3. [ ] Run `ci` workflow to configure pipeline stages
## Rollback Procedure
If workflow fails:
1. [ ] Delete output file
2. [ ] Review error logs
3. [ ] Fix missing context (PRD, architecture docs)
4. [ ] Clarify ambiguous requirements
5. [ ] Retry workflow
## Notes
### Common Issues
**Issue**: Too many P0 tests
- **Solution**: Apply strict P0 criteria - must block core AND high risk AND no workaround
**Issue**: Risk scores all high
- **Solution**: Differentiate between high-impact (3) and degraded (2) impacts
**Issue**: Duplicate coverage across levels
- **Solution**: Use test pyramid - E2E for critical paths only
**Issue**: Resource estimates too high
- **Solution**: Invest in fixtures/factories to reduce per-test setup time
### Best Practices
- Base risk assessment on evidence, not assumptions
- High-priority risks (≥6) require immediate mitigation
- P0 tests should cover <10% of total scenarios
- Avoid testing same behavior at multiple levels
- Include smoke tests (P0 subset) for fast feedback
---
**Checklist Complete**: Sign off when all items validated.
**Completed by:** {name}
**Date:** {date}
**Epic:** {epic title}
**Notes:** {additional notes}

View File

@@ -1,788 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Test Design and Risk Assessment
**Workflow ID**: `_bmad/bmm/testarch/test-design`
**Version**: 4.0 (BMad v6)
---
## Overview
Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow operates in **two modes**:
- **System-Level Mode (Phase 3)**: Testability review of architecture before solutioning gate check
- **Epic-Level Mode (Phase 4)**: Per-epic test planning with risk assessment (current behavior)
The workflow auto-detects which mode to use based on project phase.
---
## Preflight: Detect Mode and Load Context
**Critical:** Determine mode before proceeding.
### Mode Detection
1. **Check for sprint-status.yaml**
- If `{implementation_artifacts}/sprint-status.yaml` exists → **Epic-Level Mode** (Phase 4)
- If NOT exists → Check workflow status
2. **Check workflow-status.yaml**
- Read `{planning_artifacts}/bmm-workflow-status.yaml`
- If `implementation-readiness: required` or `implementation-readiness: recommended`**System-Level Mode** (Phase 3)
- Otherwise → **Epic-Level Mode** (Phase 4 without sprint status yet)
3. **Mode-Specific Requirements**
**System-Level Mode (Phase 3 - Testability Review):**
- ✅ Architecture document exists (architecture.md or tech-spec)
- ✅ PRD exists with functional and non-functional requirements
- ✅ Epics documented (epics.md)
- ⚠️ Output: `{output_folder}/test-design-system.md`
**Epic-Level Mode (Phase 4 - Per-Epic Planning):**
- ✅ Story markdown with acceptance criteria available
- ✅ PRD or epic documentation exists for context
- ✅ Architecture documents available (optional but recommended)
- ✅ Requirements are clear and testable
- ⚠️ Output: `{output_folder}/test-design-epic-{epic_num}.md`
**Halt Condition:** If mode cannot be determined or required files missing, HALT and notify user with missing prerequisites.
---
## Step 1: Load Context (Mode-Aware)
**Mode-Specific Loading:**
### System-Level Mode (Phase 3)
1. **Read Architecture Documentation**
- Load architecture.md or tech-spec (REQUIRED)
- Load PRD.md for functional and non-functional requirements
- Load epics.md for feature scope
- Identify technology stack decisions (frameworks, databases, deployment targets)
- Note integration points and external system dependencies
- Extract NFR requirements (performance SLOs, security requirements, etc.)
2. **Check Playwright Utils Flag**
Read `{config_source}` and check `config.tea_use_playwright_utils`.
If true, note that `@seontechnologies/playwright-utils` provides utilities for test implementation. Reference in test design where relevant.
3. **Load Knowledge Base Fragments (System-Level)**
**Critical:** Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` to load:
- `nfr-criteria.md` - NFR validation approach (security, performance, reliability, maintainability)
- `test-levels-framework.md` - Test levels strategy guidance
- `risk-governance.md` - Testability risk identification
- `test-quality.md` - Quality standards and Definition of Done
4. **Analyze Existing Test Setup (if brownfield)**
- Search for existing test directories
- Identify current test framework (if any)
- Note testability concerns in existing codebase
### Epic-Level Mode (Phase 4)
1. **Read Requirements Documentation**
- Load PRD.md for high-level product requirements
- Read epics.md or specific epic for feature scope
- Read story markdown for detailed acceptance criteria
- Identify all testable requirements
2. **Load Architecture Context**
- Read architecture.md for system design
- Read tech-spec for implementation details
- Read test-design-system.md (if exists from Phase 3)
- Identify technical constraints and dependencies
- Note integration points and external systems
3. **Analyze Existing Test Coverage**
- Search for existing test files in `{test_dir}`
- Identify coverage gaps
- Note areas with insufficient testing
- Check for flaky or outdated tests
4. **Load Knowledge Base Fragments (Epic-Level)**
**Critical:** Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` to load:
- `risk-governance.md` - Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples)
- `probability-impact.md` - Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples)
- `test-levels-framework.md` - Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples)
- `test-priorities-matrix.md` - P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples)
**Halt Condition (Epic-Level only):** If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Epic-level test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration"
---
## Step 1.5: System-Level Testability Review (Phase 3 Only)
**Skip this step if Epic-Level Mode.** This step only executes in System-Level Mode.
### Actions
1. **Review Architecture for Testability**
Evaluate architecture against these criteria:
**Controllability:**
- Can we control system state for testing? (API seeding, factories, database reset)
- Are external dependencies mockable? (interfaces, dependency injection)
- Can we trigger error conditions? (chaos engineering, fault injection)
**Observability:**
- Can we inspect system state? (logging, metrics, traces)
- Are test results deterministic? (no race conditions, clear success/failure)
- Can we validate NFRs? (performance metrics, security audit logs)
**Reliability:**
- Are tests isolated? (parallel-safe, stateless, cleanup discipline)
- Can we reproduce failures? (deterministic waits, HAR capture, seed data)
- Are components loosely coupled? (mockable, testable boundaries)
2. **Identify Architecturally Significant Requirements (ASRs)**
From PRD NFRs and architecture decisions, identify quality requirements that:
- Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
- Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
- Require special test environments (e.g., "Multi-region deployment" → regional test instances)
Score each ASR using risk matrix (probability × impact).
3. **Define Test Levels Strategy**
Based on architecture (mobile, web, API, microservices, monolith):
- Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
- Identify test environment needs (local, staging, ephemeral, production-like)
- Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)
4. **Assess NFR Testing Approach**
For each NFR category:
- **Security**: Auth/authz tests, OWASP validation, secret handling (Playwright E2E + security tools)
- **Performance**: Load/stress/spike testing with k6, SLO/SLA thresholds
- **Reliability**: Error handling, retries, circuit breakers, health checks (Playwright + API tests)
- **Maintainability**: Coverage targets, code quality gates, observability validation
5. **Flag Testability Concerns**
Identify architecture decisions that harm testability:
- ❌ Tight coupling (no interfaces, hard dependencies)
- ❌ No dependency injection (can't mock external services)
- ❌ Hardcoded configurations (can't test different envs)
- ❌ Missing observability (can't validate NFRs)
- ❌ Stateful designs (can't parallelize tests)
**Critical:** If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.
6. **Output System-Level Test Design**
Write to `{output_folder}/test-design-system.md` containing:
```markdown
# System-Level Test Design
## Testability Assessment
- Controllability: [PASS/CONCERNS/FAIL with details]
- Observability: [PASS/CONCERNS/FAIL with details]
- Reliability: [PASS/CONCERNS/FAIL with details]
## Architecturally Significant Requirements (ASRs)
[Risk-scored quality requirements]
## Test Levels Strategy
- Unit: [X%] - [Rationale]
- Integration: [Y%] - [Rationale]
- E2E: [Z%] - [Rationale]
## NFR Testing Approach
- Security: [Approach with tools]
- Performance: [Approach with tools]
- Reliability: [Approach with tools]
- Maintainability: [Approach with tools]
## Test Environment Requirements
[Infrastructure needs based on deployment architecture]
## Testability Concerns (if any)
[Blockers or concerns that should inform solutioning gate check]
## Recommendations for Sprint 0
[Specific actions for *framework and *ci workflows]
```
**After System-Level Mode:** Skip to Step 4 (Generate Deliverables) - Steps 2-3 are epic-level only.
---
## Step 1.6: Exploratory Mode Selection (Epic-Level Only)
### Actions
1. **Detect Planning Mode**
Determine mode based on context:
**Requirements-Based Mode (DEFAULT)**:
- Have clear story/PRD with acceptance criteria
- Uses: Existing workflow (Steps 2-4)
- Appropriate for: Documented features, greenfield projects
**Exploratory Mode (OPTIONAL - Brownfield)**:
- Missing/incomplete requirements AND brownfield application exists
- Uses: UI exploration to discover functionality
- Appropriate for: Undocumented brownfield apps, legacy systems
2. **Requirements-Based Mode (DEFAULT - Skip to Step 2)**
If requirements are clear:
- Continue with existing workflow (Step 2: Assess and Classify Risks)
- Use loaded requirements from Step 1
- Proceed with risk assessment based on documented requirements
3. **Exploratory Mode (OPTIONAL - Brownfield Apps)**
If exploring brownfield application:
**A. Check MCP Availability**
If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available:
- Use MCP-assisted exploration (Step 3.B)
If MCP unavailable OR config.tea_use_mcp_enhancements is false:
- Use manual exploration fallback (Step 3.C)
**B. MCP-Assisted Exploration (If MCP Tools Available)**
Use Playwright MCP browser tools to explore UI:
**Setup:**
```
1. Use planner_setup_page to initialize browser
2. Navigate to {exploration_url}
3. Capture initial state with browser_snapshot
```
**Exploration Process:**
```
4. Use browser_navigate to explore different pages
5. Use browser_click to interact with buttons, links, forms
6. Use browser_hover to reveal hidden menus/tooltips
7. Capture browser_snapshot at each significant state
8. Take browser_screenshot for documentation
9. Monitor browser_console_messages for JavaScript errors
10. Track browser_network_requests to identify API calls
11. Map user flows and interactive elements
12. Document discovered functionality
```
**Discovery Documentation:**
- Create list of discovered features (pages, workflows, forms)
- Identify user journeys (navigation paths)
- Map API endpoints (from network requests)
- Note error states (from console messages)
- Capture screenshots for visual reference
**Convert to Test Scenarios:**
- Transform discoveries into testable requirements
- Prioritize based on user flow criticality
- Identify risks from discovered functionality
- Continue with Step 2 (Assess and Classify Risks) using discovered requirements
**C. Manual Exploration Fallback (If MCP Unavailable)**
If Playwright MCP is not available:
**Notify User:**
```markdown
Exploratory mode enabled but Playwright MCP unavailable.
**Manual exploration required:**
1. Open application at: {exploration_url}
2. Explore all pages, workflows, and features
3. Document findings in markdown:
- List of pages/features discovered
- User journeys identified
- API endpoints observed (DevTools Network tab)
- JavaScript errors noted (DevTools Console)
- Critical workflows mapped
4. Provide exploration findings to continue workflow
**Alternative:** Disable exploratory_mode and provide requirements documentation
```
Wait for user to provide exploration findings, then:
- Parse user-provided discovery documentation
- Convert to testable requirements
- Continue with Step 2 (risk assessment)
4. **Proceed to Risk Assessment**
After mode selection (Requirements-Based OR Exploratory):
- Continue to Step 2: Assess and Classify Risks
- Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory)
---
## Step 2: Assess and Classify Risks
### Actions
1. **Identify Genuine Risks**
Filter requirements to isolate actual risks (not just features):
- Unresolved technical gaps
- Security vulnerabilities
- Performance bottlenecks
- Data loss or corruption potential
- Business impact failures
- Operational deployment issues
2. **Classify Risks by Category**
Use these standard risk categories:
**TECH** (Technical/Architecture):
- Architecture flaws
- Integration failures
- Scalability issues
- Technical debt
**SEC** (Security):
- Missing access controls
- Authentication bypass
- Data exposure
- Injection vulnerabilities
**PERF** (Performance):
- SLA violations
- Response time degradation
- Resource exhaustion
- Scalability limits
**DATA** (Data Integrity):
- Data loss
- Data corruption
- Inconsistent state
- Migration failures
**BUS** (Business Impact):
- User experience degradation
- Business logic errors
- Revenue impact
- Compliance violations
**OPS** (Operations):
- Deployment failures
- Configuration errors
- Monitoring gaps
- Rollback issues
3. **Score Risk Probability**
Rate likelihood (1-3):
- **1 (Unlikely)**: <10% chance, edge case
- **2 (Possible)**: 10-50% chance, known scenario
- **3 (Likely)**: >50% chance, common occurrence
4. **Score Risk Impact**
Rate severity (1-3):
- **1 (Minor)**: Cosmetic, workaround exists, limited users
- **2 (Degraded)**: Feature impaired, workaround difficult, affects many users
- **3 (Critical)**: System failure, data loss, no workaround, blocks usage
5. **Calculate Risk Score**
```
Risk Score = Probability × Impact
Scores:
1-2: Low risk (monitor)
3-4: Medium risk (plan mitigation)
6-9: High risk (immediate mitigation required)
```
6. **Highlight High-Priority Risks**
Flag all risks with score ≥6 for immediate attention.
7. **Request Clarification**
If evidence is missing or assumptions required:
- Document assumptions clearly
- Request user clarification
- Do NOT speculate on business impact
8. **Plan Mitigations**
For each high-priority risk:
- Define mitigation strategy
- Assign owner (dev, QA, ops)
- Set timeline
- Update residual risk expectation
---
## Step 3: Design Test Coverage
### Actions
1. **Break Down Acceptance Criteria**
Convert each acceptance criterion into atomic test scenarios:
- One scenario per testable behavior
- Scenarios are independent
- Scenarios are repeatable
- Scenarios tie back to risk mitigations
2. **Select Appropriate Test Levels**
**Knowledge Base Reference**: `test-levels-framework.md`
Map requirements to optimal test levels (avoid duplication):
**E2E (End-to-End)**:
- Critical user journeys
- Multi-system integration
- Production-like environment
- Highest confidence, slowest execution
**API (Integration)**:
- Service contracts
- Business logic validation
- Fast feedback
- Good for complex scenarios
**Component**:
- UI component behavior
- Interaction testing
- Visual regression
- Fast, isolated
**Unit**:
- Business logic
- Edge cases
- Error handling
- Fastest, most granular
**Avoid duplicate coverage**: Don't test same behavior at multiple levels unless necessary.
3. **Assign Priority Levels**
**Knowledge Base Reference**: `test-priorities-matrix.md`
**P0 (Critical)**:
- Blocks core user journey
- High-risk areas (score ≥6)
- Revenue-impacting
- Security-critical
- **Run on every commit**
**P1 (High)**:
- Important user features
- Medium-risk areas (score 3-4)
- Common workflows
- **Run on PR to main**
**P2 (Medium)**:
- Secondary features
- Low-risk areas (score 1-2)
- Edge cases
- **Run nightly or weekly**
**P3 (Low)**:
- Nice-to-have
- Exploratory
- Performance benchmarks
- **Run on-demand**
4. **Outline Data and Tooling Prerequisites**
For each test scenario, identify:
- Test data requirements (factories, fixtures)
- External services (mocks, stubs)
- Environment setup
- Tools and dependencies
5. **Define Execution Order**
Recommend test execution sequence:
1. **Smoke tests** (P0 subset, <5 min)
2. **P0 tests** (critical paths, <10 min)
3. **P1 tests** (important features, <30 min)
4. **P2/P3 tests** (full regression, <60 min)
---
## Step 4: Generate Deliverables
### Actions
1. **Create Risk Assessment Matrix**
Use template structure:
```markdown
| Risk ID | Category | Description | Probability | Impact | Score | Mitigation |
| ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- |
| R-001 | SEC | Auth bypass | 2 | 3 | 6 | Add authz check |
```
2. **Create Coverage Matrix**
```markdown
| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
| ----------- | ---------- | -------- | --------- | ---------- | ----- |
| Login flow | E2E | P0 | R-001 | 3 | QA |
```
3. **Document Execution Order**
```markdown
### Smoke Tests (<5 min)
- Login successful
- Dashboard loads
### P0 Tests (<10 min)
- [Full P0 list]
### P1 Tests (<30 min)
- [Full P1 list]
```
4. **Include Resource Estimates**
```markdown
### Test Effort Estimates
- P0 scenarios: 15 tests × 2 hours = 30 hours
- P1 scenarios: 25 tests × 1 hour = 25 hours
- P2 scenarios: 40 tests × 0.5 hour = 20 hours
- **Total:** 75 hours (~10 days)
```
5. **Add Gate Criteria**
```markdown
### Quality Gate Criteria
- All P0 tests pass (100%)
- P1 tests pass rate ≥95%
- No high-risk (score ≥6) items unmitigated
- Test coverage ≥80% for critical paths
```
6. **Write to Output File**
Save to `{output_folder}/test-design-epic-{epic_num}.md` using template structure.
---
## Important Notes
### Risk Category Definitions
**TECH** (Technical/Architecture):
- Architecture flaws or technical debt
- Integration complexity
- Scalability concerns
**SEC** (Security):
- Missing security controls
- Authentication/authorization gaps
- Data exposure risks
**PERF** (Performance):
- SLA risk or performance degradation
- Resource constraints
- Scalability bottlenecks
**DATA** (Data Integrity):
- Data loss or corruption potential
- State consistency issues
- Migration risks
**BUS** (Business Impact):
- User experience harm
- Business logic errors
- Revenue or compliance impact
**OPS** (Operations):
- Deployment or runtime failures
- Configuration issues
- Monitoring/observability gaps
### Risk Scoring Methodology
**Probability × Impact = Risk Score**
Examples:
- High likelihood (3) × Critical impact (3) = **Score 9** (highest priority)
- Possible (2) × Critical (3) = **Score 6** (high priority threshold)
- Unlikely (1) × Minor (1) = **Score 1** (low priority)
**Threshold**: Scores ≥6 require immediate mitigation.
### Test Level Selection Strategy
**Avoid duplication:**
- Don't test same behavior at E2E and API level
- Use E2E for critical paths only
- Use API tests for complex business logic
- Use unit tests for edge cases
**Tradeoffs:**
- E2E: High confidence, slow execution, brittle
- API: Good balance, fast, stable
- Unit: Fastest feedback, narrow scope
### Priority Assignment Guidelines
**P0 criteria** (all must be true):
- Blocks core functionality
- High-risk (score ≥6)
- No workaround exists
- Affects majority of users
**P1 criteria**:
- Important feature
- Medium risk (score 3-5)
- Workaround exists but difficult
**P2/P3**: Everything else, prioritized by value
### Knowledge Base Integration
**Core Fragments (Auto-loaded in Step 1):**
- `risk-governance.md` - Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples)
- `probability-impact.md` - Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples)
- `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples)
- `test-priorities-matrix.md` - P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples)
**Reference for Test Planning:**
- `selective-testing.md` - Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
- `fixture-architecture.md` - Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples)
**Manual Reference (Optional):**
- Use `tea-index.csv` to find additional specialized fragments as needed
### Evidence-Based Assessment
**Critical principle:** Base risk assessment on evidence, not speculation.
**Evidence sources:**
- PRD and user research
- Architecture documentation
- Historical bug data
- User feedback
- Security audit results
**Avoid:**
- Guessing business impact
- Assuming user behavior
- Inventing requirements
**When uncertain:** Document assumptions and request clarification from user.
---
## Output Summary
After completing this workflow, provide a summary:
```markdown
## Test Design Complete
**Epic**: {epic_num}
**Scope**: {design_level}
**Risk Assessment**:
- Total risks identified: {count}
- High-priority risks (≥6): {high_count}
- Categories: {categories}
**Coverage Plan**:
- P0 scenarios: {p0_count} ({p0_hours} hours)
- P1 scenarios: {p1_count} ({p1_hours} hours)
- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
- **Total effort**: {total_hours} hours (~{total_days} days)
**Test Levels**:
- E2E: {e2e_count}
- API: {api_count}
- Component: {component_count}
- Unit: {unit_count}
**Quality Gate Criteria**:
- P0 pass rate: 100%
- P1 pass rate: ≥95%
- High-risk mitigations: 100%
- Coverage: ≥80%
**Output File**: {output_file}
**Next Steps**:
1. Review risk assessment with team
2. Prioritize mitigation for high-risk items (score ≥6)
3. Run `*atdd` to generate failing tests for P0 scenarios (separate workflow; not auto-run by `*test-design`)
4. Allocate resources per effort estimates
5. Set up test data factories and fixtures
```
---
## Validation
After completing all steps, verify:
- [ ] Risk assessment complete with all categories
- [ ] All risks scored (probability × impact)
- [ ] High-priority risks (≥6) flagged
- [ ] Coverage matrix maps requirements to test levels
- [ ] Priority levels assigned (P0-P3)
- [ ] Execution order defined
- [ ] Resource estimates provided
- [ ] Quality gate criteria defined
- [ ] Output file created and formatted correctly
Refer to `checklist.md` for comprehensive validation criteria.

View File

@@ -1,294 +0,0 @@
# Test Design: Epic {epic_num} - {epic_title}
**Date:** {date}
**Author:** {user_name}
**Status:** Draft / Approved
---
## Executive Summary
**Scope:** {design_level} test design for Epic {epic_num}
**Risk Summary:**
- Total risks identified: {total_risks}
- High-priority risks (≥6): {high_priority_count}
- Critical categories: {top_categories}
**Coverage Summary:**
- P0 scenarios: {p0_count} ({p0_hours} hours)
- P1 scenarios: {p1_count} ({p1_hours} hours)
- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
- **Total effort**: {total_hours} hours (~{total_days} days)
---
## Risk Assessment
### High-Priority Risks (Score ≥6)
| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner | Timeline |
| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- | -------- |
| R-001 | SEC | {description} | 2 | 3 | 6 | {mitigation} | {owner} | {date} |
| R-002 | PERF | {description} | 3 | 2 | 6 | {mitigation} | {owner} | {date} |
### Medium-Priority Risks (Score 3-4)
| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner |
| ------- | -------- | ------------- | ----------- | ------ | ----- | ------------ | ------- |
| R-003 | TECH | {description} | 2 | 2 | 4 | {mitigation} | {owner} |
| R-004 | DATA | {description} | 1 | 3 | 3 | {mitigation} | {owner} |
### Low-Priority Risks (Score 1-2)
| Risk ID | Category | Description | Probability | Impact | Score | Action |
| ------- | -------- | ------------- | ----------- | ------ | ----- | ------- |
| R-005 | OPS | {description} | 1 | 2 | 2 | Monitor |
| R-006 | BUS | {description} | 1 | 1 | 1 | Monitor |
### Risk Category Legend
- **TECH**: Technical/Architecture (flaws, integration, scalability)
- **SEC**: Security (access controls, auth, data exposure)
- **PERF**: Performance (SLA violations, degradation, resource limits)
- **DATA**: Data Integrity (loss, corruption, inconsistency)
- **BUS**: Business Impact (UX harm, logic errors, revenue)
- **OPS**: Operations (deployment, config, monitoring)
---
## Test Coverage Plan
### P0 (Critical) - Run on every commit
**Criteria**: Blocks core journey + High risk (≥6) + No workaround
| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
| ------------- | ---------- | --------- | ---------- | ----- | ------- |
| {requirement} | E2E | R-001 | 3 | QA | {notes} |
| {requirement} | API | R-002 | 5 | QA | {notes} |
**Total P0**: {p0_count} tests, {p0_hours} hours
### P1 (High) - Run on PR to main
**Criteria**: Important features + Medium risk (3-4) + Common workflows
| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
| ------------- | ---------- | --------- | ---------- | ----- | ------- |
| {requirement} | API | R-003 | 4 | QA | {notes} |
| {requirement} | Component | - | 6 | DEV | {notes} |
**Total P1**: {p1_count} tests, {p1_hours} hours
### P2 (Medium) - Run nightly/weekly
**Criteria**: Secondary features + Low risk (1-2) + Edge cases
| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
| ------------- | ---------- | --------- | ---------- | ----- | ------- |
| {requirement} | API | R-004 | 8 | QA | {notes} |
| {requirement} | Unit | - | 15 | DEV | {notes} |
**Total P2**: {p2_count} tests, {p2_hours} hours
### P3 (Low) - Run on-demand
**Criteria**: Nice-to-have + Exploratory + Performance benchmarks
| Requirement | Test Level | Test Count | Owner | Notes |
| ------------- | ---------- | ---------- | ----- | ------- |
| {requirement} | E2E | 2 | QA | {notes} |
| {requirement} | Unit | 8 | DEV | {notes} |
**Total P3**: {p3_count} tests, {p3_hours} hours
---
## Execution Order
### Smoke Tests (<5 min)
**Purpose**: Fast feedback, catch build-breaking issues
- [ ] {scenario} (30s)
- [ ] {scenario} (45s)
- [ ] {scenario} (1min)
**Total**: {smoke_count} scenarios
### P0 Tests (<10 min)
**Purpose**: Critical path validation
- [ ] {scenario} (E2E)
- [ ] {scenario} (API)
- [ ] {scenario} (API)
**Total**: {p0_count} scenarios
### P1 Tests (<30 min)
**Purpose**: Important feature coverage
- [ ] {scenario} (API)
- [ ] {scenario} (Component)
**Total**: {p1_count} scenarios
### P2/P3 Tests (<60 min)
**Purpose**: Full regression coverage
- [ ] {scenario} (Unit)
- [ ] {scenario} (API)
**Total**: {p2p3_count} scenarios
---
## Resource Estimates
### Test Development Effort
| Priority | Count | Hours/Test | Total Hours | Notes |
| --------- | ----------------- | ---------- | ----------------- | ----------------------- |
| P0 | {p0_count} | 2.0 | {p0_hours} | Complex setup, security |
| P1 | {p1_count} | 1.0 | {p1_hours} | Standard coverage |
| P2 | {p2_count} | 0.5 | {p2_hours} | Simple scenarios |
| P3 | {p3_count} | 0.25 | {p3_hours} | Exploratory |
| **Total** | **{total_count}** | **-** | **{total_hours}** | **~{total_days} days** |
### Prerequisites
**Test Data:**
- {factory_name} factory (faker-based, auto-cleanup)
- {fixture_name} fixture (setup/teardown)
**Tooling:**
- {tool} for {purpose}
- {tool} for {purpose}
**Environment:**
- {env_requirement}
- {env_requirement}
---
## Quality Gate Criteria
### Pass/Fail Thresholds
- **P0 pass rate**: 100% (no exceptions)
- **P1 pass rate**: ≥95% (waivers required for failures)
- **P2/P3 pass rate**: ≥90% (informational)
- **High-risk mitigations**: 100% complete or approved waivers
### Coverage Targets
- **Critical paths**: ≥80%
- **Security scenarios**: 100%
- **Business logic**: ≥70%
- **Edge cases**: ≥50%
### Non-Negotiable Requirements
- [ ] All P0 tests pass
- [ ] No high-risk (≥6) items unmitigated
- [ ] Security tests (SEC category) pass 100%
- [ ] Performance targets met (PERF category)
---
## Mitigation Plans
### R-001: {Risk Description} (Score: 6)
**Mitigation Strategy:** {detailed_mitigation}
**Owner:** {owner}
**Timeline:** {date}
**Status:** Planned / In Progress / Complete
**Verification:** {how_to_verify}
### R-002: {Risk Description} (Score: 6)
**Mitigation Strategy:** {detailed_mitigation}
**Owner:** {owner}
**Timeline:** {date}
**Status:** Planned / In Progress / Complete
**Verification:** {how_to_verify}
---
## Assumptions and Dependencies
### Assumptions
1. {assumption}
2. {assumption}
3. {assumption}
### Dependencies
1. {dependency} - Required by {date}
2. {dependency} - Required by {date}
### Risks to Plan
- **Risk**: {risk_to_plan}
- **Impact**: {impact}
- **Contingency**: {contingency}
---
---
## Follow-on Workflows (Manual)
- Run `*atdd` to generate failing P0 tests (separate workflow; not auto-run).
- Run `*automate` for broader coverage once implementation exists.
---
## Approval
**Test Design Approved By:**
- [ ] Product Manager: {name} Date: {date}
- [ ] Tech Lead: {name} Date: {date}
- [ ] QA Lead: {name} Date: {date}
**Comments:**
---
---
---
## Appendix
### Knowledge Base References
- `risk-governance.md` - Risk classification framework
- `probability-impact.md` - Risk scoring methodology
- `test-levels-framework.md` - Test level selection
- `test-priorities-matrix.md` - P0-P3 prioritization
### Related Documents
- PRD: {prd_link}
- Epic: {epic_link}
- Architecture: {arch_link}
- Tech Spec: {tech_spec_link}
---
**Generated by**: BMad TEA Agent - Test Architect Module
**Workflow**: `_bmad/bmm/testarch/test-design`
**Version**: 4.0 (BMad v6)

View File

@@ -1,54 +0,0 @@
# Test Architect workflow: test-design
name: testarch-test-design
description: "Dual-mode workflow: (1) System-level testability review in Solutioning phase, or (2) Epic-level test planning in Implementation phase. Auto-detects mode based on project phase."
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-design"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: "{installed_path}/test-design-template.md"
# Variables and inputs
variables:
design_level: "full" # full, targeted, minimal - scope of design effort
mode: "auto-detect" # auto-detect (default), system-level, epic-level
# Output configuration
# Note: Actual output file determined dynamically based on mode detection
# Declared outputs for new workflow format
outputs:
- id: system-level
description: "System-level testability review (Phase 3)"
path: "{output_folder}/test-design-system.md"
- id: epic-level
description: "Epic-level test plan (Phase 4)"
path: "{output_folder}/test-design-epic-{epic_num}.md"
default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
# Required tools
required_tools:
- read_file # Read PRD, epics, stories, architecture docs
- write_file # Create test design document
- list_files # Find related documentation
- search_repo # Search for existing tests and patterns
tags:
- qa
- planning
- test-architect
- risk-assessment
- coverage
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true

View File

@@ -1,472 +0,0 @@
# Test Quality Review - Validation Checklist
Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
---
## Prerequisites
Note: `test-review` is optional and only audits existing tests; it does not generate tests.
### Test File Discovery
- [ ] Test file(s) identified for review (single/directory/suite scope)
- [ ] Test files exist and are readable
- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
### Knowledge Base Loading
- [ ] tea-index.csv loaded successfully
- [ ] `test-quality.md` loaded (Definition of Done)
- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
- [ ] `network-first.md` loaded (Route intercept before navigate)
- [ ] `data-factories.md` loaded (Factory patterns)
- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
- [ ] All other enabled fragments loaded successfully
### Context Gathering
- [ ] Story file discovered or explicitly provided (if available)
- [ ] Test design document discovered or explicitly provided (if available)
- [ ] Acceptance criteria extracted from story (if available)
- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
---
## Process Steps
### Step 1: Context Loading
- [ ] Review scope determined (single/directory/suite)
- [ ] Test file paths collected
- [ ] Related artifacts discovered (story, test-design)
- [ ] Knowledge base fragments loaded successfully
- [ ] Quality criteria flags read from workflow variables
### Step 2: Test File Parsing
**For Each Test File:**
- [ ] File read successfully
- [ ] File size measured (lines, KB)
- [ ] File structure parsed (describe blocks, it blocks)
- [ ] Test IDs extracted (if present)
- [ ] Priority markers extracted (if present)
- [ ] Imports analyzed
- [ ] Dependencies identified
**Test Structure Analysis:**
- [ ] Describe block count calculated
- [ ] It/test block count calculated
- [ ] BDD structure identified (Given-When-Then)
- [ ] Fixture usage detected
- [ ] Data factory usage detected
- [ ] Network interception patterns identified
- [ ] Assertions counted
- [ ] Waits and timeouts cataloged
- [ ] Conditionals (if/else) detected
- [ ] Try/catch blocks detected
- [ ] Shared state or globals detected
### Step 3: Quality Criteria Validation
**For Each Enabled Criterion:**
#### BDD Format (if `check_given_when_then: true`)
- [ ] Given-When-Then structure evaluated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with line numbers
- [ ] Examples of good/bad patterns noted
#### Test IDs (if `check_test_ids: true`)
- [ ] Test ID presence validated
- [ ] Test ID format checked (e.g., 1.3-E2E-001)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Missing IDs cataloged
#### Priority Markers (if `check_priority_markers: true`)
- [ ] P0/P1/P2/P3 classification validated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Missing priorities cataloged
#### Hard Waits (if `check_hard_waits: true`)
- [ ] sleep(), waitForTimeout(), hardcoded delays detected
- [ ] Justification comments checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with line numbers and recommended fixes
#### Determinism (if `check_determinism: true`)
- [ ] Conditionals (if/else/switch) detected
- [ ] Try/catch abuse detected
- [ ] Random values (Math.random, Date.now) detected
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Isolation (if `check_isolation: true`)
- [ ] Cleanup hooks (afterEach/afterAll) validated
- [ ] Shared state detected
- [ ] Global variable mutations detected
- [ ] Resource cleanup verified
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Fixture Patterns (if `check_fixture_patterns: true`)
- [ ] Fixtures detected (test.extend)
- [ ] Pure functions validated
- [ ] mergeTests usage checked
- [ ] beforeEach complexity analyzed
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Data Factories (if `check_data_factories: true`)
- [ ] Factory functions detected
- [ ] Hardcoded data (magic strings/numbers) detected
- [ ] Faker.js or similar usage validated
- [ ] API-first setup pattern checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Network-First (if `check_network_first: true`)
- [ ] page.route() before page.goto() validated
- [ ] Race conditions detected (route after navigate)
- [ ] waitForResponse patterns checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Assertions (if `check_assertions: true`)
- [ ] Explicit assertions counted
- [ ] Implicit waits without assertions detected
- [ ] Assertion specificity validated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Test Length (if `check_test_length: true`)
- [ ] File line count calculated
- [ ] Threshold comparison (≤300 lines ideal)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Splitting recommendations generated (if >300 lines)
#### Test Duration (if `check_test_duration: true`)
- [ ] Test complexity analyzed (as proxy for duration if no execution data)
- [ ] Threshold comparison (≤1.5 min target)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Optimization recommendations generated
#### Flakiness Patterns (if `check_flakiness_patterns: true`)
- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
- [ ] Race conditions detected
- [ ] Timing-dependent assertions detected
- [ ] Retry logic detected
- [ ] Environment-dependent assumptions detected
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
---
### Step 4: Quality Score Calculation
**Violation Counting:**
- [ ] Critical (P0) violations counted
- [ ] High (P1) violations counted
- [ ] Medium (P2) violations counted
- [ ] Low (P3) violations counted
- [ ] Violation breakdown by criterion recorded
**Score Calculation:**
- [ ] Starting score: 100
- [ ] Critical violations deducted (-10 each)
- [ ] High violations deducted (-5 each)
- [ ] Medium violations deducted (-2 each)
- [ ] Low violations deducted (-1 each)
- [ ] Bonus points added (max +30):
- [ ] Excellent BDD structure (+5 if applicable)
- [ ] Comprehensive fixtures (+5 if applicable)
- [ ] Comprehensive data factories (+5 if applicable)
- [ ] Network-first pattern (+5 if applicable)
- [ ] Perfect isolation (+5 if applicable)
- [ ] All test IDs present (+5 if applicable)
- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
**Quality Grade:**
- [ ] Grade assigned based on score:
- 90-100: A+ (Excellent)
- 80-89: A (Good)
- 70-79: B (Acceptable)
- 60-69: C (Needs Improvement)
- <60: F (Critical Issues)
---
### Step 5: Review Report Generation
**Report Sections Created:**
- [ ] **Header Section**:
- [ ] Test file(s) reviewed listed
- [ ] Review date recorded
- [ ] Review scope noted (single/directory/suite)
- [ ] Quality score and grade displayed
- [ ] **Executive Summary**:
- [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
- [ ] Key strengths listed (3-5 bullet points)
- [ ] Key weaknesses listed (3-5 bullet points)
- [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
- [ ] **Quality Criteria Assessment**:
- [ ] Table with all criteria evaluated
- [ ] Status for each criterion (PASS/WARN/FAIL)
- [ ] Violation count per criterion
- [ ] **Critical Issues (Must Fix)**:
- [ ] P0/P1 violations listed
- [ ] Code location provided for each (file:line)
- [ ] Issue explanation clear
- [ ] Recommended fix provided with code example
- [ ] Knowledge base reference provided
- [ ] **Recommendations (Should Fix)**:
- [ ] P2/P3 violations listed
- [ ] Code location provided for each (file:line)
- [ ] Issue explanation clear
- [ ] Recommended improvement provided with code example
- [ ] Knowledge base reference provided
- [ ] **Best Practices Examples** (if good patterns found):
- [ ] Good patterns highlighted from tests
- [ ] Knowledge base fragments referenced
- [ ] Examples provided for others to follow
- [ ] **Knowledge Base References**:
- [ ] All fragments consulted listed
- [ ] Links to detailed guidance provided
---
### Step 6: Optional Outputs Generation
**Inline Comments** (if `generate_inline_comments: true`):
- [ ] Inline comments generated at violation locations
- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
- [ ] Comments added to test files (no logic changes)
- [ ] Test files remain valid and executable
**Quality Badge** (if `generate_quality_badge: true`):
- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
- [ ] Badge format suitable for README or documentation
- [ ] Badge saved to output folder
**Story Update** (if `append_to_story: true` and story file exists):
- [ ] "Test Quality Review" section created
- [ ] Quality score included
- [ ] Critical issues summarized
- [ ] Link to full review report provided
- [ ] Story file updated successfully
---
### Step 7: Save and Notify
**Outputs Saved:**
- [ ] Review report saved to `{output_file}`
- [ ] Inline comments written to test files (if enabled)
- [ ] Quality badge saved (if enabled)
- [ ] Story file updated (if enabled)
- [ ] All outputs are valid and readable
**Summary Message Generated:**
- [ ] Quality score and grade included
- [ ] Critical issue count stated
- [ ] Recommendation provided (Approve/Request changes/Block)
- [ ] Next steps clarified
- [ ] Message displayed to user
---
## Output Validation
### Review Report Completeness
- [ ] All required sections present
- [ ] No placeholder text or TODOs in report
- [ ] All code locations are accurate (file:line)
- [ ] All code examples are valid and demonstrate fix
- [ ] All knowledge base references are correct
### Review Report Accuracy
- [ ] Quality score matches violation breakdown
- [ ] Grade matches score range
- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
- [ ] Violations correctly attributed to quality criteria
- [ ] No false positives (violations are legitimate issues)
- [ ] No false negatives (critical issues not missed)
### Review Report Clarity
- [ ] Executive summary is clear and actionable
- [ ] Issue explanations are understandable
- [ ] Recommended fixes are implementable
- [ ] Code examples are correct and runnable
- [ ] Recommendation (Approve/Request changes) is clear
---
## Quality Checks
### Knowledge-Based Validation
- [ ] All feedback grounded in knowledge base fragments
- [ ] Recommendations follow proven patterns
- [ ] No arbitrary or opinion-based feedback
- [ ] Knowledge fragment references accurate and relevant
### Actionable Feedback
- [ ] Every issue includes recommended fix
- [ ] Every fix includes code example
- [ ] Code examples demonstrate correct pattern
- [ ] Fixes reference knowledge base for more detail
### Severity Classification
- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
- [ ] Low (P3) issues are minor style/preference (verbose tests)
### Context Awareness
- [ ] Review considers project context (some patterns may be justified)
- [ ] Violations with justification comments noted as acceptable
- [ ] Edge cases acknowledged
- [ ] Recommendations are pragmatic, not dogmatic
---
## Integration Points
### Story File Integration
- [ ] Story file discovered correctly (if available)
- [ ] Acceptance criteria extracted and used for context
- [ ] Test quality section appended to story (if enabled)
- [ ] Link to review report added to story
### Test Design Integration
- [ ] Test design document discovered correctly (if available)
- [ ] Priority context (P0/P1/P2/P3) extracted and used
- [ ] Review validates tests align with prioritization
- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
### Knowledge Base Integration
- [ ] tea-index.csv loaded successfully
- [ ] All required fragments loaded
- [ ] Fragments applied correctly to validation
- [ ] Fragment references in report are accurate
---
## Edge Cases and Special Situations
### Empty or Minimal Tests
- [ ] If test file is empty, report notes "No tests found"
- [ ] If test file has only boilerplate, report notes "No meaningful tests"
- [ ] Score reflects lack of content appropriately
### Legacy Tests
- [ ] Legacy tests acknowledged in context
- [ ] Review provides practical recommendations for improvement
- [ ] Recognizes that complete refactor may not be feasible
- [ ] Prioritizes critical issues (flakiness) over style
### Test Framework Variations
- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
- [ ] Knowledge fragments applied appropriately for framework
### Justified Violations
- [ ] Violations with justification comments in code noted as acceptable
- [ ] Justifications evaluated for legitimacy
- [ ] Report acknowledges justified patterns
- [ ] Score not penalized for justified violations
---
## Final Validation
### Review Completeness
- [ ] All enabled quality criteria evaluated
- [ ] All test files in scope reviewed
- [ ] All violations cataloged
- [ ] All recommendations provided
- [ ] Review report is comprehensive
### Review Accuracy
- [ ] Quality score is accurate
- [ ] Violations are correct (no false positives)
- [ ] Critical issues not missed (no false negatives)
- [ ] Code locations are correct
- [ ] Knowledge base references are accurate
### Review Usefulness
- [ ] Feedback is actionable
- [ ] Recommendations are implementable
- [ ] Code examples are correct
- [ ] Review helps developer improve tests
- [ ] Review educates on best practices
### Workflow Complete
- [ ] All checklist items completed
- [ ] All outputs validated and saved
- [ ] User notified with summary
- [ ] Review ready for developer consumption
- [ ] Follow-up actions identified (if any)
---
## Notes
Record any issues, observations, or important context during workflow execution:
- **Test Framework**: [Playwright, Jest, Cypress, etc.]
- **Review Scope**: [single file, directory, full suite]
- **Quality Score**: [0-100 score, letter grade]
- **Critical Issues**: [Count of P0/P1 violations]
- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
- **Special Considerations**: [Legacy code, justified patterns, edge cases]
- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]

View File

@@ -1,628 +0,0 @@
# Test Quality Review - Instructions v4.0
**Workflow:** `testarch-test-review`
**Purpose:** Review test quality using TEA's comprehensive knowledge base and validate against best practices for maintainability, determinism, isolation, and flakiness prevention
**Agent:** Test Architect (TEA)
**Format:** Pure Markdown v4.0 (no XML blocks)
---
## Overview
This workflow performs comprehensive test quality reviews using TEA's knowledge base of best practices. It validates tests against proven patterns for fixture architecture, network-first safeguards, data factories, determinism, isolation, and flakiness prevention. The review generates actionable feedback with quality scoring.
**Key Capabilities:**
- **Knowledge-Based Review**: Applies patterns from tea-index.csv fragments
- **Quality Scoring**: 0-100 score based on violations and best practices
- **Multi-Scope**: Review single file, directory, or entire test suite
- **Pattern Detection**: Identifies flaky patterns, hard waits, race conditions
- **Best Practice Validation**: BDD format, test IDs, priorities, assertions
- **Actionable Feedback**: Critical issues (must fix) vs recommendations (should fix)
- **Integration**: Works with story files, test-design, acceptance criteria
---
## Prerequisites
**Required:**
- Test file(s) to review (auto-discovered or explicitly provided)
- Test framework configuration (playwright.config.ts, jest.config.js, etc.)
**Recommended:**
- Story file with acceptance criteria (for context)
- Test design document (for priority context)
- Knowledge base fragments available in tea-index.csv
**Halt Conditions:**
- If test file path is invalid or file doesn't exist, halt and request correction
- If test_dir is empty (no tests found), halt and notify user
---
## Workflow Steps
### Step 1: Load Context and Knowledge Base
**Actions:**
1. Check playwright-utils flag:
- Read `{config_source}` and check `config.tea_use_playwright_utils`
2. Load relevant knowledge fragments from `{project-root}/_bmad/bmm/testarch/tea-index.csv`:
**Core Patterns (Always load):**
- `test-quality.md` - Definition of Done (deterministic tests, isolated with cleanup, explicit assertions, <300 lines, <1.5 min, 658 lines, 5 examples)
- `data-factories.md` - Factory functions with faker: overrides, nested factories, API-first setup (498 lines, 5 examples)
- `test-levels-framework.md` - E2E vs API vs Component vs Unit appropriateness with decision matrix (467 lines, 4 examples)
- `selective-testing.md` - Duplicate coverage detection with tag-based, spec filter, diff-based selection (727 lines, 4 examples)
- `test-healing-patterns.md` - Common failure patterns: stale selectors, race conditions, dynamic data, network errors, hard waits (648 lines, 5 examples)
- `selector-resilience.md` - Selector best practices (data-testid > ARIA > text > CSS hierarchy, anti-patterns, 541 lines, 4 examples)
- `timing-debugging.md` - Race condition prevention and async debugging techniques (370 lines, 3 examples)
**If `config.tea_use_playwright_utils: true` (All Utilities):**
- `overview.md` - Playwright utils best practices
- `api-request.md` - Validate apiRequest usage patterns
- `network-recorder.md` - Review HAR record/playback implementation
- `auth-session.md` - Check auth token management
- `intercept-network-call.md` - Validate network interception
- `recurse.md` - Review polling patterns
- `log.md` - Check logging best practices
- `file-utils.md` - Validate file operation patterns
- `burn-in.md` - Review burn-in configuration
- `network-error-monitor.md` - Check error monitoring setup
- `fixtures-composition.md` - Validate mergeTests usage
**If `config.tea_use_playwright_utils: false`:**
- `fixture-architecture.md` - Pure function → Fixture → mergeTests composition with auto-cleanup (406 lines, 5 examples)
- `network-first.md` - Route intercept before navigate to prevent race conditions (489 lines, 5 examples)
- `playwright-config.md` - Environment-based configuration with fail-fast validation (722 lines, 5 examples)
- `component-tdd.md` - Red-Green-Refactor patterns with provider isolation (480 lines, 4 examples)
- `ci-burn-in.md` - Flaky test detection with 10-iteration burn-in loop (678 lines, 4 examples)
3. Determine review scope:
- **single**: Review one test file (`test_file_path` provided)
- **directory**: Review all tests in directory (`test_dir` provided)
- **suite**: Review entire test suite (discover all test files)
4. Auto-discover related artifacts (if `auto_discover_story: true`):
- Extract test ID from filename (e.g., `1.3-E2E-001.spec.ts` → story 1.3)
- Search for story file (`story-1.3.md`)
- Search for test design (`test-design-story-1.3.md` or `test-design-epic-1.md`)
5. Read story file for context (if available):
- Extract acceptance criteria
- Extract priority classification
- Extract expected test IDs
**Output:** Complete knowledge base loaded, review scope determined, context gathered
---
### Step 2: Discover and Parse Test Files
**Actions:**
1. **Discover test files** based on scope:
- **single**: Use `test_file_path` variable
- **directory**: Use `glob` to find all test files in `test_dir` (e.g., `*.spec.ts`, `*.test.js`)
- **suite**: Use `glob` to find all test files recursively from project root
2. **Parse test file metadata**:
- File path and name
- File size (warn if >15 KB or >300 lines)
- Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
- Imports and dependencies
- Test structure (describe/context/it blocks)
3. **Extract test structure**:
- Count of describe blocks (test suites)
- Count of it/test blocks (individual tests)
- Test IDs (if present, e.g., `test.describe('1.3-E2E-001')`)
- Priority markers (if present, e.g., `test.describe.only` for P0)
- BDD structure (Given-When-Then comments or steps)
4. **Identify test patterns**:
- Fixtures used
- Data factories used
- Network interception patterns
- Assertions used (expect, assert, toHaveText, etc.)
- Waits and timeouts (page.waitFor, sleep, hardcoded delays)
- Conditionals (if/else, switch, ternary)
- Try/catch blocks
- Shared state or globals
**Output:** Complete test file inventory with structure and pattern analysis
---
### Step 3: Validate Against Quality Criteria
**Actions:**
For each test file, validate against quality criteria (configurable via workflow variables):
#### 1. BDD Format Validation (if `check_given_when_then: true`)
-**PASS**: Tests use Given-When-Then structure (comments or step organization)
- ⚠️ **WARN**: Tests have some structure but not explicit GWT
-**FAIL**: Tests lack clear structure, hard to understand intent
**Knowledge Fragment**: test-quality.md, tdd-cycles.md
---
#### 2. Test ID Conventions (if `check_test_ids: true`)
-**PASS**: Test IDs present and follow convention (e.g., `1.3-E2E-001`, `2.1-API-005`)
- ⚠️ **WARN**: Some test IDs missing or inconsistent
-**FAIL**: No test IDs, can't trace tests to requirements
**Knowledge Fragment**: traceability.md, test-quality.md
---
#### 3. Priority Markers (if `check_priority_markers: true`)
-**PASS**: Tests classified as P0/P1/P2/P3 (via markers or test-design reference)
- ⚠️ **WARN**: Some priority classifications missing
-**FAIL**: No priority classification, can't determine criticality
**Knowledge Fragment**: test-priorities.md, risk-governance.md
---
#### 4. Hard Waits Detection (if `check_hard_waits: true`)
-**PASS**: No hard waits detected (no `sleep()`, `wait(5000)`, hardcoded delays)
- ⚠️ **WARN**: Some hard waits used but with justification comments
-**FAIL**: Hard waits detected without justification (flakiness risk)
**Patterns to detect:**
- `sleep(1000)`, `setTimeout()`, `delay()`
- `page.waitForTimeout(5000)` without explicit reason
- `await new Promise(resolve => setTimeout(resolve, 3000))`
**Knowledge Fragment**: test-quality.md, network-first.md
---
#### 5. Determinism Check (if `check_determinism: true`)
-**PASS**: Tests are deterministic (no conditionals, no try/catch abuse, no random values)
- ⚠️ **WARN**: Some conditionals but with clear justification
-**FAIL**: Tests use if/else, switch, or try/catch to control flow (flakiness risk)
**Patterns to detect:**
- `if (condition) { test logic }` - tests should work deterministically
- `try { test } catch { fallback }` - tests shouldn't swallow errors
- `Math.random()`, `Date.now()` without factory abstraction
**Knowledge Fragment**: test-quality.md, data-factories.md
---
#### 6. Isolation Validation (if `check_isolation: true`)
-**PASS**: Tests clean up resources, no shared state, can run in any order
- ⚠️ **WARN**: Some cleanup missing but isolated enough
-**FAIL**: Tests share state, depend on execution order, leave resources
**Patterns to check:**
- afterEach/afterAll cleanup hooks present
- No global variables mutated
- Database/API state cleaned up after tests
- Test data deleted or marked inactive
**Knowledge Fragment**: test-quality.md, data-factories.md
---
#### 7. Fixture Patterns (if `check_fixture_patterns: true`)
-**PASS**: Uses pure function → Fixture → mergeTests pattern
- ⚠️ **WARN**: Some fixtures used but not consistently
-**FAIL**: No fixtures, tests repeat setup code (maintainability risk)
**Patterns to check:**
- Fixtures defined (e.g., `test.extend({ customFixture: async ({}, use) => { ... }})`)
- Pure functions used for fixture logic
- mergeTests used to combine fixtures
- No beforeEach with complex setup (should be in fixtures)
**Knowledge Fragment**: fixture-architecture.md
---
#### 8. Data Factories (if `check_data_factories: true`)
-**PASS**: Uses factory functions with overrides, API-first setup
- ⚠️ **WARN**: Some factories used but also hardcoded data
-**FAIL**: Hardcoded test data, magic strings/numbers (maintainability risk)
**Patterns to check:**
- Factory functions defined (e.g., `createUser()`, `generateInvoice()`)
- Factories use faker.js or similar for realistic data
- Factories accept overrides (e.g., `createUser({ email: 'custom@example.com' })`)
- API-first setup (create via API, test via UI)
**Knowledge Fragment**: data-factories.md
---
#### 9. Network-First Pattern (if `check_network_first: true`)
-**PASS**: Route interception set up BEFORE navigation (race condition prevention)
- ⚠️ **WARN**: Some routes intercepted correctly, others after navigation
-**FAIL**: Route interception after navigation (race condition risk)
**Patterns to check:**
- `page.route()` called before `page.goto()`
- `page.waitForResponse()` used with explicit URL pattern
- No navigation followed immediately by route setup
**Knowledge Fragment**: network-first.md
---
#### 10. Assertions (if `check_assertions: true`)
-**PASS**: Explicit assertions present (expect, assert, toHaveText)
- ⚠️ **WARN**: Some tests rely on implicit waits instead of assertions
-**FAIL**: Missing assertions, tests don't verify behavior
**Patterns to check:**
- Each test has at least one assertion
- Assertions are specific (not just truthy checks)
- Assertions use framework-provided matchers (toHaveText, toBeVisible)
**Knowledge Fragment**: test-quality.md
---
#### 11. Test Length (if `check_test_length: true`)
-**PASS**: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
- ⚠️ **WARN**: Test file 301-500 lines (consider splitting)
-**FAIL**: Test file >500 lines (too large, maintainability risk)
**Knowledge Fragment**: test-quality.md
---
#### 12. Test Duration (if `check_test_duration: true`)
-**PASS**: Individual tests ≤1.5 minutes (target: <30 seconds)
- **WARN**: Some tests 1.5-3 minutes (consider optimization)
- **FAIL**: Tests >3 minutes (too slow, impacts CI/CD)
**Note:** Duration estimation based on complexity analysis if execution data unavailable
**Knowledge Fragment**: test-quality.md, selective-testing.md
---
#### 13. Flakiness Patterns (if `check_flakiness_patterns: true`)
-**PASS**: No known flaky patterns detected
- ⚠️ **WARN**: Some potential flaky patterns (e.g., tight timeouts, race conditions)
-**FAIL**: Multiple flaky patterns detected (high flakiness risk)
**Patterns to detect:**
- Tight timeouts (e.g., `{ timeout: 1000 }`)
- Race conditions (navigation before route interception)
- Timing-dependent assertions (e.g., checking timestamps)
- Retry logic in tests (hides flakiness)
- Environment-dependent assumptions (hardcoded URLs, ports)
**Knowledge Fragment**: test-quality.md, network-first.md, ci-burn-in.md
---
### Step 4: Calculate Quality Score
**Actions:**
1. **Count violations** by severity:
- **Critical (P0)**: Hard waits without justification, no assertions, race conditions, shared state
- **High (P1)**: Missing test IDs, no BDD structure, hardcoded data, missing fixtures
- **Medium (P2)**: Long test files (>300 lines), missing priorities, some conditionals
- **Low (P3)**: Minor style issues, incomplete cleanup, verbose tests
2. **Calculate quality score** (if `quality_score_enabled: true`):
```
Starting Score: 100
Critical Violations: -10 points each
High Violations: -5 points each
Medium Violations: -2 points each
Low Violations: -1 point each
Bonus Points:
+ Excellent BDD structure: +5
+ Comprehensive fixtures: +5
+ Comprehensive data factories: +5
+ Network-first pattern: +5
+ Perfect isolation: +5
+ All test IDs present: +5
Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
```
3. **Quality Grade**:
- **90-100**: Excellent (A+)
- **80-89**: Good (A)
- **70-79**: Acceptable (B)
- **60-69**: Needs Improvement (C)
- **<60**: Critical Issues (F)
**Output:** Quality score calculated with violation breakdown
---
### Step 5: Generate Review Report
**Actions:**
1. **Create review report** using `test-review-template.md`:
**Header Section:**
- Test file(s) reviewed
- Review date
- Review scope (single/directory/suite)
- Quality score and grade
**Executive Summary:**
- Overall assessment (Excellent/Good/Needs Improvement/Critical)
- Key strengths
- Key weaknesses
- Recommendation (Approve/Approve with comments/Request changes)
**Quality Criteria Assessment:**
- Table with all criteria evaluated
- Status for each (PASS/WARN/FAIL)
- Violation count per criterion
**Critical Issues (Must Fix):**
- Priority P0/P1 violations
- Code location (file:line)
- Explanation of issue
- Recommended fix
- Knowledge base reference
**Recommendations (Should Fix):**
- Priority P2/P3 violations
- Code location (file:line)
- Explanation of issue
- Recommended improvement
- Knowledge base reference
**Best Practices Examples:**
- Highlight good patterns found in tests
- Reference knowledge base fragments
- Provide examples for others to follow
**Knowledge Base References:**
- List all fragments consulted
- Provide links to detailed guidance
2. **Generate inline comments** (if `generate_inline_comments: true`):
- Add TODO comments in test files at violation locations
- Format: `// TODO (TEA Review): [Issue description] - See test-review-{filename}.md`
- Never modify test logic, only add comments
3. **Generate quality badge** (if `generate_quality_badge: true`):
- Create badge with quality score (e.g., "Test Quality: 87/100 (A)")
- Format for inclusion in README or documentation
4. **Append to story file** (if `append_to_story: true` and story file exists):
- Add "Test Quality Review" section to story
- Include quality score and critical issues
- Link to full review report
**Output:** Comprehensive review report with actionable feedback
---
### Step 6: Save Outputs and Notify
**Actions:**
1. **Save review report** to `{output_file}`
2. **Save inline comments** to test files (if enabled)
3. **Save quality badge** to output folder (if enabled)
4. **Update story file** (if enabled)
5. **Generate summary message** for user:
- Quality score and grade
- Critical issue count
- Recommendation
**Output:** All review artifacts saved and user notified
---
## Quality Criteria Decision Matrix
| Criterion | PASS | WARN | FAIL | Knowledge Fragment |
| ------------------ | ------------------------- | -------------- | ------------------- | ----------------------- |
| BDD Format | Given-When-Then present | Some structure | No structure | test-quality.md |
| Test IDs | All tests have IDs | Some missing | No IDs | traceability.md |
| Priority Markers | All classified | Some missing | No classification | test-priorities.md |
| Hard Waits | No hard waits | Some justified | Hard waits present | test-quality.md |
| Determinism | No conditionals/random | Some justified | Conditionals/random | test-quality.md |
| Isolation | Clean up, no shared state | Some gaps | Shared state | test-quality.md |
| Fixture Patterns | Pure fn Fixture | Some fixtures | No fixtures | fixture-architecture.md |
| Data Factories | Factory functions | Some factories | Hardcoded data | data-factories.md |
| Network-First | Intercept before navigate | Some correct | Race conditions | network-first.md |
| Assertions | Explicit assertions | Some implicit | Missing assertions | test-quality.md |
| Test Length | 300 lines | 301-500 lines | >500 lines | test-quality.md |
| Test Duration | ≤1.5 min | 1.5-3 min | >3 min | test-quality.md |
| Flakiness Patterns | No flaky patterns | Some potential | Multiple patterns | ci-burn-in.md |
---
## Example Review Summary
````markdown
# Test Quality Review: auth-login.spec.ts
**Quality Score**: 78/100 (B - Acceptable)
**Review Date**: 2025-10-14
**Recommendation**: Approve with Comments
## Executive Summary
Overall, the test demonstrates good structure and coverage of the login flow. However, there are several areas for improvement to enhance maintainability and prevent flakiness.
**Strengths:**
- Excellent BDD structure with clear Given-When-Then comments
- Good use of test IDs (1.3-E2E-001, 1.3-E2E-002)
- Comprehensive assertions on authentication state
**Weaknesses:**
- Hard wait detected (page.waitForTimeout(2000)) - flakiness risk
- Hardcoded test data (email: 'test@example.com') - use factories instead
- Missing fixture for common login setup - DRY violation
**Recommendation**: Address critical issue (hard wait) before merging. Other improvements can be addressed in follow-up PR.
## Critical Issues (Must Fix)
### 1. Hard Wait Detected (Line 45)
**Severity**: P0 (Critical)
**Issue**: `await page.waitForTimeout(2000)` introduces flakiness
**Fix**: Use explicit wait for element or network request instead
**Knowledge**: See test-quality.md, network-first.md
```typescript
// ❌ Bad (current)
await page.waitForTimeout(2000);
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible();
// ✅ Good (recommended)
await expect(page.locator('[data-testid="user-menu"]')).toBeVisible({ timeout: 10000 });
```
````
## Recommendations (Should Fix)
### 1. Use Data Factory for Test User (Lines 23, 32, 41)
**Severity**: P1 (High)
**Issue**: Hardcoded email `test@example.com` - maintainability risk
**Fix**: Create factory function for test users
**Knowledge**: See data-factories.md
```typescript
// ✅ Good (recommended)
import { createTestUser } from './factories/user-factory';
const testUser = createTestUser({ role: 'admin' });
await loginPage.login(testUser.email, testUser.password);
```
### 2. Extract Login Setup to Fixture (Lines 18-28)
**Severity**: P1 (High)
**Issue**: Login setup repeated across tests - DRY violation
**Fix**: Create fixture for authenticated state
**Knowledge**: See fixture-architecture.md
```typescript
// ✅ Good (recommended)
const test = base.extend({
authenticatedPage: async ({ page }, use) => {
const user = createTestUser();
await loginPage.login(user.email, user.password);
await use(page);
},
});
test('user can access dashboard', async ({ authenticatedPage }) => {
// Test starts already logged in
});
```
## Quality Score Breakdown
- Starting Score: 100
- Critical Violations (1 × -10): -10
- High Violations (2 × -5): -10
- Medium Violations (0 × -2): 0
- Low Violations (1 × -1): -1
- Bonus (BDD +5, Test IDs +5): +10
- **Final Score**: 78/100 (B)
```
---
## Integration with Other Workflows
### Before Test Review
- **atdd**: Generate acceptance tests (TEA reviews them for quality)
- **automate**: Expand regression suite (TEA reviews new tests)
- **dev story**: Developer writes implementation tests (TEA reviews them)
### After Test Review
- **Developer**: Addresses critical issues, improves based on recommendations
- **gate**: Test quality review feeds into gate decision (high-quality tests increase confidence)
### Coordinates With
- **Story File**: Review links to acceptance criteria context
- **Test Design**: Review validates tests align with prioritization
- **Knowledge Base**: Review references fragments for detailed guidance
---
## Important Notes
1. **Non-Prescriptive**: Review provides guidance, not rigid rules
2. **Context Matters**: Some violations may be justified for specific scenarios
3. **Knowledge-Based**: All feedback grounded in proven patterns from tea-index.csv
4. **Actionable**: Every issue includes recommended fix with code examples
5. **Quality Score**: Use as indicator, not absolute measure
6. **Continuous Improvement**: Review same tests periodically as patterns evolve
---
## Troubleshooting
**Problem: No test files found**
- Verify test_dir path is correct
- Check test file extensions match glob pattern
- Ensure test files exist in expected location
**Problem: Quality score seems too low/high**
- Review violation counts - may need to adjust thresholds
- Consider context - some projects have different standards
- Focus on critical issues first, not just score
**Problem: Inline comments not generated**
- Check generate_inline_comments: true in variables
- Verify write permissions on test files
- Review append_to_file: false (separate report mode)
**Problem: Knowledge fragments not loading**
- Verify tea-index.csv exists in testarch/ directory
- Check fragment file paths are correct
- Ensure auto_load_knowledge: true in variables
```

View File

@@ -1,390 +0,0 @@
# Test Quality Review: {test_filename}
**Quality Score**: {score}/100 ({grade} - {assessment})
**Review Date**: {YYYY-MM-DD}
**Review Scope**: {single | directory | suite}
**Reviewer**: {user_name or TEA Agent}
---
Note: This review audits existing tests; it does not generate tests.
## Executive Summary
**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
### Key Strengths
✅ {strength_1}
✅ {strength_2}
✅ {strength_3}
### Key Weaknesses
❌ {weakness_1}
❌ {weakness_2}
❌ {weakness_3}
### Summary
{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
---
## Quality Criteria Assessment
| Criterion | Status | Violations | Notes |
| ------------------------------------ | ------------------------------- | ---------- | ------------ |
| BDD Format (Given-When-Then) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Test IDs | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Priority Markers (P0/P1/P2/P3) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Hard Waits (sleep, waitForTimeout) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Determinism (no conditionals) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Fixture Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Data Factories | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Network-First Pattern | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Explicit Assertions | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Test Length (≤300 lines) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines} | {brief_note} |
| Test Duration (≤1.5 min) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
| Flakiness Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
---
## Quality Score Breakdown
```
Starting Score: 100
Critical Violations: -{critical_count} × 10 = -{critical_deduction}
High Violations: -{high_count} × 5 = -{high_deduction}
Medium Violations: -{medium_count} × 2 = -{medium_deduction}
Low Violations: -{low_count} × 1 = -{low_deduction}
Bonus Points:
Excellent BDD: +{0|5}
Comprehensive Fixtures: +{0|5}
Data Factories: +{0|5}
Network-First: +{0|5}
Perfect Isolation: +{0|5}
All Test IDs: +{0|5}
--------
Total Bonus: +{bonus_total}
Final Score: {final_score}/100
Grade: {grade}
```
---
## Critical Issues (Must Fix)
{If no critical issues: "No critical issues detected. ✅"}
{For each critical issue:}
### {issue_number}. {Issue Title}
**Severity**: P0 (Critical)
**Location**: `{filename}:{line_number}`
**Criterion**: {criterion_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Issue Description**:
{Detailed explanation of what the problem is and why it's critical}
**Current Code**:
```typescript
// ❌ Bad (current implementation)
{
code_snippet_showing_problem;
}
```
**Recommended Fix**:
```typescript
// ✅ Good (recommended approach)
{
code_snippet_showing_solution;
}
```
**Why This Matters**:
{Explanation of impact - flakiness risk, maintainability, reliability}
**Related Violations**:
{If similar issue appears elsewhere, note line numbers}
---
## Recommendations (Should Fix)
{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
{For each recommendation:}
### {rec_number}. {Recommendation Title}
**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
**Location**: `{filename}:{line_number}`
**Criterion**: {criterion_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Issue Description**:
{Detailed explanation of what could be improved and why}
**Current Code**:
```typescript
// ⚠️ Could be improved (current implementation)
{
code_snippet_showing_current_approach;
}
```
**Recommended Improvement**:
```typescript
// ✅ Better approach (recommended)
{
code_snippet_showing_improvement;
}
```
**Benefits**:
{Explanation of benefits - maintainability, readability, reusability}
**Priority**:
{Why this is P1/P2/P3 - urgency and impact}
---
## Best Practices Found
{If good patterns found, highlight them}
{For each best practice:}
### {practice_number}. {Best Practice Title}
**Location**: `{filename}:{line_number}`
**Pattern**: {pattern_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Why This Is Good**:
{Explanation of why this pattern is excellent}
**Code Example**:
```typescript
// ✅ Excellent pattern demonstrated in this test
{
code_snippet_showing_best_practice;
}
```
**Use as Reference**:
{Encourage using this pattern in other tests}
---
## Test File Analysis
### File Metadata
- **File Path**: `{relative_path_from_project_root}`
- **File Size**: {line_count} lines, {kb_size} KB
- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
- **Language**: {TypeScript | JavaScript}
### Test Structure
- **Describe Blocks**: {describe_count}
- **Test Cases (it/test)**: {test_count}
- **Average Test Length**: {avg_lines_per_test} lines per test
- **Fixtures Used**: {fixture_count} ({fixture_names})
- **Data Factories Used**: {factory_count} ({factory_names})
### Test Coverage Scope
- **Test IDs**: {test_id_list}
- **Priority Distribution**:
- P0 (Critical): {p0_count} tests
- P1 (High): {p1_count} tests
- P2 (Medium): {p2_count} tests
- P3 (Low): {p3_count} tests
- Unknown: {unknown_count} tests
### Assertions Analysis
- **Total Assertions**: {assertion_count}
- **Assertions per Test**: {avg_assertions_per_test} (avg)
- **Assertion Types**: {assertion_types_used}
---
## Context and Integration
### Related Artifacts
{If story file found:}
- **Story File**: [{story_filename}]({story_path})
- **Acceptance Criteria Mapped**: {ac_mapped}/{ac_total} ({ac_coverage}%)
{If test-design found:}
- **Test Design**: [{test_design_filename}]({test_design_path})
- **Risk Assessment**: {risk_level}
- **Priority Framework**: P0-P3 applied
### Acceptance Criteria Validation
{If story file available, map tests to ACs:}
| Acceptance Criterion | Test ID | Status | Notes |
| -------------------- | --------- | -------------------------- | ------- |
| {AC_1} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
| {AC_2} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
| {AC_3} | {test_id} | {✅ Covered \| ❌ Missing} | {notes} |
**Coverage**: {covered_count}/{total_count} criteria covered ({coverage_percentage}%)
---
## Knowledge Base References
This review consulted the following knowledge base fragments:
- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function Fixture mergeTests pattern
- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
- **[traceability.md](../../../testarch/knowledge/traceability.md)** - Requirements-to-tests mapping
See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
---
## Next Steps
### Immediate Actions (Before Merge)
1. **{action_1}** - {description}
- Priority: {P0 | P1 | P2}
- Owner: {team_or_person}
- Estimated Effort: {time_estimate}
2. **{action_2}** - {description}
- Priority: {P0 | P1 | P2}
- Owner: {team_or_person}
- Estimated Effort: {time_estimate}
### Follow-up Actions (Future PRs)
1. **{action_1}** - {description}
- Priority: {P2 | P3}
- Target: {next_sprint | backlog}
2. **{action_2}** - {description}
- Priority: {P2 | P3}
- Target: {next_sprint | backlog}
### Re-Review Needed?
{✅ No re-review needed - approve as-is}
{⚠ Re-review after critical fixes - request changes, then re-review}
{❌ Major refactor required - block merge, pair programming recommended}
---
## Decision
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
**Rationale**:
{1-2 paragraph explanation of recommendation based on findings}
**For Approve**:
> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
**For Approve with Comments**:
> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
**For Request Changes**:
> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
**For Block**:
> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
---
## Appendix
### Violation Summary by Location
{Table of all violations sorted by line number:}
| Line | Severity | Criterion | Issue | Fix |
| ------ | ------------- | ----------- | ------------- | ----------- |
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
### Quality Trends
{If reviewing same file multiple times, show trend:}
| Review Date | Score | Grade | Critical Issues | Trend |
| ------------ | ------------- | --------- | --------------- | ----------- |
| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1} | Improved |
| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2} | Declined |
| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3} | Stable |
### Related Reviews
{If reviewing multiple files in directory/suite:}
| File | Score | Grade | Critical | Status |
| -------- | ----------- | ------- | -------- | ------------------ |
| {file_1} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
| {file_2} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
| {file_3} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
**Suite Average**: {avg_score}/100 ({avg_grade})
---
## Review Metadata
**Generated By**: BMad TEA Agent (Test Architect)
**Workflow**: testarch-test-review v4.0
**Review ID**: test-review-{filename}-{YYYYMMDD}
**Timestamp**: {YYYY-MM-DD HH:MM:SS}
**Version**: 1.0
---
## Feedback on This Review
If you have questions or feedback on this review:
1. Review patterns in knowledge base: `testarch/knowledge/`
2. Consult tea-index.csv for detailed guidance
3. Request clarification on specific violations
4. Pair with QA engineer to apply patterns
This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.

View File

@@ -1,46 +0,0 @@
# Test Architect workflow: test-review
name: testarch-test-review
description: "Review test quality using comprehensive knowledge base and best practices validation"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-review"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: "{installed_path}/test-review-template.md"
# Variables and inputs
variables:
test_dir: "{project-root}/tests" # Root test directory
review_scope: "single" # single (one file), directory (folder), suite (all tests)
# Output configuration
default_output_file: "{output_folder}/test-review.md"
# Required tools
required_tools:
- read_file # Read test files, story, test-design
- write_file # Create review report
- list_files # Discover test files in directory
- search_repo # Find tests by patterns
- glob # Find test files matching patterns
tags:
- qa
- test-architect
- code-review
- quality
- best-practices
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true # Can review multiple files

View File

@@ -1,655 +0,0 @@
# Requirements Traceability & Gate Decision - Validation Checklist
**Workflow:** `testarch-trace`
**Purpose:** Ensure complete traceability matrix with actionable gap analysis AND make deployment readiness decision (PASS/CONCERNS/FAIL/WAIVED)
This checklist covers **two sequential phases**:
- **PHASE 1**: Requirements Traceability (always executed)
- **PHASE 2**: Quality Gate Decision (executed if `enable_gate_decision: true`)
---
# PHASE 1: REQUIREMENTS TRACEABILITY
## Prerequisites Validation
- [ ] Acceptance criteria are available (from story file OR inline)
- [ ] Test suite exists (or gaps are acknowledged and documented)
- [ ] If tests are missing, recommend `*atdd` (trace does not run it automatically)
- [ ] Test directory path is correct (`test_dir` variable)
- [ ] Story file is accessible (if using BMad mode)
- [ ] Knowledge base is loaded (test-priorities, traceability, risk-governance)
---
## Context Loading
- [ ] Story file read successfully (if applicable)
- [ ] Acceptance criteria extracted correctly
- [ ] Story ID identified (e.g., 1.3)
- [ ] `test-design.md` loaded (if available)
- [ ] `tech-spec.md` loaded (if available)
- [ ] `PRD.md` loaded (if available)
- [ ] Relevant knowledge fragments loaded from `tea-index.csv`
---
## Test Discovery and Cataloging
- [ ] Tests auto-discovered using multiple strategies (test IDs, describe blocks, file paths)
- [ ] Tests categorized by level (E2E, API, Component, Unit)
- [ ] Test metadata extracted:
- [ ] Test IDs (e.g., 1.3-E2E-001)
- [ ] Describe/context blocks
- [ ] It blocks (individual test cases)
- [ ] Given-When-Then structure (if BDD)
- [ ] Priority markers (P0/P1/P2/P3)
- [ ] All relevant test files found (no tests missed due to naming conventions)
---
## Criteria-to-Test Mapping
- [ ] Each acceptance criterion mapped to tests (or marked as NONE)
- [ ] Explicit references found (test IDs, describe blocks mentioning criterion)
- [ ] Test level documented (E2E, API, Component, Unit)
- [ ] Given-When-Then narrative verified for alignment
- [ ] Traceability matrix table generated:
- [ ] Criterion ID
- [ ] Description
- [ ] Test ID
- [ ] Test File
- [ ] Test Level
- [ ] Coverage Status
---
## Coverage Classification
- [ ] Coverage status classified for each criterion:
- [ ] **FULL** - All scenarios validated at appropriate level(s)
- [ ] **PARTIAL** - Some coverage but missing edge cases or levels
- [ ] **NONE** - No test coverage at any level
- [ ] **UNIT-ONLY** - Only unit tests (missing integration/E2E validation)
- [ ] **INTEGRATION-ONLY** - Only API/Component tests (missing unit confidence)
- [ ] Classification justifications provided
- [ ] Edge cases considered in FULL vs PARTIAL determination
---
## Duplicate Coverage Detection
- [ ] Duplicate coverage checked across test levels
- [ ] Acceptable overlap identified (defense in depth for critical paths)
- [ ] Unacceptable duplication flagged (same validation at multiple levels)
- [ ] Recommendations provided for consolidation
- [ ] Selective testing principles applied
---
## Gap Analysis
- [ ] Coverage gaps identified:
- [ ] Criteria with NONE status
- [ ] Criteria with PARTIAL status
- [ ] Criteria with UNIT-ONLY status
- [ ] Criteria with INTEGRATION-ONLY status
- [ ] Gaps prioritized by risk level using test-priorities framework:
- [ ] **CRITICAL** - P0 criteria without FULL coverage (BLOCKER)
- [ ] **HIGH** - P1 criteria without FULL coverage (PR blocker)
- [ ] **MEDIUM** - P2 criteria without FULL coverage (nightly gap)
- [ ] **LOW** - P3 criteria without FULL coverage (acceptable)
- [ ] Specific test recommendations provided for each gap:
- [ ] Suggested test level (E2E, API, Component, Unit)
- [ ] Test description (Given-When-Then)
- [ ] Recommended test ID (e.g., 1.3-E2E-004)
- [ ] Explanation of why test is needed
---
## Coverage Metrics
- [ ] Overall coverage percentage calculated (FULL coverage / total criteria)
- [ ] P0 coverage percentage calculated
- [ ] P1 coverage percentage calculated
- [ ] P2 coverage percentage calculated (if applicable)
- [ ] Coverage by level calculated:
- [ ] E2E coverage %
- [ ] API coverage %
- [ ] Component coverage %
- [ ] Unit coverage %
---
## Test Quality Verification
For each mapped test, verify:
- [ ] Explicit assertions are present (not hidden in helpers)
- [ ] Test follows Given-When-Then structure
- [ ] No hard waits or sleeps (deterministic waiting only)
- [ ] Self-cleaning (test cleans up its data)
- [ ] File size < 300 lines
- [ ] Test duration < 90 seconds
Quality issues flagged:
- [ ] **BLOCKER** issues identified (missing assertions, hard waits, flaky patterns)
- [ ] **WARNING** issues identified (large files, slow tests, unclear structure)
- [ ] **INFO** issues identified (style inconsistencies, missing documentation)
Knowledge fragments referenced:
- [ ] `test-quality.md` for Definition of Done
- [ ] `fixture-architecture.md` for self-cleaning patterns
- [ ] `network-first.md` for Playwright best practices
- [ ] `data-factories.md` for test data patterns
---
## Phase 1 Deliverables Generated
### Traceability Matrix Markdown
- [ ] File created at `{output_folder}/traceability-matrix.md`
- [ ] Template from `trace-template.md` used
- [ ] Full mapping table included
- [ ] Coverage status section included
- [ ] Gap analysis section included
- [ ] Quality assessment section included
- [ ] Recommendations section included
### Coverage Badge/Metric (if enabled)
- [ ] Badge markdown generated
- [ ] Metrics exported to JSON for CI/CD integration
### Updated Story File (if enabled)
- [ ] "Traceability" section added to story markdown
- [ ] Link to traceability matrix included
- [ ] Coverage summary included
---
## Phase 1 Quality Assurance
### Accuracy Checks
- [ ] All acceptance criteria accounted for (none skipped)
- [ ] Test IDs correctly formatted (e.g., 1.3-E2E-001)
- [ ] File paths are correct and accessible
- [ ] Coverage percentages calculated correctly
- [ ] No false positives (tests incorrectly mapped to criteria)
- [ ] No false negatives (existing tests missed in mapping)
### Completeness Checks
- [ ] All test levels considered (E2E, API, Component, Unit)
- [ ] All priorities considered (P0, P1, P2, P3)
- [ ] All coverage statuses used appropriately (FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY)
- [ ] All gaps have recommendations
- [ ] All quality issues have severity and remediation guidance
### Actionability Checks
- [ ] Recommendations are specific (not generic)
- [ ] Test IDs suggested for new tests
- [ ] Given-When-Then provided for recommended tests
- [ ] Impact explained for each gap
- [ ] Priorities clear (CRITICAL, HIGH, MEDIUM, LOW)
---
## Phase 1 Documentation
- [ ] Traceability matrix is readable and well-formatted
- [ ] Tables render correctly in markdown
- [ ] Code blocks have proper syntax highlighting
- [ ] Links are valid and accessible
- [ ] Recommendations are clear and prioritized
---
# PHASE 2: QUALITY GATE DECISION
**Note**: Phase 2 executes only if `enable_gate_decision: true` in workflow.yaml
---
## Prerequisites
### Evidence Gathering
- [ ] Test execution results obtained (CI/CD pipeline, test framework reports)
- [ ] Story/epic/release file identified and read
- [ ] Test design document discovered or explicitly provided (if available)
- [ ] Traceability matrix discovered or explicitly provided (available from Phase 1)
- [ ] NFR assessment discovered or explicitly provided (if available)
- [ ] Code coverage report discovered or explicitly provided (if available)
- [ ] Burn-in results discovered or explicitly provided (if available)
### Evidence Validation
- [ ] Evidence freshness validated (warn if >7 days old, recommend re-running workflows)
- [ ] All required assessments available or user acknowledged gaps
- [ ] Test results are complete (not partial or interrupted runs)
- [ ] Test results match current codebase (not from outdated branch)
### Knowledge Base Loading
- [ ] `risk-governance.md` loaded successfully
- [ ] `probability-impact.md` loaded successfully
- [ ] `test-quality.md` loaded successfully
- [ ] `test-priorities.md` loaded successfully
- [ ] `ci-burn-in.md` loaded (if burn-in results available)
---
## Process Steps
### Step 1: Context Loading
- [ ] Gate type identified (story/epic/release/hotfix)
- [ ] Target ID extracted (story_id, epic_num, or release_version)
- [ ] Decision thresholds loaded from workflow variables
- [ ] Risk tolerance configuration loaded
- [ ] Waiver policy loaded
### Step 2: Evidence Parsing
**Test Results:**
- [ ] Total test count extracted
- [ ] Passed test count extracted
- [ ] Failed test count extracted
- [ ] Skipped test count extracted
- [ ] Test duration extracted
- [ ] P0 test pass rate calculated
- [ ] P1 test pass rate calculated
- [ ] Overall test pass rate calculated
**Quality Assessments:**
- [ ] P0/P1/P2/P3 scenarios extracted from test-design.md (if available)
- [ ] Risk scores extracted from test-design.md (if available)
- [ ] Coverage percentages extracted from traceability-matrix.md (available from Phase 1)
- [ ] Coverage gaps extracted from traceability-matrix.md (available from Phase 1)
- [ ] NFR status extracted from nfr-assessment.md (if available)
- [ ] Security issues count extracted from nfr-assessment.md (if available)
**Code Coverage:**
- [ ] Line coverage percentage extracted (if available)
- [ ] Branch coverage percentage extracted (if available)
- [ ] Function coverage percentage extracted (if available)
- [ ] Critical path coverage validated (if available)
**Burn-in Results:**
- [ ] Burn-in iterations count extracted (if available)
- [ ] Flaky tests count extracted (if available)
- [ ] Stability score calculated (if available)
### Step 3: Decision Rules Application
**P0 Criteria Evaluation:**
- [ ] P0 test pass rate evaluated (must be 100%)
- [ ] P0 acceptance criteria coverage evaluated (must be 100%)
- [ ] Security issues count evaluated (must be 0)
- [ ] Critical NFR failures evaluated (must be 0)
- [ ] Flaky tests evaluated (must be 0 if burn-in enabled)
- [ ] P0 decision recorded: PASS or FAIL
**P1 Criteria Evaluation:**
- [ ] P1 test pass rate evaluated (threshold: min_p1_pass_rate)
- [ ] P1 acceptance criteria coverage evaluated (threshold: 95%)
- [ ] Overall test pass rate evaluated (threshold: min_overall_pass_rate)
- [ ] Code coverage evaluated (threshold: min_coverage)
- [ ] P1 decision recorded: PASS or CONCERNS
**P2/P3 Criteria Evaluation:**
- [ ] P2 failures tracked (informational, don't block if allow_p2_failures: true)
- [ ] P3 failures tracked (informational, don't block if allow_p3_failures: true)
- [ ] Residual risks documented
**Final Decision:**
- [ ] Decision determined: PASS / CONCERNS / FAIL / WAIVED
- [ ] Decision rationale documented
- [ ] Decision is deterministic (follows rules, not arbitrary)
### Step 4: Documentation
**Gate Decision Document Created:**
- [ ] Story/epic/release info section complete (ID, title, description, links)
- [ ] Decision clearly stated (PASS / CONCERNS / FAIL / WAIVED)
- [ ] Decision date recorded
- [ ] Evaluator recorded (user or agent name)
**Evidence Summary Documented:**
- [ ] Test results summary complete (total, passed, failed, pass rates)
- [ ] Coverage summary complete (P0/P1 criteria, code coverage)
- [ ] NFR validation summary complete (security, performance, reliability, maintainability)
- [ ] Flakiness summary complete (burn-in iterations, flaky test count)
**Rationale Documented:**
- [ ] Decision rationale clearly explained
- [ ] Key evidence highlighted
- [ ] Assumptions and caveats noted (if any)
**Residual Risks Documented (if CONCERNS or WAIVED):**
- [ ] Unresolved P1/P2 issues listed
- [ ] Probability × impact estimated for each risk
- [ ] Mitigations or workarounds described
**Waivers Documented (if WAIVED):**
- [ ] Waiver reason documented (business justification)
- [ ] Waiver approver documented (name, role)
- [ ] Waiver expiry date documented
- [ ] Remediation plan documented (fix in next release, due date)
- [ ] Monitoring plan documented
**Critical Issues Documented (if FAIL or CONCERNS):**
- [ ] Top 5-10 critical issues listed
- [ ] Priority assigned to each issue (P0/P1/P2)
- [ ] Owner assigned to each issue
- [ ] Due date assigned to each issue
**Recommendations Documented:**
- [ ] Next steps clearly stated for decision type
- [ ] Deployment recommendation provided
- [ ] Monitoring recommendations provided (if applicable)
- [ ] Remediation recommendations provided (if applicable)
### Step 5: Status Updates and Notifications
**Status File Updated:**
- [ ] Gate decision appended to bmm-workflow-status.md (if append_to_history: true)
- [ ] Format correct: `[DATE] Gate Decision: DECISION - Target {ID} - {rationale}`
- [ ] Status file committed or staged for commit
**Gate YAML Created:**
- [ ] Gate YAML snippet generated with decision and criteria
- [ ] Evidence references included in YAML
- [ ] Next steps included in YAML
- [ ] YAML file saved to output folder
**Stakeholder Notification Generated:**
- [ ] Notification subject line created
- [ ] Notification body created with summary
- [ ] Recipients identified (PM, SM, DEV lead, stakeholders)
- [ ] Notification ready for delivery (if notify_stakeholders: true)
**Outputs Saved:**
- [ ] Gate decision document saved to `{output_file}`
- [ ] Gate YAML saved to `{output_folder}/gate-decision-{target}.yaml`
- [ ] All outputs are valid and readable
---
## Phase 2 Output Validation
### Gate Decision Document
**Completeness:**
- [ ] All required sections present (info, decision, evidence, rationale, next steps)
- [ ] No placeholder text or TODOs left in document
- [ ] All evidence references are accurate and complete
- [ ] All links to artifacts are valid
**Accuracy:**
- [ ] Decision matches applied criteria rules
- [ ] Test results match CI/CD pipeline output
- [ ] Coverage percentages match reports
- [ ] NFR status matches assessment document
- [ ] No contradictions or inconsistencies
**Clarity:**
- [ ] Decision rationale is clear and unambiguous
- [ ] Technical jargon is explained or avoided
- [ ] Stakeholders can understand next steps
- [ ] Recommendations are actionable
### Gate YAML
**Format:**
- [ ] YAML is valid (no syntax errors)
- [ ] All required fields present (target, decision, date, evaluator, criteria, evidence)
- [ ] Field values are correct data types (numbers, strings, dates)
**Content:**
- [ ] Criteria values match decision document
- [ ] Evidence references are accurate
- [ ] Next steps align with decision type
---
## Phase 2 Quality Checks
### Decision Integrity
- [ ] Decision is deterministic (follows rules, not arbitrary)
- [ ] P0 failures result in FAIL decision (unless waived)
- [ ] Security issues result in FAIL decision (unless waived - but should never be waived)
- [ ] Waivers have business justification and approver (if WAIVED)
- [ ] Residual risks are documented (if CONCERNS or WAIVED)
### Evidence-Based
- [ ] Decision is based on actual test results (not guesses)
- [ ] All claims are supported by evidence
- [ ] No assumptions without documentation
- [ ] Evidence sources are cited (CI run IDs, report URLs)
### Transparency
- [ ] Decision rationale is transparent and auditable
- [ ] Criteria evaluation is documented step-by-step
- [ ] Any deviations from standard process are explained
- [ ] Waiver justifications are clear (if applicable)
### Consistency
- [ ] Decision aligns with risk-governance knowledge fragment
- [ ] Priority framework (P0/P1/P2/P3) applied consistently
- [ ] Terminology consistent with test-quality knowledge fragment
- [ ] Decision matrix followed correctly
---
## Phase 2 Integration Points
### BMad Workflow Status
- [ ] Gate decision added to `bmm-workflow-status.md`
- [ ] Format matches existing gate history entries
- [ ] Timestamp is accurate
- [ ] Decision summary is concise (<80 chars)
### CI/CD Pipeline
- [ ] Gate YAML is CI/CD-compatible
- [ ] YAML can be parsed by pipeline automation
- [ ] Decision can be used to block/allow deployments
- [ ] Evidence references are accessible to pipeline
### Stakeholders
- [ ] Notification message is clear and actionable
- [ ] Decision is explained in non-technical terms
- [ ] Next steps are specific and time-bound
- [ ] Recipients are appropriate for decision type
---
## Phase 2 Compliance and Audit
### Audit Trail
- [ ] Decision date and time recorded
- [ ] Evaluator identified (user or agent)
- [ ] All evidence sources cited
- [ ] Decision criteria documented
- [ ] Rationale clearly explained
### Traceability
- [ ] Gate decision traceable to story/epic/release
- [ ] Evidence traceable to specific test runs
- [ ] Assessments traceable to workflows that created them
- [ ] Waiver traceable to approver (if applicable)
### Compliance
- [ ] Security requirements validated (no unresolved vulnerabilities)
- [ ] Quality standards met or waived with justification
- [ ] Regulatory requirements addressed (if applicable)
- [ ] Documentation sufficient for external audit
---
## Phase 2 Edge Cases and Exceptions
### Missing Evidence
- [ ] If test-design.md missing, decision still possible with test results + trace
- [ ] If traceability-matrix.md missing, decision still possible with test results (but Phase 1 should provide it)
- [ ] If nfr-assessment.md missing, NFR validation marked as NOT ASSESSED
- [ ] If code coverage missing, coverage criterion marked as NOT ASSESSED
- [ ] User acknowledged gaps in evidence or provided alternative proof
### Stale Evidence
- [ ] Evidence freshness checked (if validate_evidence_freshness: true)
- [ ] Warnings issued for assessments >7 days old
- [ ] User acknowledged stale evidence or re-ran workflows
- [ ] Decision document notes any stale evidence used
### Conflicting Evidence
- [ ] Conflicts between test results and assessments resolved
- [ ] Most recent/authoritative source identified
- [ ] Conflict resolution documented in decision rationale
- [ ] User consulted if conflict cannot be resolved
### Waiver Scenarios
- [ ] Waiver only used for FAIL decision (not PASS or CONCERNS)
- [ ] Waiver has business justification (not technical convenience)
- [ ] Waiver has named approver with authority (VP/CTO/PO)
- [ ] Waiver has expiry date (does NOT apply to future releases)
- [ ] Waiver has remediation plan with concrete due date
- [ ] Security vulnerabilities are NOT waived (enforced)
---
# FINAL VALIDATION (Both Phases)
## Non-Prescriptive Validation
- [ ] Traceability format adapted to team needs (not rigid template)
- [ ] Examples are minimal and focused on patterns
- [ ] Teams can extend with custom classifications
- [ ] Integration with external systems supported (JIRA, Azure DevOps)
- [ ] Compliance requirements considered (if applicable)
---
## Documentation and Communication
- [ ] All documents are readable and well-formatted
- [ ] Tables render correctly in markdown
- [ ] Code blocks have proper syntax highlighting
- [ ] Links are valid and accessible
- [ ] Recommendations are clear and prioritized
- [ ] Gate decision is prominent and unambiguous (Phase 2)
---
## Final Validation
**Phase 1 (Traceability):**
- [ ] All prerequisites met
- [ ] All acceptance criteria mapped or gaps documented
- [ ] P0 coverage is 100% OR documented as BLOCKER
- [ ] Gap analysis is complete and prioritized
- [ ] Test quality issues identified and flagged
- [ ] Deliverables generated and saved
**Phase 2 (Gate Decision):**
- [ ] All quality evidence gathered
- [ ] Decision criteria applied correctly
- [ ] Decision rationale documented
- [ ] Gate YAML ready for CI/CD integration
- [ ] Status file updated (if enabled)
- [ ] Stakeholders notified (if enabled)
**Workflow Complete:**
- [ ] Phase 1 completed successfully
- [ ] Phase 2 completed successfully (if enabled)
- [ ] All outputs validated and saved
- [ ] Ready to proceed based on gate decision
---
## Sign-Off
**Phase 1 - Traceability Status:**
- [ ] ✅ PASS - All quality gates met, no critical gaps
- [ ] ⚠️ WARN - P1 gaps exist, address before PR merge
- [ ] ❌ FAIL - P0 gaps exist, BLOCKER for release
**Phase 2 - Gate Decision Status (if enabled):**
- [ ] ✅ PASS - Deploy to production
- [ ] ⚠️ CONCERNS - Deploy with monitoring
- [ ] ❌ FAIL - Block deployment, fix issues
- [ ] 🔓 WAIVED - Deploy with business approval and remediation plan
**Next Actions:**
- If PASS (both phases): Proceed to deployment
- If WARN/CONCERNS: Address gaps/issues, proceed with monitoring
- If FAIL (either phase): Run `*atdd` for missing tests, fix issues, re-run `*trace`
- If WAIVED: Deploy with approved waiver, schedule remediation
---
## Notes
Record any issues, deviations, or important observations during workflow execution:
- **Phase 1 Issues**: [Note any traceability mapping challenges, missing tests, quality concerns]
- **Phase 2 Issues**: [Note any missing, stale, or conflicting evidence]
- **Decision Rationale**: [Document any nuanced reasoning or edge cases]
- **Waiver Details**: [Document waiver negotiations or approvals]
- **Follow-up Actions**: [List any actions required after gate decision]
---
<!-- Powered by BMAD-CORE™ -->

File diff suppressed because it is too large Load Diff

View File

@@ -1,675 +0,0 @@
# Traceability Matrix & Gate Decision - Story {STORY_ID}
**Story:** {STORY_TITLE}
**Date:** {DATE}
**Evaluator:** {user_name or TEA Agent}
---
Note: This workflow does not generate tests. If gaps exist, run `*atdd` or `*automate` to create coverage.
## PHASE 1: REQUIREMENTS TRACEABILITY
### Coverage Summary
| Priority | Total Criteria | FULL Coverage | Coverage % | Status |
| --------- | -------------- | ------------- | ---------- | ------------ |
| P0 | {P0_TOTAL} | {P0_FULL} | {P0_PCT}% | {P0_STATUS} |
| P1 | {P1_TOTAL} | {P1_FULL} | {P1_PCT}% | {P1_STATUS} |
| P2 | {P2_TOTAL} | {P2_FULL} | {P2_PCT}% | {P2_STATUS} |
| P3 | {P3_TOTAL} | {P3_FULL} | {P3_PCT}% | {P3_STATUS} |
| **Total** | **{TOTAL}** | **{FULL}** | **{PCT}%** | **{STATUS}** |
**Legend:**
- ✅ PASS - Coverage meets quality gate threshold
- ⚠️ WARN - Coverage below threshold but not critical
- ❌ FAIL - Coverage below minimum threshold (blocker)
---
### Detailed Mapping
#### {CRITERION_ID}: {CRITERION_DESCRIPTION} ({PRIORITY})
- **Coverage:** {COVERAGE_STATUS} {STATUS_ICON}
- **Tests:**
- `{TEST_ID}` - {TEST_FILE}:{LINE}
- **Given:** {GIVEN}
- **When:** {WHEN}
- **Then:** {THEN}
- `{TEST_ID_2}` - {TEST_FILE_2}:{LINE}
- **Given:** {GIVEN_2}
- **When:** {WHEN_2}
- **Then:** {THEN_2}
- **Gaps:** (if PARTIAL or UNIT-ONLY or INTEGRATION-ONLY)
- Missing: {MISSING_SCENARIO_1}
- Missing: {MISSING_SCENARIO_2}
- **Recommendation:** {RECOMMENDATION_TEXT}
---
#### Example: AC-1: User can login with email and password (P0)
- **Coverage:** FULL ✅
- **Tests:**
- `1.3-E2E-001` - tests/e2e/auth.spec.ts:12
- **Given:** User has valid credentials
- **When:** User submits login form
- **Then:** User is redirected to dashboard
- `1.3-UNIT-001` - tests/unit/auth-service.spec.ts:8
- **Given:** Valid email and password hash
- **When:** validateCredentials is called
- **Then:** Returns user object
---
#### Example: AC-3: User can reset password via email (P1)
- **Coverage:** PARTIAL ⚠️
- **Tests:**
- `1.3-E2E-003` - tests/e2e/auth.spec.ts:44
- **Given:** User requests password reset
- **When:** User clicks reset link in email
- **Then:** User can set new password
- **Gaps:**
- Missing: Email delivery validation
- Missing: Expired token handling (error path)
- Missing: Invalid token handling (security test)
- Missing: Unit test for token generation logic
- **Recommendation:** Add `1.3-API-001` for email service integration testing and `1.3-UNIT-003` for token generation logic. Add `1.3-E2E-004` for error path validation (expired/invalid tokens).
---
### Gap Analysis
#### Critical Gaps (BLOCKER) ❌
{CRITICAL_GAP_COUNT} gaps found. **Do not release until resolved.**
1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P0)
- Current Coverage: {COVERAGE_STATUS}
- Missing Tests: {MISSING_TEST_DESCRIPTION}
- Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
- Impact: {IMPACT_DESCRIPTION}
---
#### High Priority Gaps (PR BLOCKER) ⚠️
{HIGH_GAP_COUNT} gaps found. **Address before PR merge.**
1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P1)
- Current Coverage: {COVERAGE_STATUS}
- Missing Tests: {MISSING_TEST_DESCRIPTION}
- Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
- Impact: {IMPACT_DESCRIPTION}
---
#### Medium Priority Gaps (Nightly) ⚠️
{MEDIUM_GAP_COUNT} gaps found. **Address in nightly test improvements.**
1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P2)
- Current Coverage: {COVERAGE_STATUS}
- Recommend: {RECOMMENDED_TEST_ID} ({RECOMMENDED_TEST_LEVEL})
---
#### Low Priority Gaps (Optional)
{LOW_GAP_COUNT} gaps found. **Optional - add if time permits.**
1. **{CRITERION_ID}: {CRITERION_DESCRIPTION}** (P3)
- Current Coverage: {COVERAGE_STATUS}
---
### Quality Assessment
#### Tests with Issues
**BLOCKER Issues**
- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
**WARNING Issues** ⚠️
- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
**INFO Issues**
- `{TEST_ID}` - {ISSUE_DESCRIPTION} - {REMEDIATION}
---
#### Example Quality Issues
**WARNING Issues** ⚠️
- `1.3-E2E-001` - 145 seconds (exceeds 90s target) - Optimize fixture setup to reduce test duration
- `1.3-UNIT-005` - 320 lines (exceeds 300 line limit) - Split into multiple focused test files
**INFO Issues**
- `1.3-E2E-002` - Missing Given-When-Then structure - Refactor describe block to use BDD format
---
#### Tests Passing Quality Gates
**{PASSING_TEST_COUNT}/{TOTAL_TEST_COUNT} tests ({PASSING_PCT}%) meet all quality criteria** ✅
---
### Duplicate Coverage Analysis
#### Acceptable Overlap (Defense in Depth)
- {CRITERION_ID}: Tested at unit (business logic) and E2E (user journey) ✅
#### Unacceptable Duplication ⚠️
- {CRITERION_ID}: Same validation at E2E and Component level
- Recommendation: Remove {TEST_ID} or consolidate with {OTHER_TEST_ID}
---
### Coverage by Test Level
| Test Level | Tests | Criteria Covered | Coverage % |
| ---------- | ----------------- | -------------------- | ---------------- |
| E2E | {E2E_COUNT} | {E2E_CRITERIA} | {E2E_PCT}% |
| API | {API_COUNT} | {API_CRITERIA} | {API_PCT}% |
| Component | {COMP_COUNT} | {COMP_CRITERIA} | {COMP_PCT}% |
| Unit | {UNIT_COUNT} | {UNIT_CRITERIA} | {UNIT_PCT}% |
| **Total** | **{TOTAL_TESTS}** | **{TOTAL_CRITERIA}** | **{TOTAL_PCT}%** |
---
### Traceability Recommendations
#### Immediate Actions (Before PR Merge)
1. **{ACTION_1}** - {DESCRIPTION}
2. **{ACTION_2}** - {DESCRIPTION}
#### Short-term Actions (This Sprint)
1. **{ACTION_1}** - {DESCRIPTION}
2. **{ACTION_2}** - {DESCRIPTION}
#### Long-term Actions (Backlog)
1. **{ACTION_1}** - {DESCRIPTION}
---
#### Example Recommendations
**Immediate Actions (Before PR Merge)**
1. **Add P1 Password Reset Tests** - Implement `1.3-API-001` for email service integration and `1.3-E2E-004` for error path validation. P1 coverage currently at 80%, target is 90%.
2. **Optimize Slow E2E Test** - Refactor `1.3-E2E-001` to use faster fixture setup. Currently 145s, target is <90s.
**Short-term Actions (This Sprint)**
1. **Enhance P2 Coverage** - Add E2E validation for session timeout (`1.3-E2E-005`). Currently UNIT-ONLY coverage.
2. **Split Large Test File** - Break `1.3-UNIT-005` (320 lines) into multiple focused test files (<300 lines each).
**Long-term Actions (Backlog)**
1. **Enrich P3 Coverage** - Add tests for edge cases in P3 criteria if time permits.
---
## PHASE 2: QUALITY GATE DECISION
**Gate Type:** {story | epic | release | hotfix}
**Decision Mode:** {deterministic | manual}
---
### Evidence Summary
#### Test Execution Results
- **Total Tests**: {total_count}
- **Passed**: {passed_count} ({pass_percentage}%)
- **Failed**: {failed_count} ({fail_percentage}%)
- **Skipped**: {skipped_count} ({skip_percentage}%)
- **Duration**: {total_duration}
**Priority Breakdown:**
- **P0 Tests**: {p0_passed}/{p0_total} passed ({p0_pass_rate}%) {✅ | ❌}
- **P1 Tests**: {p1_passed}/{p1_total} passed ({p1_pass_rate}%) {✅ | | ❌}
- **P2 Tests**: {p2_passed}/{p2_total} passed ({p2_pass_rate}%) {informational}
- **P3 Tests**: {p3_passed}/{p3_total} passed ({p3_pass_rate}%) {informational}
**Overall Pass Rate**: {overall_pass_rate}% {✅ | | ❌}
**Test Results Source**: {CI_run_id | test_report_url | local_run}
---
#### Coverage Summary (from Phase 1)
**Requirements Coverage:**
- **P0 Acceptance Criteria**: {p0_covered}/{p0_total} covered ({p0_coverage}%) {✅ | ❌}
- **P1 Acceptance Criteria**: {p1_covered}/{p1_total} covered ({p1_coverage}%) {✅ | | ❌}
- **P2 Acceptance Criteria**: {p2_covered}/{p2_total} covered ({p2_coverage}%) {informational}
- **Overall Coverage**: {overall_coverage}%
**Code Coverage** (if available):
- **Line Coverage**: {line_coverage}% {✅ | | ❌}
- **Branch Coverage**: {branch_coverage}% {✅ | | ❌}
- **Function Coverage**: {function_coverage}% {✅ | | ❌}
**Coverage Source**: {coverage_report_url | coverage_file_path}
---
#### Non-Functional Requirements (NFRs)
**Security**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | | ❌}
- Security Issues: {security_issue_count}
- {details_if_issues}
**Performance**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | | ❌}
- {performance_metrics_summary}
**Reliability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | | ❌}
- {reliability_metrics_summary}
**Maintainability**: {PASS | CONCERNS | FAIL | NOT_ASSESSED} {✅ | | ❌}
- {maintainability_metrics_summary}
**NFR Source**: {nfr_assessment_file_path | not_assessed}
---
#### Flakiness Validation
**Burn-in Results** (if available):
- **Burn-in Iterations**: {iteration_count} (e.g., 10)
- **Flaky Tests Detected**: {flaky_test_count} {✅ if 0 | if >0}
- **Stability Score**: {stability_percentage}%
**Flaky Tests List** (if any):
- {flaky_test_1_name} - {failure_rate}
- {flaky_test_2_name} - {failure_rate}
**Burn-in Source**: {CI_burn_in_run_id | not_available}
---
### Decision Criteria Evaluation
#### P0 Criteria (Must ALL Pass)
| Criterion | Threshold | Actual | Status |
| --------------------- | --------- | ------------------------- | -------- | -------- |
| P0 Coverage | 100% | {p0_coverage}% | {✅ PASS | ❌ FAIL} |
| P0 Test Pass Rate | 100% | {p0_pass_rate}% | {✅ PASS | ❌ FAIL} |
| Security Issues | 0 | {security_issue_count} | {✅ PASS | ❌ FAIL} |
| Critical NFR Failures | 0 | {critical_nfr_fail_count} | {✅ PASS | ❌ FAIL} |
| Flaky Tests | 0 | {flaky_test_count} | {✅ PASS | ❌ FAIL} |
**P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
---
#### P1 Criteria (Required for PASS, May Accept for CONCERNS)
| Criterion | Threshold | Actual | Status |
| ---------------------- | ------------------------- | -------------------- | -------- | ----------- | -------- |
| P1 Coverage | ≥{min_p1_coverage}% | {p1_coverage}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
| P1 Test Pass Rate | ≥{min_p1_pass_rate}% | {p1_pass_rate}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
| Overall Test Pass Rate | ≥{min_overall_pass_rate}% | {overall_pass_rate}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
| Overall Coverage | ≥{min_coverage}% | {overall_coverage}% | {✅ PASS | ⚠️ CONCERNS | ❌ FAIL} |
**P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
---
#### P2/P3 Criteria (Informational, Don't Block)
| Criterion | Actual | Notes |
| ----------------- | --------------- | ------------------------------------------------------------ |
| P2 Test Pass Rate | {p2_pass_rate}% | {allow_p2_failures ? "Tracked, doesn't block" : "Evaluated"} |
| P3 Test Pass Rate | {p3_pass_rate}% | {allow_p3_failures ? "Tracked, doesn't block" : "Evaluated"} |
---
### GATE DECISION: {PASS | CONCERNS | FAIL | WAIVED}
---
### Rationale
{Explain decision based on criteria evaluation}
{Highlight key evidence that drove decision}
{Note any assumptions or caveats}
**Example (PASS):**
> All P0 criteria met with 100% coverage and pass rates across critical tests. All P1 criteria exceeded thresholds with 98% overall pass rate and 92% coverage. No security issues detected. No flaky tests in validation. Feature is ready for production deployment with standard monitoring.
**Example (CONCERNS):**
> All P0 criteria met, ensuring critical user journeys are protected. However, P1 coverage (88%) falls below threshold (90%) due to missing E2E test for AC-5 edge case. Overall pass rate (96%) is excellent. Issues are non-critical and have acceptable workarounds. Risk is low enough to deploy with enhanced monitoring.
**Example (FAIL):**
> CRITICAL BLOCKERS DETECTED:
>
> 1. P0 coverage incomplete (80%) - AC-2 security validation missing
> 2. P0 test failures (75% pass rate) in core search functionality
> 3. Unresolved SQL injection vulnerability in search filter (CRITICAL)
>
> Release MUST BE BLOCKED until P0 issues are resolved. Security vulnerability cannot be waived.
**Example (WAIVED):**
> Original decision was FAIL due to P0 test failure in legacy Excel 2007 export module (affects <1% of users). However, release contains critical GDPR compliance features required by regulatory deadline (Oct 15). Business has approved waiver given:
>
> - Regulatory priority overrides legacy module risk
> - Workaround available (use Excel 2010+)
> - Issue will be fixed in v2.4.1 hotfix (due Oct 20)
> - Enhanced monitoring in place
---
### {Section: Delete if not applicable}
#### Residual Risks (For CONCERNS or WAIVED)
List unresolved P1/P2 issues that don't block release but should be tracked:
1. **{Risk Description}**
- **Priority**: P1 | P2
- **Probability**: Low | Medium | High
- **Impact**: Low | Medium | High
- **Risk Score**: {probability × impact}
- **Mitigation**: {workaround or monitoring plan}
- **Remediation**: {fix in next sprint/release}
**Overall Residual Risk**: {LOW | MEDIUM | HIGH}
---
#### Waiver Details (For WAIVED only)
**Original Decision**: ❌ FAIL
**Reason for Failure**:
- {list_of_blocking_issues}
**Waiver Information**:
- **Waiver Reason**: {business_justification}
- **Waiver Approver**: {name}, {role} (e.g., Jane Doe, VP Engineering)
- **Approval Date**: {YYYY-MM-DD}
- **Waiver Expiry**: {YYYY-MM-DD} (**NOTE**: Does NOT apply to next release)
**Monitoring Plan**:
- {enhanced_monitoring_1}
- {enhanced_monitoring_2}
- {escalation_criteria}
**Remediation Plan**:
- **Fix Target**: {next_release_version} (e.g., v2.4.1 hotfix)
- **Due Date**: {YYYY-MM-DD}
- **Owner**: {team_or_person}
- **Verification**: {how_fix_will_be_verified}
**Business Justification**:
{detailed_explanation_of_why_waiver_is_acceptable}
---
#### Critical Issues (For FAIL or CONCERNS)
Top blockers requiring immediate attention:
| Priority | Issue | Description | Owner | Due Date | Status |
| -------- | ------------- | ------------------- | ------------ | ------------ | ------------------ |
| P0 | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
| P0 | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
| P1 | {issue_title} | {brief_description} | {owner_name} | {YYYY-MM-DD} | {OPEN/IN_PROGRESS} |
**Blocking Issues Count**: {p0_blocker_count} P0 blockers, {p1_blocker_count} P1 issues
---
### Gate Recommendations
#### For PASS Decision ✅
1. **Proceed to deployment**
- Deploy to staging environment
- Validate with smoke tests
- Monitor key metrics for 24-48 hours
- Deploy to production with standard monitoring
2. **Post-Deployment Monitoring**
- {metric_1_to_monitor}
- {metric_2_to_monitor}
- {alert_thresholds}
3. **Success Criteria**
- {success_criterion_1}
- {success_criterion_2}
---
#### For CONCERNS Decision ⚠️
1. **Deploy with Enhanced Monitoring**
- Deploy to staging with extended validation period
- Enable enhanced logging/monitoring for known risk areas:
- {risk_area_1}
- {risk_area_2}
- Set aggressive alerts for potential issues
- Deploy to production with caution
2. **Create Remediation Backlog**
- Create story: "{fix_title_1}" (Priority: {priority})
- Create story: "{fix_title_2}" (Priority: {priority})
- Target sprint: {next_sprint}
3. **Post-Deployment Actions**
- Monitor {specific_areas} closely for {time_period}
- Weekly status updates on remediation progress
- Re-assess after fixes deployed
---
#### For FAIL Decision ❌
1. **Block Deployment Immediately**
- Do NOT deploy to any environment
- Notify stakeholders of blocking issues
- Escalate to tech lead and PM
2. **Fix Critical Issues**
- Address P0 blockers listed in Critical Issues section
- Owner assignments confirmed
- Due dates agreed upon
- Daily standup on blocker resolution
3. **Re-Run Gate After Fixes**
- Re-run full test suite after fixes
- Re-run `bmad tea *trace` workflow
- Verify decision is PASS before deploying
---
#### For WAIVED Decision 🔓
1. **Deploy with Business Approval**
- Confirm waiver approver has signed off
- Document waiver in release notes
- Notify all stakeholders of waived risks
2. **Aggressive Monitoring**
- {enhanced_monitoring_plan}
- {escalation_procedures}
- Daily checks on waived risk areas
3. **Mandatory Remediation**
- Fix MUST be completed by {due_date}
- Issue CANNOT be waived in next release
- Track remediation progress weekly
- Verify fix in next gate
---
### Next Steps
**Immediate Actions** (next 24-48 hours):
1. {action_1}
2. {action_2}
3. {action_3}
**Follow-up Actions** (next sprint/release):
1. {action_1}
2. {action_2}
3. {action_3}
**Stakeholder Communication**:
- Notify PM: {decision_summary}
- Notify SM: {decision_summary}
- Notify DEV lead: {decision_summary}
---
## Integrated YAML Snippet (CI/CD)
```yaml
traceability_and_gate:
# Phase 1: Traceability
traceability:
story_id: "{STORY_ID}"
date: "{DATE}"
coverage:
overall: {OVERALL_PCT}%
p0: {P0_PCT}%
p1: {P1_PCT}%
p2: {P2_PCT}%
p3: {P3_PCT}%
gaps:
critical: {CRITICAL_COUNT}
high: {HIGH_COUNT}
medium: {MEDIUM_COUNT}
low: {LOW_COUNT}
quality:
passing_tests: {PASSING_COUNT}
total_tests: {TOTAL_TESTS}
blocker_issues: {BLOCKER_COUNT}
warning_issues: {WARNING_COUNT}
recommendations:
- "{RECOMMENDATION_1}"
- "{RECOMMENDATION_2}"
# Phase 2: Gate Decision
gate_decision:
decision: "{PASS | CONCERNS | FAIL | WAIVED}"
gate_type: "{story | epic | release | hotfix}"
decision_mode: "{deterministic | manual}"
criteria:
p0_coverage: {p0_coverage}%
p0_pass_rate: {p0_pass_rate}%
p1_coverage: {p1_coverage}%
p1_pass_rate: {p1_pass_rate}%
overall_pass_rate: {overall_pass_rate}%
overall_coverage: {overall_coverage}%
security_issues: {security_issue_count}
critical_nfrs_fail: {critical_nfr_fail_count}
flaky_tests: {flaky_test_count}
thresholds:
min_p0_coverage: 100
min_p0_pass_rate: 100
min_p1_coverage: {min_p1_coverage}
min_p1_pass_rate: {min_p1_pass_rate}
min_overall_pass_rate: {min_overall_pass_rate}
min_coverage: {min_coverage}
evidence:
test_results: "{CI_run_id | test_report_url}"
traceability: "{trace_file_path}"
nfr_assessment: "{nfr_file_path}"
code_coverage: "{coverage_report_url}"
next_steps: "{brief_summary_of_recommendations}"
waiver: # Only if WAIVED
reason: "{business_justification}"
approver: "{name}, {role}"
expiry: "{YYYY-MM-DD}"
remediation_due: "{YYYY-MM-DD}"
```
---
## Related Artifacts
- **Story File:** {STORY_FILE_PATH}
- **Test Design:** {TEST_DESIGN_PATH} (if available)
- **Tech Spec:** {TECH_SPEC_PATH} (if available)
- **Test Results:** {TEST_RESULTS_PATH}
- **NFR Assessment:** {NFR_FILE_PATH} (if available)
- **Test Files:** {TEST_DIR_PATH}
---
## Sign-Off
**Phase 1 - Traceability Assessment:**
- Overall Coverage: {OVERALL_PCT}%
- P0 Coverage: {P0_PCT}% {P0_STATUS}
- P1 Coverage: {P1_PCT}% {P1_STATUS}
- Critical Gaps: {CRITICAL_COUNT}
- High Priority Gaps: {HIGH_COUNT}
**Phase 2 - Gate Decision:**
- **Decision**: {PASS | CONCERNS | FAIL | WAIVED} {STATUS_ICON}
- **P0 Evaluation**: {✅ ALL PASS | ❌ ONE OR MORE FAILED}
- **P1 Evaluation**: {✅ ALL PASS | ⚠️ SOME CONCERNS | ❌ FAILED}
**Overall Status:** {STATUS} {STATUS_ICON}
**Next Steps:**
- If PASS ✅: Proceed to deployment
- If CONCERNS ⚠️: Deploy with monitoring, create remediation backlog
- If FAIL ❌: Block deployment, fix critical issues, re-run workflow
- If WAIVED 🔓: Deploy with business approval and aggressive monitoring
**Generated:** {DATE}
**Workflow:** testarch-trace v4.0 (Enhanced with Gate Decision)
---
<!-- Powered by BMAD-CORE™ -->

View File

@@ -1,55 +0,0 @@
# Test Architect workflow: trace (enhanced with gate decision)
name: testarch-trace
description: "Generate requirements-to-tests traceability matrix, analyze coverage, and make quality gate decision (PASS/CONCERNS/FAIL/WAIVED)"
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/trace"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
template: "{installed_path}/trace-template.md"
# Variables and inputs
variables:
# Directory paths
test_dir: "{project-root}/tests" # Root test directory
source_dir: "{project-root}/src" # Source code directory
# Workflow behavior
coverage_levels: "e2e,api,component,unit" # Which test levels to trace
gate_type: "story" # story | epic | release | hotfix - determines gate scope
decision_mode: "deterministic" # deterministic (rule-based) | manual (team decision)
# Output configuration
default_output_file: "{output_folder}/traceability-matrix.md"
# Required tools
required_tools:
- read_file # Read story, test files, BMad artifacts
- write_file # Create traceability matrix, gate YAML
- list_files # Discover test files
- search_repo # Find tests by test ID, describe blocks
- glob # Find test files matching patterns
tags:
- qa
- traceability
- test-architect
- coverage
- requirements
- gate
- decision
- release
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true