Files
brachnha-insight/_bmad-output/implementation-artifacts/test-design-epic-2.md
Max e9e6fadb1d fix: ChatBubble crash and DeepSeek API compatibility
- Fix ChatBubble to handle non-string content with String() wrapper
- Fix API route to use generateText for non-streaming requests
- Add @ai-sdk/openai-compatible for non-OpenAI providers (DeepSeek, etc.)
- Use Chat Completions API instead of Responses API for compatible providers
- Update ChatBubble tests and fix component exports to kebab-case
- Remove stale PascalCase ChatBubble.tsx file
2026-01-26 16:55:05 +07:00

5.1 KiB

Test Design: Epic 2 - The Magic Mirror

Epic: 2 (Ghostwriter & Draft Refinement) Scope: Epic-Level Date: 2026-01-25 Author: QA Architect (AI)

1. Risk Assessment

Identified Risks

Risk ID Category Title Description Probability (1-3) Impact (1-3) Score Action
R-2.1 BUS Hallucination / Poor Quality Ghostwriter generates content unrelated to the user's insight or creates fictional details. 2 (Possible) 3 (Critical) 6 MITIGATE
R-2.2 TECH Context Window Overflow Long chat sessions exceed the token limit for the generation prompt, causing truncation or errors. 2 (Possible) 3 (Critical) 6 MITIGATE
R-2.3 TECH State Desynchronization UI gets stuck in "Drafting" state if the LLM request hangs or fails silently. 2 (Possible) 2 (Degraded) 4 MONITOR
R-2.4 TECH Clipboard API Failures "One-Click Copy" fails on certain mobile browsers due to permission policies. 2 (Possible) 2 (Degraded) 4 MONITOR
R-2.5 UI Markdown Rendering Issues Generated artifacts break layout (e.g., extremely long code blocks, tables on mobile). 1 (Unlikely) 1 (Minor) 1 DOCUMENT

Mitigation Strategies (High Risks)

R-2.1: Hallucination / Poor Quality (Score 6)

  • Mitigation: Implement specific "Grounding" prompts. Use evals (automated evaluation) to check if output tokens overlap with input "Insight" tokens.
  • Owner: Prompt Engineer / Dev
  • Validation: Automated Prompt Tests (checking recall of key facts).

R-2.2: Context Window Overflow (Score 6)

  • Mitigation: Implement strict token counting utility. Summarize or truncate chat history intelligently before sending to Ghostwriter.
  • Owner: Dev Team
  • Validation: Unit tests for PromptEngine with large mock inputs.

2. Test Coverage Plan

Acceptance Criteria Mapping

Story ID Scenario Level Priority Risk Link
2.1 2.1.1 Ghostwriter receives correct chat context (Prompt Construction) Unit P0 R-2.1
2.1 2.1.2 Token limit enforcement (Truncation/Error) Unit P0 R-2.2
2.1 2.1.3 Generated generation is valid Markdown Unit P1 R-2.5
2.2 2.2.1 Draft Sheet slides up upon completion Component P1 -
2.2 2.2.2 Draft view renders Markdown correctly (Headers, lists) Component P2 R-2.5
2.3 2.3.1 "Thumbs Down" triggers feedback prompt Integration P1 -
2.3 2.3.2 Regeneration respects user critique E2E P0 R-2.1
2.4 2.4.1 "Copy" button places text in clipboard E2E P0 R-2.4
2.4 2.4.2 "Save" marks session as completed in DB Integration P0 -

Test Levels Strategy

  • Unit Tests:
    • PromptEngine: Verify context insertion and token limits.
    • MarkdownParser: Verify safe rendering logic.
  • Component Tests:
    • DraftSheet: Verify open/close animations and state binding (Zustand).
    • MarkdownRenderer: Visual regression tests for styles.
  • Integration Tests:
    • GhostwriterService: Mock LLM response -> Verify State Update -> Verify DB Update.
  • E2E Tests:
    • Full Flow (P0): Chat -> Generate -> Copy to Clipboard.
    • Refinement Flow (P1): Generate -> Critique -> Regenerate.

3. Execution Plan

Smoke Tests (Pre-Merge)

  1. Unit: PromptEngine sanity checks.
  2. E2E: Basic Generation Flow (Mocked LLM).

Regression Suite (Nightly)

  1. Unit: Token limit edge cases.
  2. E2E: Clipboard functionality on mobile viewport emulation.
  3. Prompt Evals: Quality checks on sample inputs.

Resource Estimates

  • P0 Scenarios: 5 tests (approx. 5 hours implementation).
  • P1 Scenarios: 3 tests (approx. 2 hours implementation).
  • P2 Scenarios: 1 test (approx. 0.5 hours implementation).
  • Total Effort: ~1 day.

4. Quality Gate Criteria

  • Pass Rate: 100% on P0 tests.
  • Performance: Generation starts within 5s (mocked latency).
  • Mitigation: Token limiter unit tests must pass.