# Test Design: Epic 2 - The Magic Mirror **Epic:** 2 (Ghostwriter & Draft Refinement) **Scope:** Epic-Level **Date:** 2026-01-25 **Author:** QA Architect (AI) ## 1. Risk Assessment ### Identified Risks | Risk ID | Category | Title | Description | Probability (1-3) | Impact (1-3) | Score | Action | | :-------- | :------- | :------------------------------- | :------------------------------------------------------------------------------------------------- | :---------------- | :----------- | :---- | :----------- | | **R-2.1** | BUS | **Hallucination / Poor Quality** | Ghostwriter generates content unrelated to the user's insight or creates fictional details. | 2 (Possible) | 3 (Critical) | **6** | **MITIGATE** | | **R-2.2** | TECH | **Context Window Overflow** | Long chat sessions exceed the token limit for the generation prompt, causing truncation or errors. | 2 (Possible) | 3 (Critical) | **6** | **MITIGATE** | | **R-2.3** | TECH | **State Desynchronization** | UI gets stuck in "Drafting" state if the LLM request hangs or fails silently. | 2 (Possible) | 2 (Degraded) | 4 | MONITOR | | **R-2.4** | TECH | **Clipboard API Failures** | "One-Click Copy" fails on certain mobile browsers due to permission policies. | 2 (Possible) | 2 (Degraded) | 4 | MONITOR | | **R-2.5** | UI | **Markdown Rendering Issues** | Generated artifacts break layout (e.g., extremely long code blocks, tables on mobile). | 1 (Unlikely) | 1 (Minor) | 1 | DOCUMENT | ### Mitigation Strategies (High Risks) **R-2.1: Hallucination / Poor Quality (Score 6)** * **Mitigation:** Implement specific "Grounding" prompts. Use `evals` (automated evaluation) to check if output tokens overlap with input "Insight" tokens. * **Owner:** Prompt Engineer / Dev * **Validation:** Automated Prompt Tests (checking recall of key facts). **R-2.2: Context Window Overflow (Score 6)** * **Mitigation:** Implement strict token counting utility. Summarize or truncate chat history intelligently before sending to Ghostwriter. * **Owner:** Dev Team * **Validation:** Unit tests for `PromptEngine` with large mock inputs. --- ## 2. Test Coverage Plan ### Acceptance Criteria Mapping | Story | ID | Scenario | Level | Priority | Risk Link | | :------ | :---- | :-------------------------------------------------------------- | :---------- | :------- | :-------- | | **2.1** | 2.1.1 | Ghostwriter receives correct chat context (Prompt Construction) | Unit | **P0** | R-2.1 | | **2.1** | 2.1.2 | Token limit enforcement (Truncation/Error) | Unit | **P0** | R-2.2 | | **2.1** | 2.1.3 | Generated generation is valid Markdown | Unit | P1 | R-2.5 | | **2.2** | 2.2.1 | Draft Sheet slides up upon completion | Component | P1 | - | | **2.2** | 2.2.2 | Draft view renders Markdown correctly (Headers, lists) | Component | P2 | R-2.5 | | **2.3** | 2.3.1 | "Thumbs Down" triggers feedback prompt | Integration | P1 | - | | **2.3** | 2.3.2 | Regeneration respects user critique | E2E | **P0** | R-2.1 | | **2.4** | 2.4.1 | "Copy" button places text in clipboard | E2E | **P0** | R-2.4 | | **2.4** | 2.4.2 | "Save" marks session as completed in DB | Integration | **P0** | - | ### Test Levels Strategy * **Unit Tests:** * `PromptEngine`: Verify context insertion and token limits. * `MarkdownParser`: Verify safe rendering logic. * **Component Tests:** * `DraftSheet`: Verify open/close animations and state binding (Zustand). * `MarkdownRenderer`: Visual regression tests for styles. * **Integration Tests:** * `GhostwriterService`: Mock LLM response -> Verify State Update -> Verify DB Update. * **E2E Tests:** * **Full Flow (P0):** Chat -> Generate -> Copy to Clipboard. * **Refinement Flow (P1):** Generate -> Critique -> Regenerate. --- ## 3. Execution Plan ### Smoke Tests (Pre-Merge) 1. **Unit:** `PromptEngine` sanity checks. 2. **E2E:** Basic Generation Flow (Mocked LLM). ### Regression Suite (Nightly) 1. **Unit:** Token limit edge cases. 2. **E2E:** Clipboard functionality on mobile viewport emulation. 3. **Prompt Evals:** Quality checks on sample inputs. ### Resource Estimates * **P0 Scenarios:** 5 tests (approx. 5 hours implementation). * **P1 Scenarios:** 3 tests (approx. 2 hours implementation). * **P2 Scenarios:** 1 test (approx. 0.5 hours implementation). * **Total Effort:** ~1 day. --- ## 4. Quality Gate Criteria * **Pass Rate:** 100% on P0 tests. * **Performance:** Generation starts within 5s (mocked latency). * **Mitigation:** Token limiter unit tests must pass.