- Fix ChatBubble to handle non-string content with String() wrapper - Fix API route to use generateText for non-streaming requests - Add @ai-sdk/openai-compatible for non-OpenAI providers (DeepSeek, etc.) - Use Chat Completions API instead of Responses API for compatible providers - Update ChatBubble tests and fix component exports to kebab-case - Remove stale PascalCase ChatBubble.tsx file
90 lines
5.1 KiB
Markdown
90 lines
5.1 KiB
Markdown
# Test Design: Epic 2 - The Magic Mirror
|
|
|
|
**Epic:** 2 (Ghostwriter & Draft Refinement)
|
|
**Scope:** Epic-Level
|
|
**Date:** 2026-01-25
|
|
**Author:** QA Architect (AI)
|
|
|
|
## 1. Risk Assessment
|
|
|
|
### Identified Risks
|
|
|
|
| Risk ID | Category | Title | Description | Probability (1-3) | Impact (1-3) | Score | Action |
|
|
| :-------- | :------- | :------------------------------- | :------------------------------------------------------------------------------------------------- | :---------------- | :----------- | :---- | :----------- |
|
|
| **R-2.1** | BUS | **Hallucination / Poor Quality** | Ghostwriter generates content unrelated to the user's insight or creates fictional details. | 2 (Possible) | 3 (Critical) | **6** | **MITIGATE** |
|
|
| **R-2.2** | TECH | **Context Window Overflow** | Long chat sessions exceed the token limit for the generation prompt, causing truncation or errors. | 2 (Possible) | 3 (Critical) | **6** | **MITIGATE** |
|
|
| **R-2.3** | TECH | **State Desynchronization** | UI gets stuck in "Drafting" state if the LLM request hangs or fails silently. | 2 (Possible) | 2 (Degraded) | 4 | MONITOR |
|
|
| **R-2.4** | TECH | **Clipboard API Failures** | "One-Click Copy" fails on certain mobile browsers due to permission policies. | 2 (Possible) | 2 (Degraded) | 4 | MONITOR |
|
|
| **R-2.5** | UI | **Markdown Rendering Issues** | Generated artifacts break layout (e.g., extremely long code blocks, tables on mobile). | 1 (Unlikely) | 1 (Minor) | 1 | DOCUMENT |
|
|
|
|
### Mitigation Strategies (High Risks)
|
|
|
|
**R-2.1: Hallucination / Poor Quality (Score 6)**
|
|
* **Mitigation:** Implement specific "Grounding" prompts. Use `evals` (automated evaluation) to check if output tokens overlap with input "Insight" tokens.
|
|
* **Owner:** Prompt Engineer / Dev
|
|
* **Validation:** Automated Prompt Tests (checking recall of key facts).
|
|
|
|
**R-2.2: Context Window Overflow (Score 6)**
|
|
* **Mitigation:** Implement strict token counting utility. Summarize or truncate chat history intelligently before sending to Ghostwriter.
|
|
* **Owner:** Dev Team
|
|
* **Validation:** Unit tests for `PromptEngine` with large mock inputs.
|
|
|
|
---
|
|
|
|
## 2. Test Coverage Plan
|
|
|
|
### Acceptance Criteria Mapping
|
|
|
|
| Story | ID | Scenario | Level | Priority | Risk Link |
|
|
| :------ | :---- | :-------------------------------------------------------------- | :---------- | :------- | :-------- |
|
|
| **2.1** | 2.1.1 | Ghostwriter receives correct chat context (Prompt Construction) | Unit | **P0** | R-2.1 |
|
|
| **2.1** | 2.1.2 | Token limit enforcement (Truncation/Error) | Unit | **P0** | R-2.2 |
|
|
| **2.1** | 2.1.3 | Generated generation is valid Markdown | Unit | P1 | R-2.5 |
|
|
| **2.2** | 2.2.1 | Draft Sheet slides up upon completion | Component | P1 | - |
|
|
| **2.2** | 2.2.2 | Draft view renders Markdown correctly (Headers, lists) | Component | P2 | R-2.5 |
|
|
| **2.3** | 2.3.1 | "Thumbs Down" triggers feedback prompt | Integration | P1 | - |
|
|
| **2.3** | 2.3.2 | Regeneration respects user critique | E2E | **P0** | R-2.1 |
|
|
| **2.4** | 2.4.1 | "Copy" button places text in clipboard | E2E | **P0** | R-2.4 |
|
|
| **2.4** | 2.4.2 | "Save" marks session as completed in DB | Integration | **P0** | - |
|
|
|
|
### Test Levels Strategy
|
|
|
|
* **Unit Tests:**
|
|
* `PromptEngine`: Verify context insertion and token limits.
|
|
* `MarkdownParser`: Verify safe rendering logic.
|
|
* **Component Tests:**
|
|
* `DraftSheet`: Verify open/close animations and state binding (Zustand).
|
|
* `MarkdownRenderer`: Visual regression tests for styles.
|
|
* **Integration Tests:**
|
|
* `GhostwriterService`: Mock LLM response -> Verify State Update -> Verify DB Update.
|
|
* **E2E Tests:**
|
|
* **Full Flow (P0):** Chat -> Generate -> Copy to Clipboard.
|
|
* **Refinement Flow (P1):** Generate -> Critique -> Regenerate.
|
|
|
|
---
|
|
|
|
## 3. Execution Plan
|
|
|
|
### Smoke Tests (Pre-Merge)
|
|
1. **Unit:** `PromptEngine` sanity checks.
|
|
2. **E2E:** Basic Generation Flow (Mocked LLM).
|
|
|
|
### Regression Suite (Nightly)
|
|
1. **Unit:** Token limit edge cases.
|
|
2. **E2E:** Clipboard functionality on mobile viewport emulation.
|
|
3. **Prompt Evals:** Quality checks on sample inputs.
|
|
|
|
### Resource Estimates
|
|
* **P0 Scenarios:** 5 tests (approx. 5 hours implementation).
|
|
* **P1 Scenarios:** 3 tests (approx. 2 hours implementation).
|
|
* **P2 Scenarios:** 1 test (approx. 0.5 hours implementation).
|
|
* **Total Effort:** ~1 day.
|
|
|
|
---
|
|
|
|
## 4. Quality Gate Criteria
|
|
|
|
* **Pass Rate:** 100% on P0 tests.
|
|
* **Performance:** Generation starts within 5s (mocked latency).
|
|
* **Mitigation:** Token limiter unit tests must pass.
|