--- title: "Phase 2.7: Comprehensive Testing & Quality Assurance" labels: ["testing", "quality", "phase-2", "high-priority"] assignees: [] milestone: "Phase 2 - Intelligence & Usability" --- ## Overview Establish comprehensive testing infrastructure and quality assurance processes to ensure WizardMerge is reliable, performant, and correct. This includes unit tests, integration tests, performance benchmarks, and fuzzing. ## Related Roadmap Section Phase 2.7 - Testing & Quality ## Motivation As WizardMerge grows more complex with semantic merging, SDG analysis, and multi-platform support, we need: - Confidence that changes don't break existing functionality - Performance metrics to prevent regressions - Edge case coverage to handle real-world scenarios - Quality documentation and examples ## Testing Strategy ### 1. Unit Tests **Coverage Target**: >90% code coverage #### Backend (C++) - [ ] **Three-way merge algorithm** - Test all merge cases (clean merge, conflicts, auto-resolution) - Test edge cases (empty files, binary files, large files) - Test different line endings (LF, CRLF) - [ ] **Semantic mergers** - JSON merger tests (objects, arrays, nested structures) - YAML merger tests (comments, anchors, multi-document) - XML merger tests (namespaces, attributes, DTD) - Package file merger tests (version conflicts, dependencies) - [ ] **AST mergers** - Python: imports, functions, classes - JavaScript: ES6 modules, React components - Java: classes, methods, annotations - C++: includes, namespaces, templates - [ ] **SDG analysis** - Dependency graph construction - Edge classification - Conflict analysis - Suggestion generation - [ ] **Git integration** - Git CLI operations - Repository detection - Branch operations - PR/MR fetching **Framework**: Google Test (gtest) ```cpp // Example unit test TEST(ThreeWayMergeTest, NonOverlappingChanges) { ThreeWayMerge merger; std::string base = "line1\nline2\nline3\n"; std::string ours = "line1\nline2_modified\nline3\n"; std::string theirs = "line1\nline2\nline3_modified\n"; auto result = merger.merge(base, ours, theirs); ASSERT_TRUE(result.success); ASSERT_FALSE(result.has_conflicts); EXPECT_EQ(result.merged_content, "line1\nline2_modified\nline3_modified\n"); } ``` #### Frontends **Qt6 (C++)**: - [ ] UI component tests - [ ] QML integration tests - [ ] Model-view tests **Framework**: Qt Test **Next.js (TypeScript)**: - [ ] Component tests (React Testing Library) - [ ] API client tests - [ ] Integration tests - [ ] E2E tests (Playwright or Cypress) **Framework**: Jest, React Testing Library, Playwright ```typescript // Example component test import { render, screen, fireEvent } from '@testing-library/react'; import ConflictPanel from './ConflictPanel'; test('renders conflict and resolves with "ours"', () => { const conflict = { id: 1, ours: 'code A', theirs: 'code B' }; const onResolve = jest.fn(); render(); const oursButton = screen.getByText('Keep Ours'); fireEvent.click(oursButton); expect(onResolve).toHaveBeenCalledWith(1, 'ours'); }); ``` ### 2. Integration Tests Test interactions between components: - [ ] **Backend + Git** - Clone repo, create branch, commit changes - Fetch PR/MR data, apply merge, create branch - [ ] **Backend + Frontend** - API calls from UI - WebSocket updates - File upload/download - [ ] **End-to-end scenarios** - User resolves conflict via UI - CLI resolves PR conflicts - Batch resolution of multiple files **Framework**: - C++: Integration test suite with real Git repos - Next.js: Playwright for E2E testing ```typescript // Example E2E test (Playwright) test('resolve conflict via web UI', async ({ page }) => { await page.goto('http://localhost:3000'); // Upload conflicted file await page.setInputFiles('input[type=file]', 'test_conflict.txt'); // Wait for merge analysis await page.waitForSelector('.conflict-panel'); // Click "Keep Ours" await page.click('button:has-text("Keep Ours")'); // Verify resolution const resolved = await page.textContent('.merged-content'); expect(resolved).toContain('code A'); expect(resolved).not.toContain('<<<<<<<'); }); ``` ### 3. Performance Benchmarks **Goals**: - Merge time: <100ms for files up to 10MB - API response: <500ms for typical PRs - UI rendering: <50ms for typical conflicts - SDG analysis: <500ms for files up to 2000 lines **Benchmark Suite**: ```cpp // Benchmark framework: Google Benchmark static void BM_ThreeWayMerge_SmallFile(benchmark::State& state) { std::string base = generate_file(100); // 100 lines std::string ours = modify_lines(base, 10); std::string theirs = modify_lines(base, 10); ThreeWayMerge merger; for (auto _ : state) { auto result = merger.merge(base, ours, theirs); benchmark::DoNotOptimize(result); } } BENCHMARK(BM_ThreeWayMerge_SmallFile); static void BM_ThreeWayMerge_LargeFile(benchmark::State& state) { std::string base = generate_file(10000); // 10k lines std::string ours = modify_lines(base, 100); std::string theirs = modify_lines(base, 100); ThreeWayMerge merger; for (auto _ : state) { auto result = merger.merge(base, ours, theirs); benchmark::DoNotOptimize(result); } } BENCHMARK(BM_ThreeWayMerge_LargeFile); ``` **Metrics to Track**: - Execution time (median, p95, p99) - Memory usage - CPU usage - Throughput (files/second) **Regression Detection**: - Run benchmarks on every commit - Alert if performance degrades >10% - Track performance over time ### 4. Fuzzing Find edge cases and bugs with fuzz testing: **Targets**: - [ ] Three-way merge algorithm - [ ] JSON/YAML/XML parsers - [ ] Git URL parsing - [ ] API input validation **Framework**: libFuzzer, AFL++, or OSS-Fuzz ```cpp // Example fuzz target extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { std::string input(reinterpret_cast(data), size); ThreeWayMerge merger; try { // Try to crash the merger with random input auto result = merger.merge(input, input, input); } catch (...) { // Catch exceptions to continue fuzzing } return 0; } ``` **Goals**: - Find crashes and hangs - Discover edge cases not covered by unit tests - Improve input validation - Run continuously in CI ### 5. Test Data & Fixtures **Real-World Test Cases**: - [ ] Collect conflicts from popular open-source projects - [ ] Build test dataset with various conflict types - [ ] Include edge cases (large files, binary files, unusual encodings) - [ ] Categorize by difficulty (simple, medium, complex) **Test Repositories**: ``` tests/ ├── fixtures/ │ ├── conflicts/ │ │ ├── simple/ │ │ │ ├── 01-non-overlapping.txt │ │ │ ├── 02-identical-changes.txt │ │ │ └── ... │ │ ├── medium/ │ │ │ ├── 01-json-merge.json │ │ │ ├── 02-python-imports.py │ │ │ └── ... │ │ └── complex/ │ │ ├── 01-sdg-analysis-needed.cpp │ │ ├── 02-multi-file-dependencies.zip │ │ └── ... │ ├── repositories/ │ │ ├── test-repo-1/ # Git repo for integration tests │ │ ├── test-repo-2/ │ │ └── ... │ └── api-responses/ │ ├── github-pr-123.json │ ├── gitlab-mr-456.json │ └── ... ``` ### 6. Continuous Integration **CI Pipeline**: ```yaml # .github/workflows/ci.yml name: CI on: [push, pull_request] jobs: test-backend: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build backend run: cd backend && ./build.sh - name: Run unit tests run: cd backend/build && ctest --output-on-failure - name: Upload coverage uses: codecov/codecov-action@v3 test-frontend-nextjs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: oven-sh/setup-bun@v1 - name: Install dependencies run: cd frontends/nextjs && bun install - name: Run tests run: cd frontends/nextjs && bun test - name: E2E tests run: cd frontends/nextjs && bun run test:e2e benchmark: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build backend run: cd backend && ./build.sh - name: Run benchmarks run: cd backend/build && ./benchmarks - name: Check for regressions run: python scripts/check_benchmark_regression.py fuzz: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build fuzzer run: cd backend && cmake -DFUZZING=ON . && make - name: Run fuzzer (5 minutes) run: ./backend/build/fuzzer -max_total_time=300 ``` ### 7. Code Quality Tools - [ ] **Static Analysis**: clang-tidy, cppcheck - [ ] **Code Coverage**: gcov, lcov (C++), Istanbul (JS) - [ ] **Linting**: cpplint (C++), ESLint (JS), Prettier - [ ] **Memory Safety**: Valgrind, AddressSanitizer - [ ] **Security Scanning**: CodeQL (already in use ✅) ```bash # Run all quality checks ./scripts/quality-check.sh ``` ### 8. Documentation & Examples - [ ] **API Documentation**: Doxygen (C++), JSDoc (JS) - [ ] **User Guide**: Step-by-step examples - [ ] **Developer Guide**: Architecture, contributing - [ ] **Example Conflicts**: Tutorials for common scenarios - [ ] **Video Demos**: Screen recordings of key features ## Implementation Steps ### Phase 1: Unit Tests (3 weeks) - [ ] Set up test frameworks - [ ] Write unit tests for core algorithms - [ ] Achieve 80% code coverage - [ ] CI integration ### Phase 2: Integration Tests (2 weeks) - [ ] Set up test repositories - [ ] Write integration tests - [ ] E2E tests for frontends - [ ] CI integration ### Phase 3: Performance Benchmarks (1 week) - [ ] Set up benchmark framework - [ ] Write benchmark suite - [ ] Baseline measurements - [ ] Regression detection ### Phase 4: Fuzzing (1 week) - [ ] Set up fuzzing infrastructure - [ ] Write fuzz targets - [ ] Run continuous fuzzing - [ ] Fix discovered issues ### Phase 5: Quality Tools (1 week) - [ ] Integrate static analysis - [ ] Set up code coverage - [ ] Memory safety checks - [ ] CI integration ### Phase 6: Documentation (2 weeks) - [ ] Generate API docs - [ ] Write user guide - [ ] Create examples - [ ] Video demos ## Acceptance Criteria - [ ] >90% code coverage for backend - [ ] >80% code coverage for frontends - [ ] All unit tests pass - [ ] All integration tests pass - [ ] Performance benchmarks meet targets - [ ] Zero crashes from fuzzing (after fixes) - [ ] Documentation complete and accurate - [ ] CI pipeline green on all commits ## Priority **HIGH** - Quality and reliability are essential for user trust ## Estimated Effort 10 weeks (can be done in parallel with feature development) ## Dependencies - Core features implemented (Phase 1 and 2) ## Related Issues - #TBD (Phase 2 completion) - #TBD (All feature implementation issues) ## Success Metrics - 0 critical bugs in production - <1% test failure rate - 95% user satisfaction with stability - Performance targets met consistently ## Test Coverage Goals | Component | Coverage Target | |-----------|-----------------| | Three-way merge | 95% | | Semantic mergers | 90% | | AST mergers | 90% | | SDG analysis | 85% | | Git integration | 90% | | API endpoints | 95% | | UI components | 80% | | Overall | 90% |