Files
WizardMerge/.github/issues/05-sdg-analysis.md
2025-12-27 03:11:55 +00:00

11 KiB

title, labels, assignees, milestone
title labels assignees milestone
Phase 2.1: System Dependence Graph (SDG) Analysis for Intelligent Conflict Resolution
enhancement
phase-2
sdg-analysis
research
high-priority
Phase 2 - Intelligence & Usability

Overview

Implement System Dependence Graph (SDG) analysis based on the research paper from The University of Hong Kong. This is the core innovation of WizardMerge that achieves 28.85% reduction in conflict resolution time and provides merge suggestions for >70% of conflicted blocks.

Phase 2.1 - Smart Conflict Resolution (SDG Analysis)

Research Foundation

Based on the paper in docs/PAPER.md, which demonstrates:

  • 28.85% reduction in conflict resolution time
  • Merge suggestions for >70% of code blocks affected by conflicts
  • Tested on 227 conflicts across five large-scale projects
  • Uses dependency analysis at text and LLVM-IR levels

What is SDG Analysis?

A System Dependence Graph (SDG) captures dependencies between code blocks:

  • Nodes: Code blocks (statements, expressions, definitions)
  • Edges: Dependencies (data flow, control flow)
  • Types:
    • Data dependencies (def-use relationships)
    • Control dependencies (branching, loops)
    • Call dependencies (function calls)

Why SDG for Merging?

Traditional text-based merging ignores code semantics:

# Base
x = 1
y = x + 1

# Ours: Change x
x = 2
y = x + 1

# Theirs: Use y
x = 1
y = x + 1
print(y)  # Depends on x!

# Text merge: May miss that x change affects print(y)
# SDG merge: Detects dependency and suggests keeping x = 2

Architecture

Multi-Level Dependency Analysis

┌─────────────────────────────────────────┐
│     Text-Level Dependencies             │
│  - Line-to-line dependencies            │
│  - Block-to-block dependencies          │
│  - Variable def-use (regex-based)       │
└─────────────────────────────────────────┘
                ↓
┌─────────────────────────────────────────┐
│     AST-Level Dependencies              │
│  - Function calls                       │
│  - Variable references                  │
│  - Class/method relationships           │
│  - Import dependencies                  │
└─────────────────────────────────────────┘
                ↓
┌─────────────────────────────────────────┐
│   LLVM-IR Level Dependencies (C/C++)    │
│  - Precise data flow                    │
│  - Control flow                         │
│  - Pointer aliasing                     │
│  - Memory dependencies                  │
└─────────────────────────────────────────┘

SDG Construction

class SystemDependenceGraph {
public:
  // Build SDG for a code file
  void build_graph(const std::string& code, const std::string& language);
  
  // Add nodes (code blocks)
  NodeID add_node(const CodeBlock& block);
  
  // Add edges (dependencies)
  void add_edge(NodeID from, NodeID to, DependencyType type);
  
  // Query dependencies
  std::vector<NodeID> get_dependencies(NodeID node);
  std::vector<NodeID> get_dependents(NodeID node);
  
  // Transitive closure
  std::set<NodeID> get_transitive_dependencies(NodeID node);
  
  // Conflict analysis
  ConflictAnalysis analyze_conflict(
    const SDG& base_sdg,
    const SDG& ours_sdg,
    const SDG& theirs_sdg
  );
};

Dependency Types

enum class DependencyType {
  // Data dependencies
  DATA_FLOW,         // x = 1; y = x;
  DEF_USE,           // Definition-use chain
  
  // Control dependencies
  CONTROL_FLOW,      // if (x) { y = 1; }
  CALL,              // foo(); (inside foo's body)
  
  // Structural dependencies
  IMPORT,            // import X; use X.method()
  INHERITANCE,       // class B extends A
  FIELD_ACCESS,      // obj.field
  
  // LLVM-IR dependencies (C/C++)
  MEMORY_ALIAS,      // Pointer aliasing
  LOAD_STORE,        // Memory load/store dependencies
};

Features to Implement

1. Text-Level Dependency Analysis

  • Line-to-line dependencies

    • Variable definition tracking (regex-based)
    • Variable use tracking
    • Build def-use chains
  • Block-level dependencies

    • Identify code blocks (functions, loops, conditionals)
    • Track block boundaries
    • Build block dependency graph

Algorithm:

For each line:
  1. Extract variable definitions (x = ..., def foo():)
  2. Extract variable uses (y = x + 1)
  3. Create dependency edge: definition → use

2. AST-Level Dependency Analysis

  • Parse code into AST (using tree-sitter)
  • Extract semantic elements:
    • Function definitions and calls
    • Variable declarations and references
    • Class definitions and instantiations
    • Import/include statements
  • Build dependency graph:
    • Function call graph
    • Variable reference graph
    • Import dependency graph
  • Detect conflicts:
    • Modified functions with dependent code
    • Changed variables still in use
    • Removed imports still referenced

Example:

# Base
def compute(x):
    return x * 2

result = compute(5)

# Ours: Change compute
def compute(x):
    return x * 3  # Changed!

result = compute(5)

# Theirs: Use result
def compute(x):
    return x * 2

result = compute(5)
print(result)  # Depends on compute!

# SDG Analysis: Detects that print(result) depends on compute()
# Suggests: Keep changed compute, preserve print

3. LLVM-IR Level Analysis (C/C++)

  • Compile to LLVM IR

    • Use Clang to generate LLVM IR
    • Parse IR and build control flow graph (CFG)
    • Build data flow graph (DFG)
  • Analyze dependencies:

    • Data flow (load/store, SSA form)
    • Control flow (branches, loops)
    • Memory dependencies (aliasing)
    • Function calls
  • Conflict detection:

    • Detect violated dependencies
    • Find affected code blocks
    • Compute conflict impact radius

Tools: LLVM libraries (libclang, LLVM IR parser)

4. Conflict Analysis with SDG

struct ConflictAnalysis {
  // Classification
  bool is_true_conflict;        // Semantic conflict
  bool is_false_conflict;       // Text conflict only
  
  // Impact
  std::set<NodeID> affected_blocks;  // Blocks affected by conflict
  int impact_radius;            // Distance in dependency graph
  
  // Dependencies
  std::vector<Dependency> violated_edges;  // Dependencies broken by merge
  std::vector<Dependency> safe_edges;      // Dependencies preserved
  
  // Suggestions
  std::vector<Resolution> suggestions;     // Merge suggestions
  double confidence;            // Confidence score (0-1)
};

class ConflictAnalyzer {
  ConflictAnalysis analyze(
    const SDG& base_sdg,
    const SDG& ours_sdg,
    const SDG& theirs_sdg,
    const ConflictRegion& conflict
  );
  
  // Classify edges
  void classify_edges();
  
  // Compute impact
  void compute_impact();
  
  // Generate suggestions
  void generate_suggestions();
};

5. Visualization

  • Dependency graph viewer:

    • Interactive graph visualization
    • Show nodes (code blocks)
    • Show edges (dependencies)
    • Highlight conflicts and affected blocks
  • Impact visualization:

    • Color-coded nodes (safe, conflicted, affected)
    • Show dependency paths
    • Display conflict impact radius
  • Suggestion UI:

    • Show suggested resolutions
    • Display confidence scores
    • Explain reasoning (why this suggestion?)

Library: D3.js (web) or Qt Graphics (desktop)

Implementation Steps

  1. Phase 1: Text-Level Analysis (2 weeks)

    • Implement variable tracking
    • Build line-level dependency graph
    • Test on simple examples
  2. Phase 2: AST-Level Analysis (3 weeks)

    • Integrate tree-sitter
    • Parse AST for Python, JS, Java, C++
    • Build semantic dependency graph
    • Test on real-world code
  3. Phase 3: LLVM-IR Analysis (3 weeks)

    • Integrate LLVM/Clang
    • Generate and parse LLVM IR
    • Build precise dependency graph
    • Test on C/C++ projects
  4. Phase 4: Conflict Analysis (2 weeks)

    • Implement edge classification
    • Compute conflict impact
    • Generate merge suggestions
    • Test on conflict datasets
  5. Phase 5: Visualization (2 weeks)

    • Build dependency graph viewer
    • Integrate into UI
    • User testing
  6. Phase 6: Integration & Optimization (2 weeks)

    • Integrate with merge pipeline
    • Optimize performance
    • Cache dependency graphs
    • Final testing

Libraries and Tools

  • tree-sitter: AST parsing
  • LLVM/Clang: IR generation and analysis
  • Boost Graph Library: Graph algorithms
  • D3.js: Visualization (web)
  • Qt Graphics: Visualization (desktop)

Acceptance Criteria

  • Text-level dependency analysis works
  • AST-level dependency analysis works for Python, JS, Java, C++
  • LLVM-IR analysis works for C/C++
  • Conflict analyzer detects true vs. false conflicts
  • Generates merge suggestions with confidence scores
  • Achieves >70% suggestion rate on test dataset
  • Reduces resolution time by >25% (user study)
  • Visualization is clear and helpful
  • Performance: <500ms for files up to 2000 lines
  • Documentation complete

Test Cases

Text-Level

  1. Simple def-use chain
  2. Multiple definitions
  3. Variable shadowing
  4. Cross-block dependencies

AST-Level

  1. Function call dependencies
  2. Import dependencies
  3. Class inheritance
  4. Variable references across scopes

LLVM-IR (C/C++)

  1. Pointer aliasing
  2. Memory dependencies
  3. Control flow dependencies
  4. Function call dependencies

Conflict Analysis

  1. True conflict (semantic)
  2. False conflict (text-only)
  3. Transitive dependencies
  4. Conflict impact radius
  5. Suggestion generation

Priority

HIGH - This is the core innovation of WizardMerge and the main research contribution.

Estimated Effort

10-14 weeks (full implementation)

Incremental approach:

  • Milestone 1 (4 weeks): Text and AST analysis
  • Milestone 2 (4 weeks): LLVM-IR analysis
  • Milestone 3 (4 weeks): Conflict analysis and visualization

Dependencies

  • AST-based merging (Issue #TBD)
  • Three-way merge algorithm
  • #TBD (Phase 2 tracking)
  • #TBD (AST-based merging)
  • #TBD (Visualization enhancements)

Success Metrics

  • 28.85% reduction in conflict resolution time (match research)
  • >70% suggestion rate for conflicted blocks
  • 90% user satisfaction with SDG-based suggestions
  • High precision (>80%) for conflict detection

References

  • Research paper: docs/PAPER.md
  • Formal specification: spec/WizardMergeSpec.tla
  • ROADMAP.md Phase 2.1

Future Enhancements

  • Machine learning for confidence scoring
  • User feedback loop (learn from resolutions)
  • Cross-file dependency analysis
  • Language-specific semantic analysis
  • Integration with LSP for real-time analysis