mirror of
https://github.com/johndoe6345789/WizardMerge.git
synced 2026-04-24 13:44:55 +00:00
455 lines
15 KiB
Plaintext
455 lines
15 KiB
Plaintext
------------------------------- MODULE WizardMergeSpec -------------------------------
|
|
EXTENDS Naturals, FiniteSets
|
|
|
|
(*
|
|
Implementation Status (as of December 2024):
|
|
|
|
This formal specification describes the dependency-aware merge algorithm that
|
|
WizardMerge aims to implement. The current implementation status is:
|
|
|
|
IMPLEMENTED (Phase 1.1):
|
|
- Basic three-way merge algorithm (C++ backend)
|
|
- Line-level conflict detection
|
|
- Auto-resolution for common patterns:
|
|
* Non-overlapping changes
|
|
* Identical changes from both sides
|
|
* Whitespace-only differences
|
|
- Command-line interface (wizardmerge-cli)
|
|
- Pull request/merge request URL processing and conflict resolution:
|
|
* Parse GitHub PR URLs and GitLab MR URLs
|
|
* Fetch PR/MR data via GitHub and GitLab APIs
|
|
* Apply merge algorithm to PR/MR files
|
|
* HTTP API endpoint for PR/MR resolution
|
|
* Support for multiple git platforms (GitHub and GitLab)
|
|
|
|
NOT YET IMPLEMENTED (Future phases):
|
|
- Dependency graph construction (SDG analysis)
|
|
- LLVM-IR level analysis
|
|
- Edge classification (safe vs. violated)
|
|
- Fine-grained DCB (Definition-Code Block) tracking
|
|
- Mirror mapping and matching
|
|
- Git branch creation for resolved PRs/MRs
|
|
- Support for additional platforms (Bitbucket, etc.)
|
|
|
|
The current implementation in backend/src/merge/three_way_merge.cpp provides
|
|
a foundation for the full dependency-aware algorithm specified here. Future
|
|
phases will enhance it with the SDG analysis, edge classification, and
|
|
dependency-aware conflict resolution described in this specification.
|
|
|
|
PR/MR Resolution Workflow (Phase 1.2):
|
|
The PR/MR resolution feature extends the core merge algorithm to work with
|
|
both GitHub pull requests and GitLab merge requests. The workflow is:
|
|
1. Accept PR/MR URL: Parse URL to detect platform and extract owner, repo, and number
|
|
2. Fetch PR/MR metadata: Use platform-specific API to retrieve information
|
|
3. Fetch file versions: Retrieve base and head versions of modified files
|
|
4. Apply merge algorithm: For each file, perform three-way merge
|
|
5. Auto-resolve conflicts: Apply heuristic resolution where possible
|
|
6. Return results: Provide merged content and conflict status
|
|
|
|
Platform Support:
|
|
- GitHub: Uses GitHub API v3 with "Authorization: token" header
|
|
- GitLab: Uses GitLab API v4 with "PRIVATE-TOKEN" header
|
|
- Both platforms support public and private repositories with proper authentication
|
|
|
|
This workflow enables batch processing of PR/MR conflicts using the same
|
|
dependency-aware merge principles, with future integration planned for
|
|
automatic branch creation and PR/MR updates.
|
|
*)
|
|
|
|
(*
|
|
High-level intent
|
|
|
|
This module formalizes the core of WizardMerge's merge-handling logic at the
|
|
dependency-graph level. It abstracts away concrete source code and LLVM IR and
|
|
instead reasons about:
|
|
|
|
- Vertices: fine-grained DCB-bearing definition nodes in the Shrunk
|
|
Dependency Graphs (SDGs) for VA and VB.
|
|
|
|
- Edges: directed dependency edges v -> u in those SDGs.
|
|
|
|
- Per-vertex status in the merged result:
|
|
* applied (vertex chosen by Git's textual merge)
|
|
* conflict (vertex is part of a conflict in Git's textual merge)
|
|
* not-applied (vertex is discarded in the merge)
|
|
|
|
- Mirror mapping Mi(v): the "mirror" vertex of v in the other SDG.
|
|
|
|
- Matching information: whether a mirror node Mi(v) is a "matching" node
|
|
(same definition name and compatible DCB metadata).
|
|
|
|
The key objective is to classify edges as safe vs. violated according to the
|
|
edge-sensitive rules (1)-(6) in the paper, and to derive the set of violated
|
|
vertices/DCBs. A "properly handled" merge is one where there are no violated
|
|
edges (GoodMerge = TRUE).
|
|
*)
|
|
|
|
CONSTANTS
|
|
(*
|
|
Three versions used by the textual three-way merge:
|
|
- Base : common ancestor commit
|
|
- VA : Variant A
|
|
- VB : Variant B
|
|
|
|
These are only used as tags; they are assumed to be distinct.
|
|
*)
|
|
Base, VA, VB,
|
|
|
|
(*
|
|
VERTICES: the set of all vertices (fine-grained definition/DCB nodes) that
|
|
appear in either SDG (for VA or VB). Each vertex belongs to exactly one
|
|
SDG (either VA's or VB's).
|
|
*)
|
|
VERTICES,
|
|
|
|
(*
|
|
EDGES: the set of dependency edges in both SDGs. Each edge is a record
|
|
[from |-> v, to |-> u], which corresponds to v -> u (v depends on u).
|
|
*)
|
|
EDGES,
|
|
|
|
(*
|
|
VersionTag: a function that maps each vertex to the SDG it belongs to,
|
|
i.e. either VA or VB. Base is never used as a VersionTag value.
|
|
*)
|
|
VersionTag,
|
|
|
|
(*
|
|
Mirror: Mi(v) in the paper. For a vertex v in one SDG, Mirror[v] is the
|
|
unique corresponding vertex in the other SDG. Mirror is a bijection and
|
|
an involution (Mirror[Mirror[v]] = v).
|
|
*)
|
|
Mirror,
|
|
|
|
(*
|
|
MatchSet: the set of vertices whose mirror node is a "matching" node
|
|
according to the graph-text alignment phase (same definition and aligned
|
|
DCB metadata). In the paper this is match(Mi(v)) = True/False.
|
|
Here we encode it as a unary predicate: v ∈ MatchSet.
|
|
*)
|
|
MatchSet,
|
|
|
|
(*
|
|
AppliedSet: the set V_A of vertices whose DCBs are applied in the merged
|
|
textual result from Git (non-conflicting choice).
|
|
*)
|
|
AppliedSet,
|
|
|
|
(*
|
|
ConflictSet: the set V_C of vertices that are part of a textual conflict
|
|
produced by Git.
|
|
*)
|
|
ConflictSet
|
|
|
|
(***************************************************************************)
|
|
(* Basic derived sets and assumptions *)
|
|
(***************************************************************************)
|
|
|
|
(*
|
|
NotAppliedSet: the set V_N of vertices that are not applied in the textual
|
|
merge result (they are discarded or overshadowed).
|
|
*)
|
|
NotAppliedSet ==
|
|
VERTICES \ (AppliedSet \cup ConflictSet)
|
|
|
|
(*
|
|
Sanity assumptions about the configuration. These are required to model the
|
|
SDG properties stated in the paper.
|
|
*)
|
|
ASSUME
|
|
/\ AppliedSet \subseteq VERTICES
|
|
/\ ConflictSet \subseteq VERTICES
|
|
/\ AppliedSet \cap ConflictSet = {} \* no vertex is both applied and conflicted
|
|
/\ EDGES \subseteq [from : VERTICES, to : VERTICES]
|
|
/\ MatchSet \subseteq VERTICES
|
|
/\ VersionTag \in [VERTICES -> {VA, VB}]
|
|
/\ Mirror \in [VERTICES -> VERTICES]
|
|
/\ \A v \in VERTICES : Mirror[Mirror[v]] = v \* Mirror is an involution
|
|
/\ \A v \in VERTICES : VersionTag[Mirror[v]] # VersionTag[v]
|
|
\* Mirror always points into the "other" SDG
|
|
/\ \A v \in VERTICES :
|
|
(*
|
|
SDG facts from the paper:
|
|
|
|
- If v ∈ V_A (applied) then Mi(v) ∈ V_N (not applied), and vice versa.
|
|
- If v ∈ V_C (conflict) then Mi(v) ∈ V_C as well.
|
|
*)
|
|
/\ (v \in AppliedSet) <=> (Mirror[v] \in NotAppliedSet)
|
|
/\ (v \in ConflictSet) <=> (Mirror[v] \in ConflictSet)
|
|
|
|
(***************************************************************************)
|
|
(* Status predicates *)
|
|
(***************************************************************************)
|
|
|
|
IsApplied(v) == v \in AppliedSet
|
|
IsConflict(v) == v \in ConflictSet
|
|
IsNotApplied(v) == v \in NotAppliedSet
|
|
|
|
(*
|
|
MirrorMatch(v) corresponds to match(Mi(v)) = True for the mirror node of v.
|
|
Since MatchSet is defined over the original vertex set, we test membership
|
|
for Mirror[v].
|
|
*)
|
|
MirrorMatch(v) == Mirror[v] \in MatchSet
|
|
|
|
(***************************************************************************)
|
|
(* Edge classification: Safe vs Violated *)
|
|
(***************************************************************************)
|
|
|
|
(*
|
|
The six scenarios in the paper are phrased over edges e = (v, u) with v
|
|
depending on u. We encode them as predicates over (v, u).
|
|
|
|
Safe scenarios (Equations (1)-(4) in the paper):
|
|
|
|
(1) v ∈ V_A, u ∈ V_A
|
|
(2) v ∈ V_A, u ∉ V_A, match(Mi(u)) = True
|
|
(3) v ∈ V_C, u ∈ V_A
|
|
(4) v ∈ V_C, u ∉ V_A, match(Mi(u)) = True
|
|
|
|
Violated scenarios (Equations (5)-(6)):
|
|
|
|
(5) v ∈ V_A, u ∉ V_A, match(Mi(u)) = False
|
|
(6) v ∈ V_C, u ∉ V_A, match(Mi(u)) = False
|
|
|
|
All scenarios not detected as safe are treated as violated in the algorithm.
|
|
*)
|
|
|
|
SafeEdge(v, u) ==
|
|
\/ (IsApplied(v) /\ IsApplied(u)) \* (1)
|
|
\/ (IsApplied(v) /\ ~IsApplied(u) /\ MirrorMatch(u)) \* (2)
|
|
\/ (IsConflict(v) /\ IsApplied(u)) \* (3)
|
|
\/ (IsConflict(v) /\ ~IsApplied(u) /\ MirrorMatch(u)) \* (4)
|
|
|
|
ViolatedEdge(v, u) ==
|
|
\/ (IsApplied(v) /\ ~IsApplied(u) /\ ~MirrorMatch(u)) \* (5)
|
|
\/ (IsConflict(v) /\ ~IsApplied(u) /\ ~MirrorMatch(u)) \* (6)
|
|
|
|
(*
|
|
Edge-level classification, as sets of EDGES.
|
|
*)
|
|
|
|
SafeEdges ==
|
|
{ e \in EDGES :
|
|
LET v == e.from
|
|
u == e.to
|
|
IN SafeEdge(v, u)
|
|
}
|
|
|
|
ViolatedEdges ==
|
|
{ e \in EDGES :
|
|
LET v == e.from
|
|
u == e.to
|
|
IN ViolatedEdge(v, u)
|
|
}
|
|
|
|
(*
|
|
In the intended algorithm, every edge encountered during SDG traversal is
|
|
either safe or violated (never both).
|
|
*)
|
|
EdgePartitionInvariant ==
|
|
/\ SafeEdges \cap ViolatedEdges = {}
|
|
/\ \A e \in EDGES :
|
|
LET v == e.from
|
|
u == e.to
|
|
IN SafeEdge(v, u) \/ ViolatedEdge(v, u)
|
|
|
|
(***************************************************************************)
|
|
(* Violated vertices and DCBs *)
|
|
(***************************************************************************)
|
|
|
|
(*
|
|
ViolatedVertices: the set of vertices whose DCBs are considered violated by
|
|
WizardMerge. The paper's algorithm also marks:
|
|
|
|
- v, u, Mi(v), Mi(u) whenever an edge (v, u) is in a violated scenario, and
|
|
- all conflict vertices as violated, regardless of edge classification.
|
|
|
|
We encode that directly.
|
|
*)
|
|
|
|
ViolatedVertices ==
|
|
(*
|
|
Vertices that participate in at least one violated edge, either directly
|
|
or through their mirror. This matches "mark v, u, Mi(v), Mi(u) as violated".
|
|
*)
|
|
{ w \in VERTICES :
|
|
\E e \in ViolatedEdges :
|
|
LET v == e.from
|
|
u == e.to
|
|
IN (w = v) \/ (w = u) \/ (w = Mirror[v]) \/ (w = Mirror[u])
|
|
}
|
|
\cup ConflictSet
|
|
|
|
(*
|
|
SafeVertices: vertices that are not violated. This is a useful derived set
|
|
for reasoning about which DCBs can be safely auto-applied.
|
|
*)
|
|
SafeVertices ==
|
|
VERTICES \ ViolatedVertices
|
|
|
|
(***************************************************************************)
|
|
(* "Properly handled" merges *)
|
|
(***************************************************************************)
|
|
|
|
(*
|
|
GoodMerge: a merge whose dependency structure is fully consistent and does
|
|
not contain any violated edges. In such a configuration, WizardMerge would
|
|
detect no violated DCBs beyond conflicts (which we treat as violations to
|
|
be resolved manually).
|
|
*)
|
|
GoodMerge ==
|
|
ViolatedEdges = {}
|
|
|
|
(*
|
|
Stronger variant: in a good merge, no applied vertex depends (transitively)
|
|
on a violated or non-applied unmatched definition. This is not enforced as
|
|
an invariant here, but can be used as a property to check on concrete
|
|
instances.
|
|
*)
|
|
|
|
(***************************************************************************)
|
|
(* Trivial temporal structure for TLC *)
|
|
(***************************************************************************)
|
|
|
|
(*
|
|
We model a single-shot computation: there is no dynamic evolution of the
|
|
graph in this abstraction. All relevant structure is encoded in CONSTANTS,
|
|
and all operators above are purely definitional.
|
|
|
|
To allow TLC to run, we introduce a dummy state variable and a trivial
|
|
transition relation. In practice, you instantiate the CONSTANTS via a
|
|
config file, and then check invariants like EdgePartitionInvariant,
|
|
GoodMerge, and properties over ViolatedVertices/SafeVertices.
|
|
*)
|
|
|
|
VARIABLE dummy
|
|
|
|
Init ==
|
|
TRUE
|
|
|
|
Next ==
|
|
UNCHANGED dummy
|
|
|
|
Spec ==
|
|
Init /\ [][Next]_<<dummy>>
|
|
|
|
(***************************************************************************)
|
|
(* Example invariants to be checked by TLC *)
|
|
(***************************************************************************)
|
|
|
|
Inv ==
|
|
EdgePartitionInvariant
|
|
|
|
THEOREM Spec => []Inv
|
|
|
|
(***************************************************************************)
|
|
(* Pull Request/Merge Request Resolution Specification (Phase 1.2) *)
|
|
(***************************************************************************)
|
|
|
|
(*
|
|
This section extends the core merge specification to model the PR/MR resolution
|
|
workflow. It describes how WizardMerge processes GitHub pull requests and
|
|
GitLab merge requests to identify and resolve conflicts across multiple files.
|
|
|
|
Supported Platforms:
|
|
- GitHub: Uses "pull request" terminology with "/pull/" URL path
|
|
- GitLab: Uses "merge request" terminology with "/-/merge_requests/" URL path
|
|
*)
|
|
|
|
CONSTANTS
|
|
(*
|
|
GitPlatform: the platform type - GitHub or GitLab
|
|
*)
|
|
GitPlatform,
|
|
|
|
(*
|
|
PR_FILES: the set of all files in the pull/merge request
|
|
*)
|
|
PR_FILES,
|
|
|
|
(*
|
|
FileStatus: maps each file to its modification status in the PR/MR
|
|
Possible values: "modified", "added", "removed", "renamed"
|
|
*)
|
|
FileStatus,
|
|
|
|
(*
|
|
BaseSHA, HeadSHA: commit identifiers for base and head of the PR/MR
|
|
*)
|
|
BaseSHA, HeadSHA
|
|
|
|
(*
|
|
Platform types - GitHub uses pull requests, GitLab uses merge requests
|
|
*)
|
|
ASSUME GitPlatform \in {"GitHub", "GitLab"}
|
|
|
|
(*
|
|
A file is resolvable if it was modified (not removed) and we can fetch
|
|
both its base and head versions.
|
|
*)
|
|
Resolvable(f) ==
|
|
FileStatus[f] \in {"modified", "added"}
|
|
|
|
(*
|
|
PR_MergeResult: for each file f in PR_FILES, we compute a merge result
|
|
using the three-way merge algorithm. This is a function from PR_FILES
|
|
to merge outcomes.
|
|
|
|
Possible outcomes:
|
|
- "success": file merged without conflicts
|
|
- "conflict": file has unresolved conflicts
|
|
- "error": failed to fetch or process file
|
|
- "skipped": file was removed or not applicable
|
|
*)
|
|
|
|
VARIABLE PR_MergeResults
|
|
|
|
PR_Init ==
|
|
PR_MergeResults = [f \in PR_FILES |-> "pending"]
|
|
|
|
(*
|
|
Process a single file by applying the three-way merge algorithm.
|
|
This abstracts the actual merge computation but captures the key decision:
|
|
whether the file can be auto-resolved or requires manual intervention.
|
|
*)
|
|
ProcessFile(f) ==
|
|
/\ PR_MergeResults[f] = "pending"
|
|
/\ IF ~Resolvable(f)
|
|
THEN PR_MergeResults' = [PR_MergeResults EXCEPT ![f] = "skipped"]
|
|
ELSE \/ PR_MergeResults' = [PR_MergeResults EXCEPT ![f] = "success"]
|
|
\/ PR_MergeResults' = [PR_MergeResults EXCEPT ![f] = "conflict"]
|
|
\/ PR_MergeResults' = [PR_MergeResults EXCEPT ![f] = "error"]
|
|
|
|
(*
|
|
PR completion: all files have been processed
|
|
*)
|
|
PR_Complete ==
|
|
\A f \in PR_FILES : PR_MergeResults[f] # "pending"
|
|
|
|
(*
|
|
PR success metric: percentage of files successfully merged
|
|
*)
|
|
PR_SuccessRate ==
|
|
LET successful == {f \in PR_FILES : PR_MergeResults[f] = "success"}
|
|
IN (Cardinality(successful) * 100) \div Cardinality(PR_FILES)
|
|
|
|
(*
|
|
PR resolution quality property: a "good" PR resolution is one where
|
|
all resolvable files are either successfully merged or marked as conflicts
|
|
(no errors in fetching or processing).
|
|
*)
|
|
GoodPRResolution ==
|
|
\A f \in PR_FILES :
|
|
Resolvable(f) => PR_MergeResults[f] \in {"success", "conflict"}
|
|
|
|
PR_Spec ==
|
|
PR_Init /\ [][(\E f \in PR_FILES : ProcessFile(f))]_<<PR_MergeResults>>
|
|
|
|
PR_Invariant ==
|
|
PR_Complete => GoodPRResolution
|
|
|
|
THEOREM PR_Spec => []PR_Invariant
|
|
|
|
=============================================================================
|