Merge pull request #196 from johndoe6345789/copilot/improve-duplication-detection-script

Auto-detect duplicate issues without manual configuration
2026-04-24 22:04:56 +00:00 · 2025-12-29 18:29:09 +00:00
parent fbb9585835 893d49a0d3
commit 3bb37764df
3 changed files with 615 additions and 139 deletions
--- a/scripts/README-triage.md
+++ b/scripts/README-triage.md
@@ -0,0 +1,247 @@
+# Duplicate Issue Triage Script
+
+## Overview
+
+The `triage-duplicate-issues.sh` script is a **smart** automated tool that finds and closes duplicate issues in the repository. Unlike manual triage, this script:
+
+- ✅ **Auto-detects** all duplicate issue titles without manual configuration
+- ✅ **Handles multiple groups** of duplicates in a single run
+- ✅ **Keeps the most recent** issue open for each duplicate group
+- ✅ **Adds explanatory comments** before closing duplicates
+- ✅ **Supports dry-run mode** for safe testing
+
+## Problem It Solves
+
+When automated systems create multiple issues with the same title (e.g., deployment failures), you end up with many duplicate issues that clutter the issue tracker. This script automatically detects and closes them, keeping only the most recent one.
+
+### Before
+```
+Issues:
+  #199 ⚠️ Pre-Deployment Validation Failed (most recent)
+  #195 ⚠️ Pre-Deployment Validation Failed (duplicate)
+  #194 ⚠️ Pre-Deployment Validation Failed (duplicate)
+  ... 26 more duplicates
+```
+
+### After
+```
+Issues:
+  #199 ⚠️ Pre-Deployment Validation Failed (open)
+  #195 ⚠️ Pre-Deployment Validation Failed (closed - duplicate)
+  #194 ⚠️ Pre-Deployment Validation Failed (closed - duplicate)
+  ... 26 more closed with explanation
+```
+
+## Usage
+
+### Basic Usage (Auto-detect all duplicates)
+
+```bash
+export GITHUB_TOKEN="ghp_your_token_here"
+./scripts/triage-duplicate-issues.sh
+```
+
+This will:
+1. Fetch all open issues in the repository
+2. Group them by exact title match
+3. For each group with 2+ issues, close all except the most recent
+4. Add a comment explaining why each duplicate was closed
+
+### Dry Run (Preview without closing)
+
+**Always test with dry-run first!**
+
+```bash
+export GITHUB_TOKEN="ghp_your_token_here"
+./scripts/triage-duplicate-issues.sh --dry-run
+```
+
+This shows exactly what would be closed without actually closing anything.
+
+### Filter by Specific Title
+
+If you only want to close duplicates of a specific title:
+
+```bash
+export GITHUB_TOKEN="ghp_your_token_here"
+export SEARCH_TITLE="⚠️ Pre-Deployment Validation Failed"
+./scripts/triage-duplicate-issues.sh
+```
+
+### Get Help
+
+```bash
+./scripts/triage-duplicate-issues.sh --help
+```
+
+## How It Works
+
+### 1. Fetch All Open Issues
+The script fetches all open issues using the GitHub API, handling pagination automatically.
+
+### 2. Group by Title
+Issues are grouped by exact title match. Only groups with 2+ issues are considered duplicates.
+
+### 3. Sort by Date
+Within each group, issues are sorted by creation date (newest first).
+
+### 4. Close Duplicates
+For each group:
+- Keep the most recent issue open (canonical issue)
+- Close all older duplicates
+- Add an explanatory comment with a link to the canonical issue
+
+### 5. Summary Report
+At the end, the script shows:
+- Number of duplicate groups processed
+- Total number of issues closed
+- Summary of what was done
+
+## Example Output
+
+```
+🤖 Smart Duplicate Issue Triage
+===============================
+
+🔍 Fetching all open issues from repository...
+📊 Found 31 total open issues
+
+🔎 Automatically detecting duplicate titles...
+🎯 Found 1 title(s) with duplicates
+
+🔧 Starting bulk issue triage...
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+📋 Processing duplicate group 1/1
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Title: "⚠️ Pre-Deployment Validation Failed"
+
+  📊 Found 29 issues with this title
+  📌 Most recent: Issue #199 (created: 2025-12-27T18:12:06Z)
+
+  🎯 Planning to close 28 duplicate issues
+
+  📝 Adding comment to issue #195...
+  ✅ Added comment to issue #195
+  🔒 Closing issue #195...
+  ✅ Closed issue #195
+
+  [... continues for all duplicates ...]
+
+  ✅ Completed processing this duplicate group
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+✨ Triage complete!
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+📊 Summary:
+  • Processed 1 duplicate title group(s)
+  • Closed 28 duplicate issue(s)
+  • Kept the most recent issue open for each title
+```
+
+## Requirements
+
+- `bash` 4.0+
+- `curl` (for GitHub API calls)
+- `jq` (for JSON parsing)
+- GitHub Personal Access Token with `repo` scope
+
+## Testing
+
+The script includes comprehensive tests:
+
+```bash
+./scripts/test-triage-logic.sh
+```
+
+This runs 8 test cases covering:
+- Smart duplicate detection
+- Multiple duplicate groups
+- Title filtering
+- Edge cases (single issue, empty input, no duplicates)
+
+## Safety Features
+
+1. **Dry-run mode**: Test before closing anything
+2. **API error handling**: Graceful failure on API errors
+3. **Pagination**: Handles repositories with 100+ issues
+4. **Explanatory comments**: Each closed issue gets a comment explaining why
+5. **Rate limiting**: 1-second delay between closures to avoid API limits
+6. **Most recent preserved**: Always keeps the newest issue open
+
+## Common Use Cases
+
+### Automated Deployment Failure Issues
+When CI/CD creates multiple issues for deployment failures:
+```bash
+export GITHUB_TOKEN="ghp_xxxx"
+export SEARCH_TITLE="🚨 Production Deployment Failed"
+./scripts/triage-duplicate-issues.sh --dry-run  # Preview first
+./scripts/triage-duplicate-issues.sh            # Then execute
+```
+
+### Clean Up All Duplicates
+If your repository has multiple types of duplicate issues:
+```bash
+export GITHUB_TOKEN="ghp_xxxx"
+./scripts/triage-duplicate-issues.sh --dry-run  # Preview all
+./scripts/triage-duplicate-issues.sh            # Close all
+```
+
+### Scheduled Cleanup
+Add to cron or GitHub Actions:
+```yaml
+# .github/workflows/triage-duplicates.yml
+name: Triage Duplicate Issues
+on:
+  schedule:
+    - cron: '0 0 * * 0'  # Weekly on Sunday
+  workflow_dispatch:       # Manual trigger
+
+jobs:
+  triage:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Triage duplicates
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: ./scripts/triage-duplicate-issues.sh
+```
+
+## Troubleshooting
+
+### "Bad credentials" error
+Make sure your `GITHUB_TOKEN` has the `repo` scope and is not expired.
+
+### "jq: command not found"
+Install jq:
+```bash
+# macOS
+brew install jq
+
+# Ubuntu/Debian
+sudo apt-get install jq
+
+# RHEL/CentOS
+sudo yum install jq
+```
+
+### No duplicates found
+The script only detects issues with **exact** title matches. Similar but not identical titles won't be grouped together.
+
+### Rate limiting
+If you hit rate limits, the script includes a 1-second delay between API calls. For large batches, you may need to wait or increase the delay.
+
+## Contributing
+
+Improvements welcome! Some ideas:
+- [ ] Support fuzzy title matching (similar but not exact)
+- [ ] Add interactive mode to confirm each closure
+- [ ] Support closing by label or other criteria
+- [ ] Add GitHub Actions integration
+
+## License
+
+Same as the repository (see root LICENSE file).
--- a/scripts/test-triage-logic.sh
+++ b/scripts/test-triage-logic.sh
@@ -12,6 +12,35 @@ echo ""
 # Source the functions we need to test (extract them from the main script)
 # For testing, we'll recreate them here

+get_issues_by_title() {
+  local issues_data="$1"
+  local title="$2"
+  
+  # Filter issues matching the exact title and sort by date (newest first)
+  echo "$issues_data" | grep -F "|$title" | sort -t'|' -k2 -r
+}
+
+find_duplicate_titles() {
+  local issues_data="$1"
+  local search_filter="$2"
+  
+  if [ -z "$issues_data" ]; then
+    return 0
+  fi
+  
+  # Extract unique titles and count occurrences
+  local title_counts
+  if [ -n "$search_filter" ]; then
+    # Filter by specific title if provided
+    title_counts=$(echo "$issues_data" | cut -d'|' -f3- | grep -F "$search_filter" | sort | uniq -c | awk '$1 > 1 {$1=""; print substr($0,2)}')
+  else
+    # Find all duplicate titles
+    title_counts=$(echo "$issues_data" | cut -d'|' -f3- | sort | uniq -c | awk '$1 > 1 {$1=""; print substr($0,2)}')
+  fi
+  
+  echo "$title_counts"
+}
+
 get_issues_to_close() {
  local issues_data="$1"
  
@@ -31,18 +60,80 @@ get_issues_to_close() {
  echo "$issues_data" | tail -n +2 | cut -d'|' -f1
 }

-# Test 1: Multiple duplicate issues
-echo "Test 1: Multiple duplicate issues (should close all except most recent)"
+# Test 1: Finding duplicate titles across multiple groups
+echo "Test 1: Finding duplicate titles from mixed issues"
+echo "---------------------------------------------------"
+TEST_DATA='199|2025-12-27T18:12:06Z|⚠️ Pre-Deployment Validation Failed
+195|2025-12-27T18:09:38Z|⚠️ Pre-Deployment Validation Failed
+194|2025-12-27T18:01:57Z|⚠️ Pre-Deployment Validation Failed
+100|2025-12-27T10:00:00Z|🚨 Production Deployment Failed
+99|2025-12-27T09:00:00Z|🚨 Production Deployment Failed
+50|2025-12-26T12:00:00Z|Unique issue without duplicates'
+
+DUPLICATES=$(find_duplicate_titles "$TEST_DATA" "")
+DUP_COUNT=$(echo "$DUPLICATES" | wc -l)
+
+echo "  Found duplicate title groups: $DUP_COUNT"
+echo "  Titles with duplicates:"
+while IFS= read -r dup_title; do
+  echo "    - \"$dup_title\""
+done <<< "$DUPLICATES"
+
+if [ "$DUP_COUNT" = "2" ]; then
+  echo "  ✅ PASS: Correctly found 2 groups of duplicates"
+else
+  echo "  ❌ FAIL: Expected 2 duplicate groups, got $DUP_COUNT"
+  exit 1
+fi
+echo ""
+
+# Test 2: Filtering for specific title
+echo "Test 2: Filtering for specific duplicate title"
+echo "----------------------------------------------"
+FILTERED=$(find_duplicate_titles "$TEST_DATA" "Pre-Deployment")
+FILTERED_COUNT=$(echo "$FILTERED" | wc -l)
+
+echo "  Filtered to titles containing 'Pre-Deployment': $FILTERED_COUNT group(s)"
+if [ "$FILTERED_COUNT" = "1" ]; then
+  echo "  ✅ PASS: Correctly filtered to 1 specific title"
+else
+  echo "  ❌ FAIL: Expected 1 filtered title, got $FILTERED_COUNT"
+  exit 1
+fi
+echo ""
+
+# Test 3: Getting issues by specific title
+echo "Test 3: Getting issues by specific title"
+echo "----------------------------------------"
+TITLE="⚠️ Pre-Deployment Validation Failed"
+TITLE_ISSUES=$(get_issues_by_title "$TEST_DATA" "$TITLE")
+TITLE_COUNT=$(echo "$TITLE_ISSUES" | wc -l)
+MOST_RECENT=$(echo "$TITLE_ISSUES" | head -1 | cut -d'|' -f1)
+
+echo "  Title: \"$TITLE\""
+echo "  Issues found: $TITLE_COUNT"
+echo "  Most recent: #$MOST_RECENT"
+
+if [ "$TITLE_COUNT" = "3" ] && [ "$MOST_RECENT" = "199" ]; then
+  echo "  ✅ PASS: Correctly found 3 issues, most recent is #199"
+else
+  echo "  ❌ FAIL: Expected 3 issues with most recent #199"
+  exit 1
+fi
+echo ""
+
+# Test 4: Multiple duplicate issues
+echo "Test 4: Multiple duplicate issues (should close all except most recent)"
 echo "-----------------------------------------------------------------------"
-TEST_DATA_1='124|2025-12-27T10:30:00Z|🚨 Production Deployment Failed
+TEST_DATA_4='124|2025-12-27T10:30:00Z|🚨 Production Deployment Failed
 122|2025-12-27T10:25:00Z|🚨 Production Deployment Failed
 121|2025-12-27T10:20:00Z|🚨 Production Deployment Failed
 119|2025-12-27T10:15:00Z|🚨 Production Deployment Failed
 117|2025-12-27T10:10:00Z|🚨 Production Deployment Failed'

-TOTAL=$(echo "$TEST_DATA_1" | wc -l)
-MOST_RECENT=$(echo "$TEST_DATA_1" | head -1 | cut -d'|' -f1)
-TO_CLOSE=$(get_issues_to_close "$TEST_DATA_1")
+TOTAL=$(echo "$TEST_DATA_4" | wc -l)
+MOST_RECENT=$(echo "$TEST_DATA_4" | head -1 | cut -d'|' -f1)
+TO_CLOSE=$(get_issues_to_close "$TEST_DATA_4")
 TO_CLOSE_COUNT=$(echo "$TO_CLOSE" | wc -l)

 echo "  Total issues found: $TOTAL"
@@ -58,15 +149,15 @@ else
 fi
 echo ""

-# Test 2: Two duplicate issues
-echo "Test 2: Two duplicate issues (should close oldest, keep newest)"
+# Test 5: Two duplicate issues
+echo "Test 5: Two duplicate issues (should close oldest, keep newest)"
 echo "----------------------------------------------------------------"
-TEST_DATA_2='150|2025-12-27T11:00:00Z|Bug in login
+TEST_DATA_5='150|2025-12-27T11:00:00Z|Bug in login
 148|2025-12-27T10:55:00Z|Bug in login'

-TOTAL=$(echo "$TEST_DATA_2" | wc -l)
-MOST_RECENT=$(echo "$TEST_DATA_2" | head -1 | cut -d'|' -f1)
-TO_CLOSE=$(get_issues_to_close "$TEST_DATA_2")
+TOTAL=$(echo "$TEST_DATA_5" | wc -l)
+MOST_RECENT=$(echo "$TEST_DATA_5" | head -1 | cut -d'|' -f1)
+TO_CLOSE=$(get_issues_to_close "$TEST_DATA_5")
 TO_CLOSE_COUNT=$(echo "$TO_CLOSE" | wc -l)

 echo "  Total issues found: $TOTAL"
@@ -82,14 +173,14 @@ else
 fi
 echo ""

-# Test 3: Single issue
-echo "Test 3: Single issue (should not close anything)"
+# Test 6: Single issue
+echo "Test 6: Single issue (should not close anything)"
 echo "-------------------------------------------------"
-TEST_DATA_3='200|2025-12-27T12:00:00Z|Unique issue'
+TEST_DATA_6='200|2025-12-27T12:00:00Z|Unique issue'

-TOTAL=$(echo "$TEST_DATA_3" | wc -l)
-MOST_RECENT=$(echo "$TEST_DATA_3" | head -1 | cut -d'|' -f1)
-TO_CLOSE=$(get_issues_to_close "$TEST_DATA_3" 2>&1)
+TOTAL=$(echo "$TEST_DATA_6" | wc -l)
+MOST_RECENT=$(echo "$TEST_DATA_6" | head -1 | cut -d'|' -f1)
+TO_CLOSE=$(get_issues_to_close "$TEST_DATA_6" 2>&1)

 echo "  Total issues found: $TOTAL"
 echo "  Most recent issue: #$MOST_RECENT"
@@ -102,8 +193,8 @@ else
 fi
 echo ""

-# Test 4: Empty input
-echo "Test 4: Empty input (should handle gracefully)"
+# Test 7: Empty input
+echo "Test 7: Empty input (should handle gracefully)"
 echo "----------------------------------------------"
 TO_CLOSE=$(get_issues_to_close "" 2>&1)

@@ -115,44 +206,19 @@ else
 fi
 echo ""

-# Test 5: Date parsing and sorting verification
-echo "Test 5: Verify sorting by creation date (newest first)"
-echo "-------------------------------------------------------"
-TEST_DATA_5='300|2025-12-27T15:00:00Z|Issue C
+# Test 8: No duplicates in repository
+echo "Test 8: No duplicates (all unique titles)"
+echo "-----------------------------------------"
+TEST_DATA_8='300|2025-12-27T15:00:00Z|Issue C
 299|2025-12-27T14:00:00Z|Issue B
 298|2025-12-27T13:00:00Z|Issue A'

-MOST_RECENT=$(echo "$TEST_DATA_5" | head -1 | cut -d'|' -f1)
-MOST_RECENT_DATE=$(echo "$TEST_DATA_5" | head -1 | cut -d'|' -f2)
-OLDEST=$(echo "$TEST_DATA_5" | tail -1 | cut -d'|' -f1)
+DUPLICATES=$(find_duplicate_titles "$TEST_DATA_8" "")

-echo "  Most recent: #$MOST_RECENT at $MOST_RECENT_DATE"
-echo "  Oldest: #$OLDEST"
-
-if [ "$MOST_RECENT" = "300" ] && [ "$OLDEST" = "298" ]; then
-  echo "  ✅ PASS: Correctly sorted by date (newest first)"
+if [ -z "$DUPLICATES" ]; then
+  echo "  ✅ PASS: Correctly found no duplicates"
 else
-  echo "  ❌ FAIL: Sorting is incorrect"
-  exit 1
-fi
-echo ""
-
-# Test 6: jq parsing simulation (test data format)
-echo "Test 6: Verify data format compatibility with jq"
-echo "-------------------------------------------------"
-MOCK_JSON='{"items": [
-  {"number": 124, "created_at": "2025-12-27T10:30:00Z", "title": "Test"},
-  {"number": 122, "created_at": "2025-12-27T10:25:00Z", "title": "Test"}
-]}'
-
-# Test that jq can parse and format the data correctly
-PARSED=$(echo "$MOCK_JSON" | jq -r '.items | sort_by(.created_at) | reverse | .[] | "\(.number)|\(.created_at)|\(.title)"')
-FIRST_ISSUE=$(echo "$PARSED" | head -1 | cut -d'|' -f1)
-
-if [ "$FIRST_ISSUE" = "124" ]; then
-  echo "  ✅ PASS: jq parsing and formatting works correctly"
-else
-  echo "  ❌ FAIL: jq parsing failed"
+  echo "  ❌ FAIL: Should find no duplicates with all unique titles"
  exit 1
 fi
 echo ""
@@ -161,8 +227,10 @@ echo "============================================="
 echo "✅ All tests passed!"
 echo ""
 echo "Summary:"
+echo "  - Smart duplicate detection works correctly"
+echo "  - Multiple duplicate groups are identified"
+echo "  - Title filtering works as expected"
 echo "  - Correctly identifies most recent issue"
 echo "  - Closes all duplicates except the most recent"
 echo "  - Handles edge cases (single issue, empty input)"
-echo "  - Date sorting works correctly"
-echo "  - Data format compatible with GitHub API response"
+echo "  - Properly detects when no duplicates exist"
--- a/scripts/triage-duplicate-issues.sh
+++ b/scripts/triage-duplicate-issues.sh
@@ -1,40 +1,60 @@
 #!/bin/bash

 # Script to bulk-close duplicate issues found via GitHub API
-# Dynamically finds issues with duplicate titles and closes all except the most recent one
+# Automatically finds all duplicate issue titles and closes all except the most recent one
 #
 # Usage:
 #   export GITHUB_TOKEN="ghp_your_token_here"
 #   ./triage-duplicate-issues.sh
 #
-# Or with custom search pattern:
+# Or with custom search pattern (optional):
 #   export GITHUB_TOKEN="ghp_your_token_here"
 #   export SEARCH_TITLE="Custom Issue Title"
 #   ./triage-duplicate-issues.sh
 #
 # The script will:
-# 1. Search for all open issues matching the SEARCH_TITLE pattern
-# 2. Sort them by creation date (newest first)
-# 3. Keep the most recent issue open
-# 4. Close all other duplicates with an explanatory comment
+# 1. Fetch all open issues in the repository
+# 2. Group issues by exact title match
+# 3. For each group with 2+ issues, keep the most recent and close the rest
+# 4. Close all duplicates with an explanatory comment

 set -e

 usage() {
-  echo "Usage: $0"
+  echo "Usage: $0 [--dry-run]"
+  echo ""
+  echo "Arguments:"
+  echo "  --dry-run        Show what would be closed without actually closing issues"
  echo ""
  echo "Environment variables:"
  echo "  GITHUB_TOKEN     (required) GitHub personal access token with repo access"
-  echo "  SEARCH_TITLE     (optional) Issue title pattern to search for"
-  echo "                   Default: '🚨 Production Deployment Failed - Rollback Required'"
+  echo "  SEARCH_TITLE     (optional) If set, only process duplicates matching this specific title"
+  echo "                   If not set, automatically detects and processes ALL duplicate titles"
  echo ""
-  echo "Example:"
+  echo "Examples:"
+  echo "  # Auto-detect and close all duplicates"
  echo "  export GITHUB_TOKEN='ghp_xxxxxxxxxxxx'"
-  echo "  export SEARCH_TITLE='Duplicate bug report'"
+  echo "  $0"
+  echo ""
+  echo "  # Dry run to see what would be closed"
+  echo "  export GITHUB_TOKEN='ghp_xxxxxxxxxxxx'"
+  echo "  $0 --dry-run"
+  echo ""
+  echo "  # Only process specific title"
+  echo "  export GITHUB_TOKEN='ghp_xxxxxxxxxxxx'"
+  echo "  export SEARCH_TITLE='⚠️ Pre-Deployment Validation Failed'"
  echo "  $0"
  exit 1
 }

+# Parse command line arguments
+DRY_RUN=false
+if [ "$1" = "--dry-run" ]; then
+  DRY_RUN=true
+  echo "🔍 DRY RUN MODE: No issues will be closed"
+  echo ""
+fi
+
 # Check for help flag
 if [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
  usage
@@ -49,34 +69,91 @@ fi
 OWNER="johndoe6345789"
 REPO="metabuilder"

-# Search pattern for duplicate issues (can be customized)
-SEARCH_TITLE="${SEARCH_TITLE:-🚨 Production Deployment Failed - Rollback Required}"
+# Optional: Search pattern for specific title (if not set, processes all duplicates)
+SEARCH_TITLE="${SEARCH_TITLE:-}"

-# Function to fetch issues by title pattern
-fetch_duplicate_issues() {
-  local search_query="$1"
-  echo "🔍 Searching for issues with title: \"$search_query\"" >&2
+# Function to fetch ALL open issues in the repository
+fetch_all_open_issues() {
+  echo "🔍 Fetching all open issues from repository..." >&2
  
-  # Use GitHub API to search for issues by title
-  # Filter by: is:issue, is:open, repo, and title match
-  local encoded_query
-  encoded_query=$(echo "is:issue is:open repo:$OWNER/$REPO in:title $search_query" | jq -sRr @uri)
+  local all_issues=""
+  local page=1
+  local per_page=100
  
-  local response
-  response=$(curl -s -H "Authorization: token $GITHUB_TOKEN" \
-    -H "Accept: application/vnd.github.v3+json" \
-    "https://api.github.com/search/issues?q=$encoded_query&sort=created&order=desc&per_page=100")
+  while true; do
+    local response
+    response=$(curl -s -H "Authorization: token $GITHUB_TOKEN" \
+      -H "Accept: application/vnd.github.v3+json" \
+      "https://api.github.com/repos/$OWNER/$REPO/issues?state=open&per_page=$per_page&page=$page&sort=created&direction=desc")
+    
+    # Check for API errors (errors return object with .message, not array)
+    if echo "$response" | jq -e 'select(.message != null) | .message' > /dev/null 2>&1; then
+      local error_msg
+      error_msg=$(echo "$response" | jq -r '.message')
+      echo "❌ GitHub API error: $error_msg" >&2
+      return 1
+    fi
+    
+    # Check if response is empty (no more pages)
+    local item_count
+    item_count=$(echo "$response" | jq 'length')
+    if [ "$item_count" -eq 0 ]; then
+      break
+    fi
+    
+    # Extract issue numbers, creation dates, and titles
+    local page_data
+    page_data=$(echo "$response" | jq -r '.[] | select(.pull_request == null) | "\(.number)|\(.created_at)|\(.title)"')
+    
+    if [ -n "$page_data" ]; then
+      if [ -z "$all_issues" ]; then
+        all_issues="$page_data"
+      else
+        all_issues="$all_issues"$'\n'"$page_data"
+      fi
+    fi
+    
+    # If we got fewer items than per_page, we're on the last page
+    if [ "$item_count" -lt "$per_page" ]; then
+      break
+    fi
+    
+    page=$((page + 1))
+  done
  
-  # Check for API errors
-  if echo "$response" | jq -e '.message' > /dev/null 2>&1; then
-    local error_msg
-    error_msg=$(echo "$response" | jq -r '.message')
-    echo "❌ GitHub API error: $error_msg" >&2
-    return 1
+  echo "$all_issues"
+}
+
+# Function to find duplicate titles and return them grouped
+find_duplicate_titles() {
+  local issues_data="$1"
+  local search_filter="$2"
+  
+  if [ -z "$issues_data" ]; then
+    return 0
  fi
  
-  # Extract issue numbers and creation dates, sorted by creation date (newest first)
-  echo "$response" | jq -r '.items | sort_by(.created_at) | reverse | .[] | "\(.number)|\(.created_at)|\(.title)"'
+  # Extract unique titles and count occurrences
+  # Format: title|count
+  local title_counts
+  if [ -n "$search_filter" ]; then
+    # Filter by specific title if provided
+    title_counts=$(echo "$issues_data" | cut -d'|' -f3- | grep -F "$search_filter" | sort | uniq -c | awk '$1 > 1 {$1=""; print substr($0,2)}')
+  else
+    # Find all duplicate titles
+    title_counts=$(echo "$issues_data" | cut -d'|' -f3- | sort | uniq -c | awk '$1 > 1 {$1=""; print substr($0,2)}')
+  fi
+  
+  echo "$title_counts"
+}
+
+# Function to get issues for a specific title, sorted by creation date (newest first)
+get_issues_by_title() {
+  local issues_data="$1"
+  local title="$2"
+  
+  # Filter issues matching the exact title and sort by date (newest first)
+  echo "$issues_data" | grep -F "|$title" | sort -t'|' -k2 -r
 }

 # Function to determine which issues to close (all except the most recent)
@@ -100,81 +177,97 @@ get_issues_to_close() {
  echo "$issues_data" | tail -n +2 | cut -d'|' -f1
 }

-# Fetch all duplicate issues
-ISSUES_DATA=$(fetch_duplicate_issues "$SEARCH_TITLE")
-
-if [ -z "$ISSUES_DATA" ]; then
-  echo "✨ No duplicate issues found. Nothing to do!"
-  exit 0
-fi
-
-# Parse the data
-TOTAL_ISSUES=$(echo "$ISSUES_DATA" | wc -l)
-MOST_RECENT=$(echo "$ISSUES_DATA" | head -1 | cut -d'|' -f1)
-MOST_RECENT_DATE=$(echo "$ISSUES_DATA" | head -1 | cut -d'|' -f2)
-
-echo "📊 Found $TOTAL_ISSUES duplicate issues"
-echo "📌 Most recent issue: #$MOST_RECENT (created: $MOST_RECENT_DATE)"
+# Fetch all open issues
+echo "🤖 Smart Duplicate Issue Triage"
+echo "==============================="
 echo ""

-# Get list of issues to close
-ISSUES_TO_CLOSE_DATA=$(get_issues_to_close "$ISSUES_DATA")
+ALL_ISSUES=$(fetch_all_open_issues)

-if [ -z "$ISSUES_TO_CLOSE_DATA" ]; then
-  echo "✨ No issues need to be closed!"
+if [ -z "$ALL_ISSUES" ]; then
+  echo "✨ No open issues found in repository!"
  exit 0
 fi

-# Convert to array
-ISSUES_TO_CLOSE=()
-while IFS= read -r issue_num; do
-  ISSUES_TO_CLOSE+=("$issue_num")
-done <<< "$ISSUES_TO_CLOSE_DATA"
+TOTAL_ISSUES=$(echo "$ALL_ISSUES" | wc -l)
+echo "📊 Found $TOTAL_ISSUES total open issues"
+echo ""

-CLOSE_COMMENT='🤖 **Automated Triage: Closing Duplicate Issue**
+# Find duplicate titles
+if [ -n "$SEARCH_TITLE" ]; then
+  echo "🔎 Filtering for specific title: \"$SEARCH_TITLE\""
+  DUPLICATE_TITLES=$(find_duplicate_titles "$ALL_ISSUES" "$SEARCH_TITLE")
+else
+  echo "🔎 Automatically detecting duplicate titles..."
+  DUPLICATE_TITLES=$(find_duplicate_titles "$ALL_ISSUES" "")
+fi
+
+if [ -z "$DUPLICATE_TITLES" ]; then
+  echo "✨ No duplicate issues found. Repository is clean!"
+  exit 0
+fi
+
+# Count how many unique titles have duplicates
+DUPLICATE_TITLE_COUNT=$(echo "$DUPLICATE_TITLES" | wc -l)
+echo "🎯 Found $DUPLICATE_TITLE_COUNT title(s) with duplicates"
+echo ""
+
+close_issue() {
+  local issue_number=$1
+  local most_recent=$2
+  local most_recent_date=$3
+  local title=$4
+  local total_with_title=$5
+  
+  if [ "$DRY_RUN" = true ]; then
+    echo "  [DRY RUN] Would close issue #${issue_number}"
+    echo "  [DRY RUN] Would add comment explaining closure"
+    echo "  ✅ Dry run complete for issue #${issue_number}"
+    echo ""
+    return 0
+  fi
+  
+  local close_comment='🤖 **Automated Triage: Closing Duplicate Issue**

 This issue has been identified as a duplicate. Multiple issues with the same title were found, and this script automatically closes all duplicates except the most recent one.

 **Resolution:**
- ✅ Keeping the most recent issue (#'"$MOST_RECENT"') as the canonical tracking issue
+- ✅ Keeping the most recent issue (#'"$most_recent"') as the canonical tracking issue
 - ✅ Closing this and other duplicate issues to maintain a clean issue tracker

 **How duplicates were identified:**
- Search pattern: "'"$SEARCH_TITLE"'"
- Total duplicates found: '"$TOTAL_ISSUES"'
- Keeping most recent: Issue #'"$MOST_RECENT"' (created '"$MOST_RECENT_DATE"')
+- Title: "'"$title"'"
+- Total duplicates found: '"$total_with_title"'
+- Keeping most recent: Issue #'"$most_recent"' (created '"$most_recent_date"')

-**No Action Required** - Please refer to issue #'"$MOST_RECENT"' for continued discussion.
+**No Action Required** - Please refer to issue #'"$most_recent"' for continued discussion.

 ---
 *This closure was performed by an automated triage script. For questions, see `scripts/triage-duplicate-issues.sh`*'
-
-close_issue() {
-  local issue_number=$1
  
  # Add comment explaining closure
-  echo "📝 Adding comment to issue #${issue_number}..."
+  echo "  📝 Adding comment to issue #${issue_number}..."
  if curl -s -X POST \
    -H "Authorization: token $GITHUB_TOKEN" \
    -H "Accept: application/vnd.github.v3+json" \
    "https://api.github.com/repos/$OWNER/$REPO/issues/$issue_number/comments" \
-    -d "{\"body\": $(echo "$CLOSE_COMMENT" | jq -Rs .)}" > /dev/null; then
-    echo "✅ Added comment to issue #${issue_number}"
+    -d "{\"body\": $(echo "$close_comment" | jq -Rs .)}" > /dev/null; then
+    echo "  ✅ Added comment to issue #${issue_number}"
  else
-    echo "❌ Failed to add comment to issue #${issue_number}"
+    echo "  ❌ Failed to add comment to issue #${issue_number}"
    return 1
  fi
  
  # Close the issue
-  echo "🔒 Closing issue #${issue_number}..."
+  echo "  🔒 Closing issue #${issue_number}..."
  if curl -s -X PATCH \
    -H "Authorization: token $GITHUB_TOKEN" \
    -H "Accept: application/vnd.github.v3+json" \
    "https://api.github.com/repos/$OWNER/$REPO/issues/$issue_number" \
    -d '{"state": "closed", "state_reason": "not_planned"}' > /dev/null; then
-    echo "✅ Closed issue #${issue_number}"
+    echo "  ✅ Closed issue #${issue_number}"
  else
-    echo "❌ Failed to close issue #${issue_number}"
+    echo "  ❌ Failed to close issue #${issue_number}"
    return 1
  fi
  
@@ -184,20 +277,88 @@ close_issue() {
 main() {
  echo "🔧 Starting bulk issue triage..."
  echo ""
-  echo "📋 Planning to close ${#ISSUES_TO_CLOSE[@]} duplicate issues"
-  echo "📌 Keeping issue #$MOST_RECENT open (most recent)"
-  echo ""
  
-  for issue_number in "${ISSUES_TO_CLOSE[@]}"; do
-    close_issue "$issue_number"
-    # Add a small delay to avoid rate limiting
-    sleep 1
-  done
+  local total_closed=0
+  local title_index=0
  
-  echo "✨ Triage complete!"
+  # Process each duplicate title
+  while IFS= read -r duplicate_title; do
+    title_index=$((title_index + 1))
+    
+    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+    echo "📋 Processing duplicate group $title_index/$DUPLICATE_TITLE_COUNT"
+    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+    echo "Title: \"$duplicate_title\""
+    echo ""
+    
+    # Get all issues with this title
+    TITLE_ISSUES=$(get_issues_by_title "$ALL_ISSUES" "$duplicate_title")
+    TITLE_ISSUE_COUNT=$(echo "$TITLE_ISSUES" | wc -l)
+    
+    # Get the most recent issue
+    MOST_RECENT=$(echo "$TITLE_ISSUES" | head -1 | cut -d'|' -f1)
+    MOST_RECENT_DATE=$(echo "$TITLE_ISSUES" | head -1 | cut -d'|' -f2)
+    
+    echo "  📊 Found $TITLE_ISSUE_COUNT issues with this title"
+    echo "  📌 Most recent: Issue #$MOST_RECENT (created: $MOST_RECENT_DATE)"
+    echo ""
+    
+    # Get list of issues to close
+    ISSUES_TO_CLOSE_DATA=$(get_issues_to_close "$TITLE_ISSUES")
+    
+    if [ -z "$ISSUES_TO_CLOSE_DATA" ]; then
+      echo "  ℹ️  No duplicates to close for this title"
+      echo ""
+      continue
+    fi
+    
+    # Convert to array
+    ISSUES_TO_CLOSE=()
+    while IFS= read -r issue_num; do
+      ISSUES_TO_CLOSE+=("$issue_num")
+    done <<< "$ISSUES_TO_CLOSE_DATA"
+    
+    if [ "$DRY_RUN" = true ]; then
+      echo "  🎯 [DRY RUN] Would close ${#ISSUES_TO_CLOSE[@]} duplicate issues:"
+      echo "      Issues: $(echo "${ISSUES_TO_CLOSE[@]}" | tr ' ' ',')"
+    else
+      echo "  🎯 Planning to close ${#ISSUES_TO_CLOSE[@]} duplicate issues"
+    fi
+    echo ""
+    
+    for issue_number in "${ISSUES_TO_CLOSE[@]}"; do
+      close_issue "$issue_number" "$MOST_RECENT" "$MOST_RECENT_DATE" "$duplicate_title" "$TITLE_ISSUE_COUNT"
+      total_closed=$((total_closed + 1))
+      # Add a small delay to avoid rate limiting (skip in dry-run)
+      if [ "$DRY_RUN" = false ]; then
+        sleep 1
+      fi
+    done
+    
+    echo "  ✅ Completed processing this duplicate group"
+    echo ""
+  done <<< "$DUPLICATE_TITLES"
+  
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  if [ "$DRY_RUN" = true ]; then
+    echo "✨ Dry run complete!"
+  else
+    echo "✨ Triage complete!"
+  fi
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  echo ""
+  echo "📊 Summary:"
+  echo "  • Processed $DUPLICATE_TITLE_COUNT duplicate title group(s)"
+  if [ "$DRY_RUN" = true ]; then
+    echo "  • Would close $total_closed duplicate issue(s)"
+    echo "  • Would keep the most recent issue open for each title"
+    echo ""
+    echo "💡 To actually close these issues, run without --dry-run flag"
+  else
+    echo "  • Closed $total_closed duplicate issue(s)"
+    echo "  • Kept the most recent issue open for each title"
+  fi
  echo ""
-  echo "📌 Kept open: Issue #$MOST_RECENT (most recent, created $MOST_RECENT_DATE)"
-  echo "🔒 Closed: ${#ISSUES_TO_CLOSE[@]} duplicate issues"
 }

 main