mirror of
https://github.com/johndoe6345789/low-code-react-app-b.git
synced 2026-04-24 13:44:54 +00:00
Reorganize documentation: move all docs to /docs subdirectories
Co-authored-by: johndoe6345789 <224850594+johndoe6345789@users.noreply.github.com>
This commit is contained in:
117
docs/deployment/BAD_GATEWAY_FIX.md
Normal file
117
docs/deployment/BAD_GATEWAY_FIX.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# Bad Gateway Errors - Fixed
|
||||
|
||||
## Problem
|
||||
The application was experiencing masses of "Bad Gateway" (502) errors caused by excessive LLM API calls.
|
||||
|
||||
## Root Causes Identified
|
||||
|
||||
1. **Auto-scanning running every 2 seconds** - The `useAutoRepair` hook was automatically scanning all files for errors every 2 seconds, making continuous LLM calls
|
||||
2. **No rate limiting** - Multiple AI features (component generation, code improvement, error repair, etc.) were making unlimited concurrent LLM requests
|
||||
3. **No error circuit breaker** - Failed requests would retry immediately without backing off
|
||||
4. **No request throttling** - All AI operations competed for the same gateway resources
|
||||
|
||||
## Solutions Implemented
|
||||
|
||||
### 1. Rate Limiting System (`src/lib/rate-limiter.ts`)
|
||||
- **Per-category rate limiting**: Different limits for different AI operations
|
||||
- **Time windows**: Tracks requests over rolling 60-second windows
|
||||
- **Automatic cleanup**: Removes stale tracking data
|
||||
- **Priority queue support**: High-priority requests can retry with backoff
|
||||
- **Status tracking**: Monitor remaining capacity and reset times
|
||||
|
||||
Configuration:
|
||||
- **AI Operations**: Max 3 requests per minute
|
||||
- **Error Scanning**: Max 1 request per 30 seconds
|
||||
|
||||
### 2. Protected LLM Service (`src/lib/protected-llm-service.ts`)
|
||||
- **Error tracking**: Monitors consecutive failures
|
||||
- **Circuit breaker**: Pauses all requests after 5 consecutive errors
|
||||
- **User-friendly error messages**: Converts technical errors to actionable messages
|
||||
- **Automatic recovery**: Error count decreases on successful calls
|
||||
- **Request categorization**: Groups related operations for better rate limiting
|
||||
|
||||
### 3. Disabled Automatic Scanning
|
||||
- **Removed automatic useEffect trigger** in `useAutoRepair`
|
||||
- **Manual scanning only**: Users must explicitly click "Scan" button
|
||||
- **Rate-limited when triggered**: Even manual scans respect rate limits
|
||||
|
||||
### 4. Updated All AI Services
|
||||
- **ai-service.ts**: All methods now use `ProtectedLLMService`
|
||||
- **error-repair-service.ts**: Code repair uses rate limiting
|
||||
- **Consistent error handling**: All services handle 502/429 errors gracefully
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **No more cascading failures**: Rate limiting prevents overwhelming the gateway
|
||||
2. **Better user experience**: Clear error messages explain what went wrong
|
||||
3. **Automatic recovery**: Circuit breaker allows system to recover from issues
|
||||
4. **Resource efficiency**: Prevents wasted requests that would fail anyway
|
||||
5. **Predictable behavior**: Users understand when operations might be delayed
|
||||
|
||||
## How It Works Now
|
||||
|
||||
### Normal Operation
|
||||
1. User triggers an AI feature (generate component, improve code, etc.)
|
||||
2. Request goes through `ProtectedLLMService`
|
||||
3. Rate limiter checks if request is allowed
|
||||
4. If allowed, request proceeds
|
||||
5. If rate-limited, user sees friendly message about slowing down
|
||||
|
||||
### Error Handling
|
||||
1. If LLM call fails with 502/Bad Gateway:
|
||||
- User sees: "Service temporarily unavailable - please wait a moment"
|
||||
- Error count increases
|
||||
- Request is blocked by rate limiter for the category
|
||||
|
||||
2. If too many consecutive errors (5+):
|
||||
- Circuit breaker trips
|
||||
- All AI operations pause
|
||||
- User sees: "AI service temporarily unavailable due to repeated errors"
|
||||
|
||||
3. Recovery:
|
||||
- Successful requests decrease error count
|
||||
- After error count drops, circuit breaker resets
|
||||
- Normal operation resumes
|
||||
|
||||
### Manual Controls
|
||||
Users can check AI service status:
|
||||
```javascript
|
||||
const stats = ProtectedLLMService.getStats()
|
||||
// Returns: { totalCalls, errorCount, isPaused }
|
||||
```
|
||||
|
||||
Users can manually reset if needed:
|
||||
```javascript
|
||||
ProtectedLLMService.reset()
|
||||
// Clears all rate limits and error counts
|
||||
```
|
||||
|
||||
## Testing the Fix
|
||||
|
||||
1. **Verify no automatic scanning**: Open the app - no LLM calls should fire automatically
|
||||
2. **Test rate limiting**: Try generating 5 components quickly - should see rate limit message
|
||||
3. **Test error recovery**: If you hit an error, next successful call should work
|
||||
4. **Check manual scan**: Error panel scan button should work with rate limiting
|
||||
|
||||
## Monitoring
|
||||
|
||||
Watch the browser console for:
|
||||
- `LLM call failed (category): error` - Individual failures
|
||||
- `Rate limit exceeded for llm-category` - Rate limiting in action
|
||||
- `Too many LLM errors detected` - Circuit breaker activation
|
||||
|
||||
## Future Improvements
|
||||
|
||||
1. **Retry queue**: Queue rate-limited requests and auto-retry
|
||||
2. **Progressive backoff**: Increase delays after repeated failures
|
||||
3. **Request deduplication**: Prevent identical simultaneous requests
|
||||
4. **Usage analytics**: Track which features use most AI calls
|
||||
5. **User quotas**: Per-user rate limiting for multi-tenant deployments
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `/src/lib/rate-limiter.ts` (NEW)
|
||||
- `/src/lib/protected-llm-service.ts` (NEW)
|
||||
- `/src/lib/ai-service.ts` (UPDATED - now uses rate limiting)
|
||||
- `/src/lib/error-repair-service.ts` (UPDATED - now uses rate limiting)
|
||||
- `/src/hooks/use-auto-repair.ts` (UPDATED - disabled automatic scanning)
|
||||
45
docs/deployment/CI_FIX_SUMMARY.md
Normal file
45
docs/deployment/CI_FIX_SUMMARY.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# CI/CD Fix Summary
|
||||
|
||||
## Problem
|
||||
The CI/CD pipeline was failing during the `npm ci` step with the following error:
|
||||
|
||||
```
|
||||
npm error Invalid: lock file's @github/spark@0.0.1 does not satisfy @github/spark@0.44.15
|
||||
npm error Missing: octokit@5.0.5 from lock file
|
||||
npm error Missing: @octokit/app@16.1.2 from lock file
|
||||
... (and many more missing dependencies)
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
The `package-lock.json` file was out of sync with `package.json`. This happened because:
|
||||
|
||||
1. Dependencies were updated in `package.json` but the lock file wasn't regenerated
|
||||
2. The `@github/spark` workspace dependency version changed
|
||||
3. New octokit dependencies were added but not reflected in the lock file
|
||||
|
||||
## Solution Applied
|
||||
Ran `npm install` to regenerate the `package-lock.json` file. This:
|
||||
|
||||
- Updated the lock file to match all dependencies in `package.json`
|
||||
- Resolved all missing octokit dependencies
|
||||
- Synced the `@github/spark` workspace reference
|
||||
- Ensured `npm ci` will work correctly in CI/CD
|
||||
|
||||
## Next Steps
|
||||
1. **Commit the updated `package-lock.json`** to your repository
|
||||
2. **Push the changes** to trigger the CI/CD pipeline again
|
||||
3. The `npm ci` command should now succeed
|
||||
|
||||
## Prevention
|
||||
To avoid this issue in the future:
|
||||
|
||||
- Always run `npm install` after updating `package.json`
|
||||
- Commit both `package.json` AND `package-lock.json` together
|
||||
- Use `npm ci` locally to test that the lock file is valid
|
||||
- Consider adding a pre-commit hook to validate lock file sync
|
||||
|
||||
## CI/CD Command Explanation
|
||||
- `npm ci` (Clean Install) requires exact lock file match - used in CI/CD for reproducible builds
|
||||
- `npm install` updates the lock file if needed - used in development
|
||||
|
||||
The CI/CD pipeline uses `npm ci` because it's faster and ensures consistent builds across environments.
|
||||
Reference in New Issue
Block a user