mirror of
https://github.com/johndoe6345789/low-code-react-app-b.git
synced 2026-04-24 21:54:56 +00:00
118 lines
4.8 KiB
Markdown
118 lines
4.8 KiB
Markdown
# Bad Gateway Errors - Fixed
|
|
|
|
## Problem
|
|
The application was experiencing masses of "Bad Gateway" (502) errors caused by excessive LLM API calls.
|
|
|
|
## Root Causes Identified
|
|
|
|
1. **Auto-scanning running every 2 seconds** - The `useAutoRepair` hook was automatically scanning all files for errors every 2 seconds, making continuous LLM calls
|
|
2. **No rate limiting** - Multiple AI features (component generation, code improvement, error repair, etc.) were making unlimited concurrent LLM requests
|
|
3. **No error circuit breaker** - Failed requests would retry immediately without backing off
|
|
4. **No request throttling** - All AI operations competed for the same gateway resources
|
|
|
|
## Solutions Implemented
|
|
|
|
### 1. Rate Limiting System (`src/lib/rate-limiter.ts`)
|
|
- **Per-category rate limiting**: Different limits for different AI operations
|
|
- **Time windows**: Tracks requests over rolling 60-second windows
|
|
- **Automatic cleanup**: Removes stale tracking data
|
|
- **Priority queue support**: High-priority requests can retry with backoff
|
|
- **Status tracking**: Monitor remaining capacity and reset times
|
|
|
|
Configuration:
|
|
- **AI Operations**: Max 3 requests per minute
|
|
- **Error Scanning**: Max 1 request per 30 seconds
|
|
|
|
### 2. Protected LLM Service (`src/lib/protected-llm-service.ts`)
|
|
- **Error tracking**: Monitors consecutive failures
|
|
- **Circuit breaker**: Pauses all requests after 5 consecutive errors
|
|
- **User-friendly error messages**: Converts technical errors to actionable messages
|
|
- **Automatic recovery**: Error count decreases on successful calls
|
|
- **Request categorization**: Groups related operations for better rate limiting
|
|
|
|
### 3. Disabled Automatic Scanning
|
|
- **Removed automatic useEffect trigger** in `useAutoRepair`
|
|
- **Manual scanning only**: Users must explicitly click "Scan" button
|
|
- **Rate-limited when triggered**: Even manual scans respect rate limits
|
|
|
|
### 4. Updated All AI Services
|
|
- **ai-service.ts**: All methods now use `ProtectedLLMService`
|
|
- **error-repair-service.ts**: Code repair uses rate limiting
|
|
- **Consistent error handling**: All services handle 502/429 errors gracefully
|
|
|
|
## Benefits
|
|
|
|
1. **No more cascading failures**: Rate limiting prevents overwhelming the gateway
|
|
2. **Better user experience**: Clear error messages explain what went wrong
|
|
3. **Automatic recovery**: Circuit breaker allows system to recover from issues
|
|
4. **Resource efficiency**: Prevents wasted requests that would fail anyway
|
|
5. **Predictable behavior**: Users understand when operations might be delayed
|
|
|
|
## How It Works Now
|
|
|
|
### Normal Operation
|
|
1. User triggers an AI feature (generate component, improve code, etc.)
|
|
2. Request goes through `ProtectedLLMService`
|
|
3. Rate limiter checks if request is allowed
|
|
4. If allowed, request proceeds
|
|
5. If rate-limited, user sees friendly message about slowing down
|
|
|
|
### Error Handling
|
|
1. If LLM call fails with 502/Bad Gateway:
|
|
- User sees: "Service temporarily unavailable - please wait a moment"
|
|
- Error count increases
|
|
- Request is blocked by rate limiter for the category
|
|
|
|
2. If too many consecutive errors (5+):
|
|
- Circuit breaker trips
|
|
- All AI operations pause
|
|
- User sees: "AI service temporarily unavailable due to repeated errors"
|
|
|
|
3. Recovery:
|
|
- Successful requests decrease error count
|
|
- After error count drops, circuit breaker resets
|
|
- Normal operation resumes
|
|
|
|
### Manual Controls
|
|
Users can check AI service status:
|
|
```javascript
|
|
const stats = ProtectedLLMService.getStats()
|
|
// Returns: { totalCalls, errorCount, isPaused }
|
|
```
|
|
|
|
Users can manually reset if needed:
|
|
```javascript
|
|
ProtectedLLMService.reset()
|
|
// Clears all rate limits and error counts
|
|
```
|
|
|
|
## Testing the Fix
|
|
|
|
1. **Verify no automatic scanning**: Open the app - no LLM calls should fire automatically
|
|
2. **Test rate limiting**: Try generating 5 components quickly - should see rate limit message
|
|
3. **Test error recovery**: If you hit an error, next successful call should work
|
|
4. **Check manual scan**: Error panel scan button should work with rate limiting
|
|
|
|
## Monitoring
|
|
|
|
Watch the browser console for:
|
|
- `LLM call failed (category): error` - Individual failures
|
|
- `Rate limit exceeded for llm-category` - Rate limiting in action
|
|
- `Too many LLM errors detected` - Circuit breaker activation
|
|
|
|
## Future Improvements
|
|
|
|
1. **Retry queue**: Queue rate-limited requests and auto-retry
|
|
2. **Progressive backoff**: Increase delays after repeated failures
|
|
3. **Request deduplication**: Prevent identical simultaneous requests
|
|
4. **Usage analytics**: Track which features use most AI calls
|
|
5. **User quotas**: Per-user rate limiting for multi-tenant deployments
|
|
|
|
## Files Modified
|
|
|
|
- `/src/lib/rate-limiter.ts` (NEW)
|
|
- `/src/lib/protected-llm-service.ts` (NEW)
|
|
- `/src/lib/ai-service.ts` (UPDATED - now uses rate limiting)
|
|
- `/src/lib/error-repair-service.ts` (UPDATED - now uses rate limiting)
|
|
- `/src/hooks/use-auto-repair.ts` (UPDATED - disabled automatic scanning)
|