All proxy errors return JSON with a single error field:
Check the X-RateLimit-Remaining and X-RateLimit-Reset response headers to manage your request rate.
For non-streaming requests, the gateway retries with exponential backoff (up to 2 retries, 250ms–4s). A 502 means all attempts failed. Streaming requests are not retried because partial data may have already been sent to the client.
Each provider has a circuit breaker that opens after 10 failures or a 50%+ failure rate over 20+ requests within a 60-second window. Once open, all requests to that provider are rejected for a 30-second cooldown. After the cooldown, a single probe request is allowed through — if it succeeds, the circuit closes and traffic resumes normally.
On rate-limited responses, these headers are included:
When your organization is approaching its budget limit but requests are still allowed: