When a model service for a Model API fails due to errors or high load, AI Fallback routes requests to a backup model to maintain availability.
What is AI Fallback
AI Fallback switches a Model API to a backup model when the primary model is unavailable, improving availability and preventing request failures.
Proper configuration significantly increases the success rate of AI requests. The following diagram shows a typical use case:

You can configure one or more backup models. If the primary model is unavailable, the gateway calls backup models in sequence and returns the first successful response.
Each backup model has the following parameters:
-
Service name: The backup model service. Select from the services available on the instance.
-
Model name: Use pass-through or specify a model, such as Qwen-plus.
Trigger conditions
AI Fallback triggers when a model service returns an HTTP 4xx or 5xx status code.
Prerequisites
-
A gateway instance is created.
-
A service is created.
Configure AI Fallback
Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
-
In the left-side navigation pane, click Model API. Enable Fallback when you create or edit a Model API.
-
Create a Model API: Click Create API and enable Fallback on the Model API configuration page.
-
Edit a Model API: In the Actions column for the target API, click Edit. On the Model API configuration page, enable Fallback.
Parameter
Description
Fallback
Enables fallback services. The gateway calls these services in descending order of priority.
NoteYou can reuse the same service for multiple fallback policies.
Fallback on backend service errors only
When enabled, fallback triggers only on backend service errors.
When disabled, fallback also triggers on rate-limited or intercepted requests.
Fallback List
Service Name
Select a fallback service.
Model Name
Defaults to pass-through, which forwards the model name from the original request to the backup service.
First packet timeout
Timeout in milliseconds for the first packet of a streaming response. Applies only to streaming responses. Set to 0 to disable. When set to a non-zero value, the gateway falls back if no response arrives within this period.
-