AI Fallback

When a model service for a Model API fails due to errors or high load, AI Fallback routes requests to a backup model to maintain availability.

What is AI Fallback

AI Fallback switches a Model API to a backup model when the primary model is unavailable, improving availability and preventing request failures.

Proper configuration significantly increases the success rate of AI requests. The following diagram shows a typical use case:

You can configure one or more backup models. If the primary model is unavailable, the gateway calls backup models in sequence and returns the first successful response.

Each backup model has the following parameters:

Service name: The backup model service. Select from the services available on the instance.
Model name: Use pass-through or specify a model, such as Qwen-plus.

Trigger conditions

AI Fallback triggers when a model service returns an HTTP 4xx or 5xx status code.

Prerequisites

A gateway instance is created.
A service is created.

Configure AI Fallback

Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.

In the left-side navigation pane, click Model API. Enable Fallback when you create or edit a Model API.

Create a Model API: Click Create API and enable Fallback on the Model API configuration page.
Edit a Model API: In the Actions column for the target API, click Edit. On the Model API configuration page, enable Fallback.

Parameter		Description
Fallback		Enables fallback services. The gateway calls these services in descending order of priority. Note You can reuse the same service for multiple fallback policies.
Fallback on backend service errors only		When enabled, fallback triggers only on backend service errors. When disabled, fallback also triggers on rate-limited or intercepted requests.
Fallback List	Service Name	Select a fallback service.
	Model Name	Defaults to pass-through, which forwards the model name from the original request to the backup service.
First packet timeout		Timeout in milliseconds for the first packet of a streaming response. Applies only to streaming responses. Set to 0 to disable. When set to a non-zero value, the gateway falls back if no response arrives within this period.