Model APIs let AI application teams configure and debug the AI Gateway and pre-configure plugins for AI proxy, AI observability, consumer authorization, and content moderation.
Create Model API
Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
In the navigation pane on the left, choose Model API, then click Create Model API.
-
Select a use case and click Create.
The use case determines the available Protocol options and default routes. Supported use cases:
-
Text Generation: Supports OpenAI-compatible and Anthropic protocols.
-
Image Generation
-
Video Generation
-
Speech Synthesis
-
Embedding
-
Rerank
-
Others
-
-
Configure basic information.
In the dialog, complete the Select a scenario. step, then configure the Create Model API form:
-
Protocol: Each protocol provides a set of built-in default routes for the selected use case, which allows you to quickly generate compatible interfaces for services such as OpenAI, DashScope, and vLLM.
NoteProtocol conversion may change the Token statistics structure. For example, the input token statistics for the Alibaba Cloud Model Studio (DashScope) protocol include cache tokens, while the input token statistics for the Anthropic protocol do not include cache tokens. Pay attention to the differences in statistical metrics across protocols when viewing observability data.
-
API Name: A unique name within your account. Up to 64 characters; supports letters, digits, underscores (_), and hyphens (-).
-
Domain Name: Select one or more domain names for API access. Each domain name +
BasePathcombination must be unique.If you do not have a domain name, click the Add Domain Name button to create one.
-
base path: The base request path of the API. Defaults to
/. Optionally enable Remove during backend forwarding.NoteIf you enable Remove during backend forwarding, the system strips the base path from the request before forwarding to the backend. For example:
-
The base path is set to /api.
-
The original request path is /api/users.
-
The path forwarded to the backend service becomes /users.
-
-
AI Request Monitoring: Enables metrics, logs, and traces. Logging and tracing require SLS. Select Record request content and Log response to record model requests and responses.
ImportantWhen enabled, all AI request content including the request body is recorded to the access log. Ensure SLS is properly configured and data security safeguards are in place.
-
Model Service: Supports Single-model Service, Multi-model Service (by model name), Multi-model Service (by proportion), Multiple Services (by observability metrics), and Multiple Services (intelligent routing).
-
Single-model Service: Select one AI service and set the Model Name. The model name can be passed through or rewritten.
-
Multi-model Service (by model name): Routes requests to different services by matching the model name in the request body with a rule. The rule supports wildcards such as
?and*. For example,qwen-*can matchqwen-maxandqwen-long. -
Multi-model Service (by proportion): Select multiple AI services and set their weights. The model name can be passed through or rewritten.
-
Multi-Service (by Metrics): Automatically routes requests to the optimal service based on observability metrics such as response time and success rate, without manual weight configuration.
-
Multi-Model Service (Smart Routing): The system automatically selects the most suitable model based on model characteristics. Intelligent Routing.
NoteTo use the Multiple Services (by observability metrics) and Multiple Services (intelligent routing) features, you must upgrade AI Gateway to version
2.1.15or later.
-
-
fallback: You can Enable this feature to configure a sequence of fallback policies. The same service can be used in multiple policies.
-
first packet timeout: Maximum wait time (ms) for the first data packet in a streaming response. Set to 0 to disable.
-
resource: Select the default resource group, an existing one, or create a new one for grouping, authorizing, and monitoring resources.
To create a new Resource Group, click Create Resource Group.
-
-
Review your configuration and click OK.
Default route
Each protocol and use case combination generates a set of default routes.
Text generation
Protocol: OpenAI compatible (OpenAI/v1)
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Creates a model response for the given chat conversation. |
|
|
|
POST |
Creates a completion for the provided prompt and parameters. |
Protocol: Anthropic (Anthropic)
The Anthropic protocol provides native message formats for Anthropic models such as Claude. Ideal for applications that use the native Anthropic API format.
Providers supporting this protocol include Alibaba Cloud Model Studio (Qwen), Claude, Moonshot AI, and Zhipu AI. Their AI services natively support the Anthropic protocol with no additional configuration.
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Creates a message for the given chat conversation using Anthropic's native message format. |
Image generation
Protocol: Alibaba Cloud Model Studio
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Generates an image using text-to-image synthesis. |
|
|
|
POST |
Generates an image using image-to-image synthesis. |
|
|
|
POST |
Performs image-to-image outpainting. |
|
|
|
POST |
Generates a virtual model image. |
|
|
|
POST |
Generates a background image. |
|
|
|
GET/POST/PUT/PATCH/DELETE |
Manages asynchronous tasks. |
Protocol: OpenAI compatibility
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Generates an image. |
|
|
|
POST |
Edits an image. |
|
|
|
POST |
Creates a variation of a given image. |
Protocol: ComfyUI
|
Route name |
Path |
Method |
Description |
|
|
|
GET |
Provides a WebSocket endpoint for real-time communication with the server. |
|
|
|
GET |
Lists available embeddings. |
|
|
|
GET |
Lists extensions that register a web directory. |
|
|
|
GET |
Gets server features and capabilities. |
|
|
|
GET |
Lists available model types. |
|
|
|
GET |
Gets models from a specific folder. |
|
|
|
GET |
Gets a map of custom node modules and their associated template workflows. |
|
|
|
POST |
Uploads an image. |
|
|
|
POST |
Uploads a mask. |
|
|
|
GET |
Views an image with multiple options. |
|
|
|
GET |
Gets metadata for a model. |
|
|
|
GET |
Gets system information, such as Python version, devices, and VRAM. |
|
|
|
GET/POST |
Gets the current queue status and execution information, or submits a prompt. |
|
|
|
GET |
Gets details of all node types. |
|
|
|
GET |
Gets details for a specific node type. |
|
|
|
GET/POST |
Gets the queue history. |
|
|
|
GET |
Gets the queue history for a specific prompt. |
|
|
|
GET/POST |
Gets the queue status or manages queue operations. |
|
|
|
POST |
Stops the current workflow execution. |
|
|
|
POST |
Frees memory by unloading specified models. |
|
|
|
GET |
Lists user data files in a specified directory. |
|
|
|
GET |
Lists files and directories in a structured format. |
|
|
|
GET/POST/DELETE |
Gets, uploads, updates, or deletes a specific user data file. |
|
|
|
POST |
Moves or renames a user data file. |
|
|
|
GET/POST |
Gets user information or creates a new user. |
Video generation
Alibaba Cloud Model Studio protocol
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Generates a video. |
|
|
|
POST |
Generates a video from an image. |
|
|
|
GET/POST/PUT/PATCH/DELETE |
Manages asynchronous tasks. |
Speech synthesis
Alibaba Cloud Model Studio
|
Route name |
Path |
Method |
Description |
|
|
|
GET |
Synthesizes audio from text. |
OpenAI compatible (OpenAI/v1)
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Synthesizes audio from text. |
Embedding
Protocol: OpenAI compatible (OpenAI/v1)
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Generates an embedding vector representing the input text. |
Text reranking (rerank)
Protocol: Alibaba Cloud Model Studio text reranking
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Reranks the given documents based on query relevance. |
Protocol: vLLM
|
Route name |
Path |
Method |
Description |
|
|
|
POST |
Reranks the given documents based on query relevance. |
Others
Protocol: OpenAI-compatible (OpenAI/v1)
|
Route name |
Path |
Method |
Description |
|
|
|
GET, POST, PUT, PATCH, DELETE |
Manage models. |
|
|
|
GET, POST, PUT, PATCH, DELETE |
Manage files. |
|
|
|
GET, POST, PUT, PATCH, DELETE |
Manage batches. |
|
|
|
GET, POST, PUT, PATCH, DELETE |
Manage fine-tuning jobs. |
AI services from providers supporting the Anthropic protocol (Alibaba Cloud Model Studio, Claude, Moonshot AI, Zhipu AI) automatically support multiple protocols, including OpenAI-compatible and Anthropic. Select the appropriate protocol when creating a Model API.
Intelligent routing
Different LLMs excel in specific domains:
-
Code generation: The Qwen-Coder series excels in code understanding and generation.
-
Mathematical reasoning: The Qwen-Math series excels at solving complex mathematical problems.
-
Translation: The Qwen-MT series is optimized for multilingual translation.
-
Rapid response: The Qwen-Flash series offers ultra-low latency for time-sensitive scenarios.
-
Complex reasoning: Models such as Qwen-Max and DeepSeek-R1 have an edge in complex reasoning.
Manually routing requests to these models presents challenges:
-
Fragmented user experience: Users must manually select models without guidance on which fits best.
-
Inefficient resource utilization: High-cost models handle simple tasks that cheaper models could serve.
-
High development complexity: Application-layer routing logic increases development and maintenance costs.
-
No unified endpoint: Multiple model deployments lead to scattered APIs that are hard to manage.
The AI Gateway intelligent routing feature uses semantic analysis to automatically route requests to the best-fit model based on these intent classifications:
|
Intent code |
Description |
Scenarios |
|
|
Code writing and debugging |
Programming questions, code generation, bug fixes |
|
|
Mathematical computation and reasoning |
Mathematical proofs, formula derivation, statistical analysis |
|
|
Multilingual translation |
Document translation, real-time translation, localization |
|
|
Fast and simple responses |
Simple Q&A, information lookups, everyday conversations |
|
|
Complex reasoning |
Deep analysis, complex decision-making, long-context understanding |
Edit Model API
Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
-
In the navigation pane, click Model API, then click Edit in the Actions column of the target API. Modify the parameters in the Edit Model API panel. Parameter descriptions: Create a Model API.
-
Click OK.
Debug Model API
Currently, you can only debug text generation using the /v1/chat/completions endpoint.
Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
-
In the left navigation pane, select Model API and click Debug in the Actions column for the target API.
-
In the Debug panel, select a domain name and model. If needed, enable the Streaming switch and configure parameters and custom parameters. On the Model Returned tab, enter your content and click Send.
Parameters: system prompt (system instruction, up to 100 characters), max_tokens (0–8192, default: 1024), top_p (0–1, default: 0.95), and temperature (0–2, default: 1; adjust with caution as it significantly impacts results). The cURL command and raw output tabs are available on the right.
Delete a model API
Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
-
In the navigation pane, select Model API. In the row for the target API, click Delete in the Actions column. In the confirmation dialog box that appears, enter the API name and click Delete.