Model APIs

更新时间:
复制 MD 格式

Model APIs let AI application teams configure and debug the AI Gateway and pre-configure plugins for AI proxy, AI observability, consumer authorization, and content moderation.

Create Model API

  1. Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.

  2. In the navigation pane on the left, choose Model API, then click Create Model API.

  3. Select a use case and click Create.

    The use case determines the available Protocol options and default routes. Supported use cases:

    • Text Generation: Supports OpenAI-compatible and Anthropic protocols.

    • Image Generation

    • Video Generation

    • Speech Synthesis

    • Embedding

    • Rerank

    • Others

  4. Configure basic information.

    In the dialog, complete the Select a scenario. step, then configure the Create Model API form:

    • Protocol: Each protocol provides a set of built-in default routes for the selected use case, which allows you to quickly generate compatible interfaces for services such as OpenAI, DashScope, and vLLM.

      Note

      Protocol conversion may change the Token statistics structure. For example, the input token statistics for the Alibaba Cloud Model Studio (DashScope) protocol include cache tokens, while the input token statistics for the Anthropic protocol do not include cache tokens. Pay attention to the differences in statistical metrics across protocols when viewing observability data.

    • API Name: A unique name within your account. Up to 64 characters; supports letters, digits, underscores (_), and hyphens (-).

    • Domain Name: Select one or more domain names for API access. Each domain name + BasePath combination must be unique.

      If you do not have a domain name, click the Add Domain Name button to create one.
    • base path: The base request path of the API. Defaults to /. Optionally enable Remove during backend forwarding.

      Note

      If you enable Remove during backend forwarding, the system strips the base path from the request before forwarding to the backend. For example:

      • The base path is set to /api.

      • The original request path is /api/users.

      • The path forwarded to the backend service becomes /users.

    • AI Request Monitoring: Enables metrics, logs, and traces. Logging and tracing require SLS. Select Record request content and Log response to record model requests and responses.

      Important

      When enabled, all AI request content including the request body is recorded to the access log. Ensure SLS is properly configured and data security safeguards are in place.

    • Model Service: Supports Single-model Service, Multi-model Service (by model name), Multi-model Service (by proportion), Multiple Services (by observability metrics), and Multiple Services (intelligent routing).

      • Single-model Service: Select one AI service and set the Model Name. The model name can be passed through or rewritten.

      • Multi-model Service (by model name): Routes requests to different services by matching the model name in the request body with a rule. The rule supports wildcards such as ? and *. For example, qwen-* can match qwen-max and qwen-long.

      • Multi-model Service (by proportion): Select multiple AI services and set their weights. The model name can be passed through or rewritten.

      • Multi-Service (by Metrics): Automatically routes requests to the optimal service based on observability metrics such as response time and success rate, without manual weight configuration.

      • Multi-Model Service (Smart Routing): The system automatically selects the most suitable model based on model characteristics. Intelligent Routing.

        Note

        To use the Multiple Services (by observability metrics) and Multiple Services (intelligent routing) features, you must upgrade AI Gateway to version 2.1.15 or later.

    • fallback: You can Enable this feature to configure a sequence of fallback policies. The same service can be used in multiple policies.

    • first packet timeout: Maximum wait time (ms) for the first data packet in a streaming response. Set to 0 to disable.

    • resource: Select the default resource group, an existing one, or create a new one for grouping, authorizing, and monitoring resources.

      To create a new Resource Group, click Create Resource Group.
  5. Review your configuration and click OK.

Default route

Each protocol and use case combination generates a set of default routes.

Text generation

Protocol: OpenAI compatible (OpenAI/v1)

Route name

Path

Method

Description

create-chat-completion

/v1/chat/completions

POST

Creates a model response for the given chat conversation.

create-completion

/v1/completions

POST

Creates a completion for the provided prompt and parameters.

Protocol: Anthropic (Anthropic)

The Anthropic protocol provides native message formats for Anthropic models such as Claude. Ideal for applications that use the native Anthropic API format.

Note

Providers supporting this protocol include Alibaba Cloud Model Studio (Qwen), Claude, Moonshot AI, and Zhipu AI. Their AI services natively support the Anthropic protocol with no additional configuration.

Route name

Path

Method

Description

create-message

/v1/messages

POST

Creates a message for the given chat conversation using Anthropic's native message format.

Image generation

Protocol: Alibaba Cloud Model Studio

Route name

Path

Method

Description

dashscope-text-to-image-synthesis

/api/v1/services/aigc/text2image/image-synthesis

POST

Generates an image using text-to-image synthesis.

dashscope-image-to-image-synthesis

/api/v1/services/aigc/image2image/image-synthesis

POST

Generates an image using image-to-image synthesis.

dashscope-image-to-image-outpainting

/api/v1/services/aigc/image2image/out-painting

POST

Performs image-to-image outpainting.

dashscope-virtual-model-generation

/api/v1/services/aigc/virtualmodel/generation

POST

Generates a virtual model image.

dashscope-background-generation

/api/v1/services/aigc/background-generation/generation

POST

Generates a background image.

tasks

/api/v1/tasks

GET/POST/PUT/PATCH/DELETE

Manages asynchronous tasks.

Protocol: OpenAI compatibility

Route name

Path

Method

Description

openai-image-generation

/api/v1/images/generations

POST

Generates an image.

openai-image-edit

/api/v1/images/edits

POST

Edits an image.

openai-image-variation

/api/v1/images/variations

POST

Creates a variation of a given image.

Protocol: ComfyUI

Route name

Path

Method

Description

comfyui-websocket

/ws

GET

Provides a WebSocket endpoint for real-time communication with the server.

comfyui-embeddings

/embeddings

GET

Lists available embeddings.

comfyui-extensions

/extensions

GET

Lists extensions that register a web directory.

comfyui-features

/features

GET

Gets server features and capabilities.

comfyui-models

/models

GET

Lists available model types.

comfyui-models-folder

/models/{folder}

GET

Gets models from a specific folder.

comfyui-workflow-templates

/workflow_templates

GET

Gets a map of custom node modules and their associated template workflows.

comfyui-upload-image

/upload/image

POST

Uploads an image.

comfyui-upload-mask

/upload/mask

POST

Uploads a mask.

comfyui-view

/view

GET

Views an image with multiple options.

comfyui-view-metadata

/view_metadata/

GET

Gets metadata for a model.

comfyui-system-stats

/system_stats

GET

Gets system information, such as Python version, devices, and VRAM.

comfyui-prompt

/prompt

GET/POST

Gets the current queue status and execution information, or submits a prompt.

comfyui-object-info

/object_info

GET

Gets details of all node types.

comfyui-object-info-class

/object_info/{node_class}

GET

Gets details for a specific node type.

comfyui-history

/history

GET/POST

Gets the queue history.

comfyui-history-prompt-id

/history/{prompt_id}

GET

Gets the queue history for a specific prompt.

comfyui-queue

/queue

GET/POST

Gets the queue status or manages queue operations.

comfyui-interrupt

/interrupt

POST

Stops the current workflow execution.

comfyui-free

/free

POST

Frees memory by unloading specified models.

comfyui-userdata

/userdata

GET

Lists user data files in a specified directory.

comfyui-userdata-v2

/v2/userdata

GET

Lists files and directories in a structured format.

comfyui-userdata-file

/userdata/{file}

GET/POST/DELETE

Gets, uploads, updates, or deletes a specific user data file.

comfyui-userdata-file-move

/userdata/{file}/move/{dest}

POST

Moves or renames a user data file.

comfyui-users

/users

GET/POST

Gets user information or creates a new user.

Video generation

Alibaba Cloud Model Studio protocol

Route name

Path

Method

Description

dashscope-video-generation-synthesis

/api/v1/services/aigc/video-generation/video-synthesis

POST

Generates a video.

dashscope-image-to-video-synthesis

/api/v1/services/aigc/image2video/video-synthesis

POST

Generates a video from an image.

tasks

/api/v1/tasks

GET/POST/PUT/PATCH/DELETE

Manages asynchronous tasks.

Speech synthesis

Alibaba Cloud Model Studio

Route name

Path

Method

Description

dashscope-text-to-audio-synthesis

/api-ws/v1/inference

GET

Synthesizes audio from text.

OpenAI compatible (OpenAI/v1)

Route name

Path

Method

Description

openai-audio-speech

/api/v1/audio/speech

POST

Synthesizes audio from text.

Embedding

Protocol: OpenAI compatible (OpenAI/v1)

Route name

Path

Method

Description

create-embedding

/v1/embeddings

POST

Generates an embedding vector representing the input text.

Text reranking (rerank)

Protocol: Alibaba Cloud Model Studio text reranking

Route name

Path

Method

Description

rerank

/api/v1/services/rerank/text-rerank/text-rerank

POST

Reranks the given documents based on query relevance.

Protocol: vLLM

Route name

Path

Method

Description

rerank

/v1/rerank

POST

Reranks the given documents based on query relevance.

Others

Protocol: OpenAI-compatible (OpenAI/v1)

Route name

Path

Method

Description

models

/v1/models

GET, POST, PUT, PATCH, DELETE

Manage models.

files

/v1/files

GET, POST, PUT, PATCH, DELETE

Manage files.

batches

/v1/batches

GET, POST, PUT, PATCH, DELETE

Manage batches.

fine-tuning

/v1/fine_tuning

GET, POST, PUT, PATCH, DELETE

Manage fine-tuning jobs.

Note

AI services from providers supporting the Anthropic protocol (Alibaba Cloud Model Studio, Claude, Moonshot AI, Zhipu AI) automatically support multiple protocols, including OpenAI-compatible and Anthropic. Select the appropriate protocol when creating a Model API.

Intelligent routing

Different LLMs excel in specific domains:

  • Code generation: The Qwen-Coder series excels in code understanding and generation.

  • Mathematical reasoning: The Qwen-Math series excels at solving complex mathematical problems.

  • Translation: The Qwen-MT series is optimized for multilingual translation.

  • Rapid response: The Qwen-Flash series offers ultra-low latency for time-sensitive scenarios.

  • Complex reasoning: Models such as Qwen-Max and DeepSeek-R1 have an edge in complex reasoning.

Manually routing requests to these models presents challenges:

  1. Fragmented user experience: Users must manually select models without guidance on which fits best.

  2. Inefficient resource utilization: High-cost models handle simple tasks that cheaper models could serve.

  3. High development complexity: Application-layer routing logic increases development and maintenance costs.

  4. No unified endpoint: Multiple model deployments lead to scattered APIs that are hard to manage.

The AI Gateway intelligent routing feature uses semantic analysis to automatically route requests to the best-fit model based on these intent classifications:

Intent code

Description

Scenarios

Coder

Code writing and debugging

Programming questions, code generation, bug fixes

Math

Mathematical computation and reasoning

Mathematical proofs, formula derivation, statistical analysis

Translation

Multilingual translation

Document translation, real-time translation, localization

Flash

Fast and simple responses

Simple Q&A, information lookups, everyday conversations

Complex

Complex reasoning

Deep analysis, complex decision-making, long-context understanding

Edit Model API

  1. Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.

  2. In the navigation pane, click Model API, then click Edit in the Actions column of the target API. Modify the parameters in the Edit Model API panel. Parameter descriptions: Create a Model API.

  3. Click OK.

Debug Model API

Note

Currently, you can only debug text generation using the /v1/chat/completions endpoint.

  1. Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.

  2. In the left navigation pane, select Model API and click Debug in the Actions column for the target API.

  3. In the Debug panel, select a domain name and model. If needed, enable the Streaming switch and configure parameters and custom parameters. On the Model Returned tab, enter your content and click Send.

    Parameters: system prompt (system instruction, up to 100 characters), max_tokens (0–8192, default: 1024), top_p (0–1, default: 0.95), and temperature (0–2, default: 1; adjust with caution as it significantly impacts results). The cURL command and raw output tabs are available on the right.

Delete a model API

  1. Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.

  2. In the navigation pane, select Model API. In the row for the target API, click Delete in the Actions column. In the confirmation dialog box that appears, enter the API name and click Delete.