Create an AI Gateway instance-API Gateway(API Gateway)-阿里云帮助中心

Create an AI Gateway instance by selecting the product type, region, network access type, and VPC.

Procedure

Log on to the AI Gateway console.
In the navigation pane on the left, choose Instance. In the top menu bar, select a region.

Click Create Instance. On the AI Gateway purchase page, configure the required settings, then click Buy Now.

Configuration Item	Description
Product Type	Supports Dedicated Instance (Pay-as-you-go), Dedicated Instance (Subscription), and Serverless (Pay-as-you-go). For billing details, see Billing Overview.
Region	Select the region for your gateway. Important You cannot change the region after the resource is created.
GatewayName	Enter a custom name for your gateway, such as an environment or business domain name like test or order-prod. Maximum length: 64 characters.
GatewaySpec	Select the node specifications for your gateway. For capacity details, see Gateway Types. Serverless instances do not require gateway specifications.
resource	Select the default resource group or an existing one. To create a new resource group, click Create Resource Group. Note Resource groups let you categorize resources under your Alibaba Cloud account and manage permissions, deployments, and monitoring by group.
Network Type	Supports three access types: Public Network, Private Network, and Internet + Private Network. Internet: Accessing the gateway over the Internet incurs data transfer costs. Internet traffic is billed through Cloud Data Transfer (CDT) using Border Gateway Protocol (BGP) (multi-line) mode. For more information, see Internet Traffic. Private Network: No data transfer costs apply. Internet + Private Network: Accessing the gateway over the Internet incurs data transfer costs billed through CDT using BGP (multi-line) mode. Accessing over the private network incurs no data transfer costs.
VPC	Select the VPC where the gateway instance runs. To create a new VPC, go to the VPC Management Console. Note The VPC of the gateway must match the VPC of your backend services. When selecting a VPC, the system shows whether containers or Nacos clusters exist in it to help avoid incorrect selections.
Zone Selection	Select Automatic Allocation or Manual Selection. Automatic Allocation: Select one vSwitch. The system automatically allocates two zones to deploy gateway nodes. Manual Selection: Manually select the zones and vSwitches for deploying gateway nodes.
vSwitch	Select the vSwitch where the gateway instance runs. To create a new vSwitch, go to the VPC Management Console.
Simple Log Service	Select Use Simple Log Service (SLS) to enable log delivery for log analysis and dashboards. For more information, see Enable Gateway Log Delivery.
Service-linked Role	Automatically created. This role allows AI Gateway to access other Alibaba Cloud services.

On the Confirm Order page, review your AI Gateway configuration and click Buy Now.

Note
Creating a gateway instance takes 1 to 5 minutes.
Return to the AI Gateway Instance page. Verify that the gateway information is correct and the Status is Running. This indicates that the gateway was created successfully.

Advanced Features

You can enable log delivery for monitoring or Gzip hardware acceleration to compress request and response payloads. Gzip hardware acceleration can only be enabled during gateway creation and cannot be added afterward. Log service has no such restriction.

Enable Gzip Hardware Acceleration

Gzip hardware acceleration offloads compression and decompression from the CPU to dedicated hardware, improving processing efficiency and reducing CPU load.

Note

Serverless instances do not support Gzip hardware acceleration.

Procedure

On the AI Gateway purchase page, complete the following configurations before clicking Buy Now:
- Region: Gzip hardware acceleration is available in Hangzhou, Beijing, Shanghai, Shenzhen, Ulanqab, China (Hong Kong), and Singapore.
  
  Some zones within supported regions may not support this feature. Refer to the product purchase page for the latest availability.
- GatewaySpec: Select a specification of aigw.medium.x1 or higher.
- Gzip Hardware Acceleration: Check to enable Gzip hardware acceleration.
- Available Zone: Select a zone that supports Gzip hardware acceleration and choose a vSwitch.
After the instance is created, click the instance ID or name. In the navigation pane on the left, click Parameters. In the Gateway Engine Parameters section, edit the EnableGzipHardwareAccelerate parameter.

Note
If you did not select Enable Gzip Hardware Acceleration during purchase, you cannot enable this setting later.
After enabling this feature, ensure your client can handle Gzip-compressed data. Supported clients must include Accept-Encoding: gzip in the request header.

Performance Reference

How much traffic can Gzip compression save compared to uncompressed traffic?

Compression ratio—the ratio of compressed data size to original data size—depends heavily on the data itself. A lower ratio means better compression. A higher ratio means less effective compression.

Gzip works best on data with repetitive patterns, such as text, yielding lower compression ratios. Highly random or high-entropy data—such as images, videos, or already compressed files—compresses poorly.

Compression ratios vary by workload. Based on statistics from core regions, most instances achieve ratios between 10% and 50%, saving more than 50% of traffic.

With Gzip already enabled, how much instance resources can hardware acceleration save?

Gzip hardware acceleration reduces CPU usage by offloading compression to dedicated hardware. The following stress test compares CPU consumption between a single-node instance with hardware-accelerated Gzip and a four-node instance with software Gzip at the same QPS.

For example, the compressed data is a JSON payload of approximately 120 KB:

QPS	Hardware-accelerated Gzip / aigw.medium.x1 / Single-node CPU Usage	Software Gzip / aigw.medium.x1 / 4-node CPU Usage
2000	9%	11%
5000	26%	28%
10000	56%	56%
13000	69%	72%

CPU usage for hardware-accelerated Gzip on a single node is nearly equal to software Gzip on four nodes, saving approximately 75% of instance resources.

Enable Gateway Log Delivery

Enable Simple Log Service (SLS) during gateway creation to collect and analyze operational logs with dashboards.

During gateway creation, select Use Simple Log Service (SLS) to activate log delivery.

After enabling log delivery, go to Observability > Log Center to view gateway logs.

Log Field Descriptions

Field	Type	Description
__time__	long	The time when the log was generated.
cluster_id	string	The ID of the AI Gateway instance.
ai_log	json	A JSON object that contains log fields for Model API, Agent API, and MCP API. This field is empty for other API types. api: The name of the AI API. cache_status: Indicates whether a request hit the cache when content caching is enabled for a Model API. consumer: The identity of the consumer. This field is populated when consumer authentication is enabled. fallback_from: The route from which the request fell back. This field is populated when a fallback policy is enabled for a Model API. input_token: The number of input tokens in the LLM request. llm_first_token_duration: The time to first token (TTFT) for the LLM request. llm_service_duration: The end-to-end response time for the LLM request. model: The name of the model used in the LLM request. output_token: The number of output tokens in the LLM response. response_type: The response type of the LLM request, such as streaming or non-streaming. safecheck_status: The Content Moderation result for the LLM request. token_ratelimit_status: Indicates whether the request was blocked by token-based rate limiting.
authority	string	The value of the Host header in the request.
bytes_received	long	The size of the request body in bytes, excluding the header.
bytes_sent	long	The size of the response body in bytes, excluding the header.
downstream_local_address	string	The address of the gateway pod.
downstream_remote_address	string	The address of the client that connects to the gateway.
duration	long	The total request processing time in milliseconds, measured from when the gateway receives the first byte from the client until it sends the last byte of the response.
method	string	The HTTP method.
path	string	The path in the HTTP request.
protocol	string	The HTTP protocol version.
request_duration	long	The time in milliseconds from when the gateway receives the first byte of the request from the client until it receives the last byte.
request_id	string	A unique ID that the gateway generates for each request. This ID is included in the `x-request-id` header. You can use this field to log and troubleshoot requests.
requested_server_name	string	The server name used for the SSL connection.
response_code_details	string	Additional context for the response code. For example, `via_upstream` indicates the backend service returned the response code, and `route_not_found` indicates that the gateway could not find a matching route.
response_tx_duration	long	The time in milliseconds from when the gateway receives the first byte from the upstream service to when it sends the last byte to the client.
route_name	string	The route name.
start_time	string	The start time of the request. The time is in UTC.
trace_id	string	The trace ID.
upstream_cluster	string	The upstream cluster.
upstream_host	string	The IP address of the upstream host.
upstream_local_address	string	The local address used to connect to the upstream service.
upstream_service_time	long	The request processing time in milliseconds for the upstream service. This duration includes network latency and the service's own processing time.
upstream_transport_failure_reason	string	The reason for the upstream connection failure.
user_agent	string	The value of the User-Agent header in the request.
x_forwarded_for	string	The value of the `x-forwarded-for` header, which typically contains the client's real IP address.