Create a gateway instance

更新时间:
复制 MD 格式

This topic describes how to create an AI Gateway instance.

Procedure

  1. Log on to the AI Gateway console.

  2. In the navigation pane on the left, choose Instance. In the top menu bar, select a region.

  3. Click Create Instance. On the AI Gateway purchase page, configure the required settings, then click Buy Now.

    Configuration Item

    Description

    Product Type

    Supports Dedicated Instance (Pay-as-you-go), Dedicated Instance (Subscription), and Serverless (Pay-as-you-go). For billing details of these types, see Billing Overview.

    Region

    Select the region for your gateway.

    Important

    You cannot change the region after the resource is created.

    GatewayName

    Enter a custom name for your gateway. We recommend using environment names or environment plus business domain names, such as test or order-prod. The name can be up to 64 characters long.

    GatewaySpec

    Select node specifications based on your needs. For capacity details of different gateway specifications, see Gateway Types. Serverless instances do not have gateway specifications.

    resource

    Select the default resource group or an existing one. To create a new resource group, click Create Resource Group.

    Note

    Use resource groups to categorize and manage resources under your Alibaba Cloud account. This lets you manage permissions, deploy resources, and monitor resources by group instead of handling each resource individually.

    Network Type

    Supports three access types: Public Network, Private Network, and Internet + Private Network.

    • Internet: Accessing the gateway over the Internet incurs data transfer costs. Internet traffic is billed through Cloud Data Transfer (CDT) using Border Gateway Protocol (BGP) (multi-line) mode. For more information, see Internet Traffic.

    • Private Network: No data transfer costs apply.

    • Internet + Private Network:

      Accessing the gateway over the Internet incurs data transfer costs billed through CDT using BGP (multi-line) mode. Accessing over the private network incurs no data transfer costs.

    VPC

    Select the VPC where the gateway instance runs. To create a new VPC, go to the VPC Management Console.

    Note
    • The VPC of the gateway must match the VPC of your backend services.

    • When selecting a VPC, the system shows whether containers or Nacos clusters exist in it to help avoid incorrect selections.

    Zone Selection

    Select Automatic Allocation or Manual Selection.

    • Automatic Allocation: Select one vSwitch. The system automatically allocates two zones to deploy gateway nodes.

    • Manual Selection: Manually select the zones and vSwitches for deploying gateway nodes.

    vSwitch

    Select the vSwitch where the gateway instance runs. To create a new vSwitch, go to the VPC Management Console.

    Simple Log Service

    Select Use Simple Log Service (SLS) to enable SLS and activate log delivery for log analysis and dashboards. For more information, see Enable Gateway Log Delivery.

    Service-linked Role

    Automatically created. This role allows AI Gateway to access other Alibaba Cloud services.

  4. On the Confirm Order page, review your AI Gateway configuration and click Buy Now.

    Note

    Creating a gateway instance takes 1 to 5 minutes.

  5. Return to the AI Gateway Instance page. Verify that the gateway information is correct and the Status is Running. This indicates that the gateway was created successfully.

Advanced Features

When creating a gateway instance, if you want to use log data for monitoring and analysis or compress request and response payloads to reduce traffic, follow the instructions below. Note that Gzip hardware acceleration can only be enabled during gateway creation—it cannot be enabled afterward. Log service has no such restriction.

Enable Gzip Hardware Acceleration

Gzip hardware acceleration uses dedicated hardware to compress and decompress data quickly. By offloading Gzip decompression tasks from the CPU to specialized hardware, this feature significantly improves processing efficiency and reduces CPU load.

Note

Serverless instances do not support Gzip hardware acceleration.

Procedure

  1. On the AI Gateway purchase page, complete the following configurations before clicking Buy Now:

    • Region: Gzip hardware acceleration is available in Hangzhou, Beijing, Shanghai, Shenzhen, Ulanqab, China (Hong Kong), and Singapore.

      Some zones within supported regions may not support this feature. Refer to the product purchase page for the latest availability.
    • GatewaySpec: Select a specification of aigw.medium.x1 or higher.

    • Gzip Hardware Acceleration: Check to enable Gzip hardware acceleration.

    • Available Zone: Select a zone that supports Gzip hardware acceleration and choose a vSwitch.

  2. After the instance is created, click the instance ID or name. In the navigation pane on the left, click Parameters. In the Gateway Engine Parameters section, edit the EnableGzipHardwareAccelerate parameter.

    Note

    If you did not select Enable Gzip Hardware Acceleration during purchase, you cannot enable this setting later.

  3. After enabling this feature, ensure your client can handle Gzip-compressed data. Supported clients must include Accept-Encoding: gzip in the request header.

Performance Reference

How much traffic can Gzip compression save compared to uncompressed traffic?

Compression ratio—the ratio of compressed data size to original data size—depends heavily on the data itself. A lower ratio means better compression. A higher ratio means less effective compression.

Generally, Gzip works best on data with repetitive patterns or structures, such as text containing letters, words, and punctuation, resulting in lower compression ratios. Conversely, highly random or high-entropy data—such as images, videos, or already compressed files—has low internal redundancy, leading to higher compression ratios and limited savings.

Due to differences in business characteristics, customers experience varying compression ratios with Gzip. Based on statistics from core regions where Gzip is enabled, most instances achieve compression ratios between 10% and 50%. This means users typically save more than 50% of their traffic after enabling Gzip.

image

With Gzip already enabled, how much instance resources can hardware acceleration save?

Enabling Gzip hardware acceleration uses dedicated hardware for compression, reducing CPU usage. The stress test data below compares CPU consumption between a single-node instance with Gzip hardware acceleration and a four-node instance using software-based Gzip, both handling the same QPS.

For example, the compressed data is a JSON payload of approximately 120 KB:

QPS

Hardware-accelerated Gzip / aigw.medium.x1 / Single-node CPU Usage

Software Gzip / aigw.medium.x1 / 4-node CPU Usage

2000

9%

11%

5000

26%

28%

10000

56%

56%

13000

69%

72%

The table shows that CPU usage for hardware-accelerated Gzip on a single node is nearly equal to that of software Gzip on four nodes. This means you can handle the same workload with one node instead of four, saving approximately 75% of instance resources.

Enable Gateway Log Delivery

To collect, store, and analyze operational logs from your gateway, enable Simple Log Service (SLS) during gateway instance creation for log analysis and dashboard monitoring.

While creating the gateway instance, select Use Simple Log Service (SLS). The system will activate SLS and enable gateway log delivery.

After enabling log delivery, go to Observability > Log Center to view gateway logs.

Log Field Descriptions

Field

Type

Description

__time__

long

The time when the log was generated.

cluster_id

string

The ID of the AI Gateway instance.

ai_log

json

A JSON object that contains log fields for Model API, Agent API, and MCP API. This field is empty for other API types.

  • api: The name of the AI API.

  • cache_status: Indicates whether a request hit the cache when content caching is enabled for a Model API.

  • consumer: The identity of the consumer. This field is populated when consumer authentication is enabled.

  • fallback_from: The route from which the request fell back. This field is populated when a fallback policy is enabled for a Model API.

  • input_token: The number of input tokens in the LLM request.

  • llm_first_token_duration: The time to first token (TTFT) for the LLM request.

  • llm_service_duration: The end-to-end response time for the LLM request.

  • model: The name of the model used in the LLM request.

  • output_token: The number of output tokens in the LLM response.

  • response_type: The response type of the LLM request, such as streaming or non-streaming.

  • safecheck_status: The Content Moderation result for the LLM request.

  • token_ratelimit_status: Indicates whether the request was blocked by token-based rate limiting.

authority

string

The value of the Host header in the request.

bytes_received

long

The size of the request body in bytes, excluding the header.

bytes_sent

long

The size of the response body in bytes, excluding the header.

downstream_local_address

string

The address of the gateway pod.

downstream_remote_address

string

The address of the client that connects to the gateway.

duration

long

The total request processing time in milliseconds, measured from when the gateway receives the first byte from the client until it sends the last byte of the response.

method

string

The HTTP method.

path

string

The path in the HTTP request.

protocol

string

The HTTP protocol version.

request_duration

long

The time in milliseconds from when the gateway receives the first byte of the request from the client until it receives the last byte.

request_id

string

A unique ID that the gateway generates for each request. This ID is included in the x-request-id header. You can use this field to log and troubleshoot requests.

requested_server_name

string

The server name used for the SSL connection.

response_code_details

string

Additional context for the response code. For example, via_upstream indicates the backend service returned the response code, and route_not_found indicates that the gateway could not find a matching route.

response_tx_duration

long

The time in milliseconds from when the gateway receives the first byte from the upstream service to when it sends the last byte to the client.

route_name

string

The route name.

start_time

string

The start time of the request. The time is in UTC.

trace_id

string

The trace ID.

upstream_cluster

string

The upstream cluster.

upstream_host

string

The IP address of the upstream host.

upstream_local_address

string

The local address used to connect to the upstream service.

upstream_service_time

long

The request processing time in milliseconds for the upstream service. This duration includes network latency and the service's own processing time.

upstream_transport_failure_reason

string

The reason for the upstream connection failure.

user_agent

string

The value of the User-Agent header in the request.

x_forwarded_for

string

The value of the x-forwarded-for header, which typically contains the client's real IP address.