Knative FAQ-Container Service for Kubernetes(ACK)-阿里云帮助中心

This topic provides answers to frequently asked questions (FAQs) about using Knative in Container Service for Kubernetes (ACK) clusters.

What are the differences between Alibaba Cloud Knative and open-source Knative?
Which Ingress should I choose when I install Knative?
What permissions are required to use Knative with a RAM user or role?
How long does it take for a pod in Knative to scale to zero?
How do I use GPU resources in Knative?
How do I use GPU sharing in Knative?
How can I reduce cold start latency when Knative scales instances to zero?
Is the Activator component of ACK Knative billed?
How do I configure the listening port for a Knative service?

Differences between Alibaba Cloud and open-source Knative

Alibaba Cloud Knative is compatible with open-source Knative and provides a range of enhanced capabilities for O&M, usability, elasticity, gateways, event-driven architecture, and monitoring and alerting. For more information, see Comparison between Alibaba Cloud Knative and open-source Knative.

Choosing an Ingress for Knative

Alibaba Cloud Knative supports three types of Ingresses: Application Load Balancer (ALB), Alibaba Cloud Service Mesh (ASM), and Kourier. ALB focuses on application-layer load balancing scenarios. ASM delivers service mesh (Istio) capabilities. Choose Kourier if you only need basic Ingress features. For more information, see Choose an Ingress for Knative.

Required permissions for RAM users and roles

You need permissions to access all namespaces in the cluster. Grant the required permissions as follows.

Log on to the ACK console. In the left navigation pane, click Authorizations.
Click the RAM Users tab. In the list of RAM users, find the target RAM user and click Modify Permissions in the Actions column.
In the Add Permissions section, select the target cluster, select All namespaces for Namespace, and then complete the authorization.

Pod scale-to-zero duration

The time it takes for a pod to scale to zero depends on the following three parameters:

stable-window: A time window during which metrics are observed and evaluated before a scale-in operation is performed. The system does not take immediate action during this window.
scale-to-zero-grace-period: A grace period before scaling to zero. During this period, the system does not immediately stop or delete the last pod even if no new requests are received. This helps handle sudden traffic bursts.
scale-to-zero-pod-retention-period: The retention period for the last pod before it scales to zero. This allows the system to quickly respond to sudden traffic bursts without a cold start.

A pod scales to zero only if the following three conditions are met:

No requests are received within the stable-window.
The retention period specified by scale-to-zero-pod-retention-period has elapsed.
The time that the Serverless Kubernetes (ASK) service has been in proxy mode exceeds the duration specified by scale-to-zero-grace-period, at which point the pod starts to scale down.

A pod is retained for no longer than the result of the formula: stable-window + Max("scale-to-zero-grace-period", "scale-to-zero-pod-retention-period"). To enforce a specific retention period for the last pod, use the scale-to-zero-pod-retention-period parameter.

Using GPU resources in Knative

In the Knative Service configuration, add the k8s.aliyun.com/eci-use-specs field under spec.template.metadata.annotation to specify a GPU-accelerated instance type, and then declare the required GPU resources by using the nvidia.com/gpu field under spec.containers.resources.limits.

For more information, see Use GPU resources.

Using GPU sharing in Knative

You can refer to Run a GPU sharing scheduling example to enable GPU sharing scheduling for nodes. Then, in the Knative Service, configure the resource limits by using the aliyun.com/gpu-mem field. For more information, see Enable GPU sharing scheduling.

Reducing cold start latency

By default, open-source Knative scales the number of application instances to zero when there are no incoming requests. This helps reduce the cost of running idle instances. When a new request arrives, a new instance is assigned to the application. This process involves IaaS resource allocation and scheduling, application image pulling, and application startup, causing a long latency known as a cold start.

If your business is sensitive to cold start latency, you can use one of the following two solutions:

Configure a reserved instance: Keep a low-cost, low-specification burstable performance instance running to balance cost and cold start latency. When the first request arrives, the reserved instance serves the request while a default-specification instance is scaled up. After the default instance is ready, all new requests are routed to it. The reserved instance is automatically released after it processes its requests. For more information, see Configure a reserved instance.
Use the image cache feature of Elastic Container Instance (ECI): You can create a cache snapshot for the image in advance, and then create ECI instances (pods) based on the snapshot to avoid or reduce image layer downloads. This accelerates instance creation. For more information, see Use image acceleration to speed up instance creation.

Activator component billing

Yes. The Activator component is a data plane component that runs as pods and consumes your instance resources.

Configuring the listening port

The listening port of your application must match the containerPort in the Knative Service, which is 8080 by default. To configure a custom listening port, see Configure a custom listening port.

Contents