ACK release notes for 2024

更新时间:
复制 MD 格式

This document covers the 2024 feature updates and technical changes for Alibaba Cloud Container Service for Kubernetes (ACK) and its sub-products.

Note For the full release notes history, see Release notes.

December 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesOCI artifact signing and signature verification based on Notation and RatifyThe notation-alibabacloud-secret-manager component signs Open Container Initiative (OCI) artifacts stored in Container Registry using keys managed by Key Management Service (KMS). Install Ratify in your cluster to verify image signatures and block images with invalid signatures.All regionsUse Notation and Ratify for OCI artifact signing and signature verification
Storage monitoringMonitor storage resources across your cluster, nodes, pods, and externally mounted volumes using Managed Service for Prometheus. After enabling Managed Service for Prometheus, out-of-the-box dashboards display real-time storage usage.All regionsView storage monitoring information
Workload stability and performance analysis in cost insightsCost insights now identifies stability, performance, and cost risks in your workloads. It sorts pods by resource utilization and provides detailed resource configuration views for pods with Burstable and BestEffort Quality of Service (QoS) classes.All regionsUse cost insights to identify risks for cluster workloads
Multi-dimensional cost aggregation and idle cost policies in the cost APIThe cost API supports new parameters for filtering and aggregating cost data by pod label or node name. Customize idle cost allocation policies and dimensions to manage and optimize costs more flexibly.All regionsCall the Cost V2 API
GPU fault alerting and solutionsACK provides end-to-end GPU fault management: monitoring, diagnostics, alerting, and recovery mechanisms to resolve GPU faults in ACK clusters.All regionsConfigure GPU fault alerting and solutions
Batch task orchestration with Argo WorkflowsArgo Workflows is a Kubernetes-native workflow engine for orchestrating concurrent jobs using YAML or Python. It supports CI/CD pipelines, data processing, and machine learning workloads. Install the Argo Workflows component and use the Argo CLI or console to create and manage workflows.All regionsEnable batch task orchestration
ACK OneGeo-disaster recovery based on ALB multi-cluster gatewaysACK One supports geo-disaster recovery using Application Load Balancer (ALB) multi-cluster gateways to protect against region-level disasters such as floods and earthquakes. Note that this may increase response latency, resource costs, and maintenance costs.All regionsUse ALB multi-cluster gateways of ACK One to implement geo-disaster recovery
ACK EdgeVirtual nodesACK Edge clusters now support virtual nodes. Schedule pods directly to elastic container instances that act as virtual nodes—no need to reserve or maintain node pools. This improves elasticity and reduces resource costs compared to pre-provisioning Elastic Compute Service (ECS) instances.All regions
P2P accelerationACK Edge clusters support P2P acceleration to speed up image pulls and reduce application deployment time.All regionsInstall a P2P acceleration agent in an ACK cluster
Kubernetes 1.30 supportACK Edge clusters now support Kubernetes 1.30.All regionsRelease notes for ACK Edge of Kubernetes 1.30
ACK LingjunImage accelerationThe aliyun-acr-acceleration-suite component enables on-demand image loading in Lingjun clusters. It automatically converts source images to accelerated images and decompresses data on demand, so pods start without downloading or decompressing the full image.All regionsaliyun-acr-acceleration-suite

November 2024

ProductFeatureDescriptionRegionReferences
Container Service for KuberneteseRDMA supportACK eRDMA Controller enables elastic Remote Direct Memory Access (eRDMA) in your clusters. It manages eRDMA interface (ERI) assignments and lets you specify eRDMA settings in pod configurations.All regionsACK eRDMA Controller
New releases of ack-secret-manager and secrets-store-csi-driver-provider-alibaba-cloudNew versions of ack-secret-manager and secrets-store-csi-driver-provider-alibaba-cloud are available. Install them from the Marketplace page in the ACK console.All regionsUse ack-secret-manager to import OOS encryption parameters and Use csi-secrets-store-provider-alibabacloud to import OOS encryption parameters
Remote Shuffle Service for Spark jobs via Apache CelebornApache Celeborn enables Remote Shuffle Service (RSS) for Spark jobs by processing intermediate data—shuffle data and spilled data—for big data compute engines, improving performance, stability, and flexibility.All regionsUse Celeborn to enable RSS for Spark jobs
Log management for Spark jobsUse Simple Log Service to collect and manage logs from Spark jobs running in ACK clusters.All regionsUse Simple Log Service to collect the logs of Spark jobs
ossfs troubleshootingObject Storage Service (OSS) volumes are Filesystem in Userspace (FUSE) file systems mounted via ossfs. Analyze debug logs or pod logs to troubleshoot ossfs exceptions based on the mode in which ossfs runs.All regionsTroubleshoot OSSFS exceptions
ACK ServerlessCustom CoreDNS configurationsCustomize managed CoreDNS settings for ACK Serverless clusters: specify an external DNS server to improve resolution speed, or add static IP mappings to the local hosts file for domain names with fixed addresses.All regionsConfigure custom parameters for managed CoreDNS
ACK OneNetwork policies for Elastic Container Instance-based pods in registered clustersKubernetes network policies are now supported for Elastic Container Instance (ECI)-based pods in registered clusters. Use network policies to control traffic to specific applications by IP address or port.All regionsUse network policies on elastic container instances
Migration from self-managed Argo CD to ACK One GitOpsUse onectl to migrate clusters, repositories, and applications from a self-managed Argo CD instance to ACK One GitOps in bulk, instead of migrating resources one by one.All regionsMigrate data from self-managed Argo CD to ACK One GitOps
Preemptible ECI creation in registered clustersRegistered clusters now support preemptible elastic container instances. Use them to run short-term jobs or stateless, fault-tolerant applications at reduced cost.All regionsCreate a preemptible elastic container instance
Hybrid disaster recovery based on ALB multi-cluster gatewaysACK One supports hybrid disaster recovery using ALB multi-cluster gateways. Route traffic across clusters deployed in data centers or third-party platforms and perform seamless failovers for active-zone redundancy.All regionsUse MSE multi-cluster gateways to implement hybrid disaster recovery in ACK One
Zone-disaster recovery based on ALB multi-cluster gatewaysACK One ALB multi-cluster gateways work with ACK One GitOps or the multi-cluster application distribution feature to implement zone-disaster recovery and automatically switch traffic when a fault occurs.All regionsZone-disaster recovery based on ALB multi-cluster gateways of ACK One
Cloud-native AI suiteFUSE client monitoring for Fluid JindoRuntimeFluid now collects metrics from multiple JindoRuntime caching engines and FUSE clients, and displays them in out-of-the-box JindoRuntime monitoring dashboards.All regionsEnable and use the Fluid JindoRuntime FUSE client for monitoring

Performance analysis and troubleshooting for large models using PyTorch Profiler

This practice describes how to use PyTorch Profiler with TensorBoard to analyze model performance and optimize training across data loading, data transfer, GPU computing, and model compilation.

All

Performance analysis and troubleshooting for large models using PyTorch Profiler

Performance analysis and optimization for AI applications using Nsight Systems

In deep learning, Nsight Systems and Nsight Compute are commonly used for AI application performance analysis and optimization. This practice describes how to use Nsight Systems for these purposes.

All

Performance analysis and optimization for AI applications using Nsight Systems

ACK EdgeHigh-performance container networksACK Edge clusters now support Terway Edge as a Container Network Interface (CNI) plug-in to create high-performance underlay networks for intra-cluster communication.All regionsTerway Edge

October 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesCloud Controller Manager v2.10.0Cloud Controller Manager (CCM) v2.10.0 adds readiness gates support and allows the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-additional-resource-tags annotation to modify tags on existing load balancer instances.All regionsCloud Controller Manager
Elastic container instances for Spark jobsRun Spark jobs on Elastic Container Instance (ECI)-based pods by configuring scheduling policies to target elastic container instances. Pay only for the resources the pods consume, reducing idle resource waste.All regionsUse elastic container instances to run Spark jobs
ACK ServerlessCustom parameter configurations for managed CoreDNSConfigure DNS settings for managed CoreDNS by defining a CustomDNSConfig custom resource (CR).All regionsConfigure custom parameters for managed CoreDNS
ACK OneServerless computing in self-managed Kubernetes clustersACK Virtual Node lets you create serverless pods in self-managed Kubernetes clusters and access elastic cloud compute resources, including both CPUs and GPUs.All regionsUse ACK Virtual Node for serverless computing in self-managed Kubernetes clusters
ALB multi-cluster gatewaysACK One ALB multi-cluster gateways extend ALB Ingress to multi-cluster mode. They work like single-cluster ALB Ingress with a few differences.All regionsOverview of ALB multi-cluster gateways
Cloud-native AI suiteModel inference optimization with TensorRTCompile PyTorch or TensorFlow models to TensorRT format and run them in the TensorRT inference engine to improve inference speed on NVIDIA GPUs.All regions
ACK EdgeRAM Roles for Service Accounts (RRSA)Use RAM Roles for Service Accounts (RRSA) to enforce fine-grained API permission control at the pod level, reducing security risks from shared node permissions.All regionsConfigure RRSA for service accounts to isolate permissions among pods
Managed node poolsACK Edge managed node pools automate O&M tasks including OS CVE patching, kubelet updates, and node restarts, with custom O&M capabilities beyond what standard node pools offer.All regionsOverview of managed node pools
Alert configurationsCentrally manage alerts for ACK Edge clusters across multiple scenarios. Configure alert rules for key cluster resource metrics, core component metrics, and application metrics.All regions
Kubernetes 1.28ACK Edge clusters now support Kubernetes 1.28. To upgrade from version 1.26 to 1.28, submit a ticket to contact the ACK technical team. Upgrades between other versions are not supported.All regionsUpdate an ACK Edge cluster
ACK LingjunNetwork topology-aware schedulingTopology-aware scheduling in Lingjun clusters assigns pods to the same Layer 1 or Layer 2 forwarding domain, reducing network latency and accelerating job completion.All regionsWork with network topology-aware scheduling

September 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesKubernetes 1.31 supportCreate new ACK clusters on Kubernetes 1.31 or upgrade existing clusters from earlier versions.All regionsKubernetes 1.31
Deletion protection for namespaces and ServicesAfter you enable policy governance, turn on deletion protection for business-critical namespaces or Services to prevent accidental deletion.All regionsEnable deletion protection for a namespace or a Service
Tracing for the NGINX Ingress controllerReport NGINX Ingress controller trace data to Managed Service for OpenTelemetry for real-time trace details and topology visualization. Use the monitoring data to troubleshoot and diagnose issues.All regionsEnable tracing for the NGINX Ingress controller
Cost insights for Knative ServicesEnable cost insights for a Knative Service to view its estimated cost in real time and support multi-dimensional cost analysis and allocation.All regionsEnable the cost insights feature in Knative Service
Risk identification with cost insights for cluster workloadsUse cost insights to quickly surface stability, performance, and cost risks in cluster workloads. The feature tracks resource utilization, provides detailed configuration data for Burstable pods, and identifies risks in BestEffort pods.All regionsUse cost insights to identify risks for cluster workloads
Spark Operator for Spark jobsRun and manage Spark jobs in ACK clusters using Spark Operator, giving data engineers an efficient way to handle large-scale data processing workloads.All regionsUse Spark Operator to run Spark jobs
ACK OneArgo CD alertingConfigure custom alert rules for Fleet instances using Managed Service for Prometheus metrics. Dashboards display monitoring information about Fleet instances and the GitOps system.All regionsConfigure ACK One Argo CD alerts
Application distributionDistribute applications from a Fleet instance to multiple associated clusters using configurable distribution policies. Unlike GitOps, this method requires no Git repositories. Use differentiated policies to meet varying deployment requirements across clusters and applications.All regionsApplication distribution overview
Access to Alibaba Cloud DNS PrivateZoneOn-premises networks connected via virtual border router (VBR), IPsec-VPN, or Cloud Connect Network (CCN) can access Alibaba Cloud DNS PrivateZone through a transit router for VPC-based private domain name resolution.All regionsManage access to Alibaba Cloud DNS PrivateZone
Statically provisioned NAS volumes in registered clustersMount statically provisioned NAS (Network Attached Storage) volumes to registered clusters for persistent, shared data storage across pods.All regionsMount a statically provisioned NAS volume
Cloud-native AI suiteAuto recovery for FUSE mount targetsWhen a Filesystem in Userspace (FUSE) daemon crashes during a pod's lifecycle, auto recovery restores data access without restarting the application pod.All regionsEnable the auto recovery feature for FUSE mount targets
Cross-namespace dataset sharingFluid supports data access and cache sharing across namespaces. Cache a dataset once and share it across multiple teams, improving data utilization and enabling collaboration between R&D teams.All regionsShare datasets across namespaces
ACK EdgeEdge Node Service (ENS) managementManage Edge Node Service (ENS) instances deployed across multiple regions and ISPs in a unified, containerized manner. Create ENS disks and Edge Load Balancer instances for cloud-native storage and networking at the edge.All regionsENS management
Service topology management for node poolsExpose an application on an edge node only to the current node or nodes within the same edge node pool, preventing cross-node-group routing failures and improving response reliability.All regionsConfigure a Service topology

August 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesInventory health status monitoring for node instant scalingNode instant scaling now monitors ECS instance inventory health. Check the ConfigMap for inventory health status to assess the health of instance types configured for a node pool and proactively adjust instance type selections.All regionsView the health status of node instant scaling
Multiple update frequencies for auto cluster updateAuto cluster update now supports three update frequency options: Latest Patch Version (patch), Second-Latest Minor Version (stable), and Latest Minor Version (rapid).All regionsAutomatically update a cluster
GPU sharing and memory isolation with MPSMulti-Process Service (MPS) enables GPU sharing and memory isolation for AI applications running Compute Unified Device Architecture (CUDA) workloads. Add specific labels to node pools in the ACK console to enable MPS mode.All regionsUse MPS for GPU sharing and memory isolation
Knative 1.12.5Knative 1.12.5-aliyun.7 is now supported. This version is compatible with Kourier 1.12 and adds support for Container Registry Enterprise Edition and the dashboard for preemptible ECS instances.All regionsKnative release notes
ACK OneMulti-cluster applicationsUse Argo CD ApplicationSets to automatically generate one or more applications from a single orchestration template and deploy them across multiple clusters.All regionsCreate a multi-cluster application
Elastic node pools with custom images in registered clustersUse custom images pre-installed with required software packages to reduce the time for on-cloud nodes to reach the Ready state and accelerate system startup.All regionsBuild an elastic node pool with a custom image
Large-scale workflow creation with Argo Workflows SDK for Python (Hera)Hera is a Python SDK for Argo Workflows that provides an alternative to YAML. Use Hera to orchestrate and test complex workflows in Python, leveraging seamless integration with the Python ecosystem.All regionsUse Argo Workflows SDK for Python to create large-scale workflows
Event-driven CI pipelines based on EventBridgeBuild event-driven continuous integration (CI) pipelines using EventBridge and distributed Argo Workflows. This approach simplifies and accelerates application delivery with high elasticity and low cost.All regionsEvent-driven CI pipelines based on EventBridge
Cloud-native AI suiteAI-powered Q&A assistants with DifyDify integrates enterprise or individual knowledge bases with large language model (LLM) applications. Use it to design customized AI-assisted Q&A solutions for your business.All regionsUse Dify to create a customized AI-powered Q&A assistant for a website
Flowise installation and managementInstall the Flowise component in ACK clusters. Flowise provides a drag-and-drop UI for building LLM applications in a low-code manner, enabling rapid iteration from testing to production.All regions
Qwen2 model inference deployment with TensorRT-LLMDeploy Qwen2 models as inference services using Triton and TensorRT-LLM. Fluid Dataflow handles data preparation during deployment, and Fluid accelerates model loading. The documented example uses the Qwen2-1.5B-Instruct model on A10 GPUs.All regionsUse TensorRT-LLM to deploy a Qwen2 model as an inference service
ACK EdgeCloud-native AI suite supportACK Edge clusters now support the cloud-native AI suite, including AI Dashboard and AI Developer Console for monitoring cluster status and submitting training jobs.All regionsDeploy the cloud-native AI suite

July 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesTracing support in NGINX Ingress controller v1.10.2-aliyun.1NGINX Ingress controller v1.10.2-aliyun.1 adds tracing support via Managed Service for OpenTelemetry.All regionsEnable tracing for the NGINX Ingress controller
Global network policies with Poseidon v0.5.0Poseidon v0.5.0 introduces cluster-level global network policies, enabling network connectivity management across namespaces in ACK clusters.All regionsUse ACK GlobalNetworkPolicy
ContainerOS 3.3ContainerOS 3.3 updates the kernel to version 5.10.134-17.0.2.lifsea8, enables cgroup v2 by default for container resource isolation, and fixes vulnerabilities and defects.All regionsContainerOS image release record
Custom worker RAM roles for node poolsAssign a custom Resource Access Management (RAM) worker role to a node pool at creation time. This isolates permissions per node pool and avoids all nodes in the cluster sharing the same default RAM role.All regionsUse custom worker RAM roles
ACKBlockVolumeTypes security policyThe ACKBlockVolumeTypes policy is added to the security policy library. Use it to restrict which volume types pods in specified namespaces can use.All regionsACKBlockVolumeTypes
NVIDIA GPU driver 550.90.07NVIDIA GPU driver version 550.90.07 is now supported in ACK clusters.All regionsNVIDIA driver versions supported by ACK
Qwen model inference deployment with LMDeployDeploy Qwen models as inference services using the LMDeploy framework. The documented example uses the Qwen1.5-4B-Chat model on A10 GPUs.All regionsUse LMDeploy to deploy the Qwen model inference service
GPU-sharing inference services with KServeDeploy inference services that share a GPU using KServe to improve GPU utilization. The documented example uses the Qwen1.5-0.5B-Chat model on a V100 GPU.All regionsDeploy inference services that share a GPU
ACK OneEvent-driven CI pipelines based on EventBridgeBuild event-driven CI pipelines by combining EventBridge with distributed Argo Workflows to accelerate application delivery with minimal overhead.All regionsEvent-driven CI pipelines based on EventBridge
Multi-cluster application orchestration through GitOpsOrchestrate multi-cluster applications in the GitOps console using Git repositories as application sources. Supports YAML manifests, Helm charts, and Kustomize for version management, multi-cluster distribution, and continuous deployment (CD).All regionsUse an ApplicationSet to create multiple applications
Elastic node pools with custom images in registered clustersUse custom images pre-installed with required software packages to reduce node startup time and accelerate the path to Ready state.All regionsBuild an elastic node pool with a custom image
Cloud-native AI suiteFUSE mount target auto repairFluid performs periodic polling checks and automatic repairs of FUSE mount targets, improving data access stability for business workloads.All regions
ACK EdgeKubernetes 1.28 supportCreate ACK Edge clusters running Kubernetes 1.28.9-aliyun.1.All regionsRelease notes for ACK Edge of Kubernetes 1.28
Container Storage Interface (CSI) plug-in supportACK Edge clusters support the Container Storage Interface (CSI) plug-in. Storage medium types and limitations vary by node type and integration method.All regionsStorage overview
Cloud-native AI suite supportACK Edge clusters support all cloud-native AI suite features in on-cloud environments. Feature availability and limits in on-premises environments vary by node type and network type.All regionsCloud-native AI suite
Ingress best practices for edge node poolsDeploy Ingress controllers in edge node pools. Note the differences in behavior compared to Ingress controllers deployed in on-cloud node pools.All regionsIngress overview and Use the NGINX Ingress

June 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesKubernetes 1.30 supportCreate new ACK clusters on Kubernetes 1.30 or upgrade existing clusters from earlier versions.All regionsKubernetes 1.30 and Manually update ACK clusters
Node pool OS parameter customizationCustomize Linux OS parameters for node pools to improve OS performance when the defaults don't meet your business requirements.All regionsCustomize the OS parameters of a node pool
Ubuntu 22.04 supportUse Ubuntu 22.04 as the node OS for ACK clusters running Kubernetes 1.30 or later.All regionsOS images
Enhanced deschedulingThe Koordinator Descheduler module in the ack-koordinator component now has enhanced descheduling policies, pod eviction methods, and eviction traffic control to address imbalanced node utilization, overloaded nodes, and changing scheduling requirements.All regionsDescheduling and Enable descheduling
Network Load Balancer (NLB) configuration via Services in the ACK consoleCreate and manage Services in the ACK console to configure Network Load Balancer (NLB) instances. NLB is a Layer 4 load balancing service supporting up to 100 million concurrent connections with auto-scaling.All regionsUse an existing SLB instance to expose an application and Use an automatically created SLB instance to expose an application
New release of csi-provisionerThe updated csi-provisioner includes a managed version that consumes no node resources, TLS-based NAS mounting on Alibaba Cloud Linux 3, and Ubuntu node support.All regionscsi-provisioner
ACK OneEnhanced Fleet monitoringACK One Fleet monitoring now provides global monitoring across all associated clusters. A unified dashboard displays key component metrics, GitOps system metrics, and cost insights data for Fleet instances.All regionsFleet monitoring
Cloud-native AI suiteCloud-native AI suite now free of chargeAll cloud-native AI suite features are now free. Use them to build customized AI production systems on ACK with full-stack optimizations for AI and machine learning (ML) applications.All regions[Free component notice] Cloud-native AI suite is free of charge
ACK EdgeDisk storage for on-cloud node poolsOn-cloud node pools in ACK Edge clusters now use the same Container Storage Interface (CSI) as ACK managed clusters. Mount disks using persistent volumes (PVs) and persistent volume claims (PVCs).All regions
Access to data center workloads via Express Connect circuitsThe API server of an ACK Edge cluster can access pods and Services deployed at the edge using Express Connect circuits. The edge controller manager (ECM) automates routing configuration from VPCs to edge pods.All regionsNetwork management

May 2024

ProductFeatureDescriptionRegionReferences
Container Service for Kubernetescloud-controller-manager v2.9.1cloud-controller-manager v2.9.1 supports cross-VPC NLB instance reuse, NLB server group weights, and mixed ECS-plus-pod server groups. This version also improves NLB IPv6 support.All regionsCloud Controller Manager
Custom routing rules for ALB IngressesCreate custom routing rules for ALB Ingresses using a visual interface. Route requests based on paths, domain names, or request headers, and configure actions to forward to specific Services or return fixed responses.All regionsCustomize the routing rules of an ALB Ingress
NVMe disk multi-attach and reservationMount an NVMe disk to up to 16 instances simultaneously using the NVMe reservation feature. This ensures data consistency for applications such as databases and enables faster failovers.All regionsUse the multi-attach and NVMe reservation features of NVMe disks
ossfs version switching via feature gateIn CSI 1.30.1 and later, enable a feature gate to switch to ossfs 1.91 or later for higher file system performance.All regionsossfs versions and Features of ossfs 1.91 and later and ossfs performance benchmarking
ACK OneCI pipelines for Golang projects in workflow clustersACK One workflow clusters—built on hosted Argo Workflows—provide high elasticity, auto-scaling, and zero O&M overhead. Use them to create CI pipelines for Golang projects at low cost.All regionsCreate CI pipelines for Golang projects in workflow clusters
Cloud-native AI suiteDynamic dataset mount targets with FluidFluid now dynamically mounts and updates dataset mount targets—including the corresponding PVs and PVCs—inside running containers without requiring a pod restart.All regions

April 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesAnomaly diagnostics with ACK AI AssistantACK AI Assistant can now analyze and diagnose failed tasks, error logs, and component update failures in ACK clusters, reducing manual O&M effort.All regionsUse ACK AI Assistant to help troubleshoot issues and find answers to your questions
RRSA authentication for OSS volumesConfigure RAM Roles for Service Accounts (RRSA) authentication on persistent volumes to restrict API access to specific OSS volumes, enabling fine-grained access control and improving cluster security.All regionsUse RRSA authentication to mount a statically provisioned OSS volume
EIPs with Anti-DDoS (Enhanced Edition) for podsACK Extend Network Controller v0.9.0 creates and manages NAT gateways and elastic IP addresses (EIPs), and can bind EIPs with Anti-DDoS (Enhanced Edition) to pods exposed to the internet.All regionsAssociate an exclusive EIP with a pod
New predefined security policiesThree new predefined security policies are added to the policy governance module: ACKServicesDeleteProtection, ACKPVSizeConstraint, and ACKPVCConstraint.All regionsPredefined security policies of ACK
ACK EdgeOffline O&M tool for edge nodesPerform O&M operations—such as business updates and configuration changes—on edge nodes that are offline due to network instability, using the ACK Edge offline O&M tool.All regionsOffline O&M tool for edge nodes
ACK OneMulti-cluster gateway managementMicroservices Engine (MSE) cloud-native gateways serve as multi-cluster gateways via the MSE Ingress controller hosted in ACK One. Manage north-south traffic visually, and implement active zone-redundancy, multi-cluster load balancing, and header-based traffic routing.All regionsManage gateways
OSS access optimization for distributed Argo WorkflowsACK One Argo Workflows now supports multipart upload for large files, artifact auto garbage collection, and streaming artifact transmission for more efficient and secure OSS access.All regionsConfigure artifacts
Cloud-native AI suiteMLflow deployment in ACK clustersDeploy MLflow in ACK clusters with a few clicks to track model training and manage the full ML model lifecycle, including models in MLflow Model Registry.All regionsConfigure MLflow Model Registry and Manage models in MLflow Model Registry

March 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesKubeconfig file management and recycle binView and manage issued kubeconfig files using Alibaba Cloud accounts, RAM users, or RAM roles with the required permissions. Delete or revoke permissions for kubeconfig files that pose security risks, and restore deleted files from the recycle bin within 30 days.All regionsUse the kubeconfig recycle bin, Delete kubeconfig files, and Use ack-ram-tool to revoke the permissions of specified users on ACK clusters
GPU device isolationIn exclusive GPU scheduling scenarios, isolate a faulty GPU device on a node to prevent new workloads from being scheduled to it.All regionsGPU Device Plugin-related operations
Metrics collection for a specific virtual nodeIn clusters with multiple virtual nodes, specify a single virtual node for metrics collection. This reduces the volume of data collected at once and lowers monitoring system load when many containers run on virtual nodes.All regionsCollect the metrics of the specified virtual node

February 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesACK Virtual Node 2.11.0ACK Virtual Node 2.11.0 adds Windows instance support and Windows node scheduling semantics. It also enables the System Operations & Maintenance (SysOM) feature for kernel-level monitoring of ECI-based pods and improves certificate generation speed during pod creation.All regionsACK Virtual Node and Deploy the virtual node controller and use it to create Elastic Container Instance-based pods
ACK OneKnative support for registered clustersRegistered clusters now support Knative, the Kubernetes-based serverless framework. Knative integrates container creation, workload management, and event models to help you build enterprise-grade serverless platforms.All regionsKnative overview
Zone-disaster recovery in hybrid cloud environmentsUse ACK One to implement zone-disaster recovery for Kubernetes clusters running in data centers or third-party public clouds. ACK One manages traffic, applications, and clusters centrally, routes traffic across clusters, and supports millisecond-level failovers with Layer 7 routing via the managed MSE Ingress controller.All regionsUse MSE multi-cluster gateways to implement hybrid disaster recovery in ACK One
OSS object access acceleration with Fluid in registered clustersUse Fluid—an open source, Kubernetes-native distributed dataset orchestrator—to accelerate access to OSS files in registered clusters.All regionsUse Fluid to accelerate access to OSS objects
DingTalk chatbot notifications for GitOps application updatesConfigure a DingTalk chatbot to receive notifications about GitOps application updates in multi-cluster continuous delivery scenarios.All regionsUse a DingTalk chatbot to receive notifications about GitOps application updates
Cloud-native AI suiteRay cluster best practicesCreate a Ray cluster in ACK and integrate it with Simple Log Service, Managed Service for Prometheus, and ApsaraDB for Redis for optimized logging, observability, and availability. The Ray autoscaler works with the ACK cluster autoscaler for efficient compute scaling.All regionsBest practices for Ray clusters

January 2024

ProductFeatureDescriptionRegionReferences
Container Service for KubernetesACK AI AssistantACK AI Assistant is built on a large language model (LLM) developed by the ACK team. It uses ACK team expertise, O&M system observability, and diagnostic experience to help you find answers and diagnose ACK and Kubernetes issues.All regionsUse ACK AI Assistant to help troubleshoot issues and find answers to your questions
OS kernel-level container monitoringInstall the ALB Ingress controller and enable the Xtrace feature to collect tracing data. The Tracing Analysis service then provides trace mapping, request statistics, and trace topology for distributed applications.All regionsUse AlbConfigs to enable Tracing Analysis based on Xtrace
ACK EdgeKubernetes 1.26 supportACK Edge clusters now support Kubernetes 1.26, with improvements to edge node autonomy and edge node access.All regionsRelease notes for ACK Edge of Kubernetes 1.26
Updated cloud-edge communication solutionACK Edge clusters running Kubernetes 1.26 and later support network communication between on-cloud and edge node pools via Raven, which provides two modes: proxy mode for cross-domain HTTP communication between hosts, and tunnel mode for cross-domain container-to-container communication.All regionsCross-region O&M communication component Raven and raven-agent-ds
ACK OneGitOps console access via custom domain nameAccess the ACK One GitOps console through a custom domain name. Create a CNAME record mapping your custom domain to the default GitOps domain name, configure an SSL certificate, and log in with a CloudSSO account at https://<your-domain>.All regionsAccess the GitOps console through a custom domain name
Disaster recovery architectures for Kubernetes clustersDesign disaster recovery architectures combining ACK clusters—including third-party cloud clusters and on-premises clusters—with Alibaba Cloud networking, database, middleware, and observability services to build resilient business systems.All regionsDisaster recovery architectures and solutions based on Kubernetes container clusters