Specialized inspection (Lens)

更新时间:
复制 MD 格式

Lens extends the Well-Architected framework to specific industry and technology domains. Select a specialized check model to assess your cloud resources against domain-specific requirements, identify risks, and get targeted optimization recommendations.

Supported lenses

The following lenses are supported:

  • Container Build

    Checks container protection across deployment, monitoring, and operational risks to ensure security and reliability baselines.

  • Machine Learning

    Checks infrastructure architecture for AI model training, including whether core resources (ECS, NAS, OSS) match training requirements.

  • Network Services

    Inspects network resource health, including capacity levels, disaster recovery architecture, and idle resources across multiple network products.

    Note

    Before using the Network Services lens, enable Network Intelligence Service.

  • Data protection

    Checks audit log retention, sensitive data protection, SQL performance anomalies, and disaster recovery and security controls.

    Note

    To use this lens, you must enable Database Autonomy Service (DAS) and Data Transmission Service (DTS).

Supported check items

The following table lists check items for each lens.

Lens

Check item

Description

Container Build

ACK cluster deployed in a single availability zone

Regional ACK clusters provide high availability by distributing nodes across multiple availability zones. A cluster is compliant if its nodes are in three or more availability zones.

Container Build

Cost management suite not enabled for an ACK cluster

The cost management suite provides resource waste detection and cost prediction. An ACK cluster is non-compliant if the cost management suite is not enabled.

Container Build

ACK cluster not using a stable version

If an ACK cluster is not upgraded to the latest version, the evaluation result is Non-compliant.

Container Build

Deletion protection not enabled for an ACK cluster

If an ACK cluster does not have deletion protection enabled, the evaluation result is Non-compliant.

Container Build

Secret at-rest encryption not configured for an ACK cluster

Secret at-rest encryption uses a key from Key Management Service (KMS) to encrypt Kubernetes Secrets, which enhances the security of sensitive information. An ACK Pro cluster is non-compliant if it does not use KMS for Secret at-rest encryption.

Container Build

ack-ram-authenticator not used for RAM authentication

The ack-ram-authenticator component authenticates API server requests through RAM using Kubernetes Webhook Token authentication. In SSO role mapping scenarios, it enables secure auditing when different users assume the same role. An ACK cluster is non-compliant if ack-ram-authenticator is not enabled.

Container Build

Policy governance not used to restrict privileged container configurations

Policy governance helps enterprise security operations teams better apply container security policies. An ACK cluster is non-compliant if no policy management is enabled.

Container Build

RRSA for pod-level permission isolation not implemented

RRSA implements pod-level OpenAPI permission isolation, enabling fine-grained cloud resource access control. An ACK cluster is non-compliant if RRSA is not enabled.

Container Build

API Server audit logging not enabled

API Server audit logs track operations performed by different users, supporting cluster security and O&M. An ACK cluster is non-compliant if API Server audit logging is not enabled.

Container Build

Control plane component logging not enabled

Control plane component logs are collected to Simple Log Service for auditing and troubleshooting. An ACK cluster is non-compliant if control plane component logging is not enabled.

Container Build

Container Intelligence Service (CIS) cluster configuration inspection not enabled

Container Intelligence Service (CIS) discovers potential cluster risks such as resource quota margins and key resource usage, and provides recommended fixes. An ACK cluster is non-compliant if CIS cluster configuration inspection is not enabled.

Container Build

Cluster security configuration inspection not enabled

Configuration inspection scans for security vulnerabilities in workload configurations and generates reports. An ACK cluster is non-compliant if security configuration inspection is not enabled.

Container Build

Container internal operation audit logging not enabled

Container auditing records commands and operations performed by different users inside containers. An ACK cluster is non-compliant if internal operation audit logging is not enabled.

Container Build

ACK cluster not using managed node pools

Managed node pools automate node maintenance including high-risk CVE repairs and fault recovery. An ACK cluster is non-compliant if managed node pools are not used.

Container Build

Auto Scaling not enabled for a node pool

Auto Scaling provisions pay-as-you-go instances on demand to elastically adjust computing resources. An ACK cluster is non-compliant if node pool Auto Scaling is not enabled.

Container Service for Kubernetes (ACK)

ACK managed cluster is a Basic edition

ACK managed clusters are available in Basic and Pro editions. Compared to the Basic edition, the Pro edition provides enhanced reliability, security, and scheduling, making it more suitable for large-scale production workloads. A managed cluster is non-compliant if it is not a Pro edition.

Container Service for Kubernetes (ACK)

Zero backend servers for the CoreDNS service

If an ACK cluster has zero backend servers for CoreDNS, service discovery fails completely. This interrupts intra-cluster communication (such as microservice calls and database access) and prevents applications from resolving addresses by service name, directly affecting service availability and cluster stability. An ACK cluster is non-compliant if it has zero backend servers for the CoreDNS service.

Container Service for Kubernetes (ACK)

Abnormal backend status for the API Server's CLB instance

An abnormal backend status for the API Server's Classic Load Balancer (CLB) instance can interrupt control plane communication and disable cluster management. This prevents clients like kubectl from accessing the API Server. An ACK cluster is non-compliant if the backend status of its API Server's CLB instance is abnormal.

Container Service for Kubernetes (ACK)

Abnormal listener port configuration for the API Server's CLB instance

An abnormal listener port configuration for the CLB instance bound to the API Server will disrupt API service access, preventing clients like kubectl from connecting to the cluster and halting all management operations. An ACK cluster is non-compliant if the listener configuration for the CLB instance bound to the API Server is abnormal.

Container Service for Kubernetes (ACK)

The CLB instance bound to the API Server does not exist

If an ACK cluster's API Server is not bound to a CLB instance, it lacks a traffic entry point. External clients like kubectl cannot access the API Server through load balancing, leading to a complete interruption of cluster management. An ACK cluster is non-compliant if the CLB instance bound to the API Server does not exist.

Container Service for Kubernetes (ACK)

Abnormal status for the CLB instance bound to the API Server

An abnormal status for the CLB instance bound to the API Server will cause API service traffic forwarding to fail, preventing clients like kubectl from establishing stable connections and completely blocking cluster management. An ACK cluster is non-compliant if the status of the CLB instance bound to the API Server is abnormal.

Container Service for Kubernetes (ACK)

Node Kubelet component version is older than the control plane version

If a node's Kubelet version is older than the control plane version, compatibility issues can arise. The control plane (e.g., API Server) may fail to communicate with the outdated Kubelet after a feature or protocol upgrade, leading to abnormal node status, pod scheduling failures, or nodes being marked as unavailable. An ACK cluster is non-compliant if a node's Kubelet version is older than the control plane version.

Container Service for Kubernetes (ACK)

Unavailable node pool scaling configuration

An unavailable node pool scaling configuration prevents the cluster from automatically adjusting its node count. During high-load periods, the inability to scale out can lead to resource exhaustion, pod scheduling failures, or service interruptions. An ACK cluster is non-compliant if a node pool's scaling configuration is unavailable.

Container Service for Kubernetes (ACK)

Unavailable node pool scaling group

An unavailable node pool scaling group disables the cluster's Auto Scaling capabilities. During high-load periods, this can lead to resource depletion, pod scheduling failures, or increased service latency. An ACK cluster is non-compliant if a node pool's scaling group is unavailable.

Container Service for Kubernetes (ACK)

Unavailable node pool security group

An unavailable security group for a node pool will cause network access rules to fail. Communication between cluster components, such as between Kubelet and the API Server or for service discovery between pods, may be interrupted due to blocked ports or missing rules. An ACK cluster is non-compliant if a node pool's security group is unavailable.

Container Service for Kubernetes (ACK)

Unavailable node pool vSwitch

An unavailable vSwitch for a node pool will interrupt network communication between nodes, preventing pods and services from interacting across nodes. This can cause service discovery failures or data transmission stalls. An ACK cluster is non-compliant if a node pool's vSwitch is unavailable.

Container Service for Kubernetes (ACK)

Unavailable APIService

An unavailable APIService will cause extended API functions to fail. Custom resources (such as CRDs) will be unable to communicate with the control plane, leading to management anomalies in components that rely on extended APIs, such as operators and service meshes. An ACK cluster is non-compliant if an APIService is unavailable.

Container Service for Kubernetes (ACK)

Abnormal CoreDNS pods

Abnormal CoreDNS pods in an ACK cluster can lead to unstable DNS resolution. Communication between services that use domain names may time out or fail, causing application call interruptions. An ACK cluster is non-compliant if it has abnormal CoreDNS pods.

Container Service for Kubernetes (ACK)

Abnormal status for an elasticity component

An abnormal status in a cluster's elasticity components can cause auto-scaling and self-healing mechanisms to fail. This can lead to resource bottlenecks, service latency, or interruptions during high-load periods. An ACK cluster is non-compliant if an elasticity component has an abnormal status.

Container Service for Kubernetes (ACK)

Inconsistent billing method between a LoadBalancer Service and its instance

A mismatch between the billing method of a LoadBalancer Service and its actual instance can lead to billing anomalies, such as unexpected pay-as-you-go charges for a subscription resource, or unexpected resource releases. An ACK cluster is non-compliant if such an inconsistency exists.

Container Service for Kubernetes (ACK)

Inconsistent certificate instance ID between a LoadBalancer Service and its instance

A mismatch between the certificate instance ID of a LoadBalancer Service and the actual bound certificate will cause the TLS configuration to fail. This can lead to connection rejections or security warnings for HTTPS services, interrupting user access. An ACK cluster is non-compliant if such an inconsistency exists.

Container Service for Kubernetes (ACK)

Only one CoreDNS replica

Running only a single replica of CoreDNS eliminates high availability. If the pod fails, the DNS service will be completely interrupted, causing DNS resolution failures and blocking communication between applications. An ACK cluster is non-compliant if it has only one CoreDNS replica.

Machine Learning

ECS instances not prohibited from binding public addresses exist

ECS instances should not be directly exposed to the public network. Use NAT Gateway or SLB for public access instead. An ECS instance is non-compliant if it has a public IP address bound.

Machine Learning

Security group inbound rules set to 0.0.0.0/0 and any port exist

Inbound rules allowing all IPs (0.0.0.0/0) on any port are prohibited. A security group is non-compliant if its inbound rules include 0.0.0.0/0 without restricting to specific ports.

Machine Learning

Security groups with high-risk ports (22/3389/...) open to the public network exist

Public access to high-risk ports such as SSH (22) and RDP (3389) is prohibited. A security group is non-compliant if these ports are open to the public network.

Machine Learning

ACK clusters not using stable versions exist

If an ACK cluster is not upgraded to the latest version, the evaluation result is Non-compliant.

Machine Learning

OSS resources not using multi-zone architecture exist

If an OSS bucket does not have zone-redundant storage enabled, the evaluation result is Non-compliant.

Machine Learning

ECS resources without release protection enabled exist

If an ECS instance does not have release protection enabled, the evaluation result is Non-compliant.

Machine Learning

OSS buckets without versioning enabled exist

If an OSS instance does not have versioning enabled, data cannot be recovered when it is overwritten or deleted. If an OSS instance does not have versioning enabled, the evaluation result is Non-compliant.

Machine Learning

NAS file systems without backup plans created exist

Use Cloud Backup to regularly back up all directories and files in your General-purpose NAS file system. A NAS file system is compliant if a backup plan is created.

Machine Learning

ACK clusters without Secret disk encryption configured exist

Secret at-rest encryption uses a key from Key Management Service (KMS) to encrypt Kubernetes Secrets, which enhances the security of sensitive information. An ACK Pro cluster is non-compliant if it does not use KMS for Secret at-rest encryption.

Machine Learning

VPCs without flow logs enabled exist

VPC flow logs record inbound and outbound traffic of ENIs for access control verification, traffic monitoring, and troubleshooting. A VPC is compliant if flow logging is enabled.

Machine Learning

OSS buckets without server-side encryption enabled exist

OSS server-side encryption protects data at rest for high security or compliance requirements. An OSS bucket is compliant if KMS or OSS-managed encryption is enabled.

Machine Learning

VPC custom CIDR blocks without routes configured exist

You can create custom route tables in a VPC, add custom route entries, and then bind the route table to a vSwitch to control its traffic for more flexible network management. A VPC custom CIDR block is compliant if at least one route entry exists for an IP address within that CIDR block in the associated route table.

Machine Learning

ECS instances using images that are not regularly updated and hardened exist

Regularly updated images ensure systems include the latest security patches and perform optimally. An ECS instance is compliant if its image was created within the specified number of days (default: 180).

Machine Learning

OSS buckets without secure access configured in permission policies exist

HTTPS provides higher security than HTTP. An OSS bucket is compliant if its bucket policy allows read and write access over HTTPS and denies HTTP access. Buckets without a bucket policy are Not Applicable.

Machine Learning

NAS file system access points without RAM policies enabled exist

RAM policies for NAS access points grant mount, read, and write permissions to different RAM users or roles, enabling fine-grained permission management. A NAS file system is compliant if RAM policies are enabled for its access points.

Machine Learning

ECS instances without instance RAM roles assigned exist

Instance RAM roles provide STS temporary credentials from within ECS instances, eliminating the need to embed AccessKey pairs. This improves security and enables fine-grained access control. An ECS instance is compliant if a RAM role is assigned.

Machine Learning

Running ECS instances without CloudMonitor agents installed exist

CloudMonitor agents collect OS-level metrics and enable real-time monitoring with alert rules. A running ECS instance is compliant if the CloudMonitor agent is installed and running. Non-running instances are not applicable.

Machine Learning

NAS file systems without encryption configured exist

Server-side encryption protects data at rest in NAS file systems and automatically decrypts data on access. A NAS file system is compliant if server-side encryption is enabled.

Machine Learning

Running ECS instances without Security Center protection enabled exist

Security Center agents provide asset information collection, risk discovery, intrusion detection, and compliance baseline checks to protect ECS instances. A running ECS instance is compliant if a Security Center agent is installed. Non-running instances are not applicable.

Machine Learning

OSS buckets without logging enabled exist

OSS generates hourly access logs with predefined naming conventions and stores them in a specified bucket for analysis. An OSS bucket is compliant if logging is enabled.

Machine Learning

ACK versions that are not maintained are being used

Kubernetes releases minor versions approximately every 4 months. Clusters running outdated versions miss latest features, bug fixes, and security patches. An ACK cluster is compliant if its Kubernetes version is still supported.

Machine Learning

ACK clusters should not have public endpoints configured

Public API server endpoints increase attack surface and may violate compliance requirements. An ACK cluster is compliant if no public endpoint is configured for its API server.

Network Services

Idle EIP resources exist

If an EIP is not bound to a resource instance and has been created for more than 7 days, the evaluation result is Non-compliant.

Network Services

VPN instances not using multi-zone architecture exist

For existing single-tunnel VPN gateway instances, we strongly recommend enabling multi-AZ high availability and configuring dual tunnels for the connection. A VPN instance is non-compliant if it uses a single-tunnel configuration.

Network Services

NLB instances not using multi-zone architecture exist

For Network Load Balancer instances, we strongly recommend configuring multiple zones to meet multi-zone disaster recovery requirements. If a Network Load Balancer instance uses a single zone, the evaluation result is Non-compliant.

Network Services

EIP resources with abnormal running status exist

Check whether EIPs run as expected. If an EIP is in a disabled or inactive state, the evaluation result is Non-compliant.

Network Services

NAT gateways with abnormal processing levels exist

This check inspects the processing level of NAT gateways, including concurrent connections, new connection rate, and traffic throughput, to identify network risks. A NAT gateway is non-compliant if alerts for "NAT session limit exceeded," "NAT new session limit exceeded," or "SNAT source port allocation failure" were triggered during the last inspection period, or if the traffic processing rate is too high.

Network Services

VPN services with abnormal load levels exist

Checks VPN gateway loads, bandwidth usage risks, and BGP route advertisement overage frequency. A VPN instance is non-compliant if SSL connection count is too high, client network segment addresses are insufficient, BGP route count exceeds limits, or bandwidth exceeds limits during the last inspection interval.

Network Services

ALB virtual IPs with abnormal processing levels exist

Checks ALB VIP loads including sessions, connections, QPS, and bandwidth. An ALB instance is non-compliant if session limit, connection failure surge, QPS limit, or bandwidth limit alerts were triggered during the last inspection interval.

Network Services

NLB virtual IPs with abnormal processing levels exist

Checks NLB VIP loads including new and concurrent connections. An NLB instance is non-compliant if failed connection surge, new connection drop, new connection limit exceeded, or concurrent connection limit exceeded alerts were triggered during the last inspection interval.

Network Services

VBR resources with abnormal BGP connection status exist

Check the status of BGP connections created over Express Connect circuits and the frequency of Express Connect circuit failures within an inspection cycle. This helps you monitor the quality of leased lines and identify stability risks at the earliest opportunity. If BGP connection failure was triggered during the most recent inspection interval, the evaluation result is Non-compliant.

Network Services

CLB instances with abnormal processing levels exist

Checks CLB instance loads including sessions, connections, and bandwidth. A CLB instance is non-compliant if bandwidth limit packet loss, session limit connection loss, or connection failure surge alerts were triggered during the last inspection interval.

Network Services

TR route configuration risks exist

The number of routes in the route table of the Basic Edition transit router has reached 80% of the quota limit. When the quota limit is reached, routes can no longer be added to the route table, which may lead to network failures. The Basic Edition TR route quota has reached 80%.

Network Services

VBRs without health checks configured exist

A static route is configured for the VBR to point to on-premises resources, but health check is not configured. If Express Connect circuits fail, automatic switching cannot be performed. If CEN or VBR upstream does not have health checks configured or VBR upstream does not have health checks configured, the evaluation result is Non-compliant.

Network Services

VBRs with missing redundancy exist

This check inspects the integrity of VBR redundancy to identify stability risks. A VBR is non-compliant if redundant connections are not configured for some or all network segments between the VPC and VBR, or between the Transit Router (TR) on Cloud Enterprise Network (CEN) and the VBR.

Network Services

Physical Express Connect circuits with port abnormalities exist

Check the status of Express Connect circuits and the frequency of BGP connection failures within an inspection cycle. This helps you monitor the quality of leased lines and identify stability risks at the earliest opportunity. If Express Connect circuit port or link failure alert was triggered during the most recent inspection interval, the evaluation result is Non-compliant.

Network Services

EIPs with abnormal bandwidth levels exist

Checks EIP bandwidth usage and packet loss frequency. An EIP is non-compliant if bandwidth limit warnings or packet loss alerts were triggered during the last inspection interval.

Network Services

Cross-region bandwidth with abnormal levels exists

Checks CEN inter-region bandwidth usage and packet loss frequency. A cross-region connection is non-compliant if bandwidth exceeded limit packet loss alerts were triggered or traffic scheduling queues exceeded bandwidth limits during the last inspection interval.

Data Protection

SQL audit logging not enabled for high-spec database instances

Audit logs provide a comprehensive record of database operations, which can be used for diagnosing operational failures and meeting regulatory compliance requirements. This is considered a risk if SQL audit logging is not enabled for a high-specification database instance (defined as an instance with ≥ 4 vCPU/8 GiB, or an instance belonging to an account in the finance industry).

Data Protection

Security auditing not enabled for a high-spec database instance

Security auditing detects risks such as data exfiltration, SQL injection, and abnormal access to protect data assets. This is considered a risk if security auditing is not enabled for a high-specification database instance (defined as an instance with ≥ 4 vCPU/8 GiB, or with total SQL logs > 100 GB/day).

Data Protection

Unified security collaboration not enabled for a high-spec database instance

Enabling security collaboration helps prevent non-standard operations during database changes from affecting stability. This check applies to high-specification instances (≥ 4 vCPU/8 GiB).

Data Protection

Cross-region disaster recovery not enabled for a database instance

Databases may become unavailable during a regional outage or failure. To check if disaster recovery is enabled, verify whether Data Transmission Service (DTS) synchronization is configured for the database instance in the DTS console.

Data Protection

Excessive slow SQL queries on a database instance

Analyzing slow SQL queries is an effective method for identifying database performance issues. Slow SQL queries can consume excessive CPU, I/O, or execution time, and can also lock resources needed by other queries, potentially causing service instability. An instance is considered at risk if it has more than 100 slow SQL queries within the last 24 hours. Instances with no account password or with a QPS below 50 are not applicable.

Data Protection

Sensitive data protection not enabled for a high-spec database instance

Enabling sensitive data protection helps implement dynamic security for sensitive data, reducing the risk of data leaks and non-compliance.

Data Protection

Cross-region backup not enabled for a database instance

Databases may face the risk of data loss during a regional outage or failure. Verify the status of the cross-region backup feature on the Backup and Restoration page of the instance details in the database console.

Lens check results

Agentic Cloud Governance Center runs daily checks across all lenses. View results and follow the remediation guidance to address risks.

  1. Log in to the Agentic Cloud Governance Center console.

  2. In the left-side navigation pane, choose Excellent Architecture > Governance Maturity Check.

  3. In the top navigation bar, switch to any lens to view its check results.

    The following example uses the Machine Learning lens.

    The results page displays a summary of total check items, high-risk items, medium-risk items, and recommendations, along with a donut chart. You can switch to the standard view, filter by category (Security, Stability, or All), and add filter tags such as risk status. The main section lists check items with their name, scenario, risk level, affected resources, compliance rate, and actions.

    Note

    Click Re-detect to manually obtain new check data for the Lens.

  4. Click a risk item to view check details and remediation guidance in the detection details panel.