Get alerts for issue events-Network Intelligence Service(NIS)-阿里云帮助中心

The Network Intelligence Service (NIS) event center sends proactive alerts so you can identify risks, view potentially affected resources, and prevent business interruptions.

Use cases

NIS events record and notify you about cloud network resources, including O&M task execution, resource issues, and status changes.

Risk and issue notifications

When an event impairs the availability or performance of an instance—such as performance degradation from usage beyond specifications, service unavailability from ISP link packet loss, or an expiring instance alert—Alibaba Cloud pushes the event to the event center in the NIS console. Respond promptly to prevent business interruptions.
Automated O&M

Each event in the NIS console has a defined status for tracking related system O&M tasks. When an event is generated or its status changes, the system reports it to CloudMonitor, enabling you to build an event-driven automated O&M system.

Limitations

NIS does not support the event feature for instance families that are no longer available for purchase. For more information, see the end-of-sale announcements for each cloud service.

Basics

Event types

Events record information about cloud network resources and are categorized by cause as follows:

Category	Description	Example
issue event	An exceptional event that has already caused business impact and has remained in the In Progress state for seven days.	Packet loss due to excess bandwidth usage Instance shutdown due to overdue payments
risk event	An exceptional event that may cause business impact and has remained in the In Progress state for seven days.	Risk of business impact due to packet loss on a physical link Risk of failure due to sudden spikes or drops in bandwidth usage Risk of instance shutdown due to overdue payments

Event levels

Events are classified into the following levels based on their impact on instance operations:

Critical: Significant impact requiring immediate action to prevent the instance from becoming unavailable.
Warn: Moderate impact. Monitor the event while it persists or handle it at an appropriate time.
Info: No immediate action required.

Note

For more information about event codes, names, descriptions, and recommended actions, see Event summary.

Event summary

The following tables list the events supported by NIS and the recommended actions for each event.

Note

Issue events do not support monitoring for shared-resource CLB instances.

Issue events

Event code	Event name	Event level	CloudMonitor event name	CloudMonitor metric name	Description and impact	Alert rules	Recommended action
Internet-facing instance
problem-internetBandwidthOverlimit	Packet loss due to excess bandwidth usage	Critical	Packet loss due to excess instance bandwidth usage	`net_out.rate_percentage` (outbound bandwidth utilization), `out_ratelimit_drop_speed` (outbound rate-limiting packet drop rate), `net_tx.rate` (outbound bandwidth)	The bandwidth usage of an Internet-facing instance has exceeded its specification, causing packet loss. Internet-facing instances include elastic IP address (EIP) instances, bandwidth plans, and Classic Load Balancer (CLB) instances.	Critical: Bandwidth usage frequently exceeds the limit over the last 10 minutes, causing packet loss.	Upgrade the instance to increase the peak bandwidth.
NAT gateway
problem-nat-sessionOverLimit	Connection drop caused by excess NAT sessions	Critical	Connection drop caused by excess NAT sessions	`EniSessionLimitDropConnection` (interface concurrent connection drop rate), `EniSessionActiveConnection` (interface concurrent connections)	The session count on the NAT gateway exceeds its specification, causing new sessions to fail with a packet loss rate above 100 packets/s.	Critical: The number of concurrent sessions frequently exceeds the limit over the last 10 minutes, and the packet loss rate is greater than 100 packets/s.	Upgrade the specification or use multiple NAT gateway instances. For more information, see Manage NAT Gateway quotas and Internet NAT gateway and Create and manage a VPC NAT Gateway instance.
problem-nat-sessionNewOverLimit	Connection drop caused by excess new NAT sessions	Critical	Connection drop caused by excess new NAT sessions	`EniSessionNewLimitDropConnection` (interface new connection drop rate), `EniSessionNewConnection` (interface new connection rate)	The new session rate on the NAT gateway exceeds its specification, causing new sessions to fail with a packet loss rate above 100 packets/s.	Critical: The number of new sessions frequently exceeds the limit over the last 10 minutes, and the packet loss rate is greater than 100 packets/s.
problem-nat-portAllocationError	Allocation failure of SNAT source ports	Critical	Allocation failure of SNAT source ports	`ErrorPortAllocationRate` (rate of port allocation failures)	Too few EIPs or IP addresses are configured for the NAT gateway, causing source port allocation to fail with a packet loss rate above 10 packets/s. Note You cannot create a subscription for this event.	Critical: Source port allocation frequently fails over the last 10 minutes, and the packet loss rate is greater than 10 packets/s.	Add more EIPs or IP addresses that are associated with the NAT gateway instance. For more information, see Create and manage a VPC NAT Gateway instance.
problem-nat-datapathUnavailable	NAT gateway data path unavailable	Critical	NAT gateway data path unavailable	Not applicable (system availability event)	The NAT gateway data path is unavailable. Availability was 0% in the past 10 minutes, meaning all traffic is affected. This may be due to a platform-level event. Alibaba Cloud engineers are working to resolve the issue.	Critical: The availability of the NAT gateway was 0% in the last 10 minutes.	If you have deployed multiple NAT gateways for high availability, we recommend that you switch to another NAT gateway. For more information, see Deploy multiple NAT gateways to implement high availability. Otherwise, contact Alibaba Cloud engineers to get the latest recovery progress.
problem-nat-datapathDegraded	NAT gateway data path degraded	Critical	NAT gateway data path degraded	Not applicable (system availability event)	The NAT gateway data path is degraded. Availability was below 80% in the past 10 minutes, meaning more than 20% of traffic is affected. This may be due to a platform-level event causing packet drops. Alibaba Cloud engineers are working to resolve the issue.	Critical: The availability of the NAT gateway was less than 80% in the last 10 minutes, causing packet loss.
Classic Load Balancer (CLB)
problem-clb-connectionOverLimit	Dropped new connections due to excess CLB sessions	Critical	Dropped new connections due to excess CLB sessions	`InstanceDropConnection` (dropped connections per second for an instance)	The new or concurrent connection count on a CLB instance exceeds its specification, causing new sessions to fail with a high rate of dropped connections.	Critical: The number of concurrent sessions frequently exceeds the limit over the last 10 minutes, causing packet loss.	Upgrade the instance or switch to a Network Load Balancer (NLB) or Application Load Balancer (ALB) instance. For more information, see Manage CLB quotas. For product details about NLB and ALB, see What is Network Load Balancer (NLB)? and What is Application Load Balancer (ALB)?.
problem-clb-bandwidthOverLimit	Packet loss due to excess CLB bandwidth usage	Critical	Packet loss due to excess CLB bandwidth usage	`InstanceDropTrafficRX` (inbound bits dropped per second for an instance)	The traffic of a CLB instance exceeds its bandwidth specification, causing packet loss.	Critical: Bandwidth usage frequently exceeds the specification over the last 10 minutes, and the drop rate is greater than 100 bps.	Upgrade the instance specification. For more information, see Adjust the specifications of performance-guaranteed instances.
problem-clb-connectionFail	Sharp increase in failed CLB connections	Critical	Sharp increase in failed CLB connections	Not supported by CloudMonitor	The failed connection count on the CLB instance has sharply increased. Possible causes include backend server specification being exceeded, high load, or a service exception.	Critical: The number of failed new connections for the CLB instance has sharply increased over the last 10 minutes. An alert is triggered if all of the following conditions are met: Condition 1: The number of failed connections is greater than 100/s. Condition 2: The number of failed connections increases by 30% compared with the previous 10-minute window. Condition 3: Based on an intelligent baseline learned from historical data, the number of failed connections continuously exceeds the upper baseline limit by more than 30% within a 10-minute period.	Depending on the cause, upgrade the backend server specification, upgrade the CLB specification, or check the backend service status. For more information, see Manage CLB quotas and Diagnose a CLB instance.
NLB
problem-nlb-connectionFail	Sharp increase in failed NLB connections	Critical	Sharp increase in failed NLB connections	Not supported by CloudMonitor	The failed connection count on a virtual IP address (VIP) of the NLB instance has sharply increased for 10 consecutive minutes. Possible causes: Network link jitter. Insufficient backend server performance.	Critical: An alert is triggered if the number of failed connections on the NLB instance meets all of the following conditions: Condition 1: Within a 610-second monitoring window, the number of failed connections exceeds the intelligent forecast baseline by more than 100% for 3 consecutive minutes. Condition 2: Within a 610-second monitoring window, the number of failed connections increases by 50% or more compared with the previous hour for 7 consecutive minutes. Condition 3: Within a 610-second monitoring window, the number of failed connections is 1,000 or more for 8 consecutive minutes.	Check if the backend server resources or service status are normal. For more information, see Diagnose an NLB instance.
problem-nlb-newConnectionSurge	Dropped new NLB connections	Critical	Dropped new NLB connections	`VipDropConnection` (dropped connections per second for a VIP), `VipNewConnection` (new connections per second for a VIP)	Due to a surge in new connections, the VIP of the NLB instance continuously drops new connection requests at millisecond or second intervals.	Critical: An alert is triggered if the number of connections on the NLB instance meets all of the following conditions: Condition 1: Within 10 minutes, there are more than 8 data points where the number of connections dropped by the VIP per second is greater than 0. Condition 2: Within 10 minutes, there are more than 8 data points where the number of new connections established by the VIP per second is less than 200,000.	Distribute traffic across multiple NLB instances or contact your account manager to apply for a quota increase.
problem-nlb-newConnectionOverLimit	Excess new NLB connections	Critical	Excess new NLB connections	`VipDropConnection` (dropped connections per second for a VIP), `VipNewConnection` (new connections per second for a VIP)	The new connection count on the VIP of the NLB instance has exceeded the automatic scaling limit for a single VIP, causing new connection requests to be continuously dropped.	Critical: An alert is triggered if the number of connections on the NLB instance meets all of the following conditions: Condition 1: Within 10 minutes, there are more than 8 data points where the number of connections dropped by the VIP per second is greater than 0. Condition 2: Within 10 minutes, there are more than 8 data points where the number of new connections established by the VIP per second is 200,000 or more.
problem-nlb-concurrentConnectionOverLimit	Excess concurrent NLB connections	Critical	Excess concurrent NLB connections	`VipDropConnection` (dropped connections per second for a VIP), `VipMaxConnection` (maximum concurrent connections for a VIP)	The concurrent connection count on the VIP of the NLB instance has exceeded the automatic scaling limit for a single VIP, causing new connection requests to be continuously dropped.	Critical: An alert is triggered if the number of connections on the NLB instance meets all of the following conditions: Condition 1: Within 10 minutes, there are more than 8 data points where the number of connections dropped by the VIP per second is greater than 0. Condition 2: Within 10 minutes, there are more than 8 data points where the maximum number of concurrent connections on the VIP is greater than 5,000,000.
ALB
problem-alb-intranetBandwidthOverLimit	Packet loss due to excess private bandwidth usage of ALB instances	Critical	Packet loss due to excess private bandwidth usage of ALB instances	Not supported by CloudMonitor	The outbound or inbound bandwidth on the VIP of the ALB instance has reached its limit. Each VIP resolved from an ALB domain name has a bandwidth limit.	Critical: Within 10 minutes, there are more than 8 data points where the traffic dropped by the ALB instance is greater than 100 bps.	Add a CNAME record for the ALB instance. For more information, see Add a CNAME record for an ALB instance.
problem-alb-sessionOverLimit	Dropped new connections due to excess ALB sessions	Critical	Dropped new connections due to excess ALB sessions	`LoadBalancerRejectedConnection` (dropped connections per second for a load balancer instance)	The new or concurrent connection count on the VIP of the ALB instance exceeds the limit, causing new sessions to fail. Each VIP resolved from an ALB domain name has a new connection limit.	Critical: Within 10 minutes, there are more than 8 data points where the number of connections dropped by the ALB instance per second is greater than 0.
problem-alb-qpsOverLimit	503 error code returned because QPS exceeds the limit	Critical	503 error code returned because QPS exceeds the limit	Not supported by CloudMonitor	The queries per second (QPS) on the VIP of the ALB instance has reached the VIP limit. Each VIP resolved from an ALB domain name has a QPS limit.	Critical: Within 10 minutes, there are more than 8 data points where the number of requests dropped per second is greater than 200 qps, and for 10 consecutive minutes, the number of dropped requests per second increases by 30% or more compared with 7 minutes earlier.
Cloud Enterprise Network (CEN)
problem-cen-routeOverLimit	Excess CEN routes	Critical	Excess CEN routes	Not applicable (event-based metric)	The CEN route quota is exceeded, potentially causing network issues.	Critical: The CEN route quota is exceeded, causing network issues.	Upgrade the Transit Router (TR). For more information, see Upgrade a basic transit router to an enterprise transit router.
TR
problem-cen-vpcAttachBandwidthOverLimit	Packet loss due to excess VPC connection bandwidth	Critical	Packet loss due to excess VPC connection bandwidth	Not supported by CloudMonitor	The traffic of a CEN transit router exceeds the bandwidth specification, causing packet loss.	Critical: Within 10 minutes, there are more than 5 data points where the inbound packet loss rate is greater than 0.	Increase the bandwidth limit. For more information, see Manage CEN quotas.
problem-cen-peerAttachBandwidthOverLimit	Packet loss due to excess inter-region connection bandwidth	Critical	Packet loss due to excess inter-region connection bandwidth	`InterRegionRateLimitDropPackets` (outbound rate-limiting packet drop rate for inter-region connections), `InterRegionPeakBandwidthUtilization` (peak outbound bandwidth utilization for inter-region connections)	The traffic of a CEN transit router exceeds the bandwidth specification, causing packet loss.	Critical: An alert is triggered if the actual traffic of the TR instance meets all of the following conditions: Condition 1: Within 10 minutes, there are more than 8 data points where the peak outbound bandwidth utilization is 90% or higher. Condition 2: Within 10 minutes, there are more than 8 data points where the outbound rate-limiting packet drop rate is greater than 100 pps.	Increase the bandwidth limit. For more information, see Manage CEN quotas.

Risk events

Event code	Event name	Event level	CloudMonitor event name	CloudMonitor metric name	Description and impact	Alert rules	Recommended action
Internet-facing instance
risk-internetPacketLoss	Risk of Internet link packet loss	Warn	Risk of Internet link packet loss	Not applicable (Internet link probe event)	Probing has detected a packet loss on the physical link from the Alibaba Cloud {Region} to {Country} - {Area} - {ISP}. Traffic on this link in your account may experience jitter.	Critical: An alert is triggered if either of the following conditions is met: Condition 1: The detected packet loss rate of a regional-level ISP link is greater than 50%. Condition 2: Packet loss is detected on a national-level ISP link, and the average bandwidth of the traffic on this link within your account is 0.05 Mbps or higher in the last 10 minutes. Note Regional-level: A physical link to {Country}-{Area}-{ISP}. National-level: A physical link to {Country}-{ISP}. Warn: The Internet link packet loss rate is less than 50%, and the average bandwidth is greater than 0.5 Mbps in the last 10 minutes.	Check whether the instance bandwidth on this link meets your business requirements (you can refer to the 5-tuple data in traffic analysis). If there is an issue, consider migrating critical services to another region. If not, you can ignore this alert.
risk-internetBandwidthOverlimit	Packet loss risk due to excess bandwidth usage	Warn	Packet loss risk due to excess bandwidth usage	`net_out.rate_percentage` (outbound bandwidth utilization)	Historical data indicates a >90% probability that the instance bandwidth usage will exceed its specification.	Warn: There is a greater than 90% probability that traffic will exceed the specification at a certain time, causing packet loss.	Monitor the usage. If the specification is exceeded, consider upgrading the instance specification.
VPN Gateway
risk-vpn-bpsOverLimit	Risk of excess VPN bandwidth usage	Warn	Risk of excess VPN bandwidth usage	`in_bandwidth_utilization` (inbound bandwidth utilization of the VPN gateway), `out_bandwidth_utilization` (outbound bandwidth utilization of the VPN gateway)	The bandwidth utilization of the VPN instance exceeded 90% three times in the last 10 minutes.	Warn: Within 10 minutes, there are more than 3 data points where the bandwidth utilization is greater than 90%.
risk-vpn-bgpRouteLimit	Risk of excess BGP routes	Warn	Risk of excess BGP routes	Not supported by CloudMonitor	The BGP dynamic route count learned by the VPN instance exceeded 90% of its BGP route quota in the last 10 minutes.	Warn: Within 10 minutes, there is more than 1 data point where the route utilization is greater than 90%.	Monitor BGP route usage. If the quota is nearly full, consider route aggregation on the peer VPN Gateway based on your network plan.
Express Connect
risk-ec-physicalConnectionFail	Express Connect circuit or port failure	Warn	Express Connect circuit or port failure	Not applicable (link status event)	A failure in the ISP physical Express Connect circuit or a device port failure caused a service interruption.	Warn: The inbound traffic rate (from the data center to the VPC) of the VBR instance is monitored at a minute-level granularity. An alert is triggered if all of the following conditions are met: Condition 1: 3 ≤ number of Express Connect port down events < 20. Condition 2: The Express Connect port is down for more than 2 consecutive data points. Condition 3: Not all Express Connect ports are in a down state.	Contact your business manager for assistance.
risk-ec-bgpRouterFail	BGP connection failure	Warn	BGP connection failure	Not applicable (BGP connection status event)	A network connectivity failure on the physical Express Connect circuit or an abnormal BGP configuration caused a BGP connection failure and route loss.	Warn: An alert is triggered if the BGP connection status changes from Connected to any other state.	Contact your business manager for assistance.
risk-ec-inTrafficDroppedToZero	Sharp drop in inbound VBR traffic	Warn	Sharp drop in inbound VBR traffic	`RateInFromIDCToVpc` (inbound traffic rate from data center to VPC)	A failure in the ISP physical Express Connect circuit or a device port failure caused the inbound traffic of the virtual border router (VBR) to drop sharply.	Warn: The inbound traffic rate (from the data center to the VPC) of the VBR instance is monitored at a minute-level granularity. An alert is triggered if all of the following conditions are met: Condition 1: For 3 consecutive minutes, the rate per minute drops by ≥ 99% compared to the average rate of the previous 7 minutes. Condition 2: For 3 consecutive minutes, the absolute value of the rate drop per minute is ≥ 1 Mbps compared to the average rate of the previous 7 minutes. Condition 3: For 3 consecutive minutes, the absolute value of the rate drop per minute is ≥ 0.5 Mbps compared to the average rates of the previous 15, 30, and 60 minutes. Condition 4 (Intelligent baseline alert): An intelligent baseline learns the historical patterns of the VBR instance's inbound traffic rate to predict a stable range for the next cycle. If the rate drops below the predicted lower bound by ≥ 99% for 2 consecutive minutes within a 3-minute period when the cycle begins, it is considered an abnormal drop.	Check whether this is normal business traffic behavior or if a health check switchover has occurred. If your business is impacted, contact your business manager for assistance.
risk-ec-outTrafficDroppedToZero	Sharp drop in outbound VBR traffic	Warn	Sharp drop in outbound VBR traffic	`RateOutFromVpcToIDC` (outbound traffic rate from VPC to data center)	A failure in the ISP physical Express Connect circuit or a device port failure caused the outbound traffic of the VBR to drop sharply.	Warn: The outbound traffic rate (from the VPC to the data center) of the VBR instance is monitored at a minute-level granularity. An alert is triggered if all of the following conditions are met: Condition 1: For 3 consecutive minutes, the rate per minute drops by ≥ 99% compared to the average rate of the previous 7 minutes. Condition 2: For 3 consecutive minutes, the absolute value of the rate drop per minute is ≥ 1 Mbps compared to the average rate of the previous 7 minutes. Condition 3: For 3 consecutive minutes, the absolute value of the rate drop per minute is ≥ 0.5 Mbps compared to the average rates of the previous 15, 30, and 60 minutes. Condition 4 (Intelligent baseline alert): An intelligent baseline learns the historical patterns of the VBR instance's outbound traffic rate to predict a stable range for the next cycle. If the rate drops below the predicted lower bound by ≥ 99% for 2 consecutive minutes within a 3-minute period when the cycle begins, it is considered an abnormal drop.	Check whether this is normal business traffic behavior or if a health check switchover has occurred. If your business is impacted, contact your business manager for assistance.

Related operations

Actions	Description and references
View events	You can view events in the following ways: In the NIS console. For more information, see View NIS issue events. In the CloudMonitor console. For more information, see View system events.
Subscribe to events	Subscribe to events in CloudMonitor to receive notifications about new events and status updates by phone, text message, or email. For more information, see Configure NIS event subscriptions.
Handle events	After viewing an event, resolve the issue based on the provided recommendations. For more information, see Event summary.