Use MTR to analyze network links

更新时间:
复制 MD 格式

If packet loss or network connectivity issues occur during a ping test, you can use a link testing tool to identify the cause. This topic describes how to use the MTR tool for link tests and analyze the results.

Test process

The typical link test process is shown in the following figure.

image
Note
  • You can visit a website such as IP Address Query - IPLark to obtain the public IP address of your on-premises network.

  • The client refers to the public egress IP address of the local client.

  • The destination server refers to the domain name or public IP address of the destination service.

Tool overview

MTR is a network diagnostic tool that combines the features of ping and traceroute. Unlike traceroute, which performs a link trace only once, mtr continuously probes the nodes on a link and provides statistics. This process prevents node fluctuations from affecting the test results, making the results from mtr more accurate.

On Linux systems, you can use mtr by installing the mtr package. On Windows systems, you can use WinMTR. The following sections describe how to install and use these two tools.

mtr (Linux)

Installation

Alibaba Cloud Linux 3/2

sudo yum install mtr

CentOS 6/7/8

sudo yum install mtr

Ubuntu/Debian

sudo apt install mtr

Usage

Command format

The mtr command uses the following format: hostname refers to the service domain name, and ip refers to the public IP address of the service.

mtr [options] hostname/ip

Parameters

The following table describes common optional parameters. To view more parameter descriptions, you can run the man mtr command.

Optional parameter

Description

-r or -report

Displays the output in report mode.

-p or -split

Lists the results of each link trace separately.

-s or -psize

Specifies the size of ping packets.

-n or -no-dns

Does not resolve IP addresses to domain names.

-a or -address

Sets the IP address from which to send packets.

Note

This parameter is used when the host has multiple IP addresses.

-4

Uses only the IPv4 protocol.

-6

Uses only the IPv6 protocol.

Note

After running the mtr command, the system enters interactive mode by default. In this mode, press the ? or h key to display the help menu. Follow the instructions in the help menu to control the mtr tool or switch display views.

Usage example

Diagnose the network using the IPv4 protocol.

sudo mtr -4 www.aliyun.com

Sample output description

The following sample output is returned after you run the mtr <Destination IP address> command:

image

The following table describes the data items in the list that is returned with the default configurations.

Parameter

Description

Host

The IP address and domain name of the node. Press the n key to switch the display.

Loss%

The packet loss rate of the node.

Snt

The number of packets sent. The default value is 10. Use the -c parameter to specify a different number.

Last

The latency of the last probe.

Avg

The average latency of all probes.

Best

The minimum latency of all probes.

Wrst

The maximum latency of all probes.

StDev

The standard deviation. A higher value indicates that the node is less stable.

WinMTR (Windows)

Installation

WinMTR does not require installation. After you download the package, decompress the package and run the file. The steps are as follows:

  1. Go to the official WinMTR website to download WinMTR.

  2. Decompress the WinMTR package and double-click WinMTR to run it.运行WinMTR

Usage

  1. In the Host field, enter the domain name or IP address of the destination server.

    Important

    The domain name or IP address of the destination server cannot contain spaces.

    You can use other features or set other parameters. The following table describes the features and parameters.

    Feature or parameter

    Description

    Copy Text to clipboard

    Copies the test results to the clipboard in text format.

    Copy HTML to clipboard

    Copies the test results to the clipboard in HTML format.

    Export TEXT

    Exports the test results to a specified file in text format.

    Export HTML

    Exports the test results to a specified file in HTML format.

    Options

    Optional parameters, including the following settings:

    • Interval (sec): The interval (expiration) time for each probe. The default value is 1 second.

    • Ping size (bytes): The size of the packet used for the PING probe. The default value is 64 bytes.

    • Max. hosts in LRU list: The maximum number of hosts supported in the LRU list. The default value is 128.

    • Resolve names: Displays nodes by domain name by performing a reverse lookup of IP addresses.

  2. Click Start to begin the test.

    After the test starts, Start automatically changes to Stop, and WinMTR displays the test results.

  3. After the test runs for a period of time, click Stop to end the test.

Sample output description

The following sample output is returned for a test on a destination server domain name:

测试进行中

The following table describes the data items in the output that is returned with the default configurations:

Parameter

Description

Hostname

The IP address and domain name of the node.

Nr

The node number.

Loss%

The packet loss rate of the node.

Sent

The number of packets sent.

Recv

The number of packets successfully received.

Best

The minimum latency of the node.

Avg

The average latency of the node.

Worst

The maximum latency of the node.

Last

The last latency value of the node.

StDev

The standard deviation. A higher value indicates that the node is less stable.

Result analysis guide

Because the mtr command provides higher accuracy, this topic uses the test results of the mtr command as an example to describe how to analyze link test results. The following descriptions are based on the sample link test results shown in the following figure.

image

Network area description

Typically, the link from the client to the destination server includes the following network areas. For more information about the network areas and suggestions for handling exceptions in each area, see the following descriptions.

  • Client's on-premises network

    This includes the local area network (LAN) and the on-premises network provider's network, such as Area A in the preceding figure. Exceptions in this area can be classified into two types.

    • If an exception occurs on a node in the client's on-premises network, you should troubleshoot the on-premises network.

    • If an exception occurs on a node in the on-premises network provider's network, you should report the issue to the local Internet Service Provider (ISP).

  • Carrier Network

    The ISP network, such as Area B in the preceding figure, may pass through multiple backbone networks. If an exception occurs in this area, you can query the IP address of the abnormal node to identify its ISP. Then, you can report the issue to the ISP directly or through Alibaba Cloud technical support.

  • Destination server's on-premises network

    This is the network of the destination host's provider, such as Area C in the preceding figure. If an exception occurs in this area, you should report the issue to the destination host's network provider.

Note

If link load balancing is enabled for some parts of the intermediate link, the mtr command numbers and probes only the first and last nodes. For intermediate nodes, only the corresponding IP addresses or domain names are displayed.

Metric analysis guide

To analyze network link connectivity or performance, you can perform a comprehensive analysis based on the Loss% (packet loss rate), Avg (average value), StDev (standard deviation), and latency metrics. The following sections describe how to analyze link connectivity based on these metrics.

Loss% (packet loss rate)

If the Loss% of any node is not zero, a problem may exist on that network hop. Packet loss at a node is typically caused by one of the following:

  • The ISP has manually limited the ICMP sending rate of the node for security or performance reasons, which causes packet loss.

  • The node is faulty, which causes packet loss. To determine the cause, you can check the packet loss status of the abnormal node and its subsequent nodes:

    • If no packet loss occurs on subsequent nodes, the packet loss on the abnormal node is typically caused by ISP policy restrictions. You can ignore this packet loss, as shown at the second hop in the preceding figure.

    • If packet loss also occurs on subsequent nodes, the abnormal node has a network exception that is causing the packet loss, as shown at the sixth hop in the preceding figure.

    • If subsequent nodes include nodes with and without packet loss, the abnormal node may have both policy-based rate limiting and a network exception. In this case, if packet loss occurs continuously on the abnormal node and its subsequent nodes, and the packet loss rates are different, use the packet loss rate of the last few hops as the reference. As shown in the preceding figure, packet loss occurs at hops 6, 7, 8, and 9. Therefore, the final packet loss rate is based on the 30.3% at hop 9.

Avg (average value) and StDev (standard deviation)

Due to link jitter or other factors, the Best and Wrst values of a node can vary greatly. The Avg (average value) is the average of all probes since the test began, so it better reflects the node's network quality. A higher StDev (standard deviation) value means the packet latency values at that node are more spread out. Therefore, the standard deviation helps determine whether the Avg value accurately reflects the node's network quality. For example, a large standard deviation indicates uncertain packet latency. Some packets might have low latency (such as 25 ms) while others have high latency (such as 350 ms). The resulting average latency might appear normal. In this case, the Avg value does not accurately reflect the actual network quality.

In summary, the recommended analysis criteria are as follows:

  • If the StDev value is high, you should check the Best and Wrst values of the corresponding node to determine whether an exception exists.

  • If the StDev value is not high, you can use the Avg value to determine whether an exception exists on the corresponding node.

    Note

    There is no specific time range standard for a high or not high StDev value. You need to evaluate it based on the latency values in the other columns of the same node. For example, if the Avg is 30 ms, an StDev of 25 ms is considered a very high drift. However, if the Avg is 325 ms, the same StDev of 25 ms is considered a low drift.

Latency

  • Latency spike

    If latency increases sharply after a certain hop, a network exception likely exists at that node. As shown in the preceding figure, the latency of subsequent nodes increases sharply after the sixth hop. This suggests a network exception at the sixth hop. However, high latency does not always mean an exception exists at that node. In the figure, although latency increases after the sixth hop, the test data still reaches the destination host. Therefore, the high latency might be caused on the return path. You should analyze this together with a reverse link test.

  • Latency increase due to ICMP rate limiting

    ICMP policy rate limiting can also cause a sharp increase in a node's latency, but subsequent nodes typically return to normal. As shown in the preceding figure, the ninth hop has a 30% packet loss rate and a significant latency increase. However, the latency of subsequent nodes immediately returns to normal. Therefore, the latency increase and packet loss at this node are caused by policy rate limiting.

Sample analysis and conclusion

image

Based on the sample link test results and the analysis guide, the following conclusions can be drawn.

  • In the client's on-premises network, packet loss occurs at hops 2, 6, 7, 8, and 9. However, no significant packet loss occurs at subsequent hops 3, 4, 5, 10, 11, and 15. If the corresponding service network requests are normal, the packet loss at hops 2, 6, 7, 8, and 9 is likely caused by ICMP rate limiting.

  • The Wrst value at the fourth hop is high, but the Avg value is not high. This may be caused by a transient network link fluctuation due to network or device performance fluctuations during one of the probes.

  • The average latency of the nodes on the entire link is between 1.8 ms and 17.6 ms, which indicates that the network latency of the entire link is low.

Based on these conclusions, the network link in the example is normal. If network fluctuations occur in your actual service network, you should analyze them together with the reverse link test results.

Note

The approach to analyzing network results is flexible. The preceding section describes common methods for analyzing metrics. In practice, you should make a comprehensive assessment based on your specific situation to draw accurate conclusions. If a one-way network link test does not provide a clear conclusion, you can combine it with a reverse link test for more in-depth analysis and problem identification.

Common link exception scenarios

Note

The common link exception scenarios are described using the example of running the mtr command on a Linux operating system. The actual results may vary depending on your operating system and tool.

Improper network configuration of the destination host

As shown in the following figure, 100% packet loss occurs at the destination address. This may seem like the packets did not arrive. However, it is likely that security policies on the destination server, such as a firewall or iptables, have disabled ICMP. This prevents the destination host from sending any acknowledgements. In this scenario, you should check the security policy configuration of the destination server.

ICMP rate limiting

As shown in the following figure, the packet loss rate at the destination address is high. This may seem like the packets did not arrive. However, it is likely that security policies on the destination server, such as a firewall, iptables, or ISP policies, have disabled ICMP. This prevents the destination host from sending any acknowledgements. In this scenario, you should check the security policy configuration of the destination server or perform a comprehensive analysis with a reverse MTR link test.

image

Loop in the link

As shown in the following figure, packets enter a loop after the fifth hop and cannot reach the destination server. This is typically caused by an abnormal routing configuration on an ISP node, which creates a loop in the link. In this scenario, you should contact the ISP that owns the node to resolve the issue.

Link interruption

As shown in the following figure, no feedback is received after the fourth hop. The Loss%, Last, Avg, Best, and other metrics show no statistics. This is typically caused by an interruption at that node. You can use a reverse link test for further confirmation. In this scenario, you should contact the ISP that owns the node to resolve the issue.

References

For more information about how to diagnose network connectivity in the console, see Diagnose network connectivity.