Troubleshoot slow SQL execution on an ApsaraDB RDS for MySQL instance

更新时间:
复制 MD 格式

The performance of SQL execution on an ApsaraDB RDS for MySQL instance depends on many factors, such as the instance load and SQL execution plans. When SQL performance fluctuates, the number of scanned rows may increase or remain the same. This topic describes the possible causes and solutions for each scenario.

The number of scanned rows increases

Two factors can cause an increase in the number of scanned rows:

Factor 1: The execution plan changes

An SQL execution plan may change for the following reasons:

Reason 1: A high availability (HA) switchover occurs, causing a difference in statistics between the primary and secondary instances.

  • Troubleshooting: Check if an HA switchover occurred before the SQL performance degraded.

    Go to the Service Availability > Primary/Secondary Switchover Log page of the instance in the ApsaraDB RDS console to check the instance switchover logs.

  • Solution: Manually collect statistics on the tables involved in the slow SQL statement, or run an `OPTIMIZE TABLE` statement to reorganize the tables. If the issue persists, increase the number of sampling pages for table statistics and then re-collect the statistics. Alternatively, you can perform a manual HA switchover. For more information, see Manage a primary/secondary switchover.

    Note

    A manual HA switchover does not guarantee that the previous node is used. If the secondary node is rebuilt after the switchover, the new node is used. This can cause differences in statistics, which can affect SQL performance.

Reason 2: Data is migrated from another instance, such as from instance A to instance B using DTS. This can cause differences in table fragmentation and statistics.

  • Troubleshooting: Confirm if the data was recently migrated from another instance.

  • Solution: Manually collect statistics on the tables involved in the slow SQL statement, or run an `OPTIMIZE TABLE` statement to reorganize the tables. If the issue persists, increase the number of sampling pages for table statistics and then re-collect the statistics.

Reason 3: Table statistics are updated manually or automatically, and the new statistics differ significantly from the old ones.

  • Troubleshooting: Check the statistics of the tables involved in the slow SQL statement to determine if they were updated.

  • Solution: Manually collect statistics on the tables involved in the slow SQL statement, or run an `OPTIMIZE TABLE` statement to reorganize the tables. If the issue persists, increase the number of sampling pages for table statistics and then re-collect the statistics.

Reason 4: The minor engine version or major version changes.

  • Troubleshooting: Go to the Task List page in the ApsaraDB RDS console to check the instance management tasks. For more information, see Task Hub.

  • Solution: If a change in the minor or major engine version causes the SQL execution plan to change, manually collect statistics on the tables involved in the slow SQL statement, or run an `OPTIMIZE TABLE` statement to reorganize the tables. If the issue persists, increase the number of sampling pages for table statistics and then re-collect the statistics.

Reason 5: In typical sorting and paging scenarios that use select ... group by or order by limit statements, the optimizer uses prefer_ordering_index.

  • Troubleshooting: Check if the slow SQL statement uses the select ... group by or order by limit format.

  • Solution:

    • For MySQL 5.6 and versions earlier than 5.7.33: Add an empty character after the field in the group by or order by clause. This prevents the optimizer from using this index and improves performance. For example, use group by field+'' or order by field+''.

    • For MySQL 5.7.33 and later, and MySQL 8.0: Set the prefer_ordering_index option in the optimizer_switch variable to off. This disables this optimizer behavior and improves performance. For more information, see Low efficiency of ORDER BY LIMIT execution.

Factor 2: The number of rows that meet the conditions increases, but the execution plan remains the same

  • Cause: This scenario can occur if new data is imported, which increases the number of rows that meet the query conditions. It can also occur if the query conditions change to match more rows.

  • Troubleshooting: Analyze SQL Explorer logs or binary logs. For more information, see SQL Explorer and Audit.

The number of scanned rows remains almost the same

If the number of scanned rows does not change, slow SQL execution may be caused by the following factors:

Reason 1: CPU, IOPS, or I/O bandwidth limitations.

  • Troubleshooting: Go to the Monitoring and Alerts > Standard Monitoring page of the instance in the ApsaraDB RDS console. Check the MySQL CPU Utilization, MySQL IOPS Utilization, and MySQL Read/Write Throughput metrics.

  • Solution: Scale out the instance to increase its resources, or optimize the relevant SQL statements to reduce resource usage. For information about how to scale out an instance, see Change instance configurations.

Reason 2: The semi-synchronous commit mechanism affects write operations.

  • Troubleshooting: Go to the Log Management > Error Log page of the instance in the ApsaraDB RDS console. Check for semi-synchronous timeout logs. The keyword is Timeout waiting for reply of binlog.

  • Solution: Evaluate if you can change the replication mode to asynchronous mode. To make this change, see Query and modify the data replication mode.

Reason 3: The client application retrieves a large amount of data from the server, and the client processes it slowly.

  • Troubleshooting: When the issue recurs, check the client session state to determine if it is Sending to client or Writing to net.

  • Solution: Modify the client application to avoid retrieving a large amount of data at once.