The traffic playback and stress testing report helps you identify compatibility issues and performance bottlenecks by comparing performance metrics between the source and destination database instances. It offers a comprehensive assessment of your database instance's performance.
Core concepts and limitations
To interpret the report accurately, you must first understand how traffic playback works and its inherent limitations.
|
Key consideration |
Description |
|
Playback dependency |
The traffic playback feature relies heavily on the source instance's audit log. The traffic playback feature derives the content, concurrency, and execution order of the replayed traffic from this log. |
|
Imperfect replication |
Because of the limited time precision of audit logs (typically to the second) and differences in the playback environment (such as network latency and kernel scheduling), traffic playback cannot perfectly reproduce the instantaneous pressure and transaction sequences of an online production environment. As a result, the playback duration may be slightly longer than the selected time range, affecting the evaluation results. |
|
Risk of SQL truncation |
The source instance's audit log has a length limit, which can cause long SQL statements to be truncated. This can lead to the following issues:
|
|
Comprehensive assessment |
Do not draw conclusions based on a single metric. Perform a comprehensive assessment by combining multi-dimensional performance metrics from the report, including CPU utilization, QPS, latency quantiles, and SQL performance. |
Report modules
The report consists of five core modules: Overview, Performance Trend, SQL Distribution, Related SQL, and Parameter Comparison.
Overview
This module provides an executive summary of the report, comparing core playback metrics between the source and destination instances for a quick understanding of the playback results.
The top of the page displays an Important notice: Traffic playback depends on audit logs and cannot 100% reproduce real-world online scenarios. The table below displays a side-by-side comparison of metrics for the source and destination instances.
|
Key metric |
Interpretation and focus |
|
Time Range |
Confirm that the playback time range and instance type are correct. Focus on instance type differences, which are a common root cause of performance gaps. |
|
Instance Type |
|
|
CPU Utilization |
Use the core metrics CPU utilization and QPS to verify that traffic playback ran smoothly and the results are trustworthy. |
|
QPS |
|
|
Execution Time Distribution |
Measure the performance levels of the source and destination instances based on the overall execution time distribution. |
|
SQL Template Performance Statistics |
Evaluate SQL template performance by analyzing scanned rows and execution time to quickly identify compatibility issues and optimizable SQL templates. |
|
Parameter Differences |
Database parameter configurations significantly impact playback results. See the parameter comparison module for details. |
|
Slow SQL Statements |
The number of slow SQL statements on the source and destination instances. |
Performance trend
This module uses time-series charts to show the trend of various database performance metrics for both the source and destination instances during the playback period.
SQL distribution
This module provides a high-level view of the overall SQL execution time distribution and error situations, helping you assess the "SQL health" of the playback results.
|
Chart/List |
Interpretation and focus |
|
Execution Time Distribution |
Shows the percentage of SQL statements in each execution time bucket. Evaluate overall SQL quality and focus on SQL statements with an execution time greater than 1s, as these statements can cause instance instability. |
|
Execution Time by Quantile |
Shows the distribution of execution time by quantile. Use this to assess and monitor long-tail latency and abnormal requests. |
|
Top Failed SQL Templates |
Lists the SQL templates that failed most frequently. Prioritize addressing these failed SQL templates to prevent compatibility risks. |
|
Top Failed SQL Error Codes |
Counts error codes by failure frequency. Use these error codes to identify the cause of anomalies and confirm compatibility issues quickly. |
Note: If the playback time is earlier than the creation time of the audit log index on the source instance, some statistics for the source instance in this module may be missing.
Related SQL
This module provides detailed comparison data for specific SQL templates in the report. It displays the performance comparison data for the top SQL templates during playback and serves as the primary basis for SQL optimization.
-
Core features:
-
Categorical statistics: All SQL templates are classified into three categories: Performance-Improved SQL, Performance-Degraded SQL, and Failed SQL. A count and a downloadable list are provided for each category.
-
Detailed data: The table lists key information for each SQL template, such as the average response time and average number of scanned rows on both the source and destination instances.
-
Failure attribution: For failed SQL statements, the error code is displayed directly to help you quickly locate the reason for the failure.
-
-
Optimization workflow:
-
Start with Failed SQL: Check the error codes to resolve issues such as syntax errors, permission problems, or data inconsistencies.
-
Focus on Performance-Degraded SQL: This is the core of performance optimization. Comparing execution times and the number of scanned rows between the instances can help you quickly locate the problem (for example, a sharp increase in scanned rows due to SQL truncation).
-
Review Performance-Improved SQL: Understand why these SQL templates perform better on the destination instance. Common reasons include a newer optimizer, better parameter settings, or hardware advantages.
-
Parameter Comparison
If you find a performance gap, use this module to check for differences in database parameters.
-
Display method: The report displays all database parameters for both instances side-by-side and highlights any parameters with different values.
-
Common performance-sensitive parameters:
-
Memory-related:
innodb_buffer_pool_size,join_buffer_size -
I/O-related:
innodb_flush_log_at_trx_commit,sync_binlog,innodb_read_io_threads -
Concurrency-related:
innodb_thread_concurrency,thread_handling -
Database version:
version.
-
Analysis and optimization practices
-
Move from macro to micro: Start with Overview and Performance Trend to get a high-level overview. Then use SQL Distribution, Related SQL, and Parameter Comparison to drill down into specific SQL and configuration details.
-
Correlate across modules: Match the time points where you see performance fluctuations in Performance Trend with slow or failed SQL in SQL Distribution and Related SQL to find the root cause.
-
Verify parameter impact: If Parameter Comparison shows differences in key parameters, align the destination instance's parameters with the source instance, then run the traffic playback again to confirm whether the parameter difference is the root cause.
-
Focus on degradation, not absolute values: For example, a SQL template that takes 50 ms on the destination instance, up from 5 ms on the source instance, is a serious regression. Treat this as a high-priority optimization target.