Java GC log analysis: Long GC pause times
This document explains how to use Java garbage collection (GC) log analysis to facilitate an upgrade from Concurrent Mark Sweep (CMS) GC to Garbage-First (G1) GC and resolve any issues encountered during the process.
Customer requirements
A business team member provided the following feedback:
The application frequently experiences long pause times during runtime. This causes health checks to fail and leads to other problems.
The application performs offline data processing. Latency is not a critical requirement. The main goal is to keep most pause times under 1200 ms and avoid extremely long pauses. Application throughput is a higher priority.
The application currently uses CMS GC. The customer wants to upgrade to G1 GC to adopt modern technology and resolve the previously mentioned issues.
GC log analysis
First, examine the severity of the problem when using CMS. After uploading the file, adjust the analysis configuration:
The problem diagnosis shows that many Full GC, Final Remark, and Young GC pause times exceed 1200 ms.
The pause view shows that timeouts are frequent.
The customer then modified the JVM startup parameters to upgrade from CMS GC to G1 GC. After the application ran for a period of time, a new GC log was uploaded for analysis. The problem diagnosis, pause information, and GC phase sections are shown below:
After the upgrade to G1 GC, the application's pause times mostly meet the requirements. However, G1 GC by default attempts to keep application pauses under 200 ms. Because the application's pauses are generally long, G1 controls pause times by making the young generation space very small. This results in more frequent GC, which reduces overall application throughput. The "GC Pause Time Percentage" metric, which reflects application throughput, increased from 2.38% to 3.87%.
To reduce the frequency of Young GC and increase the size of the young generation space, the application was tuned by adding the following JVM parameters:
-XX:MaxGCPauseMillis=400 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=40
After the application ran for a period, this change resolved the frequent GC issue. However, the minimum size of the young generation space was set too high. This setting impaired the auto-tuning capability of G1 GC and caused new Full GC and To-space Exhausted events, which in turn led to long pauses.
The JVM parameters were tuned again as follows:
-XX:G1NewSizePercent=20 -XX:-XX:InitiatingHeapOccupancyPercent=40
After the application ran for a period with the new parameters, no new issues occurred, and the G1 GC upgrade was considered complete. Using the Java GC log analysis tool, the customer was able to resolve the latency problem through JVM parameter tuning while maintaining application throughput. The overall result was satisfactory.