Diagnose slow Java calls with code hotspot

更新时间:
复制 MD 格式

ARMS code hotspot is a monitoring and diagnostic tool that uses continuous profiling to capture snapshots of request thread stacks. This provides an accurate, real-time view of your code's execution to help you pinpoint performance issues.

Use cases

  • Quickly locate problematic code when slow calls occur during high-traffic events, such as sales promotions.

  • Automatically save the execution context when your system encounters a high volume of slow calls.

  • Reconstruct the exact method-level execution path for complex or intermittent slow calls that are difficult to reproduce.

  • When a trace lacks instrumentation for non-framework methods, the code hotspot feature helps you determine the actual execution time of these method calls.

Prerequisites

  • The code hotspot feature requires agent version 3.1.4 or later.

  • The code hotspot feature relies on continuous profiling, which has specific requirements for the operating system kernel and JDK version. For more information, see Limitations. Ensure you use a compatible operating system and JDK.

  • Agent versions earlier than 4.2.1 support only synchronous calls. Data may be lost for asynchronous calls. For example, if you use Spring Cloud Gateway, Undertow, or Lettuce, asynchronous thread switching can cause data collection inaccuracies. Agent version 4.2.1 and later support asynchronous scenarios.

Enable code hotspot

  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Applications.

  2. On the Applications page, select a region at the top and click the name of your application.

    Note

    The icons in the Language column indicate the following:

    Java图标: a Java application connected to Application Monitoring.

    image: a Go application connected to Application Monitoring.

    image: a Python application connected to Application Monitoring.

    -: an application connected to Trace Explorer (OpenTelemetry).

  3. In the left-side navigation pane, click Application Settings and then click the Custom Configuration tab.

  4. In the Continuous profiling section, turn on the Main switch, and then turn on the Code Hotspot switch. Configure the IP addresses of the application instances or the CIDR block of the instance group where you want to enable this feature.

  5. At the bottom of the page, click Save.

    The changes take effect immediately without an application restart.

Analyze code hotspot data

This example parses and iterates over JSON data, and then calls a downstream HTTP API.

public class HotSpotAction extends AbsAction {
  private RestTemplate restTemplate = new RestTemplate();
  // Request entry method.
  @Override
  public void runBusiness() {
    readFile();
    invokeAPI();
  }
  // Make an HTTP call.
  private void invokeAPI() {
    String url = "https://httpbin.org/get";
    String response = restTemplate.getForObject(url, String.class);
  }
   // Read and parse file data.
  private double readFile() {
    InputStreamReader reader = new InputStreamReader(
        ClassLoader.getSystemResourceAsStream("data/xxx.json"));
    LinkedList<Movie> movieList = GSON.fromJson(reader, new TypeToken<LinkedList<Movie>>() {
    }.getType());
    double totalCount = 0;
    for (int i = 0; i < movieList.size(); i++) {
      totalCount += movieList.get(i).rating();
    }
    return totalCount;
  }
}
  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Applications.

  2. On the Applications page, select a region at the top and click the name of your application.

    Note

    The icons in the Language column indicate the following:

    Java图标: a Java application connected to Application Monitoring.

    image: a Go application connected to Application Monitoring.

    image: a Python application connected to Application Monitoring.

    -: an application connected to Managed Service for OpenTelemetry.

  3. In the left-side navigation pane, click Interface Invocation. On the right side of the page, select the target interface and then click the trace query tab.

  4. On the trace query tab, click the target TraceId link.

  5. In the Details column, click the magnifying glass icon and then click the Code Hotspot tab.

    The left panel lists all methods involved in this call and their execution times. The right panel displays a flame graph that visualizes the stack traces for the selected method.

    • The Self column shows the time or resources consumed by a method within its own execution, excluding the time or resources consumed by its child methods. This helps you identify methods that spend a significant amount of time in their own logic.

    • The Total column shows the time or resources consumed by a method and all its child methods. This helps you understand which methods contribute the most to the overall execution time of the call stack.

    To identify specific code hotspots, focus on the Self column or look for wide bars at the bottom of the flame graph. A wide bar indicates a root cause of high latency and often represents a system performance bottleneck, such as the java.lang.Thread.sleep() method.

    Analyze the data as follows:

    1. Sort the Self column in descending order. Click the method with the highest Self value, such as java.util.LinkedList.node(int). The flame graph on the right focuses on the relevant methods.

    2. The focused view shows that java.util.LinkedList.node(int) is the widest bar at the top of the stack in the flame graph.

    3. This method is a library function from the Java Development Kit (JDK), not your application's business logic. To find the source in your code, trace the call stack up from java.util.LinkedList.node(int). It is called by java.util.LinkedList.get(int), which is in turn called by com.alibaba.cloud.pressure.memory.HotSpotAction.readFile(). The com.alibaba.cloud.pressure.memory.HotSpotAction.readFile() method is part of your application. In this example, it consumes 3.75s, which accounts for 69.88% of the total time in the flame graph. This indicates that com.alibaba.cloud.pressure.memory.HotSpotAction.readFile() is a significant performance bottleneck. Analyze this method's logic to identify optimization opportunities.

FAQ

  • Why is the duration shown in the code hotspot less than the total request duration?

    To minimize the performance impact on your application, the code hotspot feature uses an optimized collection mechanism. This can cause the recorded duration to be slightly less than the actual request duration. The deviation is typically within 20 ms. Disregard this minor difference and focus on the methods with the highest relative duration.

  • Are there any limitations on the data collection range for code hotspot?

    • For requests that last longer than 15 minutes, the code hotspot feature provides analysis data for only the first 15 minutes.

    • To reduce system overhead, ARMS does not collect code hotspot data for low-latency requests. These are typically requests that complete in under 500 ms, though the exact threshold is determined dynamically based on system load. As a result, the tab may not display any data for these requests.

Related topics

You can use the continuous profiling feature to troubleshoot high CPU and memory utilization issues. For more information, see the following topics:

For common issues with continuous profiling, see FAQ.