Trace Explorer

更新时间:
复制 MD 格式

Trace Explorer lets you query and analyze the stored full trace data in real time. Combine filters and aggregation dimensions to diagnose performance bottlenecks, identify failing spans, and understand request flows across microservices.

Prerequisites

Before you begin, make sure that you have:

Open Trace Explorer

  1. Log on to the Managed Service for OpenTelemetry console.

  2. In the left-side navigation pane, click Trace Explorer.

  3. In the top navigation bar, select a region.

  4. In the upper-right corner, select a time range.

Filter and query traces

Trace Explorer provides three ways to filter trace data:

  • Quick Filter: Filter by status, duration, application name, span name, or host address. Selected conditions appear in the search bar.

  • Filter panel: Click the search bar to open the drop-down filter panel. Add or modify filter conditions from the panel.

    Filter panel

  • Query syntax: Enter a query statement directly in the search bar. For syntax details, see Usage methods of Trace Explorer.

Note
  • Click the Save icon next to the Aggregation Dimension drop-down list to save your current filter conditions as a view.

  • Click Saved View to access all saved views, or click a specific view to load its filter conditions.

Aggregate trace data

Select an aggregation dimension from the drop-down list to group queried data. Aggregation helps you identify patterns, such as which spans consume the most time or which services produce the most errors.

View the trace list

After you apply filters, the Trace Explorer page displays column charts of call counts and HTTP errors, a time series curve of duration, and a trace list.

Trace list

The following table describes the actions available in the trace list.

Action How to use
View trace details Click a trace ID or click Details in the Actions column to view trace details and topology. See Trace details.
View trace logs Click Logs in the Actions column. See Use the log analysis feature.
Customize columns Click the Settings icon in the upper-right corner.
Add a filter from a value Hover over a span value and click the Filter icon to add that value as a filter condition.
Add filter condition

View the scatter chart

The Scatter plot tab plots each trace as a point, with time on the X axis and duration on the Y axis. Use this view to spot outliers and duration trends at a glance.

  • Hover over a point to view basic trace information.

  • Click a point to open trace details. See Trace details.

Scatter chart

Analyze trace aggregation

Trace Explorer allows you to analyze queried spans based on various dimensions. For traces that consist of a large number of spans, the trace aggregation feature queries up to 5,000 distributed traces, retrieves their spans by trace ID, and aggregates the results. Trace integrity is preserved throughout this process.

Note

When multiple query conditions are specified, aggregation may take longer to complete. Wait for the calculation to finish before reviewing results.

Trace aggregation

Aggregation metrics

Metric Description
spanName Name of the span.
serviceName Name of the application that the span belongs to.
Request count / request ratio Number of requests that call this span, and its percentage of total requests. Formula: requests calling this span / total requests x 100%.
Span count / request multiple Average number of times each request calls this span. Formula: total span count / request count.
Average self-time / proportion Average time spent in the span itself, excluding child spans. Formula: total span time - time in all child spans. For asynchronous calls, self-time equals total span time.
Average duration Average end-to-end duration of the span.
Exception count / exception ratio Number of requests with exceptions and their percentage of total requests. Formula: requests with exceptions / total requests. When the request multiple exceeds 1, a single request may produce multiple exceptions.

Aggregation example

Consider a trace where Span A calls Span B and Span C:

spanName serviceName Request count / request ratio Span count / request multiple Average self-time / proportion Average duration Exception count / exception ratio
A demo 10 / 100.00% 10 / 1.00 5.00 ms / 25.00% 20 ms 2 / 20.00%
- B demo 4 / 40.00% 8 / 2.00 16.00 ms / 100.00% 16 ms 2 / 50.00%
- C demo 1 / 10.00% 1 / 1.00 4.00 ms / 100.00% 4 ms 1 / 100.00%

How to read this data:

  • Request distribution: Span A is called by all 10 requests (100%). Only 4 requests reach Span B (40%), and only 1 reaches Span C (10%). The remaining requests skip Span B and C due to conditional logic or exceptions.

  • Span-per-request distribution: Span A has a request multiple of 1.00, meaning each request calls it once. Span B has a multiple of 2.00: each of the 4 requests that reach Span B calls it twice on average.

  • Self-time distribution: Span A's average self-time is 5.00 ms, which accounts for only 25% of its total 20 ms duration. The remaining 75% is spent in child spans (B and C). Both Span B and Span C show 100% self-time because they have no child spans.

  • Exception distribution: Span A has 2 exceptions across 10 requests (20% exception ratio). Span B also has 2 exceptions, but across only 4 requests (50% exception ratio). Since each request calls Span B twice, the likely distribution is: 2 out of 4 requests encounter an exception on the first call, while the second call succeeds.

Note

To view a specific trace, hover over a blue span name and click the recommended trace ID.

View trace topology

The Full Link Topology tab shows inter-application call relationships for the aggregated traces. Each application node displays the request count, error count, and response time.

Trace topology

Diagnose slow and failed traces

Trace Explorer analyzes slow and failed traces to help you identify root causes. Instead of manually inspecting individual traces, use the analysis features to find common patterns across problematic traces -- for example, whether failures concentrate on a specific host or interface.

Query traces by host, interface, or a combination of conditions. Example: serviceName="arms-demo" AND ip="192.168.1.1".

Slow trace analysis

ARMS selects the 1,000 traces with the longest duration and identifies the five dimensions most correlated with slowness.

Slow trace analysis

Slow trace details

ARMS selects the 1,000 longest traces above the configured threshold, samples 1,000 traces below the threshold, compares the two groups, and surfaces the three characteristics most correlated with slow calls.

Note

Set the threshold based on your business requirements. For example, to analyze traces that take longer than 1 minute, set the threshold to 60,000 milliseconds.

Slow trace details

Failed trace analysis

ARMS randomly samples 1,000 failed traces and identifies the five dimensions most correlated with failures.

Failed trace analysis

Failed trace details

ARMS compares failed traces with normal traces and surfaces the three characteristics most correlated with failures.

Failed trace details

Trace details

Click a trace ID from the trace list or scatter chart to open the trace details view. The trace details view contains four sections: component tags, trace bar chart, trace waterfall, and span details.

Trace details

Component tags

The tags at the top of the trace details view show call types and span counts. Each tag corresponds to a value of the attributes.component.name field. Click a tag to show or hide spans of that type.

Trace bar chart

The horizontal bar chart visualizes the entire trace and the distribution of spans within it.

Element Description
Bars Each bar represents a span. Only spans with a duration greater than 1% of the total trace duration are displayed.
Colors Different colors represent different applications. For example, blue represents the opentelemetry-demo-adservice application.
Black lines Black lines within bars indicate self-time: the span's own processing time, excluding time spent in child spans. For example, if Span A takes 10 ms and its child Span B takes 8 ms, Span A's self-time is 2 ms.
Timeline The timeline at the top shows the full time range of the trace.

Trace focus and filtering

Each row represents a span and shows the parent-child hierarchy. A number before each parent span indicates how many child spans it contains.

Action How to use
Collapse / expand Click the Collapse icon to collapse or expand a span's children.
Focus Select a span and click the Focus icon to display only that span and its downstream spans.
Defocus Click the Defocus icon to restore the full trace view.
Filter Enter a span name, application name, or attribute value in the search box to filter the trace down to matching spans and their path to the entry span. Clear the search box and click the Search icon to remove the filter.
Zoom Click the Zoom in icon to zoom in and hide the bar chart. Click the Zoom out icon to restore the bar chart.

Span details

Select a span to view its details, related metrics, logs, and exceptions.

Tab Description When to use
Additional information Displays the span's attributes, resources, details, and events, grouped by type. For field descriptions, see Trace Explorer parameters. Inspect specific span attributes and event data.
Metrics For Java applications monitored by ARMS: JVM and host metrics. For applications using an open-source agent: RED Method metrics (rate, errors, duration). Correlate span performance with infrastructure metrics.
Logs Displays business logs associated with the trace. If a Simple Log Service (SLS) Logstore is configured for the application, you can go to the Logstore and query the business logs based on the trace ID. Find application-level log entries related to a trace.
Exceptions Lists exception information for the selected span, if any. Identify the root cause of span errors.
Event Config Configure custom interaction events for one or more trace attributes. Use interaction events to query more details about the trace or view related logs and metrics. For configuration steps, see Configure a custom interaction event for a trace. Create shortcuts to related details from trace data.
Span metrics

What to do next

Set up alert rules to get notified when specific errors occur. Automated alerts help your operations team respond before issues affect users. For details, see Application monitoring alert rules.