This document demonstrates how to configure observability for reinforcement learning training using the agentic-rl-example project: add OpenTelemetry dependencies, set up tracing, and view traces and metrics on the console.
Overview
Reinforcement learning training uses OpenTelemetry for tracing. By adding a few decorators and wrapper calls to your function code, every LLM call, tool invocation, and scoring detail is automatically recorded and then exported to Application Real-Time Monitoring Service (ARMS) for visualization in the Model Studio console.
This topic walks you through the end-to-end process from initial setup to viewing results in the console, using the agentic-rl-example project, a CalcX calculator use case. It also provides a metric reference dictionary, troubleshooting guidelines, and a workflow for investigating FAILED statuses. For training submission and hyperparameter configuration, see Reinforcement learning training configuration — submission and configuration.
Integrate tracing
This guide demonstrates how to integrate tracing in three steps using the CalcXRolloutProcessor from the agentic-rl-example project.
Step 1: Add dependencies
Add the following OpenTelemetry dependencies to the requirements.txt file in your project's root directory:
opentelemetry-api==1.41.1
opentelemetry-sdk==1.41.1
opentelemetry-exporter-otlp-proto-http==1.41.1
opentelemetry-processor-baggage==0.62b1
loongsuite-util-genai==0.4.0
The dashscope, fastapi, uvicorn, and pyyaml packages are pre-installed in the runtime and do not need to be listed in requirements.txt.
Step 2: Add instrumentation code
The code changes involve five decorators/functions, which automatically nest to form a complete trace:
[ENTRY: ROLLOUT] @observe_processor ← Rollout processor entry point
├── [LLM] trace_client / @observe_llm ← LLM call
│ └── (OpenAI / LangChain / DashScope API)
├── [TOOL] trace_tool / @observe_tool ← Tool call
│ ├── tool: calculator (MCP)
│ └── tool: response_scorer (custom)
└── [custom] rollout_metrics ← Rollout custom metrics
[ENTRY: REWARD] @observe_processor ← Reward processor entry point
├── [LLM] trace_client / @observe_llm ← LLM call (for scoring)
├── [TOOL] trace_tool / @observe_tool ← Tool call
└── [custom] reward_metrics ← Reward custom metrics
@observe_processor: Trace the processor entry point
Apply this decorator to the process() method to create the top-level ENTRY span. The SDK automatically identifies the type based on the parent class. Inheriting from AbstractRolloutProcessor creates a ROLLOUT-type span, and inheriting from AbstractRewardProcessor creates a REWARD-type span. The decorator automatically records the input, output, duration, and success/failure status of each call.
For Rollout (functions/rollout/rollout.py):
from dashscope.finetune.reinforcement.component.observability import (
observe_processor
)
class CalcXRolloutProcessor(AbstractRolloutProcessor):
@observe_processor # Span type = ROLLOUT
async def process(self, input: RolloutInput) -> RolloutOutput:
await self._async_setup()
return await self._async_process(input)
For Reward (functions/reward/reward.py):
class DemoRewardProcessor(AbstractRewardProcessor):
@observe_processor # Span type = REWARD
async def process(self, input: RewardInput) -> RewardOutput:
score = await evaluate(content, input.ground_truth)
return RewardOutput(
reward=Reward(
reward_score=score,
reward_metrics={"test1": 0.5, "test2": 0.3}
),
status=TaskStatus.SUCCESS,
)
trace_client(): Trace an LLM client
Call this function in your initialization method to wrap an LLM client instance. All subsequent LLM requests from this client will automatically generate an LLM span that records the model name, request content, token usage, latency, and more.
Supported client types (auto-detected via duck typing):
-
OpenAI clients (
AsyncOpenAI/OpenAI) -
OpenAI completions resources (
.chat.completions) -
LangChain classes like ChatOpenAI (via
.client/.async_client) -
DashScope Generation class (pass the class itself, not an instance)
Example (functions/rollout/rollout.py):
from langchain_openai import ChatOpenAI
from dashscope.finetune.reinforcement.component.observability import (
trace_client
)
class CalcXRolloutProcessor(AbstractRolloutProcessor):
def _build_llm(self, input: RolloutInput) -> ChatOpenAI:
llm = ChatOpenAI(
model=input.model_resource.model_name,
openai_api_key=api_key,
openai_api_base=input.model_resource.base_url,
...
)
trace_client(llm) # Wrap the client to automatically trace all LLM calls
return llm
trace_tool(): Trace tool calls
After retrieving tool instances, call this function to wrap them. Each subsequent tool call will generate a TOOL span, recording the tool name, parameters, return value, and latency.
Supported input formats:
-
A single LangChain BaseTool
-
A list, tuple, or dictionary (iterated over automatically)
-
A LangGraph ToolNode (automatically expands
.tools_by_name) -
MCP tools (auto-detected, with provider set to "mcp")
Example (functions/rollout/rollout.py):
from dashscope.finetune.reinforcement.component.observability import (
trace_tool
)
class CalcXRolloutProcessor(AbstractRolloutProcessor):
async def _init_resources_async(self):
client = MultiServerMCPClient({
"calculator": {"url": "http://localhost:10086/sse"}
})
tools = await client.get_tools()
trace_tool(tools) # Must be called after get_tools()
Special consideration for MCP: The MCP server and client run in different processes. The @observe_tool decorator on the server has no effect on the client. You must call trace_tool(tools) on the client with the list of tools returned by get_tools().
@observe_llm: Custom LLM function
If trace_client() cannot automatically detect your LLM client, use this decorator to manually mark a function as an LLM call.
Signature requirement: The function must include the model and messages keyword arguments after a *.
from dashscope.finetune.reinforcement.component.observability import (
observe_llm
)
@observe_llm # Mark as an LLM span
async def call_custom_llm(*, model: str, messages: list, **kwargs):
# Custom LLM call logic
...
@observe_tool: Custom tool function
If trace_tool() cannot automatically detect your tool (for example, a standard Python function), use this decorator to mark it manually. You can customize the span name using the name parameter.
from dashscope.finetune.reinforcement.component.observability import (
observe_tool
)
@observe_tool(name="response_scorer") # Mark as a TOOL span
def score_response(*, messages: list) -> float:
# Custom scoring logic
...
Step 3: Submit the job
When submitting a job, you can leave the env field in the runtime empty. Tracing is enabled by default.
from dashscope.finetune.agentic_rl import AgenticRL
from dashscope.finetune.reinforcement import (
RolloutFunctionComponent, RewardFunctionComponent,
FunctionComponentModel, FunctionComponentRuntime
)
client = AgenticRL()
rollout_runtime = FunctionComponentRuntime(
cpu=2, memory_size=4096, disk_size=512,
concurrency=30, capacity=30,
min_capacity=30, max_capacity=60,
env={} # Leave empty to enable tracing by default
)
reward_runtime = FunctionComponentRuntime(
cpu=2, memory_size=4096, disk_size=512,
concurrency=30, capacity=30,
min_capacity=30, max_capacity=60,
env={}
)
result = await client.run(
model="qwen3.5-9b",
functions=[
RolloutFunctionComponent(
name="rollout-1",
fcmodel=FunctionComponentModel(
classpath="functions.rollout.rollout.CalcXRolloutProcessor"),
runtime=rollout_runtime,
),
RewardFunctionComponent(
name="reward-1",
weight=1.0,
fcmodel=FunctionComponentModel(
classpath="functions.reward.reward.DemoRewardProcessor"),
runtime=reward_runtime,
),
],
...
)
To disable tracing (and save costs), set {"ENABLE_TRAJECTORY": "false"} in the env field. Disabling tracing stops the collection of trace data but does not affect system metrics such as Actor, Critic, and Perf.
Tracing: Performance and cost trade-offs
Data flow and cost drivers
-
Data flow: Function code (core decorators) → OpenTelemetry SDK → ARMS → Trace/Metrics tabs in the console
-
Cost drivers: ARMS span storage (pay-as-you-go), minor CPU and network overhead on the function side, and a slight increase in training latency.
Development vs. large-scale training strategy
|
Stage |
Tracing status |
Data collected |
Data not collected |
Cost impact |
|
Development / Small-batch debugging |
Fully Enabled |
All data (trajectories, reward analysis, tool calls, system metrics) |
— |
Low |
|
Canary release / Pre-release validation |
Enabled |
All data |
— |
Medium |
|
Large-scale production training (e.g., 9B model, tens of thousands of samples, multiple epochs) |
Recommended to disable |
System metrics: actor, critic, trajectory, perf, timing |
Trajectory replay, tool call details, and custom metric curves under |
Significantly reduced |
To disable tracing, set runtime.env = {"ENABLE_TRAJECTORY": "false"}.
Cost management for custom metrics
-
The number and cardinality of metrics in
reward_metricsandrollout_metricsaffect ARMS storage costs. -
Avoid using high-cardinality fields, such as user_id or request_id, as metric keys.
-
Limit key metrics to 10 or fewer, and remove temporary debugging metrics.
-
Sub-dimensional metrics from
@sub_reward_funcare aggregated into thereward_metricsof the corresponding Reward function. This aggregation helps keep the data volume manageable.
Custom metrics
Key-value metrics you define in your code using the following entry points automatically appear on the Metrics tab under the trace/ group and on the Reward Analysis page in the console:
|
Entry point |
Code location |
Console path |
|
reward_metrics |
|
|
|
rollout_metrics |
|
|
|
sub-dimension score |
The |
Merged into the |
Multiple reward functions: Set a unique name for each reward function using RewardFunctionComponent(name="reward-1"). This name distinguishes them in the metric path (e.g., trace/reward_metrics/reward-1/...). You can also use reward_metric_weight to set the weight of each sub-metric in the overall score.
Console overview
After the training job starts, go to the model fine-tuning page in the Model Studio console and click the job name to open its details. This section maps the actions in your code to the data displayed in the console.
The job details page has the following five tabs. We recommend using them in this order: first check the progress, then analyze the behavior, and finally drill down if you encounter issues.
|
Tab |
Purpose |
Use case |
|
Details |
Job progress and status |
Always check this tab first |
|
Trajectory |
What the model did and why it received its score |
Verify model behavior and diagnose low-scoring samples |
|
Metrics |
Quantitative training trends |
Analyze curves to determine convergence and identify inflection points |
|
Outputs |
checkpoint list and publishing |
Select a model after training is complete |
|
Logs |
stdout, stderr, and error stack traces |
Troubleshoot FAILED jobs |
Details tab
View basic job information, such as the job ID, training model, training method (RL full-parameter training), and data configuration. Pay close attention to the job status:
-
PENDING: The job is queued and waiting for resource allocation.
-
RUNNING: Training is in progress. You can switch to other tabs to view real-time data.
-
SUCCESS: Training is complete. Switch to the Outputs tab to publish the model.
-
FAILED: Training failed. Check the Logs tab or retrieve logs using the SDK or CLI:
AgenticRL.logs(job_id="ft-xxx")ordashscope rl logs "ft-xxx".
Trajectory Details — maps to @observe_processor
On the Trajectory tab, open the Trajectory Details subpage to view the complete interaction process for each rollout:
-
Trajectory List: Displays all sampled trajectories. You can filter them by sample ID, trajectory ID, epoch, or step.
-
Conversation Process: Shows the complete multi-turn interaction (user → assistant → tool_call → tool_result → assistant), clearly showing the model's reasoning chain.
-
Reward Score: Displays the reward score and status (SUCCESS/FAILED) for each step.
Key questions to consider: Is the tool calling correct? Is the number of conversation turns reasonable? Is the model repeating ineffective actions?
Tool Call Analysis — maps to trace_tool / @observe_tool
On the Trajectory tab, open the Tool Call Analysis subpage to view:
-
Tool Call Records: Tool name, call parameters, result, and latency.
-
Tracing subtab: The span tree for each trajectory. You can expand it to see full details of each tool call and LLM request.
Typical use case: Troubleshoot an agent's tool calling failures. Which tool returned an error? Were the parameters passed correctly? Is the latency too high?
Reward Analysis — maps to reward_metrics
Key concepts:
-
sample: An original sample from the training data, such as a question, an instruction, or a prompt.
-
trajectory: A specific interaction trajectory generated from a single sample over
n_rolloutssampling attempts. -
Relationship: One sample generates N trajectories, where N equals
n_rollouts.
On the Trajectory tab, open the Reward Analysis subpage to evaluate training performance from three perspectives:
-
Step dimension: Select a training step to view the aggregated reward metrics (average score, success rate, and trend chart) for all samples at that step. This helps you assess the overall training trend.
-
Sample dimension: Select a sample ID to compare the reward scores for the same sample across trajectories. This helps you identify problematic samples.
-
Trajectory dimension: View the raw scores for each scoring dimension of a single trajectory. Use this for attribution analysis.
Metrics tab — maps to rollout_metrics / reward_metrics
On the Metrics tab, under the trace/ group, you can view aggregated curves (avg / sum) for all the custom metrics you defined in your code.
For a complete list of all 13 groups and 121 metrics, see §Training Metrics Reference. To learn how to diagnose anomalies, see §Troubleshooting Decision Tree (P1–P9).
Outputs tab
After training is complete, the Outputs tab displays a list of checkpoints. Each row includes a checkpoint ID, publishing status, and remaining retention time.
-
Select the target checkpoint and click the Publish button.
-
Wait for the publishing process to complete (the status changes from "To Be Published" to "Published").
-
After the model is published, you can call it via API by its model name.
Training completion does not mean training success. You must check the validation/data/reward/mean@1 metric to select the best checkpoint, which is not always the last one. For information on checkpoint retention policies, resuming training, and the SFT→DPO→RL transition path, see §Resume Training, Checkpoints, and Progressive Training in "Reinforcement Learning Training Configuration — Submission and Configuration."
Logs tab
On the Logs tab, you can view the training run logs. You can also retrieve them using the SDK or CLI: AgenticRL.logs(job_id="ft-xxx", lines=100) or dashscope rl logs "ft-xxx" --lines 100.
For instructions on troubleshooting a FAILED job, see §Troubleshooting workflow for FAILED training jobs in the FAQ section.
Training metrics
Key metrics
The following metrics are organized by monitoring dimension to help you quickly assess the health of your training job:
|
Monitoring dimension |
Key metric |
Description |
Health criteria |
|
Task performance and generalization |
|
The North Star metric. It measures the mean reward for the current training batch, showing whether the model is learning an effective policy. |
Healthy: Increases steadily until convergence. |
|
|
Measures the quality of the model's first response to new prompts, reflecting its generalization ability. |
Healthy: Increases steadily, consistent with the training reward trend. |
|
|
Training stability |
|
Measures the uncertainty in the policy's output, which is crucial for balancing exploration and exploitation. |
Healthy: High at the beginning of training, then gradually decreases while remaining at a non-zero level. |
|
|
Measures how much the current policy has diverged from the initial policy. |
Healthy: Stays within the range of 0.01 to 0.05. |
|
|
|
The fraction of updates clipped by importance sampling, indicating the rate of policy drift. |
Healthy: Typically remains below 1%. |
|
|
System efficiency and boundaries |
|
The mean token count for non-truncated responses. |
Healthy: Aligns with your task requirements. |
|
|
The fraction of prompts or responses forcibly truncated for exceeding the maximum length. |
Healthy: Extremely low, close to 0%. |
|
|
|
The total time for a complete reinforcement learning (RL) step, which includes generation, evaluation, and update. |
Healthy: Remains relatively stable. |
Troubleshooting framework
Interpreting metrics in layers
-
Essentials (8-10 metrics for daily health checks): The north star metric, three core metrics, truncation rate, and single-step latency (see the Key Metric Quick Reference Table for health criteria).
-
Scenario-specific (by use case): For agent scenarios, monitor
trace/num_llm_callsandtrajectory/num_turns. For long-text scenarios, review the length-related metrics. Troubleshooting drill-down: timing sub-items / fully_async queue / critic distribution max/min
To learn when to use each of the five tabs in the console, see the tab quick reference table in the "View Results in the Console" section above.
Troubleshooting decision tree (P1–P9)
When you observe an abnormal curve on the Metrics tab, use this section to pinpoint the problem and tune parameters as recommended (see "Reinforcement Learning Training Configuration — Submission and Configuration" § Parameter tuning decision table).
P1: Non-convergence and P2: oscillation
-
P1 primary signal: A long plateau in
critic/rewards/mean;actor/ppo_kl ≈ 0; andactor/grad_normis extremely small. -
P1 root cause: The learning rate is too small, the reward function consistently returns the same value, or there is insufficient data.
-
P2 primary signal: Major oscillation in
rewards/mean, a spike ingrad_norm, andpg_clipfrac> 5%. -
P2 root cause: The learning rate is too high, the batch size is too small, or the advantage variance is not normalized.
P3: KL explosion and P4: sudden entropy drop (mode collapse)
-
P3 primary signal:
actor/ppo_klis consistently > 0.1,pg_clipfracincreases concurrently, and the trajectory shows garbled or repetitive output. -
P3 recommended action: Increase
kl_loss_coef(e.g., from 0.001 to 0.01), decrease the learning rate, and decreaseppo_mini_batch_size. -
P4 primary signal:
actor/entropydrops to nearly 0 within a few dozen steps,rewards/meanshows a false increase, and sample outputs are repetitive and template-like. -
P4 root cause: A single mode is excessively rewarded, the KL constraint is too weak, or the temperature is too low.
P5: Reward hacking and P6: validation set collapse
-
P5 primary signal: The training reward increases while the validation reward stagnates or decreases;
response_length_non_aborted/meanincreases linearly and rapidly; and a specific sub-dimension ofreward_metricsshows exclusive gains. -
P5 attribution path: Analyze the abnormal metric, then sample the trajectory. Check if the assistant is padding its output, copying the prompt, or using tools to exploit loopholes in the reward function (for details, see "Reinforcement Learning Development Guide" § Identifying and preventing reward hacking).
-
P6 primary signal:
validation/data/reward/mean@1shows a downward inflection point while the training reward continues to increase. -
P6 recommended action: Apply early stopping (select an earlier checkpoint from the Outputs tab), increase the KL constraint, and expand the validation set.
P7: Slow training, P8: garbled output, and P9: high truncation rate
-
P7 primary signal: A sudden increase in
timing/s/step. -
P7 drill-down: Investigate
rollout/agent_loop_latencyandupdate_actor. Possible root causes include an increase in response length, delays in FC auto-scaling (check thefully_asyncqueue), or high tool latency. -
P8 primary signal: The trajectory contains garbled or non-natural language output. This is often caused by a KL explosion (P3). Address this issue using the P3 solution.
-
P9 primary signal: A high value for
trajectory/response_length/clip_ratioorprompt_length/clip_ratio. -
P9 recommended action: Increase
max_lengthor shorten the prompt. Also, check if the reward function inadvertently encourages longer outputs.
Three-dimensional console attribution
Using Step, Sample, and Trajectory
Standard workflow: Spot an inflection point on the Metrics tab → Use Step to pinpoint the failing training segment (trend inflection) → Use Sample to find which samples are skewing the mean → Use Trajectory to examine the reasoning chain or tool calling details → Return to the code to modify the reward function or data.
Tool calling analysis and tracing span tree
-
Tool calling analysis: Tool name, parameters, return value, and duration—essential for troubleshooting agent tool failures.
-
The Tracing subtab provides a span tree for each trajectory (ENTRY:ROLLOUT → LLM → TOOL → REWARD).
-
Typical use case: If the reward suddenly drops, check if a tool has failed or the MCP server is unavailable.
-
Note: The server-side
@observe_tooldecorator does not affect the client. You must usetrace_tool(tools)on the client side.
Custom metrics for attribution (rollout_metrics and reward_metrics)
-
System metrics tell you that an error occurred, but custom metrics pinpoint which dimension is failing.
-
Recommended naming convention: Name your metrics according to business sub-dimensions, such as accuracy, format_score, tool_success_rate, or answer_length.
-
Handling multiple reward functions: Distinguish them using
RewardFunctionComponent(name=...)and adjust their weights usingreward_metric_weight. -
Anti-pattern: Avoid stuffing log or debugging information into metrics, as this can cause a cardinality explosion.
Troubleshooting with logs, metrics, traces, and tracing
Each of the four observation sources answers a different question: logs show when a failure occurred and provide stack traces; metrics indicate whether an issue is occurring and its severity; traces reveal why an issue occurred and pinpoint the specific sample; and tracing identifies slow code segments and failing external dependencies. Select an observation source based on the issue type:
|
Issue type |
Primary source |
Secondary source |
Not needed |
|
Training divergence |
metric |
trace |
log |
|
Task failure |
log |
metric |
trace |
|
Stagnant reward |
metric → trace |
|
log |
|
Tool call error |
tracing |
log |
— |
|
Training slowdown |
timing metrics |
tracing |
— |
|
Poor output quality |
trace |
|
— |
When to look beyond the console
-
If you suspect an issue with the reward implementation, reproduce it locally using
test_functions. -
If you suspect data issues, sample the JSONL file and inspect
rollout_extra. -
If you suspect incorrect hyperparameters, review the submission script and check it against the "Reinforcement Learning Training Configuration — Submission and Configuration" guide.
Metric groups
Metrics generated during training are grouped by prefix. See the collapsible panels below for a detailed description of the metrics in each group.
|
Group prefix |
Metric count |
Type |
Description |
|
actor/ |
8 |
System |
PPO policy network metrics: loss, entropy, KL divergence, clip ratio, gradient norm, and learning rate |
|
critic/ |
12 |
System |
Reward and value metrics: mean, max, and min of score, rewards, advantages, and returns |
|
trajectory/ |
16 |
System |
Trajectory statistics: response length, prompt length, truncation rate, abort rate, and conversation turns |
|
trace/ |
40+ |
Hybrid |
Observability metrics: epoch, LLM calls, success rate, and custom reward_metrics and rollout_metrics |
|
timing/ |
17 |
System |
Timing analysis: duration of each Trainer phase (in seconds), rollout duration, and per-token processing time (in milliseconds) |
|
perf/ |
3 |
System |
Performance overview: total tokens, time per step, and throughput per GPU |
|
fully_async/ |
24 |
System |
Asynchronous training scheduling: queue status, parameter versions, staleness statistics, and processing latency distribution |
Metric groups
Click to expand the complete list of metrics for each group:
Custom metrics
See the custom metrics section for a complete definition.
FAQ
Troubleshooting quick reference
|
Issue keyword |
Primary signal |
Recommended action |
See also |
|
Stagnant reward |
|
Check the learning rate and reward function |
P1 in this topic |
|
KL explosion / Garbled output |
|
Increase |
P3 / P8 in this topic |
|
Reward hacking |
Training reward increases while validation reward decreases |
Switch signals / Apply reward shaping |
P5 in this topic + "Reinforcement Learning Development Guide" |
|
High truncation rate |
High |
Increase |
P9 in this topic |
|
FAILED · Function registration failed |
Incorrect classpath / Missing dependency |
Check |
"Reinforcement Learning Training Configuration — Submission and Configuration" |
|
FAILED · Rollout timeout |
Individual request timeout |
Increase the |
See "Troubleshooting FAILED jobs" below |
|
FAILED · Insufficient resources |
MTU or FC container issues |
Check the |
"Reinforcement Learning Training Configuration — Submission and Configuration" |
|
Cannot see trace data |
Console is blank |
Check ARMS authorization / |
See "Troubleshooting missing trace data" below |
Troubleshooting FAILED jobs
This section covers troubleshooting FAILED jobs caused by metric abnormalities during training. For infrastructure-related failures, like FC function registration failures, upload timeouts, or insufficient resources, refer to the FAQ section in "Reinforcement Learning Training Configuration — Submission and Configuration".
Standard troubleshooting workflow
-
Step 1: Details tab Confirm the job status and the time of failure.
-
Step 2: Logs tab Inspect the last 100-500 lines of the log (SDK:
AgenticRL.logs(job_id, lines=100)/ CLI:dashscope rl logs --lines 100). -
Step 3: Identify the error layer:
-
User function error (exceptions in
rollout/rewardfunctions) → Reproduce locally withtest_functions→ Fix the code → Re-register and run. -
Framework error (OOM, insufficient resources, network issues) → Check the
fully_asyncqueue and adjustconcurrency/capacity. -
Data error (JSONL parsing failure) → Validate the format of individual records and the
rollout_extrafield.
-
Common error patterns
|
Error pattern |
Primary symptom |
Recommended action |
|
Rollout timeout |
|
Increase the timeout value and use tracing to identify the bottleneck. |
|
Intermittent reward FAILED |
Empty messages are unhandled or there is an encoding error. |
Add a |
|
Insufficient resources |
Insufficient MTU or FC auto-scaling cannot keep up. |
Check the fully_async queue and adjust |
|
Incorrect data format |
JSONL parsing fails. |
Validate line-by-line JSON format, the roles in |
Viewing trace data
After authorizing ARMS in the Model Studio console:
-
Trajectory tab → View Trajectory Details, Reward Analysis, and Tool Call Analysis.
-
Metrics tab → Check the trace/ group.
Troubleshooting missing trace data
Follow these steps to troubleshoot the issue:
-
Ensure that
ENABLE_TRAJECTORY=falseis not set in the runtimeenv. (Tracing is enabled by default and requires no extra configuration.) -
Ensure you have authorized the ARMS service in the Model Studio console.
-
Check that
requirements.txtincludes the required OpenTelemetry dependencies. -
Check whether the
process()method is decorated with@observe_processor.
Performance impact of tracing
Tracing adds minor storage and latency overhead. We recommend enabling it during development and debugging. For large-scale production training, if you only need to view training metrics (such as Actor, Critic, and Perf) and do not need trace details, you can disable tracing by setting {"ENABLE_TRAJECTORY": "false"} in the runtime env. Disabling tracing does not affect system metrics.
Distinguishing reward functions
Set a unique name for each reward function by using RewardFunctionComponent(name="reward-1"). This name appears in the metric path (e.g., trace/reward_metrics/reward-1/...), and the console automatically groups and displays the metrics by name.
Disabling tracing to reduce costs
Set {"ENABLE_TRAJECTORY": "false"} in the runtime env, either by using FunctionComponentRuntime(env={"ENABLE_TRAJECTORY": "false"}) or by setting env: {ENABLE_TRAJECTORY: false} in a YAML file.