Resource observation

更新时间:
复制 MD 格式

Resource Observation lets you monitor resources like the Data Transmission Service, computing resources, and storage resources over time. You can analyze metric charts to optimize job execution plans and resource configurations, improving job efficiency and performance.

Supported regions

The following table lists the regions where Resource Observation is available for different resource types.

Resource type

Supported region

Computing Resource

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Ulanqab), China (Chengdu), China (Hong Kong), US (Silicon Valley), US (Virginia), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), UK (London), and Singapore

Storage Resource

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Zhangjiakou), China (Ulanqab), China (Hong Kong), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), and Singapore

Data Transmission Service

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China East 1 Finance, China (Hong Kong), Singapore, Japan (Tokyo), Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia), and SAU (Riyadh - Partner Region)

Job Performance Observation

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), US (Silicon Valley), US (Virginia), Germany (Frankfurt), UK (London), and SAU (Riyadh - Partner Region)

Open Storage

China (Shanghai) and China (Shenzhen)

Permissions

  • An Alibaba Cloud account has all permissions to view and manage Resource Observation.

  • A RAM user requires specific RAM permissions. For more information, see RAM permissions.

Computing resource

You can view the CU consumption of subscription and pay-as-you-go quotas.

Procedure

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Computing Resource tab.

  4. Select a level-1 quota, a time range, and a time interval.

    The time interval specifies the data point frequency. You can select an adaptive interval, or a 1-minute, 5-minute, or 15-minute interval. To ensure performance, the system automatically sets the interval to adaptive if the time range exceeds 72 hours.

  5. Click the image.png icon to the left of a target level-2 quota to view its resource consumption trend chart. You can expand multiple charts at the same time.

  6. View the list of projects associated with each level-2 quota.

Metrics

Metric

Description

CPU Resources

The CPU usage trend for the current quota group. You can click a point on the timeline to view a list of job snapshots at that time.

Memory Resources

The memory usage trend for the current quota group.

Important

Pay-as-you-go resources are shared. Computing jobs preempt these resources on demand, so you cannot specify a fixed amount of usage. If a single user continuously requests a large amount of resources, MaxCompute limits the user's resource usage to ensure fair access for other users.

The list of quotas and associated projects shows which projects have set the corresponding level-2 quota as their default computing quota.

Storage resource

You can view the total storage usage and the proportion of different storage types in the current region. You can also monitor usage trends for various storage types and view detailed table or partition storage information based on the selected project and time range.

Procedure

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Storage Resource tab to view the total storage usage and storage distribution for the current day.

  4. (Optional) Select a time range (default is 7d) and projects (default is all projects; you can select up to eight projects) to view the Storage Trend.

  5. (Optional) In the Storage Details area, on the Project Details tab, select a date (default is the current day) to view the storage usage of each project.

  6. (Optional) In the Storage Details area, on the Table/Partition Details tab, select a date (default is the current day) and a project to view the detailed table and partition storage usage in the project.

Metrics

Metric

Description

Storage Usage on the Current Day

The total storage usage and the storage usage percentage of each storage type in the current region. Data is updated approximately every hour.

Storage Distribution

The number of projects, tables, and partitions in the current region. Data is updated daily.

Storage Trend

  • Group by storage type: The storage usage of all or selected projects in the current region, and the usage trend of each storage type over time.

  • Group by project: The usage trends of different storage types over time, grouped by the top 8 projects with the highest total storage usage (default) or by the selected projects.

Project Details

The detailed storage usage of various storage types for projects with total storage greater than 0 on a specified date (within the last year) in the current region. It also provides a comparison of the total storage with data from N (1, 7, or 30) days ago.

Table/Partition Details

The storage type, storage size, and a comparison with data from N (1, 7, or 30) days ago for all tables and partitions in a specified project on a specified date (within the last year).

MaxCompute does not support viewing storage usage by column. To estimate column-level storage, export the table data and calculate the usage for each column.

Data transmission service

You can view the resource usage of a specific Data Transmission Service resource group or project. You can also use filters to observe and analyze the usage of different tables or request types in more detail.

Procedure

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, choose Workspace > Resource Observation.

  3. On the Resource Observation page, click the Data Transmission Service tab.

  4. Select a quota, project, time range, and aggregation algorithm to query the usage of each metric.

Usage notes

Data aggregation mechanism

Data Transmission Service monitoring uses an adaptive mechanism for metric intervals. It automatically optimizes the density of monitoring data based on your selected time range:

  • Short periods (within 3 hours) use the raw data granularity (1 minute per point).

  • Longer periods automatically extend the metric step to 5 minutes/point (12 hours), 30 minutes/point (72 hours), or 60 minutes/point (7 days).

You can choose from several aggregation strategies:

  • Average: Reflects the overall data trend.

  • Max: Captures abnormal fluctuations.

When the step exceeds the base granularity (1 minute), the system first processes the data according to the aggregation strategy. Therefore, the data trend in the monitoring chart varies depending on the chosen aggregation algorithm. The longer the time period, the larger the potential data difference; this is expected behavior. You can select the appropriate aggregation algorithm for your analysis scenario:

  • Performance analysis: We recommend using the Average aggregation algorithm.

  • Troubleshooting: We recommend using the Max aggregation algorithm.

Filters and limits

  • When filtering by time range, you can select up to 7 days at a time. Due to the adaptive interval mechanism, the shorter the selected time range, the more precise the monitoring data.

  • You must select at least one quota or project. You can also use a combination of a quota and project for filtering:

    • To view usage monitoring only by quota: You can specify an exclusive or shared resource group when selecting a quota. Since shared resource groups are project-level quotas, you must also specify a project when monitoring a shared resource group.

    • To view usage monitoring only by project: Leave the Select Quota selection empty and specify the desired project in the Choose Project selection. The total usage for that project is displayed.

  • After changing filter conditions, you must click Query to refresh the monitoring data.

  • For some table-level monitoring dashboards, you must select a project before you can view data or filter by table name.

Metrics

Metric

Description

Request Parallelism

Displays slot usage based on the selected filters, including current usage and the quota limit. The unit is slots.

Throughput

Displays throughput based on the selected filters. The unit (such as B/min or MB/min) is shown on the vertical axis.

Table-level Request Parallelism

Select a Usage Type (for example, Tunnel Batch upload) and a Table Name (for example, testtable). The chart shows the request parallelism for uploading data to the testtable table by using the Tunnel Batch method based on the filter conditions. The unit is slots.

Table-level IP Throughput

Select a Usage Type (for example, Tunnel Batch upload) and a Table Name (for example, testtable). The chart shows the throughput from each source IP address when data is uploaded to the testtable table by using the Tunnel Batch method based on the filter conditions.

Total Requests and Error Requests

The total number of requests and the count of various error requests based on the filter conditions:

  • Total requests: the sum of all successful and failed requests.

  • Error requests: all requests with a 4xx or 5xx status code. For more information about status codes, see Data Transmission Service overview.

Total Throughput

A summary of the data volume for different usage types within the selected time range based on the filter conditions. A pie chart shows the proportion of each usage type.

Slot Average Transfer Rate

Select a Usage Type (for example, Tunnel Batch upload). The chart shows the average transmission rate per slot for requests that use the Tunnel Batch upload method based on the filter conditions.

Open storage

Important

The Open Storage feature is in public preview. For more information, see Open Storage overview.

View the resource usage of Open Storage.

Procedure

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Open Storage tab.

  4. Select the project, table, task initiator, and time range to query the usage of each metric.

Metrics

Metric

Description

StorageAPIRead

The total amount of data read through Open Storage (Storage API).

StorageAPIWrite

The total amount of data written through Open Storage (Storage API).

Job performance observation

You can view the number of jobs, CU usage, and runtime of computing jobs to determine whether their performance meets expectations.

Procedure

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, click Resource Observation.

  3. On the Resource Observation page, click the Job Performance Observation tab.

  4. Select the following parameters to filter and group jobs. This allows you to view specific jobs and group metric data in charts by different dimensions.

    Parameter

    Description

    Time range

    Required. Filters completed jobs based on the specified time range (start and end times).

    You can select a preset time range or configure a custom one:

    • 1d: The last 1 day.

    • 3d: The last 3 days.

    • 7d: The last 7 days.

    • Select a specific period: Click the time range drop-down list, select the desired date, and then click Select Time to choose the target time period.

    Note

    The default time range is the last day. The maximum time range is 7 days, and the minimum is 1 hour. You can search for jobs from the last 45 days.

    Choose Project

    Filters by MaxCompute project name.

    Note

    By default, all projects are selected. You can select up to 8 projects.

    Select Quota

    Filters by computing quota.

    Note

    By default, all computing quotas are selected. You can select up to 8 level-2 quotas. For more information about computing quotas, see Manage computing quotas.

    Group By

    Required. Depending on the chart type, you can group data in the chart view by multiple dimensions.

    Valid values for Group By:

    • No Group (Default): Displays the trends of various metrics for all jobs within the filtered range over time.

    • Project: Displays various metrics for all jobs within the filtered range, grouped by project.

      Note

      When you group by project, you must specify the projects in the filter parameters, up to a maximum of 8 projects.

    • Quotas: Displays various metrics for all jobs within the filtered range, grouped by level-2 quota.

      Note

      When you group by quota, you must specify the quotas in the filter parameters, up to a maximum of 8 level-2 quotas.

    • Job type: Displays various metrics for all jobs within the filtered range, grouped by job type.

      • SQL: SQL job.

      • MCQA: MaxCompute Query Acceleration (MCQA) SQL job.

      • LOT: MapReduce job.

      • CUPID: Spark or Mars job.

      • Algo_Task: machine learning job.

      • GRAPH: graph computing job.

    • Job End Status: Displays various metrics for all jobs within the filtered range, grouped by their final status.

      • Success: The job ran successfully.

      • Failed: The job failed.

      • Cancelled: The job was canceled.

  5. Click Query to view the statistics for each metric.

  6. (Optional) Select a Data Summary granularity to view the statistics for each metric based on the chosen time dimension.

    Parameter

    Description

    By Hour

    Summarizes data in one-hour intervals. This shows statistics for jobs that finished running in the current hour. This is the default setting.

    For example, if the current hour is 14:00 on May 6, 2024, summarizing by hour shows statistics for jobs that finished between 14:00 and 15:00 on May 6, 2024.

    By Day

    Summarizes data in one-day intervals. This shows statistics for jobs that finished running on the current day.

    For example, if the current date is May 6, 2024, summarizing by day shows statistics for jobs that finished between 00:00 on May 6, 2024, and 00:00 on May 7, 2024.

  7. (Optional) Select a Comparison Period to view historical statistics from a previous date or hour.

    The default is No Comparison. Options include Previous 30 Days, Previous 7 Days, and Previous 1 Day. For example, for 14:00 on May 6, 2024, the 30-day comparison data is the statistics from 14:00 on April 6, 2024.

Metrics

  • CU Usage Trend (Unit: Core*H)

    Metric

    Description

    CPU-hour (Unit: Core*H)

    The number of CPU-hours consumed by completed jobs within the selected filter range.

    One CPU-hour is defined as one CPU core consumed for one hour. CPU-hours = Number of CPU cores × Duration.

    Memory-hour (Unit: GB*H)

    The amount of memory-hours consumed by completed jobs within the selected filter range.

    One memory-hour is defined as 1 GB of memory consumed for one hour. Memory-hours = Memory size × Duration.

    CPU-hour/Top 10 Memory-Hour Consumption Analysis

    Ranks the top 10 jobs by CPU-hour or memory-hour consumption. It also lists the top 10 Signatures and ExtNodeIds with the highest total or average consumption within the selected filter range.

  • Job Runtime Period (Unit: seconds)

    Metric

    Description

    Average Value

    The average runtime of completed jobs within the selected filter range.

    Maximum Value

    The longest runtime of completed jobs within the selected filter range.

    Minimum Value

    The shortest runtime of completed jobs within the selected filter range.

    Quantile

    The job runtime for a specific quantile (including the 1st, 5th, 10th, 50th, 90th, 95th, and 99th percentiles) of completed jobs within the selected filter range.

    For example, the 99th percentile indicates that 99% of jobs finished within this runtime.

    Top 10 Job Runtime Analysis

    Provides the top 10 jobs with the longest total runtime, and the top 10 Signatures and ExtNodeIds with the longest total or average runtime within the selected filter range.

  • Job Count Trend (Unit: count): The number of completed jobs within the selected filter range.

  • Job Input Size Trend (Unit: GB, automatically adjusted based on the chart's scale): The amount of data scanned by completed jobs within the selected filter range.

  • Trend of Job Input Size per CU-Hour (Unit: GB, automatically adjusted based on the chart's scale): The average size of data scanned by jobs per CU-hour within the selected filter range. One CU-hour includes 1 CPU core and 4 GB of memory. It is calculated by using the formula MAX(CPU-hour, CEILING(memory-hour/4)).

You can also collect statistics for the preceding metrics by using the tenant-level Information Schema. Note that the Information Schema task_history table contains all task instances resulting from operations, whereas the metrics on the Job Performance Observation tab in the console count only jobs that consume computing resources. Therefore, the statistical results may differ.

The following is a sample query:

SET odps.namespace.schema=TRUE;
SELECT to_char (end_time, 'yyyy-mm-dd hh'), -- The hour when the job finished.
       -- to_char (end_time, 'yyyy-mm-dd'), -- For daily granularity, use this line to replace the preceding one. This indicates the date when the job finished.
       sum(cast(cost_cpu/100/3600 as DECIMAL(18,5) )) cost_cpuh, -- CPU-hour
       sum(cast(cost_mem/1024/3600 as DECIMAL(18,5) )) cost_memh, -- Memory-hour
       avg(datediff(end_time, start_time, 'ss')), -- Average job runtime
       min(datediff(end_time, start_time, 'ss')), -- Minimum job runtime
       max(datediff(end_time, start_time, 'ss'))  -- Maximum job runtime
       -- status, -- Grouping basis: status: job status; project: task_catalog; job type: task_type.
FROM SYSTEM_CATALOG.INFORMATION_SCHEMA.tasks_history
WHERE ds>=to_char(date_add(getdate(),-7),'yyyymmdd')  -- Modify or add other filter conditions as needed.
and task_type in ('SQL','SQLRT','LOT','CUPID','ALgoTask')
GROUP BY to_char (end_time, 'yyyy-mm-dd hh')
         -- to_char (end_time, 'yyyy-mm-dd'), -- For daily granularity, use this line to replace the preceding one. This indicates the date when the job finished.
         -- status, -- Grouping basis: status: job status; project: task_catalog; job type: task_type.
order BY to_char (end_time, 'yyyy-mm-dd hh') ASC;
         -- to_char (end_time, 'yyyy-mm-dd'); -- For daily granularity, use this line to replace the preceding one. This indicates the date when the job finished.

FAQ

  • Question 1:

    • Q: Why are some projects or quotas missing from the chart legend after grouping?

    • A: This happens when the job count for a project or quota is zero within the selected time range.

  • Question 2:

    • Q: Why is comparison data missing for a previous period?

    • A: Data may be missing if the project or quota did not exist or had no jobs during the comparison period.

Related topics

After observing the resource usage, you can optimize the job execution plan and resource configuration as needed:

  • To adjust resources, see Configure quotas to modify the quota and time plans for a quota group.

  • To adjust job execution priority, see Job priority.