DataWorks open data provides multi-dimensional tables and views for collecting metadata. This topic lists the available tables and views and describes their schemas.
Metadata
DataWorks generates metadata tables and sample metric tables from the metadata of resources in your tenant, such as tables, tasks, instances, workspaces, members, and projects. The schema of these tables is dynamic, and the schema displayed in the UI is authoritative.
Asset table issues (asset_table_issues)
Partition field: dt
Description: Details of data governance issues in the table.
|
Parameter |
Type |
Description |
|
tenant_id |
string |
The ID of the DataWorks tenant. |
|
meta_entity_id |
string |
The ID of the corresponding metadata entity. |
|
uuid |
string |
The unique key of the table. |
|
meta_entity_type |
string |
The type of the corresponding metadata entity. For example, maxcompute-table. |
|
entity_type |
string |
The entity type. For example, table, view, and materialized_view. |
|
account_id |
string |
The main account that owns the asset. |
|
datasource_type |
string |
The data source type. For example, E-MapReduce and MaxCompute. |
|
datasource_id |
string |
The name of the engine, which is projectName for MaxCompute, clusterId for E-MapReduce, or databaseName for Hologres. |
|
catalog_name |
string |
The name of the Data Lake Formation catalog, used when Data Lake Formation is the metadata source. |
|
database_name |
string |
The name of the database. For E-MapReduce, this is the dbName. |
|
schema_name |
string |
The name of the schema. |
|
rule_id |
string |
The ID of the governance rule. |
|
rule_name_zh |
string |
The Chinese name of the governance rule. |
|
rule_name_en |
string |
The English name of the governance rule. |
|
category |
string |
The dimension of the governance rule. |
|
deduct_score_tenant |
string |
The points deducted at the tenant level. The value is accurate to four decimal places. |
|
deduct_score_owner |
string |
The points deducted at the owner level. The value is accurate to four decimal places. |
|
cost |
string |
The amount of wasted resources. |
|
project_id |
string |
The ID of the DataWorks project. |
|
dt |
string |
The date partition, a logical partition field, in YYYYMMDD format. |
Asset table profiles (asset_table_profiles)
Partition field: dt
Description: Detailed metrics for table assets.
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The ID of the source tenant. |
|
meta_entity_id |
string |
The ID of the corresponding metadata entity. |
|
meta_entity_type |
string |
The type of the corresponding metadata entity. For example, |
|
entity_type |
string |
The entity type, such as |
|
account_id |
string |
The main account that owns the asset. |
|
datasource_type |
string |
The data source type, such as |
|
datasource_id |
string |
The engine name. Examples include |
|
catalog_name |
string |
The Data Lake Formation (DLF) catalog name. This field is populated only when DLF is the metadata source. |
|
database_name |
string |
The name of the database. For E-MapReduce, this corresponds to |
|
schema_name |
string |
The name of the schema. |
|
uuid |
string |
The table's unique key. |
|
name |
string |
The table's name. |
|
owner |
string |
The asset's owner. |
|
last_access_timestamp |
bigint |
The table's last access timestamp. |
|
meta_modified_timestamp |
bigint |
The 13-digit UNIX timestamp indicating when the table's metadata was last modified. |
|
data_modified_timestamp |
bigint |
The 13-digit UNIX timestamp indicating when the table's data was last modified. |
|
create_timestamp |
bigint |
The table's creation timestamp. |
|
comment |
string |
The table's comment. |
|
partition_keys |
string |
The partition keys for the table. |
|
tags |
string |
The asset's tags. |
|
governance_rule_finding_count |
bigint |
The number of issues identified by governance rules. |
|
governance_rule_finding_history_count |
string |
The asset's historical count of governance findings. |
|
governance_health_score |
string |
The asset's governance health score. |
|
governance_health_level |
string |
The asset's governance health level, derived from its score. |
|
is_partitioned |
bigint |
Indicates whether the table is partitioned. |
|
content_size |
bigint |
The table's logical size. |
|
record_num |
bigint |
The number of records in the table. |
|
life_cycle |
string |
The table's lifecycle. |
|
partition_count |
bigint |
The table's partition count. |
|
view_count_monthly |
bigint |
The table's view count over the last month. |
|
access_count |
bigint |
The table's total access count. |
|
upstream_table_count |
bigint |
The number of upstream tables. |
|
upstream_table_detail |
string |
Details about the upstream tables. |
|
downstream_table_count |
bigint |
The number of downstream tables. |
|
downstream_table_detail |
string |
Details about the downstream tables. |
|
producing_project_ids |
string |
A list of workspaces that produce the table. |
|
producing_tasks_count |
bigint |
The number of tasks that produce this table. |
|
producing_tasks_detail |
string |
Details about the tasks that produce this table. |
|
using_tasks_count |
bigint |
The number of tasks that use this table. |
|
using_tasks_detail |
string |
Details about the tasks that use this table. |
|
quality_rule_count |
bigint |
The number of quality rules for the table. |
|
quality_monitor_count |
bigint |
The number of quality monitoring metrics for the table. |
|
quality_rule_7_days_failed_count |
bigint |
The number of failed quality rule checks in the last 7 days. |
|
quality_monitor_7_days_failed_count |
bigint |
The number of failed quality monitoring metric checks in the last 7 days. |
|
dt |
string |
The date partition, which serves as a logical partition field. The format is |
Asset task issues (asset_task_issues)
Partition field: dt
Description: Details of data governance issues identified in tasks.
|
Parameter |
Type |
Description |
|
tenant_id |
string |
The ID of the DataWorks tenant. |
|
node_id |
string |
The ID of the scheduling node. |
|
node_name |
string |
The name of the node. |
|
node_type |
string |
The type of the node. Valid values: SQL, SQLCost, LOT, and CUPID. |
|
node_owner |
string |
The base ID of the node owner. |
|
priority |
string |
The priority of the node. |
|
rule_id |
string |
The ID of the governance rule. |
|
rule_name_zh |
string |
The Chinese name of the governance rule. |
|
rule_name_en |
string |
The English name of the governance rule. |
|
category |
string |
The rule's governance domain. |
|
deduct_score_tenant |
string |
The score deduction for the tenant, accurate to four decimal places. |
|
deduct_score_owner |
string |
The score deduction for the owner, accurate to four decimal places. |
|
cost |
string |
The benefit gained by resolving the issue, typically measured as a cost saving. |
|
project_id |
string |
The ID of the DataWorks project. |
|
dt |
string |
The logical date partition, in YYYYMMDD format. |
Asset task profiles (asset_task_profiles)
Partition field: dt
Description: Detailed metrics for asset tasks.
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The ID of the tenant. |
|
data_asset_id |
string |
The ID of the asset within the module, corresponding to |
|
name |
string |
The asset name, corresponding to |
|
project_id |
bigint |
The ID of the workspace where the asset is located. |
|
project_env |
string |
The environment. Valid values: |
|
owner |
string |
The owner of the asset. |
|
create_user |
string |
The user who created the asset. |
|
create_time |
bigint |
The time when the asset was created. |
|
modify_user |
string |
The user who last modified the asset. |
|
modify_time |
bigint |
The time when the asset was last modified. |
|
trigger_type |
string |
The trigger type. Valid values: |
|
trigger_recurrence_type |
string |
The scheduling state. Valid values: |
|
trigger_cron |
string |
The cron expression. |
|
type |
bigint |
The type of code executed by the node. For a list of node type codes, see the DataStudio documentation at https://help.aliyun.com/zh/dataworks/user-guide/node-development-of-data-studio. |
|
script_parameters |
string |
The script parameters. |
|
priority |
bigint |
The priority of the task. Valid values range from 1 (lowest) to 8 (highest). The default is 1. |
|
trigger_start_time |
bigint |
The start date for scheduling. |
|
trigger_end_time |
bigint |
The end date for scheduling. |
|
runtime_resource_group_id |
bigint |
The ID of the resource group to which the node belongs. |
|
runtime_cu |
string |
The compute units (CUs). |
|
baseline_id |
bigint |
The ID of the baseline to which the node belongs. |
|
rerun_times |
bigint |
The maximum number of times the task can be rerun. |
|
rerun_interval |
bigint |
The rerun interval, in milliseconds. |
|
rerun_mode_type |
string |
Specifies when the task can be rerun. Valid values: |
|
tags |
string |
Tags associated with the asset. |
|
tags_count |
bigint |
The number of tags associated with the asset. |
|
input_table_count |
bigint |
The number of input tables. |
|
output_table_count |
bigint |
The number of output tables. |
|
input_table_detail |
string |
Details of the input tables. |
|
output_table_detail |
string |
Details of the output tables. |
|
upstream_node_count |
bigint |
The number of upstream nodes. |
|
downstream_node_count |
bigint |
The number of downstream nodes. |
|
governance_rule_finding_count |
bigint |
The number of issues identified by governance rules. |
|
governance_rule_finding_history_count |
string |
Historical count of governance issues for the asset. |
|
governance_health_score |
string |
The health score of the asset. |
|
governance_health_level |
string |
The health level of the asset, based on its score. |
|
engine_datasource_id |
string |
The ID of the compute engine. |
|
engine_instance_count |
bigint |
The number of compute engine instances. |
|
engine_instance_run_time |
bigint |
The total runtime of compute engine instances. |
|
engine_instance_comput_volume_cost |
string |
The volume of computation. |
|
engine_instance_cu_cost |
string |
The compute units (CUs) consumed. |
|
engine_instance_cpu_cost |
string |
The CPU consumption. |
|
engine_instance_mem_cost |
string |
The memory consumption. |
|
engine_instance_exist_data_skew |
bigint |
Indicates whether data skew exists. |
|
engine_instance_suggestions |
string |
Suggestions for addressing data skew. |
|
engine_instance_data_skew_ids |
string |
The IDs of instances with data skew. |
|
engine_instance_ids |
string |
The IDs of the engine instances. |
|
task_instance_wait_time_cost_sum |
bigint |
Total instance wait time, in milliseconds. |
|
task_instance_wait_time_cost_max |
bigint |
Maximum instance wait time, in milliseconds. |
|
task_instance_run_time_cost_sum |
bigint |
Total instance runtime, in milliseconds. |
|
task_instance_run_time_cost_max |
bigint |
Maximum runtime for a single instance, in milliseconds. |
|
task_instance_7_days_wait_time_cost_max |
bigint |
Maximum instance wait time over the last seven days, in milliseconds. |
|
task_instance_7_days_run_time_cost_max |
bigint |
Maximum instance runtime over the last seven days, in milliseconds. |
|
task_instance_count |
bigint |
The number of instances. |
|
task_instance_7_days_failed_count |
bigint |
The number of failed instances over the last seven days. |
|
task_instance_7_days_failed_day_count |
bigint |
The count of days with failures over the past seven days. |
|
task_instance_7_days_frezeed_day_count |
bigint |
The count of days the task was frozen over the past seven days. |
|
task_instance_7_days_dry_run_day_count |
bigint |
The count of days the task was skipped over the past seven days. |
|
quality_monitor_count |
bigint |
The number of data quality monitoring metrics. |
|
quality_monitor_7_days_failed_count |
bigint |
The number of failed data quality monitoring metrics over the past seven days. |
|
di_task_resource_group_id |
string |
The ID of the data integration resource group to which the node belongs. |
|
di_task_is_public_network |
bigint |
Indicates whether the data integration task uses public network traffic. |
|
di_task_concurrency |
bigint |
The concurrency level for the data integration task. |
|
di_task_total_records |
bigint |
The total number of synchronized records. |
|
di_task_total_bytes |
bigint |
The total volume of synchronized data, in bytes. |
|
di_task_source_type |
string |
The type of the data source. |
|
di_task_target_type |
string |
The type of the data target. |
|
di_task_run_time_cost |
bigint |
Runtime of the data integration task, in milliseconds. |
|
di_task_wait_time_cost |
bigint |
Wait time of the data integration task, in milliseconds. |
|
dt |
string |
The date partition for the record, in |
Data catalogs (catalogs)
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The data source type, such as DLF and StarRocks. |
|
datasource_id |
string |
The data source ID, such as a StarRocks cluster ID or the main account ID for DLF. |
|
name |
string |
The data catalog name. |
|
type |
string |
The data catalog type, such as Hive or JDBC. |
|
comment |
string |
The data catalog comment. |
|
location |
string |
The location of the data catalog. |
|
properties |
string |
Configuration properties, specified as a JSON string. |
|
owner |
string |
The owner of the data catalog. This value can be an Alibaba Cloud account UID or a database account, depending on the data source type. |
|
create_timestamp |
bigint |
The time the data catalog was created, represented as a 13-digit timestamp (milliseconds). |
|
update_timestamp |
bigint |
The time the data catalog was last updated, represented as a 13-digit timestamp (milliseconds). |
|
meta_entity_id |
string |
A unique, API-friendly identifier for the data catalog that complies with the metadata entity ID specification. |
|
dt |
string |
The date partition, a logical partition field, in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D]. |
Columns
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The type of the data source, such as DLF and StarRocks. |
|
datasource_id |
string |
The ID of the data source, such as a StarRocks cluster ID, the main account ID for DLF or MaxCompute, or an RDS instance ID. |
|
catalog_name |
string |
The name of the data catalog. This field is populated only if the data source type supports data catalogs. |
|
database_name |
string |
The name of the database. |
|
schema_name |
string |
The name of the schema. This field is populated only if the data source type supports schemas. |
|
table_name |
string |
The name of the table. |
|
name |
string |
The name of the column. |
|
type |
string |
The data type of the column. |
|
comment |
string |
The comment for the column. |
|
ordinal_position |
bigint |
The 1-based ordinal position of the column in the table. |
|
is_primary_key |
boolean |
Indicates whether the column is part of the primary key. |
|
is_nullable |
boolean |
Indicates whether the column allows NULL values. |
|
is_partition_key |
boolean |
Indicates whether the column is a partition key. |
|
properties |
string |
A JSON string of properties and parameters. |
|
business_description |
string |
The business description of the column. |
|
meta_entity_id |
string |
The unique identifier for the column. It is API-friendly and complies with the metadata entity ID specification. |
|
dt |
string |
The date partition (a logical partition column) in YYYYMMDD format. Valid range: [TODAY-31D, TODAY-1D]. |
Databases
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The type of the data source. Examples include |
|
datasource_id |
string |
The data source ID. For example, a StarRocks cluster ID, the primary account ID for Data Lake Formation (DLF) or MaxCompute, or an RDS instance ID. |
|
catalog_name |
string |
The catalog name. This field is populated only if the data source type supports catalogs. |
|
name |
string |
The database name. |
|
type |
string |
The database type. |
|
comment |
string |
The database comment. |
|
location |
string |
The database path. |
|
properties |
string |
Properties and parameters (JSON string). |
|
owner |
string |
The owner of the database. The value is an Alibaba Cloud account UID or a database system account, depending on the data source type. |
|
is_external |
boolean |
Indicates whether the database is an external database. |
|
create_timestamp |
bigint |
A 13-digit timestamp indicating the creation time. |
|
update_timestamp |
bigint |
A 13-digit timestamp indicating the last update time. |
|
meta_entity_id |
string |
The unique identifier for the database. This ID conforms to the metadata entity ID specification. |
|
dt |
string |
The date partition (a logical partition field) in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D]. |
Table and column-level data lineage (lineages)
|
Parameter |
Type |
Description |
|
source_meta_entity_id |
string |
The unique identifier for the source. This identifier is API-friendly and conforms to the metadata entity ID specification. |
|
source_raw_entity_type |
string |
The entity type of the source. This field is used for identification if the entity's metadata is unmanaged and the |
|
source_uuid |
string |
A unique, UI-friendly identifier for the source used for page access. |
|
target_meta_entity_id |
string |
The unique identifier for the target. This identifier is API-friendly and conforms to the metadata entity ID specification. |
|
target_raw_entity_type |
string |
The entity type of the target. This field is used for identification if the entity's metadata is unmanaged and the |
|
target_uuid |
string |
A unique, UI-friendly identifier for the target used for page access. |
|
compute_engine |
string |
The compute engine, such as |
|
transform_type |
string |
The type of transformation task performed by the compute engine. Examples: |
|
task_id |
bigint |
The ID of the DataWorks scheduled task. Refer to the |
|
task_instance_id |
bigint |
The ID of the DataWorks scheduled task instance. Refer to the |
|
lineage_time |
bigint |
The timestamp, in milliseconds, when the data lineage was generated. |
|
granularity |
string |
The level of the data lineage, such as |
|
dt |
string |
The date partition (a logical partition field), in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D]. |
Partitions
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The data source type, such as MaxCompute, DLF, and StarRocks. |
|
datasource_id |
string |
The data source ID, such as a StarRocks cluster ID, the main account ID for DLF or MaxCompute, or an RDS instance ID. |
|
catalog_name |
string |
The name of the data catalog. This field is populated when the data source type supports data catalogs. |
|
database_name |
string |
The name of the database. |
|
schema_name |
string |
The name of the schema. This field is populated when the data source type supports schemas. |
|
table_name |
string |
The name of the table. |
|
name |
string |
The partition name (partition specification). |
|
create_timestamp |
bigint |
The 13-digit creation timestamp. |
|
update_timestamp |
bigint |
The 13-digit update timestamp. |
|
content_size |
bigint |
The partition size, in bytes. |
|
properties |
string |
A JSON string of properties and parameters. |
|
dt |
string |
The date partition (a logical partition field) in YYYYMMDD format. The valid value range is [TODAY-31D, TODAY-1D]. |
Resource groups
|
Parameter |
Type |
Description |
|
resource_group_id |
bigint |
The ID of the resource group. |
|
resource_group_identifier |
string |
The identifier of the resource group. |
|
resource_group_type |
bigint |
The type of the resource group. Valid values: 1 (scheduling resource group), 2 (MaxCompute resource group), and 4 (data integration resource group). |
|
resource_group_mode |
bigint |
The mode of the resource group. Valid values: 1 (prepaid), 2 (pay-as-you-go), and 3 (developer edition, MaxCompute only). |
|
resource_group_status |
bigint |
The status of the resource group. Valid values: 0 (Normal), 1 (Frozen), 2 (Deleted), 3 (Creating), 4 (Creation Failed), 5 (Updating), 6 (Update Failed), 7 (Deleting), and 8 (Deletion Failed). |
|
is_exclusive_resource_group |
boolean |
Specifies whether this is an exclusive resource group. |
|
dt |
string |
The date partition, a logical partition field. Format: YYYYMMDD. Value range: [TODAY-31D, TODAY-1D]. |
Schemas
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The data source type. Examples: holodb, MaxCompute, and PostgreSQL. |
|
datasource_id |
string |
The data source ID, such as an RDS instance ID or the account ID for MaxCompute. |
|
catalog_name |
string |
The data catalog name. This field is populated only if the data source type supports data catalogs. |
|
database_name |
string |
The database name. |
|
name |
string |
The schema name. |
|
type |
string |
The schema type. |
|
comment |
string |
A comment about the schema. |
|
properties |
string |
Properties and parameters, in JSON string format. |
|
owner |
string |
The schema owner. The value can be an Alibaba Cloud account UID or a database account, depending on the data source type. |
|
create_timestamp |
bigint |
The creation time, represented as a 13-digit UNIX timestamp. |
|
update_timestamp |
bigint |
The last update time, represented as a 13-digit UNIX timestamp. |
|
meta_entity_id |
string |
A unique identifier for the schema, which is API-friendly and compliant with the meta entity ID specification. |
|
dt |
string |
The date partition (a logical partition field) in YYYYMMDD format. The value range is [TODAY-31D, TODAY-1D]. |
Tables
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The data source type, such as Data Lake Formation, StarRocks, MaxCompute, Hologres, or MySQL. |
|
datasource_id |
string |
The data source ID. This value is the cluster ID for a StarRocks cluster, the main account ID for Data Lake Formation or MaxCompute, or the instance ID for an RDS instance. |
|
catalog_name |
string |
The name of the data catalog. This field applies only to data source types that support data catalogs. |
|
database_name |
string |
The name of the database. |
|
schema_name |
string |
The name of the schema. This field applies only to data source types that support schemas. |
|
name |
string |
The name of the table. |
|
type |
string |
The type of the table. |
|
comment |
string |
The comment for the table. |
|
partition_keys |
string |
The partition keys. For multi-level partitioning, keys are separated by commas (,). |
|
location |
string |
The storage path for the table. |
|
properties |
string |
A JSON string of properties and parameters. For a view, this field contains the view's DDL definition. |
|
owner |
string |
The table owner. The value can be an Alibaba Cloud account ID or a database system account, depending on the data source type. |
|
content_size |
bigint |
The storage size, in bytes. |
|
data_retention |
map<string,string> |
The data retention period or lifecycle. The value varies by table type. For MaxCompute tables, the key is |
|
is_compressed |
boolean |
Indicates whether the table is compressed. |
|
is_temporary |
boolean |
Indicates whether the table is a temporary table. |
|
entity_type |
string |
The type of the entity, such as |
|
input_format |
string |
The input format. |
|
output_format |
string |
The output format. |
|
serde_parameters |
string |
The SerDe parameters. |
|
serialization_lib |
string |
The serialization library. |
|
create_timestamp |
bigint |
A 13-digit UNIX timestamp indicating when the table was created. |
|
meta_modified_timestamp |
bigint |
A 13-digit UNIX timestamp indicating when the table metadata was last modified. |
|
data_modified_timestamp |
bigint |
A 13-digit UNIX timestamp indicating when the table data was last modified. |
|
last_access_timestamp |
bigint |
A 13-digit UNIX timestamp indicating when the table was last accessed. |
|
business_description |
string |
The business description or Chinese name. |
|
meta_entity_id |
string |
The unique identifier for the table. This ID is designed for API use and conforms to the metadata entity ID specification. Examples:
|
|
uuid |
string |
The UUID of the table, used to link to the table details page in the DataWorks data map. |
|
business_tags |
array<string> |
Business tags. This field contains tags set on the data map page. |
|
wikis |
array<struct< |
The table wiki. The struct contains the following fields: |
|
producing_tasks |
array<bigint> |
A list of scheduling task IDs that produce data for this table. For more information, see the |
|
dt |
string |
The date partition (a logical partition field) in YYYYMMDD format. Valid values: |
Task and workflow run instances (task_instances)
|
Parameter |
Type |
Description |
|
id |
bigint |
The task instance ID. |
|
node_id |
bigint |
The task ID. References the |
|
node_type |
bigint |
The task type. For a list of node code values, see Node Development. |
|
node_name |
string |
The name of the task. |
|
description |
string |
The description of the task. |
|
workflow_id |
bigint |
The ID of the workflow. References the |
|
workflow_name |
string |
The name of the workflow. |
|
workflow_instance_id |
bigint |
The ID of the workflow instance. |
|
workflow_instance_type |
bigint |
The type of the workflow instance. Valid values: 0 (daily scheduling), 1 (manual task), 2 (smoke test), 3 (backfill), 4 (one-time workflow), 5 (manual workflow). |
|
trigger_type |
string |
The trigger type (Scheduler/Manual). |
|
trigger_recurrence |
string |
The run mode. Valid values: 0 (normal), 1 (manual), 2 (paused), 3 (dry run), 4 (referenced). |
|
timeout |
bigint |
The task execution timeout, in hours. |
|
rerun_mode |
string |
The rerun configuration. Valid values: 0 (rerunnable on failure), 1 (rerunnable on failure or success), 2 (not rerunnable). |
|
run_number |
bigint |
The number of runs. |
|
period_number |
bigint |
The period number. |
|
baseline_id |
bigint |
The ID of the baseline. |
|
priority |
bigint |
The task priority (1-8). |
|
script_parameters |
string |
A list of script parameters for the run. |
|
runtime_resource_group_id |
bigint |
The resource group ID for the task run. |
|
runtime_resource_group_identifier |
string |
The resource group identifier for the task run. |
|
runtime_image |
string |
The runtime image ID. |
|
runtime_cu |
string |
CUs consumed at runtime. |
|
runtime_process_id |
string |
The process ID at runtime. |
|
runtime_gateway |
string |
The gateway used at runtime. |
|
datasource_name |
string |
The name of the data source. |
|
inputs_variables |
array<struct< |
A list of input variables. |
|
outputs |
array<struct< |
A list of output identifiers. |
|
outputs_variables |
array<struct< |
A list of output variables. |
|
tags |
array<struct< |
A list of task tags. |
|
status |
bigint |
The task status. Valid values: 1 (not run), 2 (waiting for schedule), 3 (waiting for resources), 4 (running), 5 (failed), 6 (succeeded), 7 (verifying), 8 (pending condition), 9 (waiting for a trigger). |
|
trigger_time |
string |
The time the task was triggered. |
|
bizdate |
string |
The business date. |
|
started_time |
string |
The time the task started. |
|
finished_time |
string |
The time the task finished. |
|
project_id |
bigint |
The project ID. References the |
|
project_env |
string |
The environment type (PROD/DEV). |
|
owner |
string |
The owner's account ID. References the |
|
create_time |
string |
The creation time. |
|
modify_time |
string |
The last modification time. |
|
create_user |
string |
The creator's user ID. References the |
|
modify_user |
string |
The last modifier's user ID. References the |
|
waiting_resource_time |
string |
The time spent waiting for resources. |
|
waiting_trigger_time |
string |
The time spent waiting for a trigger. |
|
dt |
string |
The logical date partition, in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D]. |
Task and workflow definitions (tasks)
|
Parameter |
Type |
Description |
|
id |
bigint |
The task ID. |
|
name |
string |
The task name. |
|
description |
string |
The task description. |
|
type |
bigint |
The task type. See node development for node code values. |
|
workflow_id |
bigint |
The workflow ID. |
|
instance_mode |
string |
The instance generation mode.
|
|
baseline_id |
bigint |
The baseline ID. |
|
priority |
bigint |
The task priority, ranging from |
|
timeout |
bigint |
The task execution timeout, in hours. |
|
rerun_mode |
bigint |
The rerun policy for the task. Valid values: |
|
rerun_times |
bigint |
The number of rerun attempts. This setting applies only when the task is configured to allow reruns. |
|
rerun_interval |
bigint |
The interval between rerun attempts, in seconds. |
|
script_parameters |
string |
The script parameters for the runtime. |
|
trigger_type |
string |
The trigger type. Valid values: |
|
trigger_recurrence |
bigint |
The run mode when the task is triggered. Valid values: |
|
trigger_cron |
string |
The Cron expression. Applies when |
|
trigger_start_time |
string |
The start time for the scheduled trigger. Applies when |
|
trigger_end_time |
string |
The expiration time for the scheduled trigger. Applies when |
|
runtime_resource_group_id |
bigint |
The ID of the resource group for the task runtime. |
|
runtime_image |
string |
The image ID for the task runtime. |
|
runtime_cu |
string |
The CU consumption for the task runtime. |
|
datasource_name |
string |
The data source name. |
|
inputs_variables |
array<struct< |
The input variables. |
|
outputs |
array<struct< |
The task output identifiers. |
|
outputs_variables |
array<struct< |
The output variables. |
|
dependencies |
array<struct< |
The dependencies. |
|
related_workflow_id |
bigint |
The ID of the related workflow. |
|
tags |
array<struct< |
The task tags. |
|
project_id |
bigint |
The project ID. See the |
|
project_env |
string |
The environment type. Valid values: |
|
owner |
string |
The account ID of the task owner. See the |
|
create_time |
string |
The creation time. |
|
modify_time |
string |
The last modification time. |
|
create_user |
string |
The account ID of the user who created the task. See the |
|
modify_user |
string |
The account ID of the user who last modified the task. See the |
|
dt |
string |
The date partition in |
Users
|
Parameter |
Type |
Description |
|
user_id |
string |
The unique identifier for the user. |
|
user_nick |
string |
The user's account alias or display name. |
|
dt |
string |
The logical partition field, representing the date partition in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D]. |
Workspace members
|
Parameter |
Type |
Description |
|
workspace_id |
bigint |
The workspace ID. See the |
|
user_id |
string |
The user ID. See the |
|
user_status |
bigint |
The user status. Valid values: |
|
gmt_create_ts |
bigint |
The creation time, a 13-digit timestamp. |
|
gmt_modified_ts |
bigint |
The modification time, a 13-digit timestamp. |
|
dt |
string |
The date partition (a logical partition) in |
Workspaces
|
Parameter |
Type |
Description |
|
workspace_id |
bigint |
The workspace ID. |
|
workspace_name |
string |
The workspace name. |
|
workspace_identifier |
string |
The workspace identifier. |
|
workspace_description |
string |
The workspace description. |
|
workspace_owner |
string |
The workspace owner ID. See the |
|
workspace_status |
bigint |
The workspace status. Valid values: |
|
dt |
string |
The date partition (a logical partition field). Format: |
Data quality rule instances (quality_rule_results)
Partition field: dt
Description: Describes data quality rule instances.
|
Parameter |
Type |
Description |
|
id |
bigint |
The primary key. |
|
scan_run_id |
bigint |
The ID of the quality monitoring instance. |
|
rule_id |
bigint |
The ID of the rule. |
|
rule_name |
string |
The name of the rule. |
|
status |
string |
The validation result of the rule. Possible values: Pass, Error, Warn, Fail, or Running. |
|
severity |
string |
The strength of the rule. Possible values: High (strong rule) or Normal (weak rule). |
|
create_time |
bigint |
The creation time of the instance. |
|
modify_time |
bigint |
The last modification time of the instance. |
|
spec |
string |
The specification of the rule instance. |
|
tags |
array<string> |
The tags for the rule instance. |
|
tenant_id |
bigint |
The ID of the DataWorks tenant. |
|
project_id |
bigint |
The ID of the DataWorks workspace. |
|
meta_entity_id |
string |
The unique identifier for the meta table entity. |
|
dt |
string |
The date partition, in YYYYMMDD format. Value range: [TODAY-D, TODAY-1D]. |
Data quality rule metrics (quality_rules)
Partition field: dt
Description: Detailed metrics for each data quality rule.
|
Parameter |
Type |
Description |
|
id |
bigint |
The primary key. |
|
scan_id |
bigint |
The ID of the quality monitoring instance. |
|
rule_name |
string |
The name of the rule. |
|
enabled |
boolean |
Indicates whether the rule is enabled. |
|
severity |
string |
The strength of the rule. Possible values: |
|
create_time |
bigint |
The time when the rule was created. |
|
modify_time |
bigint |
The time when the rule was last modified. |
|
spec |
string |
The specification of the rule. |
|
tags |
array<string> |
The rule's tags. |
|
tenant_id |
bigint |
The ID of the DataWorks tenant. |
|
project_id |
bigint |
The ID of the DataWorks workspace. |
|
meta_entity_id |
string |
The unique identifier of the meta entity in the data map. |
|
pass_count |
int |
The number of times the rule passed. |
|
warn_count |
int |
The number of times the rule triggered a warning. |
|
error_count |
int |
The number of times the rule triggered an error. |
|
fail_count |
int |
The number of times the rule failed. |
|
dt |
string |
The date partition, in |
Data quality scan runs (quality_scan_runs)
partition field: dt
Description: Stores information about each data quality scan run.
|
Parameter |
Type |
Description |
|
id |
bigint |
The primary key. |
|
scan_id |
bigint |
The data quality scan ID. |
|
name |
string |
The name of the scan. |
|
status |
string |
The status of the scan run. Valid values: |
|
post_action_type |
string |
The post-check action. Valid values: |
|
data_filter |
string |
The data range used for sampling. |
|
trigger_time |
bigint |
The scheduled time of the task. |
|
trigger_type |
string |
The trigger type of the data quality scan. Valid values: |
|
create_time |
bigint |
The creation time of the scan run. |
|
modify_time |
bigint |
The time when the scan run was last modified. |
|
datasource_id |
bigint |
The ID of the data source to which the table belongs. |
|
datasource_type |
string |
The type of the data source. |
|
computing_resource_id |
bigint |
The ID of the compute resource. |
|
compute_resource_option |
string |
The compute resource used for the scan run. |
|
spec |
string |
The data quality scan specification. |
|
tenant_id |
bigint |
The ID of the DataWorks tenant. |
|
project_id |
bigint |
The ID of the DataWorks workspace. |
|
owner |
string |
The owner of the data quality scan. |
|
task_id |
bigint |
The ID of the scheduling task. |
|
task_instance_id |
bigint |
The ID of the scheduling task instance. |
|
meta_entity_id |
string |
The unique identifier of the metadata entity. |
|
table_name |
string |
The name of the table. |
|
catalog_name |
string |
The name of the table's data catalog. |
|
schema_name |
string |
The name of the table's schema. |
|
database_name |
string |
The name of the table's database. |
|
cluster_id |
string |
The ID of the table's cluster. |
|
dt |
string |
The date partition, in |
Data quality scan metrics (quality_scans)
partition field: dt
Description: Detailed metrics for data quality monitoring tasks.
|
Parameter |
Type |
Description |
|
id |
bigint |
Unique identifier for the quality scan. |
|
name |
string |
Name of the quality scan. |
|
data_filter_type |
string |
Type of the data filter. Valid values: |
|
data_filter |
string |
Data filter expression. |
|
trigger_type |
string |
Trigger type for data quality monitoring. Valid values: |
|
create_time |
bigint |
Time when the quality scan was created. |
|
modify_time |
bigint |
Time when the quality scan was last updated. |
|
computing_resource_id |
bigint |
ID of the compute engine. |
|
compute_resource_option |
string |
Compute resource for the data quality monitoring task. |
|
spec |
string |
Specification for the data quality monitoring. |
|
related_tasks |
array<bigint> |
Associated scheduling tasks. |
|
tenant_id |
bigint |
ID of the DataWorks tenant. |
|
project_id |
bigint |
ID of the DataWorks workspace. |
|
owner |
string |
Owner of the quality scan. |
|
datasource_id |
string |
ID of the table's data source. |
|
datasource_type |
string |
Type of the data source. |
|
meta_entity_id |
string |
Unique identifier for the meta-entity in the data catalog. |
|
table_name |
string |
Name of the table. |
|
catalog_name |
string |
Name of the table's data catalog. |
|
schema_name |
string |
Name of the table's schema. |
|
database_name |
string |
Name of the table's database. |
|
cluster_id |
string |
ID of the table's cluster. |
|
related_scheduler_task_count |
int |
Number of associated scheduling tasks. |
|
rule_count |
int |
Number of associated rules. |
|
high_severity_rule_count |
int |
Number of associated high-severity rules. |
|
normal_severity_rule_count |
int |
Number of associated normal-severity rules. |
|
enabled_rule_count |
int |
Number of enabled rules. |
|
enabled_high_severity_rule_count |
int |
Number of enabled high-severity rules. |
|
enabled_normal_severity_rule_count |
int |
Number of enabled normal-severity rules. |
|
rule_instance_count |
int |
Number of rule instances run today. |
|
high_severity_rule_instance_count |
int |
Number of high-severity rule instances run today. |
|
normal_severity_rule_instance_count |
int |
Number of normal-severity rule instances run today. |
|
high_severity_rule_instance_pass_count |
int |
Number of high-severity rule instances that passed today. |
|
high_severity_rule_instance_warn_count |
int |
Number of high-severity rule instances with warnings (orange alerts) today. |
|
high_severity_rule_instance_error_count |
int |
Number of high-severity rule instances with errors (red alerts) today. |
|
high_severity_rule_instance_fail_count |
int |
Number of high-severity rule instances that failed today. |
|
normal_severity_rule_instance_pass_count |
int |
Number of normal-severity rule instances that passed today. |
|
normal_severity_rule_instance_warn_count |
int |
Number of normal-severity rule instances with warnings (orange alerts) today. |
|
normal_severity_rule_instance_error_count |
int |
Number of normal-severity rule instances with errors (red alerts) today. |
|
normal_severity_rule_instance_fail_count |
int |
Number of normal-severity rule instances that failed today. |
|
block_task_instance_count |
int |
Number of scheduling tasks blocked today. |
|
alert_rule_count |
int |
Number of configured alert subscriptions. |
|
sms_alert_rule_count |
int |
Number of configured SMS alert subscriptions. |
|
mail_alert_rule_count |
int |
Number of configured email alert subscriptions. |
|
phone_alert_rule_count |
int |
Number of configured phone alert subscriptions. |
|
ding_alert_rule_count |
int |
Number of configured DingTalk alert subscriptions. |
|
feishu_alert_rule_count |
int |
Number of configured Lark alert subscriptions. |
|
weixin_alert_rule_count |
int |
Number of configured WeChat alert subscriptions. |
|
webhook_alert_rule_count |
int |
Number of configured custom webhook alert subscriptions. |
|
alert_times |
int |
Number of alerts triggered today. |
|
sms_alert_times |
int |
Number of SMS alerts triggered today. |
|
mail_alert_times |
int |
Number of email alerts triggered today. |
|
phone_alert_times |
int |
Number of phone alerts triggered today. |
|
ding_alert_times |
int |
Number of DingTalk alerts triggered today. |
|
feishu_alert_times |
int |
Number of Lark alerts triggered today. |
|
weixin_alert_times |
int |
Number of WeChat alerts triggered today. |
|
webhook_alert_times |
int |
Number of custom webhook alerts triggered today. |
|
dt |
string |
Date partition in |
Data quality summary (table_quality_summary)
Partition field: dt
Description: Contains data quality metrics for the table.
|
Parameter |
Type |
Description |
|
meta_entity_id |
string |
The unique identifier for the table's meta entity. |
|
project_id |
bigint |
The ID of the DataWorks workspace. |
|
table_name |
string |
The name of the table. |
|
schema_name |
string |
The name of the table's schema. |
|
database_name |
string |
The name of the table's database. |
|
catalog_name |
string |
The name of the table's data catalog. |
|
datasource_id |
bigint |
The ID of the table's data source. This field is NULL if data quality is not configured. |
|
tenant_id |
bigint |
The ID of the DataWorks tenant. |
|
owner |
string |
The owner of the table. |
|
scan_count |
int |
The number of configured quality monitors. |
|
scheduler_related_scan_count |
int |
The number of quality monitors linked to scheduling. |
|
scan_run_count |
int |
The number of quality monitoring task instances today. |
|
alert_scan_run_count |
int |
The number of quality monitoring task instances that triggered an alert today. |
|
block_task_instance_scan_run_count |
int |
The number of quality monitoring task instances that blocked scheduling tasks today. |
|
rule_count |
int |
The number of configured rules. |
|
enabled_rule_count |
int |
The number of enabled rules. |
|
high_severity_rule_count |
int |
The number of configured high-severity rules. |
|
normal_severity_rule_count |
int |
The number of configured normal-severity rules. |
|
rule_instance_count |
int |
The number of rule instances today. |
|
high_severity_rule_instance_count |
int |
The number of high-severity rule instances today. |
|
normal_severity_rule_instance_count |
int |
The number of normal-severity rule instances today. |
|
high_severity_rule_instance_pass_count |
int |
The number of successful high-severity rule checks today. |
|
high_severity_rule_instance_warn_count |
int |
The number of high-severity rule checks that triggered a warning today. |
|
high_severity_rule_instance_error_count |
int |
The number of high-severity rule checks that triggered an error today. |
|
high_severity_rule_instance_fail_count |
int |
The number of failed high-severity rule checks today. |
|
normal_severity_rule_instance_pass_count |
int |
The number of successful normal-severity rule checks today. |
|
normal_severity_rule_instance_warn_count |
int |
The number of normal-severity rule checks that triggered a warning today. |
|
normal_severity_rule_instance_error_count |
int |
The number of normal-severity rule checks that triggered an error today. |
|
normal_severity_rule_instance_fail_count |
int |
The number of failed normal-severity rule checks today. |
|
dt |
string |
The date partition in YYYYMMDD format. The value can range from 31 days prior to the current date to one day before the current date, matching the range [TODAY-31D, TODAY-1D]. |
Examples metadata
Table metric details (table_metrics_detail)
|
Parameter |
Type |
Description |
|
datasource_type |
string |
The data source type. |
|
datasource_id |
string |
The data source ID. |
|
catalog_name |
string |
The data catalog name. |
|
database_name |
string |
The database name. |
|
schema_name |
string |
The schema name. |
|
table_name |
string |
The table name. |
|
table_uuid |
string |
The UUID of the table, used to access its details page. |
|
meta_entity_id |
string |
The human-readable ID of the table. |
|
content_size |
bigint |
The collected storage size. This value is |
|
daily_rate_cs |
decimal(16,6) |
The day-over-day change rate of the storage size. |
|
avg_content_size_7d |
bigint |
The 7-day average storage size. |
|
daily_rate_acs_7d |
decimal(16,6) |
The day-over-day change rate of the 7-day average storage size. |
|
latest_data_update_time_31d |
bigint |
The timestamp of the most recent data update within the last 31 days. This time is derived from the end time of the corresponding downstream instance in the data lineage and represents the maximum |
|
latest_data_update_task_id |
bigint |
The ID of the scheduling task that most recently updated the table within the last 31 days. |
|
latest_data_update_instance_id |
bigint |
The ID of the scheduling task instance that most recently updated the table within the last 31 days. |
|
latest_data_update_time_by_task |
bigint |
The end time of the scheduling task instance that most recently updated the table within the last 31 days. |
|
writing_task_ids |
array<bigint> |
A unique list of scheduling task IDs that wrote to the table on the current business date. |
|
writing_task_ids_31d |
array<bigint> |
A unique list of scheduling task IDs that wrote to the table within the last 31 days. |
|
latest_data_access_time_31d |
bigint |
The timestamp of the most recent data access within the last 31 days. This time is derived from the end time of the corresponding upstream instance in the data lineage and represents the maximum |
|
latest_data_access_task_id |
bigint |
The ID of the scheduling task that most recently read from the table within the last 31 days. |
|
latest_data_access_instance_id |
bigint |
The ID of the scheduling task instance that most recently read from the table within the last 31 days. |
|
latest_data_access_time_by_task |
bigint |
The end time of the scheduling task instance that most recently read from the table within the last 31 days. |
|
reading_task_ids |
array<string> |
A unique list of scheduling task IDs that read from the table on the current business date. |
|
reading_task_ids_31d |
array<string> |
A unique list of scheduling task IDs that read from the table within the last 31 days. |
|
direct_downstream_tables |
array<string> |
A list of direct downstream table UUIDs. |
|
direct_upstream_tables |
array<string> |
A list of direct upstream table UUIDs. |
|
dt |
string |
The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D]. |
Table metric summary (table_metrics_summary)
|
Parameter |
Type |
Description |
|
table_count |
bigint |
The number of tables. |
|
daily_rate_tc |
decimal(16,6) |
The day-over-day change rate of the table count. |
|
avg_table_count_7d |
bigint |
The 7-day average table count. |
|
daily_rate_atc_7d |
decimal(16,6) |
The day-over-day change rate of the 7-day average table count. |
|
content_size |
bigint |
The collected storage size. This value is |
|
daily_rate_cs |
decimal(16,6) |
The day-over-day change rate of the storage size. |
|
avg_content_size_7d |
bigint |
The 7-day average storage size. |
|
daily_rate_acs_7d |
decimal(16,6) |
The day-over-day change rate of the 7-day average storage size. |
|
updated_table_count |
bigint |
The number of tables updated within the last 31 days. |
|
daily_rate_utc |
decimal(16,6) |
The day-over-day change rate of the number of tables updated within the last 31 days. |
|
avg_updated_table_count_7d |
bigint |
The 7-day average number of tables updated within the last 31 days. |
|
daily_rate_autc_7d |
decimal(16,6) |
The day-over-day change rate of the 7-day average number of tables updated within the last 31 days. |
|
accessed_table_count |
bigint |
The number of tables read from within the last 31 days. |
|
daily_rate_atc |
decimal(16,6) |
The day-over-day change rate of the number of tables read from within the last 31 days. |
|
avg_accessed_table_count_7d |
bigint |
The 7-day average number of tables read from within the last 31 days. |
|
daily_rate_aatc_7d |
decimal(16,6) |
The day-over-day change rate of the 7-day average number of tables read from within the last 31 days. |
|
dt |
string |
The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D]. |
Task metric details (task_metrics_detail)
|
Parameter |
Type |
Description |
|
task_id |
bigint |
The scheduling task ID. |
|
workflow_id |
bigint |
The workflow ID. |
|
node_type |
bigint |
The node type. |
|
project_id |
bigint |
The workspace ID. |
|
week_number |
bigint |
The week number of the business date in the year. |
|
task_owner |
string |
The owner ID. |
|
compute_resource_type |
string |
The compute resource type. |
|
compute_resource_id |
string |
The ID of the compute resource, such as a MaxCompute project name, an E-MapReduce (EMR) cluster ID, or a Hologres instance ID. |
|
datasource_name |
string |
The data source name. |
|
inst_success_count |
bigint |
The number of successful instances. |
|
inst_failed_count |
bigint |
The number of failed instances. |
|
inst_running_count |
bigint |
The number of running instances. |
|
inst_abnormal_count |
bigint |
The number of abnormal instances. |
|
inst_not_started_count |
bigint |
The number of instances that have not started. |
|
inst_runtime_cu |
double |
The total CUs consumed by the task's instances on the business date. |
|
task_avg_cu_31d |
double |
The average daily CU consumption of the task over the last 31 days. |
|
dt |
string |
The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D]. |
Task metric summary (task_metrics_summary)
|
Parameter |
Type |
Description |
|
node_type |
bigint |
The node type. |
|
inst_status |
string |
The instance status. |
|
inst_count |
bigint |
The number of instances. |
|
avg_inst_count_7d |
double |
The 7-day average instance count. |
|
granularity |
string |
The statistical granularity. Valid values: |
|
dt |
string |
The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D]. |