Details of open data table structures-DataWorks(DataWorks)-阿里云帮助中心

DataWorks open data provides multi-dimensional tables and views for collecting metadata. This topic lists the available tables and views and describes their schemas.

Metadata

DataWorks generates metadata tables and sample metric tables from the metadata of resources in your tenant, such as tables, tasks, instances, workspaces, members, and projects. The schema of these tables is dynamic, and the schema displayed in the UI is authoritative.

Asset table issues (asset_table_issues)

Partition field: dt

Description: Details of data governance issues in the table.

Parameter	Type	Description
tenant_id	string	The ID of the DataWorks tenant.
meta_entity_id	string	The ID of the corresponding metadata entity.
uuid	string	The unique key of the table.
meta_entity_type	string	The type of the corresponding metadata entity. For example, maxcompute-table.
entity_type	string	The entity type. For example, table, view, and materialized_view.
account_id	string	The main account that owns the asset.
datasource_type	string	The data source type. For example, E-MapReduce and MaxCompute.
datasource_id	string	The name of the engine, which is projectName for MaxCompute, clusterId for E-MapReduce, or databaseName for Hologres.
catalog_name	string	The name of the Data Lake Formation catalog, used when Data Lake Formation is the metadata source.
database_name	string	The name of the database. For E-MapReduce, this is the dbName.
schema_name	string	The name of the schema.
rule_id	string	The ID of the governance rule.
rule_name_zh	string	The Chinese name of the governance rule.
rule_name_en	string	The English name of the governance rule.
category	string	The dimension of the governance rule.
deduct_score_tenant	string	The points deducted at the tenant level. The value is accurate to four decimal places.
deduct_score_owner	string	The points deducted at the owner level. The value is accurate to four decimal places.
cost	string	The amount of wasted resources.
project_id	string	The ID of the DataWorks project.
dt	string	The date partition, a logical partition field, in YYYYMMDD format.

Asset table profiles (asset_table_profiles)

Partition field: dt

Description: Detailed metrics for table assets.

Parameter	Type	Description
tenant_id	bigint	The ID of the source tenant.
meta_entity_id	string	The ID of the corresponding metadata entity.
meta_entity_type	string	The type of the corresponding metadata entity. For example, `maxcompute-table`.
entity_type	string	The entity type, such as `table`, `view`, and `materialized_view`.
account_id	string	The main account that owns the asset.
datasource_type	string	The data source type, such as `E-MapReduce` or `MaxCompute`.
datasource_id	string	The engine name. Examples include `projectName` for MaxCompute, `clusterId` for E-MapReduce, and `databaseName` for Hologres.
catalog_name	string	The Data Lake Formation (DLF) catalog name. This field is populated only when DLF is the metadata source.
database_name	string	The name of the database. For E-MapReduce, this corresponds to `dbName`.
schema_name	string	The name of the schema.
uuid	string	The table's unique key.
name	string	The table's name.
owner	string	The asset's owner.
last_access_timestamp	bigint	The table's last access timestamp.
meta_modified_timestamp	bigint	The 13-digit UNIX timestamp indicating when the table's metadata was last modified.
data_modified_timestamp	bigint	The 13-digit UNIX timestamp indicating when the table's data was last modified.
create_timestamp	bigint	The table's creation timestamp.
comment	string	The table's comment.
partition_keys	string	The partition keys for the table.
tags	string	The asset's tags.
governance_rule_finding_count	bigint	The number of issues identified by governance rules.
governance_rule_finding_history_count	string	The asset's historical count of governance findings.
governance_health_score	string	The asset's governance health score.
governance_health_level	string	The asset's governance health level, derived from its score.
is_partitioned	bigint	Indicates whether the table is partitioned.
content_size	bigint	The table's logical size.
record_num	bigint	The number of records in the table.
life_cycle	string	The table's lifecycle.
partition_count	bigint	The table's partition count.
view_count_monthly	bigint	The table's view count over the last month.
access_count	bigint	The table's total access count.
upstream_table_count	bigint	The number of upstream tables.
upstream_table_detail	string	Details about the upstream tables.
downstream_table_count	bigint	The number of downstream tables.
downstream_table_detail	string	Details about the downstream tables.
producing_project_ids	string	A list of workspaces that produce the table.
producing_tasks_count	bigint	The number of tasks that produce this table.
producing_tasks_detail	string	Details about the tasks that produce this table.
using_tasks_count	bigint	The number of tasks that use this table.
using_tasks_detail	string	Details about the tasks that use this table.
quality_rule_count	bigint	The number of quality rules for the table.
quality_monitor_count	bigint	The number of quality monitoring metrics for the table.
quality_rule_7_days_failed_count	bigint	The number of failed quality rule checks in the last 7 days.
quality_monitor_7_days_failed_count	bigint	The number of failed quality monitoring metric checks in the last 7 days.
dt	string	The date partition, which serves as a logical partition field. The format is `YYYYMMDD`.

Asset task issues (asset_task_issues)

Partition field: dt

Description: Details of data governance issues identified in tasks.

Parameter	Type	Description
tenant_id	string	The ID of the DataWorks tenant.
node_id	string	The ID of the scheduling node.
node_name	string	The name of the node.
node_type	string	The type of the node. Valid values: SQL, SQLCost, LOT, and CUPID.
node_owner	string	The base ID of the node owner.
priority	string	The priority of the node.
rule_id	string	The ID of the governance rule.
rule_name_zh	string	The Chinese name of the governance rule.
rule_name_en	string	The English name of the governance rule.
category	string	The rule's governance domain.
deduct_score_tenant	string	The score deduction for the tenant, accurate to four decimal places.
deduct_score_owner	string	The score deduction for the owner, accurate to four decimal places.
cost	string	The benefit gained by resolving the issue, typically measured as a cost saving.
project_id	string	The ID of the DataWorks project.
dt	string	The logical date partition, in YYYYMMDD format.

Asset task profiles (asset_task_profiles)

Partition field: dt

Description: Detailed metrics for asset tasks.

Parameter	Type	Description
tenant_id	bigint	The ID of the tenant.
data_asset_id	string	The ID of the asset within the module, corresponding to `task.id`.
name	string	The asset name, corresponding to `task.name`.
project_id	bigint	The ID of the workspace where the asset is located.
project_env	string	The environment. Valid values: `PROD` for production and `DEV` for development.
owner	string	The owner of the asset.
create_user	string	The user who created the asset.
create_time	bigint	The time when the asset was created.
modify_user	string	The user who last modified the asset.
modify_time	bigint	The time when the asset was last modified.
trigger_type	string	The trigger type. Valid values: `Scheduler` for a scheduled trigger and `Manual` for a manual trigger.
trigger_recurrence_type	string	The scheduling state. Valid values: `Normal` (runs as scheduled), `Manual` (manual task), `Pause` (paused), and `Skip` (skipped).
trigger_cron	string	The cron expression.
type	bigint	The type of code executed by the node. For a list of node type codes, see the DataStudio documentation at https://help.aliyun.com/zh/dataworks/user-guide/node-development-of-data-studio.
script_parameters	string	The script parameters.
priority	bigint	The priority of the task. Valid values range from 1 (lowest) to 8 (highest). The default is 1.
trigger_start_time	bigint	The start date for scheduling.
trigger_end_time	bigint	The end date for scheduling.
runtime_resource_group_id	bigint	The ID of the resource group to which the node belongs.
runtime_cu	string	The compute units (CUs).
baseline_id	bigint	The ID of the baseline to which the node belongs.
rerun_times	bigint	The maximum number of times the task can be rerun.
rerun_interval	bigint	The rerun interval, in milliseconds.
rerun_mode_type	string	Specifies when the task can be rerun. Valid values: `AllAllowed` (can be rerun on success or failure), `FailureAllowed` (can be rerun only on failure), and `AllDenied` (cannot be rerun).
tags	string	Tags associated with the asset.
tags_count	bigint	The number of tags associated with the asset.
input_table_count	bigint	The number of input tables.
output_table_count	bigint	The number of output tables.
input_table_detail	string	Details of the input tables.
output_table_detail	string	Details of the output tables.
upstream_node_count	bigint	The number of upstream nodes.
downstream_node_count	bigint	The number of downstream nodes.
governance_rule_finding_count	bigint	The number of issues identified by governance rules.
governance_rule_finding_history_count	string	Historical count of governance issues for the asset.
governance_health_score	string	The health score of the asset.
governance_health_level	string	The health level of the asset, based on its score.
engine_datasource_id	string	The ID of the compute engine.
engine_instance_count	bigint	The number of compute engine instances.
engine_instance_run_time	bigint	The total runtime of compute engine instances.
engine_instance_comput_volume_cost	string	The volume of computation.
engine_instance_cu_cost	string	The compute units (CUs) consumed.
engine_instance_cpu_cost	string	The CPU consumption.
engine_instance_mem_cost	string	The memory consumption.
engine_instance_exist_data_skew	bigint	Indicates whether data skew exists.
engine_instance_suggestions	string	Suggestions for addressing data skew.
engine_instance_data_skew_ids	string	The IDs of instances with data skew.
engine_instance_ids	string	The IDs of the engine instances.
task_instance_wait_time_cost_sum	bigint	Total instance wait time, in milliseconds.
task_instance_wait_time_cost_max	bigint	Maximum instance wait time, in milliseconds.
task_instance_run_time_cost_sum	bigint	Total instance runtime, in milliseconds.
task_instance_run_time_cost_max	bigint	Maximum runtime for a single instance, in milliseconds.
task_instance_7_days_wait_time_cost_max	bigint	Maximum instance wait time over the last seven days, in milliseconds.
task_instance_7_days_run_time_cost_max	bigint	Maximum instance runtime over the last seven days, in milliseconds.
task_instance_count	bigint	The number of instances.
task_instance_7_days_failed_count	bigint	The number of failed instances over the last seven days.
task_instance_7_days_failed_day_count	bigint	The count of days with failures over the past seven days.
task_instance_7_days_frezeed_day_count	bigint	The count of days the task was frozen over the past seven days.
task_instance_7_days_dry_run_day_count	bigint	The count of days the task was skipped over the past seven days.
quality_monitor_count	bigint	The number of data quality monitoring metrics.
quality_monitor_7_days_failed_count	bigint	The number of failed data quality monitoring metrics over the past seven days.
di_task_resource_group_id	string	The ID of the data integration resource group to which the node belongs.
di_task_is_public_network	bigint	Indicates whether the data integration task uses public network traffic.
di_task_concurrency	bigint	The concurrency level for the data integration task.
di_task_total_records	bigint	The total number of synchronized records.
di_task_total_bytes	bigint	The total volume of synchronized data, in bytes.
di_task_source_type	string	The type of the data source.
di_task_target_type	string	The type of the data target.
di_task_run_time_cost	bigint	Runtime of the data integration task, in milliseconds.
di_task_wait_time_cost	bigint	Wait time of the data integration task, in milliseconds.
dt	string	The date partition for the record, in `YYYYMMDD` format.

Data catalogs (catalogs)

Parameter	Type	Description
datasource_type	string	The data source type, such as DLF and StarRocks.
datasource_id	string	The data source ID, such as a StarRocks cluster ID or the main account ID for DLF.
name	string	The data catalog name.
type	string	The data catalog type, such as Hive or JDBC.
comment	string	The data catalog comment.
location	string	The location of the data catalog.
properties	string	Configuration properties, specified as a JSON string.
owner	string	The owner of the data catalog. This value can be an Alibaba Cloud account UID or a database account, depending on the data source type.
create_timestamp	bigint	The time the data catalog was created, represented as a 13-digit timestamp (milliseconds).
update_timestamp	bigint	The time the data catalog was last updated, represented as a 13-digit timestamp (milliseconds).
meta_entity_id	string	A unique, API-friendly identifier for the data catalog that complies with the metadata entity ID specification.
dt	string	The date partition, a logical partition field, in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Columns

Parameter	Type	Description
datasource_type	string	The type of the data source, such as DLF and StarRocks.
datasource_id	string	The ID of the data source, such as a StarRocks cluster ID, the main account ID for DLF or MaxCompute, or an RDS instance ID.
catalog_name	string	The name of the data catalog. This field is populated only if the data source type supports data catalogs.
database_name	string	The name of the database.
schema_name	string	The name of the schema. This field is populated only if the data source type supports schemas.
table_name	string	The name of the table.
name	string	The name of the column.
type	string	The data type of the column.
comment	string	The comment for the column.
ordinal_position	bigint	The 1-based ordinal position of the column in the table.
is_primary_key	boolean	Indicates whether the column is part of the primary key.
is_nullable	boolean	Indicates whether the column allows NULL values.
is_partition_key	boolean	Indicates whether the column is a partition key.
properties	string	A JSON string of properties and parameters.
business_description	string	The business description of the column.
meta_entity_id	string	The unique identifier for the column. It is API-friendly and complies with the metadata entity ID specification.
dt	string	The date partition (a logical partition column) in YYYYMMDD format. Valid range: [TODAY-31D, TODAY-1D].

Databases

Parameter	Type	Description
datasource_type	string	The type of the data source. Examples include `dlf`, `starrocks`, `maxcompute`, `holodb`, and `mysql`.
datasource_id	string	The data source ID. For example, a StarRocks cluster ID, the primary account ID for Data Lake Formation (DLF) or MaxCompute, or an RDS instance ID.
catalog_name	string	The catalog name. This field is populated only if the data source type supports catalogs.
name	string	The database name.
type	string	The database type.
comment	string	The database comment.
location	string	The database path.
properties	string	Properties and parameters (JSON string).
owner	string	The owner of the database. The value is an Alibaba Cloud account UID or a database system account, depending on the data source type.
is_external	boolean	Indicates whether the database is an external database.
create_timestamp	bigint	A 13-digit timestamp indicating the creation time.
update_timestamp	bigint	A 13-digit timestamp indicating the last update time.
meta_entity_id	string	The unique identifier for the database. This ID conforms to the metadata entity ID specification.
dt	string	The date partition (a logical partition field) in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Table and column-level data lineage (lineages)

Parameter	Type	Description
source_meta_entity_id	string	The unique identifier for the source. This identifier is API-friendly and conforms to the metadata entity ID specification.
source_raw_entity_type	string	The entity type of the source. This field is used for identification if the entity's metadata is unmanaged and the `source_meta_entity_type` field is empty.
source_uuid	string	A unique, UI-friendly identifier for the source used for page access.
target_meta_entity_id	string	The unique identifier for the target. This identifier is API-friendly and conforms to the metadata entity ID specification.
target_raw_entity_type	string	The entity type of the target. This field is used for identification if the entity's metadata is unmanaged and the `target_meta_entity_type` field is empty.
target_uuid	string	A unique, UI-friendly identifier for the target used for page access.
compute_engine	string	The compute engine, such as `MaxCompute`, `DataX`, or `Hologres`.
transform_type	string	The type of transformation task performed by the compute engine. Examples: `SQL`, `DATAX`, `DATAX_STREAM`, `EXTERNAL_TABLE_MAPPING`, `STORAGE_MAPPING`, and `API_MAPPING`.
task_id	bigint	The ID of the DataWorks scheduled task. Refer to the `tasks` table. This field is empty if a DataWorks scheduled task did not generate the data lineage.
task_instance_id	bigint	The ID of the DataWorks scheduled task instance. Refer to the `task_instances` table. This field is empty if a DataWorks scheduled task did not generate the data lineage.
lineage_time	bigint	The timestamp, in milliseconds, when the data lineage was generated.
granularity	string	The level of the data lineage, such as `TABLE` and `COLUMN`.
dt	string	The date partition (a logical partition field), in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D].

Partitions

Parameter	Type	Description
datasource_type	string	The data source type, such as MaxCompute, DLF, and StarRocks.
datasource_id	string	The data source ID, such as a StarRocks cluster ID, the main account ID for DLF or MaxCompute, or an RDS instance ID.
catalog_name	string	The name of the data catalog. This field is populated when the data source type supports data catalogs.
database_name	string	The name of the database.
schema_name	string	The name of the schema. This field is populated when the data source type supports schemas.
table_name	string	The name of the table.
name	string	The partition name (partition specification).
create_timestamp	bigint	The 13-digit creation timestamp.
update_timestamp	bigint	The 13-digit update timestamp.
content_size	bigint	The partition size, in bytes.
properties	string	A JSON string of properties and parameters.
dt	string	The date partition (a logical partition field) in YYYYMMDD format. The valid value range is [TODAY-31D, TODAY-1D].

Resource groups

Parameter	Type	Description
resource_group_id	bigint	The ID of the resource group.
resource_group_identifier	string	The identifier of the resource group.
resource_group_type	bigint	The type of the resource group. Valid values: 1 (scheduling resource group), 2 (MaxCompute resource group), and 4 (data integration resource group).
resource_group_mode	bigint	The mode of the resource group. Valid values: 1 (prepaid), 2 (pay-as-you-go), and 3 (developer edition, MaxCompute only).
resource_group_status	bigint	The status of the resource group. Valid values: 0 (Normal), 1 (Frozen), 2 (Deleted), 3 (Creating), 4 (Creation Failed), 5 (Updating), 6 (Update Failed), 7 (Deleting), and 8 (Deletion Failed).
is_exclusive_resource_group	boolean	Specifies whether this is an exclusive resource group.
dt	string	The date partition, a logical partition field. Format: YYYYMMDD. Value range: [TODAY-31D, TODAY-1D].

Schemas

Parameter	Type	Description
datasource_type	string	The data source type. Examples: holodb, MaxCompute, and PostgreSQL.
datasource_id	string	The data source ID, such as an RDS instance ID or the account ID for MaxCompute.
catalog_name	string	The data catalog name. This field is populated only if the data source type supports data catalogs.
database_name	string	The database name.
name	string	The schema name.
type	string	The schema type.
comment	string	A comment about the schema.
properties	string	Properties and parameters, in JSON string format.
owner	string	The schema owner. The value can be an Alibaba Cloud account UID or a database account, depending on the data source type.
create_timestamp	bigint	The creation time, represented as a 13-digit UNIX timestamp.
update_timestamp	bigint	The last update time, represented as a 13-digit UNIX timestamp.
meta_entity_id	string	A unique identifier for the schema, which is API-friendly and compliant with the meta entity ID specification.
dt	string	The date partition (a logical partition field) in YYYYMMDD format. The value range is [TODAY-31D, TODAY-1D].

Tables

Parameter	Type	Description
datasource_type	string	The data source type, such as Data Lake Formation, StarRocks, MaxCompute, Hologres, or MySQL.
datasource_id	string	The data source ID. This value is the cluster ID for a StarRocks cluster, the main account ID for Data Lake Formation or MaxCompute, or the instance ID for an RDS instance.
catalog_name	string	The name of the data catalog. This field applies only to data source types that support data catalogs.
database_name	string	The name of the database.
schema_name	string	The name of the schema. This field applies only to data source types that support schemas.
name	string	The name of the table.
type	string	The type of the table.
comment	string	The comment for the table.
partition_keys	string	The partition keys. For multi-level partitioning, keys are separated by commas (,).
location	string	The storage path for the table.
properties	string	A JSON string of properties and parameters. For a view, this field contains the view's DDL definition.
owner	string	The table owner. The value can be an Alibaba Cloud account ID or a database system account, depending on the data source type.
content_size	bigint	The storage size, in bytes.
data_retention	map<string,string>	The data retention period or lifecycle. The value varies by table type. For MaxCompute tables, the key is `lifecycle` and the value is the table's lifecycle, such as `365`. For Data Lake Formation tables, the key is `retention` and the value is the table's lifecycle, such as `91`. This field is not supported for other table types.
is_compressed	boolean	Indicates whether the table is compressed.
is_temporary	boolean	Indicates whether the table is a temporary table.
entity_type	string	The type of the entity, such as `table`, `view`, or `materialized_view`.
input_format	string	The input format.
output_format	string	The output format.
serde_parameters	string	The SerDe parameters.
serialization_lib	string	The serialization library.
create_timestamp	bigint	A 13-digit UNIX timestamp indicating when the table was created.
meta_modified_timestamp	bigint	A 13-digit UNIX timestamp indicating when the table metadata was last modified.
data_modified_timestamp	bigint	A 13-digit UNIX timestamp indicating when the table data was last modified.
last_access_timestamp	bigint	A 13-digit UNIX timestamp indicating when the table was last accessed.
business_description	string	The business description or Chinese name.
meta_entity_id	string	The unique identifier for the table. This ID is designed for API use and conforms to the metadata entity ID specification. Examples: `maxcompute-table`: `[main account ID]::[project_name]:[schema_name]:[table_name]` `holo-table`: `[Hologres instance ID]::[sample_database]:[public_schema]:[table_name]` `starrocks-table`: `[cluster ID]:[default_catalog]:[sample_database]::[sample_table]`
uuid	string	The UUID of the table, used to link to the table details page in the DataWorks data map.
business_tags	array<string>	Business tags. This field contains tags set on the data map page.
wikis	array<struct<`version`:bigint,`operator`:string,`update_timestamp`:bigint,`content`:string>>	The table wiki. The struct contains the following fields: `version` (the version number), `operator` (the user who submitted the entry), `update_timestamp` (a 13-digit UNIX timestamp indicating when the entry was updated), and `content` (the content).
producing_tasks	array<bigint>	A list of scheduling task IDs that produce data for this table. For more information, see the `tasks` table.
dt	string	The date partition (a logical partition field) in YYYYMMDD format. Valid values: `[TODAY-31D, TODAY-1D]`.

Task and workflow run instances (task_instances)

Parameter	Type	Description
id	bigint	The task instance ID.
node_id	bigint	The task ID. References the `tasks` table.
node_type	bigint	The task type. For a list of node code values, see Node Development.
node_name	string	The name of the task.
description	string	The description of the task.
workflow_id	bigint	The ID of the workflow. References the `workflows` table.
workflow_name	string	The name of the workflow.
workflow_instance_id	bigint	The ID of the workflow instance.
workflow_instance_type	bigint	The type of the workflow instance. Valid values: 0 (daily scheduling), 1 (manual task), 2 (smoke test), 3 (backfill), 4 (one-time workflow), 5 (manual workflow).
trigger_type	string	The trigger type (Scheduler/Manual).
trigger_recurrence	string	The run mode. Valid values: 0 (normal), 1 (manual), 2 (paused), 3 (dry run), 4 (referenced).
timeout	bigint	The task execution timeout, in hours.
rerun_mode	string	The rerun configuration. Valid values: 0 (rerunnable on failure), 1 (rerunnable on failure or success), 2 (not rerunnable).
run_number	bigint	The number of runs.
period_number	bigint	The period number.
baseline_id	bigint	The ID of the baseline.
priority	bigint	The task priority (1-8).
script_parameters	string	A list of script parameters for the run.
runtime_resource_group_id	bigint	The resource group ID for the task run.
runtime_resource_group_identifier	string	The resource group identifier for the task run.
runtime_image	string	The runtime image ID.
runtime_cu	string	CUs consumed at runtime.
runtime_process_id	string	The process ID at runtime.
runtime_gateway	string	The gateway used at runtime.
datasource_name	string	The name of the data source.
inputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	A list of input variables.
outputs	array<struct<`output`:string,`type`:string>>	A list of output identifiers.
outputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	A list of output variables.
tags	array<struct<`key`:string,`value`:string>>	A list of task tags.
status	bigint	The task status. Valid values: 1 (not run), 2 (waiting for schedule), 3 (waiting for resources), 4 (running), 5 (failed), 6 (succeeded), 7 (verifying), 8 (pending condition), 9 (waiting for a trigger).
trigger_time	string	The time the task was triggered.
bizdate	string	The business date.
started_time	string	The time the task started.
finished_time	string	The time the task finished.
project_id	bigint	The project ID. References the `workspace_id` field in the `workspaces` table.
project_env	string	The environment type (PROD/DEV).
owner	string	The owner's account ID. References the `users` table.
create_time	string	The creation time.
modify_time	string	The last modification time.
create_user	string	The creator's user ID. References the `users` table.
modify_user	string	The last modifier's user ID. References the `users` table.
waiting_resource_time	string	The time spent waiting for resources.
waiting_trigger_time	string	The time spent waiting for a trigger.
dt	string	The logical date partition, in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D].

Task and workflow definitions (tasks)

Parameter	Type	Description
id	bigint	The task ID.
name	string	The task name.
description	string	The task description.
type	bigint	The task type. See node development for node code values.
workflow_id	bigint	The workflow ID.
instance_mode	string	The instance generation mode. `T+1`: The instance is generated the next day. `Immediately`: The instance is generated immediately.
baseline_id	bigint	The baseline ID.
priority	bigint	The task priority, ranging from `1` (lowest) to `8` (highest). A higher value indicates a higher priority. The default is `1`.
timeout	bigint	The task execution timeout, in hours.
rerun_mode	bigint	The rerun policy for the task. Valid values: `0` (Rerun only on failure), `1` (Rerun on failure or success), and `2` (Never rerun).
rerun_times	bigint	The number of rerun attempts. This setting applies only when the task is configured to allow reruns.
rerun_interval	bigint	The interval between rerun attempts, in seconds.
script_parameters	string	The script parameters for the runtime.
trigger_type	string	The trigger type. Valid values: `Scheduler` (schedule-based trigger) and `Manual` (manually triggered).
trigger_recurrence	bigint	The run mode when the task is triggered. Valid values: `0` (Normal run), `1` (Manual task), `2` (Paused), `3` (Dry run), and `4` (Referenced task).
trigger_cron	string	The Cron expression. Applies when `trigger_type` is `Scheduler`.
trigger_start_time	string	The start time for the scheduled trigger. Applies when `trigger_type` is `Scheduler`.
trigger_end_time	string	The expiration time for the scheduled trigger. Applies when `trigger_type` is `Scheduler`.
runtime_resource_group_id	bigint	The ID of the resource group for the task runtime.
runtime_image	string	The image ID for the task runtime.
runtime_cu	string	The CU consumption for the task runtime.
datasource_name	string	The data source name.
inputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	The input variables.
outputs	array<struct<`output`:string,`type`:string>>	The task output identifiers.
outputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	The output variables.
dependencies	array<struct<`type`:string,`upstream_output`:string,`upstream_node_id`:bigint>>	The dependencies.
related_workflow_id	bigint	The ID of the related workflow.
tags	array<struct<`key`:string,`value`:string>>	The task tags.
project_id	bigint	The project ID. See the `workspace_id` field in the `workspaces` table.
project_env	string	The environment type. Valid values: `PROD` (production) and `DEV` (development).
owner	string	The account ID of the task owner. See the `users` table.
create_time	string	The creation time.
modify_time	string	The last modification time.
create_user	string	The account ID of the user who created the task. See the `users` table.
modify_user	string	The account ID of the user who last modified the task. See the `users` table.
dt	string	The date partition in `YYYYMMDD` format. Valid range: [`TODAY-31D`, `TODAY-1D`].

Users

Parameter	Type	Description
user_id	string	The unique identifier for the user.
user_nick	string	The user's account alias or display name.
dt	string	The logical partition field, representing the date partition in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Workspace members

Parameter	Type	Description
workspace_id	bigint	The workspace ID. See the `workspaces` table.
user_id	string	The user ID. See the `users` table.
user_status	bigint	The user status. Valid values: `0` (Normal), `1` (Disabled), and `2` (Deleted).
gmt_create_ts	bigint	The creation time, a 13-digit timestamp.
gmt_modified_ts	bigint	The modification time, a 13-digit timestamp.
dt	string	The date partition (a logical partition) in `YYYYMMDD` format. Value range: `[TODAY-31D, TODAY-1D]`.

Workspaces

Parameter	Type	Description
workspace_id	bigint	The workspace ID.
workspace_name	string	The workspace name.
workspace_identifier	string	The workspace identifier.
workspace_description	string	The workspace description.
workspace_owner	string	The workspace owner ID. See the `users` table.
workspace_status	bigint	The workspace status. Valid values: `0` (Normal), `1` (Deleted), `2` (Initializing), `3` (Initialization Failed), `4` (Manually Disabled), `5` (Deleting), `6` (Deletion Failed), and `7` (Frozen due to Overdue Payment).
dt	string	The date partition (a logical partition field). Format: `YYYYMMDD`. Value range: `[TODAY-31D, TODAY-1D]`.

Data quality rule instances (quality_rule_results)

Partition field: dt

Description: Describes data quality rule instances.

Parameter	Type	Description
id	bigint	The primary key.
scan_run_id	bigint	The ID of the quality monitoring instance.
rule_id	bigint	The ID of the rule.
rule_name	string	The name of the rule.
status	string	The validation result of the rule. Possible values: Pass, Error, Warn, Fail, or Running.
severity	string	The strength of the rule. Possible values: High (strong rule) or Normal (weak rule).
create_time	bigint	The creation time of the instance.
modify_time	bigint	The last modification time of the instance.
spec	string	The specification of the rule instance.
tags	array<string>	The tags for the rule instance.
tenant_id	bigint	The ID of the DataWorks tenant.
project_id	bigint	The ID of the DataWorks workspace.
meta_entity_id	string	The unique identifier for the meta table entity.
dt	string	The date partition, in YYYYMMDD format. Value range: [TODAY-D, TODAY-1D].

Data quality rule metrics (quality_rules)

Partition field: dt

Description: Detailed metrics for each data quality rule.

Parameter	Type	Description
id	bigint	The primary key.
scan_id	bigint	The ID of the quality monitoring instance.
rule_name	string	The name of the rule.
enabled	boolean	Indicates whether the rule is enabled.
severity	string	The strength of the rule. Possible values: `High` (strong rule) and `Normal` (weak rule).
create_time	bigint	The time when the rule was created.
modify_time	bigint	The time when the rule was last modified.
spec	string	The specification of the rule.
tags	array<string>	The rule's tags.
tenant_id	bigint	The ID of the DataWorks tenant.
project_id	bigint	The ID of the DataWorks workspace.
meta_entity_id	string	The unique identifier of the meta entity in the data map.
pass_count	int	The number of times the rule passed.
warn_count	int	The number of times the rule triggered a warning.
error_count	int	The number of times the rule triggered an error.
fail_count	int	The number of times the rule failed.
dt	string	The date partition, in `YYYYMMDD` format. The value range is `[TODAY-D, TODAY-1D]`.

Data quality scan runs (quality_scan_runs)

partition field: dt

Description: Stores information about each data quality scan run.

Parameter	Type	Description
id	bigint	The primary key.
scan_id	bigint	The data quality scan ID.
name	string	The name of the scan.
status	string	The status of the scan run. Valid values: `Pass`, `Warn`, `Error`, `Fail`, and `Running`.
post_action_type	string	The post-check action. Valid values: `Alert` and `BlockTaskInstance`.
data_filter	string	The data range used for sampling.
trigger_time	bigint	The scheduled time of the task.
trigger_type	string	The trigger type of the data quality scan. Valid values: `ByManual`, `BySchedule`, and `ByQualityNode`.
create_time	bigint	The creation time of the scan run.
modify_time	bigint	The time when the scan run was last modified.
datasource_id	bigint	The ID of the data source to which the table belongs.
datasource_type	string	The type of the data source.
computing_resource_id	bigint	The ID of the compute resource.
compute_resource_option	string	The compute resource used for the scan run.
spec	string	The data quality scan specification.
tenant_id	bigint	The ID of the DataWorks tenant.
project_id	bigint	The ID of the DataWorks workspace.
owner	string	The owner of the data quality scan.
task_id	bigint	The ID of the scheduling task.
task_instance_id	bigint	The ID of the scheduling task instance.
meta_entity_id	string	The unique identifier of the metadata entity.
table_name	string	The name of the table.
catalog_name	string	The name of the table's data catalog.
schema_name	string	The name of the table's schema.
database_name	string	The name of the table's database.
cluster_id	string	The ID of the table's cluster.
dt	string	The date partition, in `YYYYMMDD` format. The value range is [`TODAY-D`, `TODAY-1D`].

Data quality scan metrics (quality_scans)

partition field: dt

Description: Detailed metrics for data quality monitoring tasks.

Parameter	Type	Description
id	bigint	Unique identifier for the quality scan.
name	string	Name of the quality scan.
data_filter_type	string	Type of the data filter. Valid values: `ByPartition` and `ByWhere`.
data_filter	string	Data filter expression.
trigger_type	string	Trigger type for data quality monitoring. Valid values: `ByManual`, `BySchedule`, and `ByQualityNode`.
create_time	bigint	Time when the quality scan was created.
modify_time	bigint	Time when the quality scan was last updated.
computing_resource_id	bigint	ID of the compute engine.
compute_resource_option	string	Compute resource for the data quality monitoring task.
spec	string	Specification for the data quality monitoring.
related_tasks	array<bigint>	Associated scheduling tasks.
tenant_id	bigint	ID of the DataWorks tenant.
project_id	bigint	ID of the DataWorks workspace.
owner	string	Owner of the quality scan.
datasource_id	string	ID of the table's data source.
datasource_type	string	Type of the data source.
meta_entity_id	string	Unique identifier for the meta-entity in the data catalog.
table_name	string	Name of the table.
catalog_name	string	Name of the table's data catalog.
schema_name	string	Name of the table's schema.
database_name	string	Name of the table's database.
cluster_id	string	ID of the table's cluster.
related_scheduler_task_count	int	Number of associated scheduling tasks.
rule_count	int	Number of associated rules.
high_severity_rule_count	int	Number of associated high-severity rules.
normal_severity_rule_count	int	Number of associated normal-severity rules.
enabled_rule_count	int	Number of enabled rules.
enabled_high_severity_rule_count	int	Number of enabled high-severity rules.
enabled_normal_severity_rule_count	int	Number of enabled normal-severity rules.
rule_instance_count	int	Number of rule instances run today.
high_severity_rule_instance_count	int	Number of high-severity rule instances run today.
normal_severity_rule_instance_count	int	Number of normal-severity rule instances run today.
high_severity_rule_instance_pass_count	int	Number of high-severity rule instances that passed today.
high_severity_rule_instance_warn_count	int	Number of high-severity rule instances with warnings (orange alerts) today.
high_severity_rule_instance_error_count	int	Number of high-severity rule instances with errors (red alerts) today.
high_severity_rule_instance_fail_count	int	Number of high-severity rule instances that failed today.
normal_severity_rule_instance_pass_count	int	Number of normal-severity rule instances that passed today.
normal_severity_rule_instance_warn_count	int	Number of normal-severity rule instances with warnings (orange alerts) today.
normal_severity_rule_instance_error_count	int	Number of normal-severity rule instances with errors (red alerts) today.
normal_severity_rule_instance_fail_count	int	Number of normal-severity rule instances that failed today.
block_task_instance_count	int	Number of scheduling tasks blocked today.
alert_rule_count	int	Number of configured alert subscriptions.
sms_alert_rule_count	int	Number of configured SMS alert subscriptions.
mail_alert_rule_count	int	Number of configured email alert subscriptions.
phone_alert_rule_count	int	Number of configured phone alert subscriptions.
ding_alert_rule_count	int	Number of configured DingTalk alert subscriptions.
feishu_alert_rule_count	int	Number of configured Lark alert subscriptions.
weixin_alert_rule_count	int	Number of configured WeChat alert subscriptions.
webhook_alert_rule_count	int	Number of configured custom webhook alert subscriptions.
alert_times	int	Number of alerts triggered today.
sms_alert_times	int	Number of SMS alerts triggered today.
mail_alert_times	int	Number of email alerts triggered today.
phone_alert_times	int	Number of phone alerts triggered today.
ding_alert_times	int	Number of DingTalk alerts triggered today.
feishu_alert_times	int	Number of Lark alerts triggered today.
weixin_alert_times	int	Number of WeChat alerts triggered today.
webhook_alert_times	int	Number of custom webhook alerts triggered today.
dt	string	Date partition in `YYYYMMDD` format, with a value range of `[TODAY-D, TODAY-1D]`.

Data quality summary (table_quality_summary)

Partition field: dt

Description: Contains data quality metrics for the table.

Parameter	Type	Description
meta_entity_id	string	The unique identifier for the table's meta entity.
project_id	bigint	The ID of the DataWorks workspace.
table_name	string	The name of the table.
schema_name	string	The name of the table's schema.
database_name	string	The name of the table's database.
catalog_name	string	The name of the table's data catalog.
datasource_id	bigint	The ID of the table's data source. This field is NULL if data quality is not configured.
tenant_id	bigint	The ID of the DataWorks tenant.
owner	string	The owner of the table.
scan_count	int	The number of configured quality monitors.
scheduler_related_scan_count	int	The number of quality monitors linked to scheduling.
scan_run_count	int	The number of quality monitoring task instances today.
alert_scan_run_count	int	The number of quality monitoring task instances that triggered an alert today.
block_task_instance_scan_run_count	int	The number of quality monitoring task instances that blocked scheduling tasks today.
rule_count	int	The number of configured rules.
enabled_rule_count	int	The number of enabled rules.
high_severity_rule_count	int	The number of configured high-severity rules.
normal_severity_rule_count	int	The number of configured normal-severity rules.
rule_instance_count	int	The number of rule instances today.
high_severity_rule_instance_count	int	The number of high-severity rule instances today.
normal_severity_rule_instance_count	int	The number of normal-severity rule instances today.
high_severity_rule_instance_pass_count	int	The number of successful high-severity rule checks today.
high_severity_rule_instance_warn_count	int	The number of high-severity rule checks that triggered a warning today.
high_severity_rule_instance_error_count	int	The number of high-severity rule checks that triggered an error today.
high_severity_rule_instance_fail_count	int	The number of failed high-severity rule checks today.
normal_severity_rule_instance_pass_count	int	The number of successful normal-severity rule checks today.
normal_severity_rule_instance_warn_count	int	The number of normal-severity rule checks that triggered a warning today.
normal_severity_rule_instance_error_count	int	The number of normal-severity rule checks that triggered an error today.
normal_severity_rule_instance_fail_count	int	The number of failed normal-severity rule checks today.
dt	string	The date partition in YYYYMMDD format. The value can range from 31 days prior to the current date to one day before the current date, matching the range [TODAY-31D, TODAY-1D].

Examples metadata

Table metric details (table_metrics_detail)

Parameter	Type	Description
datasource_type	string	The data source type.
datasource_id	string	The data source ID.
catalog_name	string	The data catalog name.
database_name	string	The database name.
schema_name	string	The schema name.
table_name	string	The table name.
table_uuid	string	The UUID of the table, used to access its details page.
meta_entity_id	string	The human-readable ID of the table.
content_size	bigint	The collected storage size. This value is `NULL` if storage size collection is not supported.
daily_rate_cs	decimal(16,6)	The day-over-day change rate of the storage size.
avg_content_size_7d	bigint	The 7-day average storage size.
daily_rate_acs_7d	decimal(16,6)	The day-over-day change rate of the 7-day average storage size.
latest_data_update_time_31d	bigint	The timestamp of the most recent data update within the last 31 days. This time is derived from the end time of the corresponding downstream instance in the data lineage and represents the maximum `data_modified_timestamp`. Returns `NULL` if no updates occurred during this period.
latest_data_update_task_id	bigint	The ID of the scheduling task that most recently updated the table within the last 31 days.
latest_data_update_instance_id	bigint	The ID of the scheduling task instance that most recently updated the table within the last 31 days.
latest_data_update_time_by_task	bigint	The end time of the scheduling task instance that most recently updated the table within the last 31 days.
writing_task_ids	array<bigint>	A unique list of scheduling task IDs that wrote to the table on the current business date.
writing_task_ids_31d	array<bigint>	A unique list of scheduling task IDs that wrote to the table within the last 31 days.
latest_data_access_time_31d	bigint	The timestamp of the most recent data access within the last 31 days. This time is derived from the end time of the corresponding upstream instance in the data lineage and represents the maximum `last_access_timestamp`. Returns `NULL` if no access occurred during this period.
latest_data_access_task_id	bigint	The ID of the scheduling task that most recently read from the table within the last 31 days.
latest_data_access_instance_id	bigint	The ID of the scheduling task instance that most recently read from the table within the last 31 days.
latest_data_access_time_by_task	bigint	The end time of the scheduling task instance that most recently read from the table within the last 31 days.
reading_task_ids	array<string>	A unique list of scheduling task IDs that read from the table on the current business date.
reading_task_ids_31d	array<string>	A unique list of scheduling task IDs that read from the table within the last 31 days.
direct_downstream_tables	array<string>	A list of direct downstream table UUIDs.
direct_upstream_tables	array<string>	A list of direct upstream table UUIDs.
dt	string	The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].

Table metric summary (table_metrics_summary)

Parameter	Type	Description
table_count	bigint	The number of tables.
daily_rate_tc	decimal(16,6)	The day-over-day change rate of the table count.
avg_table_count_7d	bigint	The 7-day average table count.
daily_rate_atc_7d	decimal(16,6)	The day-over-day change rate of the 7-day average table count.
content_size	bigint	The collected storage size. This value is `NULL` if storage size collection is not supported.
daily_rate_cs	decimal(16,6)	The day-over-day change rate of the storage size.
avg_content_size_7d	bigint	The 7-day average storage size.
daily_rate_acs_7d	decimal(16,6)	The day-over-day change rate of the 7-day average storage size.
updated_table_count	bigint	The number of tables updated within the last 31 days.
daily_rate_utc	decimal(16,6)	The day-over-day change rate of the number of tables updated within the last 31 days.
avg_updated_table_count_7d	bigint	The 7-day average number of tables updated within the last 31 days.
daily_rate_autc_7d	decimal(16,6)	The day-over-day change rate of the 7-day average number of tables updated within the last 31 days.
accessed_table_count	bigint	The number of tables read from within the last 31 days.
daily_rate_atc	decimal(16,6)	The day-over-day change rate of the number of tables read from within the last 31 days.
avg_accessed_table_count_7d	bigint	The 7-day average number of tables read from within the last 31 days.
daily_rate_aatc_7d	decimal(16,6)	The day-over-day change rate of the 7-day average number of tables read from within the last 31 days.
dt	string	The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].

Task metric details (task_metrics_detail)

Parameter	Type	Description
task_id	bigint	The scheduling task ID.
workflow_id	bigint	The workflow ID.
node_type	bigint	The node type.
project_id	bigint	The workspace ID.
week_number	bigint	The week number of the business date in the year.
task_owner	string	The owner ID.
compute_resource_type	string	The compute resource type.
compute_resource_id	string	The ID of the compute resource, such as a MaxCompute project name, an E-MapReduce (EMR) cluster ID, or a Hologres instance ID.
datasource_name	string	The data source name.
inst_success_count	bigint	The number of successful instances.
inst_failed_count	bigint	The number of failed instances.
inst_running_count	bigint	The number of running instances.
inst_abnormal_count	bigint	The number of abnormal instances.
inst_not_started_count	bigint	The number of instances that have not started.
inst_runtime_cu	double	The total CUs consumed by the task's instances on the business date.
task_avg_cu_31d	double	The average daily CU consumption of the task over the last 31 days.
dt	string	The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].

Task metric summary (task_metrics_summary)

Parameter	Type	Description
node_type	bigint	The node type.
inst_status	string	The instance status.
inst_count	bigint	The number of instances.
avg_inst_count_7d	double	The 7-day average instance count.
granularity	string	The statistical granularity. Valid values: `DAILY` and `WEEKLY`.
dt	string	The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].