Open data table schema

更新时间:
复制 MD 格式

DataWorks open data provides multi-dimensional tables and views for collecting metadata. This topic lists the available tables and views and describes their schemas.

Metadata

DataWorks generates metadata tables and sample metric tables from the metadata of resources in your tenant, such as tables, tasks, instances, workspaces, members, and projects. The schema of these tables is dynamic, and the schema displayed in the UI is authoritative.

Asset table issues (asset_table_issues)

Partition field: dt

Description: Details of data governance issues in the table.

Parameter

Type

Description

tenant_id

string

The ID of the DataWorks tenant.

meta_entity_id

string

The ID of the corresponding metadata entity.

uuid

string

The unique key of the table.

meta_entity_type

string

The type of the corresponding metadata entity. For example, maxcompute-table.

entity_type

string

The entity type. For example, table, view, and materialized_view.

account_id

string

The main account that owns the asset.

datasource_type

string

The data source type. For example, E-MapReduce and MaxCompute.

datasource_id

string

The name of the engine, which is projectName for MaxCompute, clusterId for E-MapReduce, or databaseName for Hologres.

catalog_name

string

The name of the Data Lake Formation catalog, used when Data Lake Formation is the metadata source.

database_name

string

The name of the database. For E-MapReduce, this is the dbName.

schema_name

string

The name of the schema.

rule_id

string

The ID of the governance rule.

rule_name_zh

string

The Chinese name of the governance rule.

rule_name_en

string

The English name of the governance rule.

category

string

The dimension of the governance rule.

deduct_score_tenant

string

The points deducted at the tenant level. The value is accurate to four decimal places.

deduct_score_owner

string

The points deducted at the owner level. The value is accurate to four decimal places.

cost

string

The amount of wasted resources.

project_id

string

The ID of the DataWorks project.

dt

string

The date partition, a logical partition field, in YYYYMMDD format.

Asset table profiles (asset_table_profiles)

Partition field: dt

Description: Detailed metrics for table assets.

Parameter

Type

Description

tenant_id

bigint

The ID of the source tenant.

meta_entity_id

string

The ID of the corresponding metadata entity.

meta_entity_type

string

The type of the corresponding metadata entity. For example, maxcompute-table.

entity_type

string

The entity type, such as table, view, and materialized_view.

account_id

string

The main account that owns the asset.

datasource_type

string

The data source type, such as E-MapReduce or MaxCompute.

datasource_id

string

The engine name. Examples include projectName for MaxCompute, clusterId for E-MapReduce, and databaseName for Hologres.

catalog_name

string

The Data Lake Formation (DLF) catalog name. This field is populated only when DLF is the metadata source.

database_name

string

The name of the database. For E-MapReduce, this corresponds to dbName.

schema_name

string

The name of the schema.

uuid

string

The table's unique key.

name

string

The table's name.

owner

string

The asset's owner.

last_access_timestamp

bigint

The table's last access timestamp.

meta_modified_timestamp

bigint

The 13-digit UNIX timestamp indicating when the table's metadata was last modified.

data_modified_timestamp

bigint

The 13-digit UNIX timestamp indicating when the table's data was last modified.

create_timestamp

bigint

The table's creation timestamp.

comment

string

The table's comment.

partition_keys

string

The partition keys for the table.

tags

string

The asset's tags.

governance_rule_finding_count

bigint

The number of issues identified by governance rules.

governance_rule_finding_history_count

string

The asset's historical count of governance findings.

governance_health_score

string

The asset's governance health score.

governance_health_level

string

The asset's governance health level, derived from its score.

is_partitioned

bigint

Indicates whether the table is partitioned.

content_size

bigint

The table's logical size.

record_num

bigint

The number of records in the table.

life_cycle

string

The table's lifecycle.

partition_count

bigint

The table's partition count.

view_count_monthly

bigint

The table's view count over the last month.

access_count

bigint

The table's total access count.

upstream_table_count

bigint

The number of upstream tables.

upstream_table_detail

string

Details about the upstream tables.

downstream_table_count

bigint

The number of downstream tables.

downstream_table_detail

string

Details about the downstream tables.

producing_project_ids

string

A list of workspaces that produce the table.

producing_tasks_count

bigint

The number of tasks that produce this table.

producing_tasks_detail

string

Details about the tasks that produce this table.

using_tasks_count

bigint

The number of tasks that use this table.

using_tasks_detail

string

Details about the tasks that use this table.

quality_rule_count

bigint

The number of quality rules for the table.

quality_monitor_count

bigint

The number of quality monitoring metrics for the table.

quality_rule_7_days_failed_count

bigint

The number of failed quality rule checks in the last 7 days.

quality_monitor_7_days_failed_count

bigint

The number of failed quality monitoring metric checks in the last 7 days.

dt

string

The date partition, which serves as a logical partition field. The format is YYYYMMDD.

Asset task issues (asset_task_issues)

Partition field: dt

Description: Details of data governance issues identified in tasks.

Parameter

Type

Description

tenant_id

string

The ID of the DataWorks tenant.

node_id

string

The ID of the scheduling node.

node_name

string

The name of the node.

node_type

string

The type of the node. Valid values: SQL, SQLCost, LOT, and CUPID.

node_owner

string

The base ID of the node owner.

priority

string

The priority of the node.

rule_id

string

The ID of the governance rule.

rule_name_zh

string

The Chinese name of the governance rule.

rule_name_en

string

The English name of the governance rule.

category

string

The rule's governance domain.

deduct_score_tenant

string

The score deduction for the tenant, accurate to four decimal places.

deduct_score_owner

string

The score deduction for the owner, accurate to four decimal places.

cost

string

The benefit gained by resolving the issue, typically measured as a cost saving.

project_id

string

The ID of the DataWorks project.

dt

string

The logical date partition, in YYYYMMDD format.

Asset task profiles (asset_task_profiles)

Partition field: dt

Description: Detailed metrics for asset tasks.

Parameter

Type

Description

tenant_id

bigint

The ID of the tenant.

data_asset_id

string

The ID of the asset within the module, corresponding to task.id.

name

string

The asset name, corresponding to task.name.

project_id

bigint

The ID of the workspace where the asset is located.

project_env

string

The environment. Valid values: PROD for production and DEV for development.

owner

string

The owner of the asset.

create_user

string

The user who created the asset.

create_time

bigint

The time when the asset was created.

modify_user

string

The user who last modified the asset.

modify_time

bigint

The time when the asset was last modified.

trigger_type

string

The trigger type. Valid values: Scheduler for a scheduled trigger and Manual for a manual trigger.

trigger_recurrence_type

string

The scheduling state. Valid values: Normal (runs as scheduled), Manual (manual task), Pause (paused), and Skip (skipped).

trigger_cron

string

The cron expression.

type

bigint

The type of code executed by the node. For a list of node type codes, see the DataStudio documentation at https://help.aliyun.com/zh/dataworks/user-guide/node-development-of-data-studio.

script_parameters

string

The script parameters.

priority

bigint

The priority of the task. Valid values range from 1 (lowest) to 8 (highest). The default is 1.

trigger_start_time

bigint

The start date for scheduling.

trigger_end_time

bigint

The end date for scheduling.

runtime_resource_group_id

bigint

The ID of the resource group to which the node belongs.

runtime_cu

string

The compute units (CUs).

baseline_id

bigint

The ID of the baseline to which the node belongs.

rerun_times

bigint

The maximum number of times the task can be rerun.

rerun_interval

bigint

The rerun interval, in milliseconds.

rerun_mode_type

string

Specifies when the task can be rerun. Valid values: AllAllowed (can be rerun on success or failure), FailureAllowed (can be rerun only on failure), and AllDenied (cannot be rerun).

tags

string

Tags associated with the asset.

tags_count

bigint

The number of tags associated with the asset.

input_table_count

bigint

The number of input tables.

output_table_count

bigint

The number of output tables.

input_table_detail

string

Details of the input tables.

output_table_detail

string

Details of the output tables.

upstream_node_count

bigint

The number of upstream nodes.

downstream_node_count

bigint

The number of downstream nodes.

governance_rule_finding_count

bigint

The number of issues identified by governance rules.

governance_rule_finding_history_count

string

Historical count of governance issues for the asset.

governance_health_score

string

The health score of the asset.

governance_health_level

string

The health level of the asset, based on its score.

engine_datasource_id

string

The ID of the compute engine.

engine_instance_count

bigint

The number of compute engine instances.

engine_instance_run_time

bigint

The total runtime of compute engine instances.

engine_instance_comput_volume_cost

string

The volume of computation.

engine_instance_cu_cost

string

The compute units (CUs) consumed.

engine_instance_cpu_cost

string

The CPU consumption.

engine_instance_mem_cost

string

The memory consumption.

engine_instance_exist_data_skew

bigint

Indicates whether data skew exists.

engine_instance_suggestions

string

Suggestions for addressing data skew.

engine_instance_data_skew_ids

string

The IDs of instances with data skew.

engine_instance_ids

string

The IDs of the engine instances.

task_instance_wait_time_cost_sum

bigint

Total instance wait time, in milliseconds.

task_instance_wait_time_cost_max

bigint

Maximum instance wait time, in milliseconds.

task_instance_run_time_cost_sum

bigint

Total instance runtime, in milliseconds.

task_instance_run_time_cost_max

bigint

Maximum runtime for a single instance, in milliseconds.

task_instance_7_days_wait_time_cost_max

bigint

Maximum instance wait time over the last seven days, in milliseconds.

task_instance_7_days_run_time_cost_max

bigint

Maximum instance runtime over the last seven days, in milliseconds.

task_instance_count

bigint

The number of instances.

task_instance_7_days_failed_count

bigint

The number of failed instances over the last seven days.

task_instance_7_days_failed_day_count

bigint

The count of days with failures over the past seven days.

task_instance_7_days_frezeed_day_count

bigint

The count of days the task was frozen over the past seven days.

task_instance_7_days_dry_run_day_count

bigint

The count of days the task was skipped over the past seven days.

quality_monitor_count

bigint

The number of data quality monitoring metrics.

quality_monitor_7_days_failed_count

bigint

The number of failed data quality monitoring metrics over the past seven days.

di_task_resource_group_id

string

The ID of the data integration resource group to which the node belongs.

di_task_is_public_network

bigint

Indicates whether the data integration task uses public network traffic.

di_task_concurrency

bigint

The concurrency level for the data integration task.

di_task_total_records

bigint

The total number of synchronized records.

di_task_total_bytes

bigint

The total volume of synchronized data, in bytes.

di_task_source_type

string

The type of the data source.

di_task_target_type

string

The type of the data target.

di_task_run_time_cost

bigint

Runtime of the data integration task, in milliseconds.

di_task_wait_time_cost

bigint

Wait time of the data integration task, in milliseconds.

dt

string

The date partition for the record, in YYYYMMDD format.

Data catalogs (catalogs)

Parameter

Type

Description

datasource_type

string

The data source type, such as DLF and StarRocks.

datasource_id

string

The data source ID, such as a StarRocks cluster ID or the main account ID for DLF.

name

string

The data catalog name.

type

string

The data catalog type, such as Hive or JDBC.

comment

string

The data catalog comment.

location

string

The location of the data catalog.

properties

string

Configuration properties, specified as a JSON string.

owner

string

The owner of the data catalog. This value can be an Alibaba Cloud account UID or a database account, depending on the data source type.

create_timestamp

bigint

The time the data catalog was created, represented as a 13-digit timestamp (milliseconds).

update_timestamp

bigint

The time the data catalog was last updated, represented as a 13-digit timestamp (milliseconds).

meta_entity_id

string

A unique, API-friendly identifier for the data catalog that complies with the metadata entity ID specification.

dt

string

The date partition, a logical partition field, in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Columns

Parameter

Type

Description

datasource_type

string

The type of the data source, such as DLF and StarRocks.

datasource_id

string

The ID of the data source, such as a StarRocks cluster ID, the main account ID for DLF or MaxCompute, or an RDS instance ID.

catalog_name

string

The name of the data catalog. This field is populated only if the data source type supports data catalogs.

database_name

string

The name of the database.

schema_name

string

The name of the schema. This field is populated only if the data source type supports schemas.

table_name

string

The name of the table.

name

string

The name of the column.

type

string

The data type of the column.

comment

string

The comment for the column.

ordinal_position

bigint

The 1-based ordinal position of the column in the table.

is_primary_key

boolean

Indicates whether the column is part of the primary key.

is_nullable

boolean

Indicates whether the column allows NULL values.

is_partition_key

boolean

Indicates whether the column is a partition key.

properties

string

A JSON string of properties and parameters.

business_description

string

The business description of the column.

meta_entity_id

string

The unique identifier for the column. It is API-friendly and complies with the metadata entity ID specification.

dt

string

The date partition (a logical partition column) in YYYYMMDD format. Valid range: [TODAY-31D, TODAY-1D].

Databases

Parameter

Type

Description

datasource_type

string

The type of the data source. Examples include dlf, starrocks, maxcompute, holodb, and mysql.

datasource_id

string

The data source ID. For example, a StarRocks cluster ID, the primary account ID for Data Lake Formation (DLF) or MaxCompute, or an RDS instance ID.

catalog_name

string

The catalog name. This field is populated only if the data source type supports catalogs.

name

string

The database name.

type

string

The database type.

comment

string

The database comment.

location

string

The database path.

properties

string

Properties and parameters (JSON string).

owner

string

The owner of the database. The value is an Alibaba Cloud account UID or a database system account, depending on the data source type.

is_external

boolean

Indicates whether the database is an external database.

create_timestamp

bigint

A 13-digit timestamp indicating the creation time.

update_timestamp

bigint

A 13-digit timestamp indicating the last update time.

meta_entity_id

string

The unique identifier for the database. This ID conforms to the metadata entity ID specification.

dt

string

The date partition (a logical partition field) in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Table and column-level data lineage (lineages)

Parameter

Type

Description

source_meta_entity_id

string

The unique identifier for the source. This identifier is API-friendly and conforms to the metadata entity ID specification.

source_raw_entity_type

string

The entity type of the source. This field is used for identification if the entity's metadata is unmanaged and the source_meta_entity_type field is empty.

source_uuid

string

A unique, UI-friendly identifier for the source used for page access.

target_meta_entity_id

string

The unique identifier for the target. This identifier is API-friendly and conforms to the metadata entity ID specification.

target_raw_entity_type

string

The entity type of the target. This field is used for identification if the entity's metadata is unmanaged and the target_meta_entity_type field is empty.

target_uuid

string

A unique, UI-friendly identifier for the target used for page access.

compute_engine

string

The compute engine, such as MaxCompute, DataX, or Hologres.

transform_type

string

The type of transformation task performed by the compute engine. Examples: SQL, DATAX, DATAX_STREAM, EXTERNAL_TABLE_MAPPING, STORAGE_MAPPING, and API_MAPPING.

task_id

bigint

The ID of the DataWorks scheduled task. Refer to the tasks table. This field is empty if a DataWorks scheduled task did not generate the data lineage.

task_instance_id

bigint

The ID of the DataWorks scheduled task instance. Refer to the task_instances table. This field is empty if a DataWorks scheduled task did not generate the data lineage.

lineage_time

bigint

The timestamp, in milliseconds, when the data lineage was generated.

granularity

string

The level of the data lineage, such as TABLE and COLUMN.

dt

string

The date partition (a logical partition field), in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D].

Partitions

Parameter

Type

Description

datasource_type

string

The data source type, such as MaxCompute, DLF, and StarRocks.

datasource_id

string

The data source ID, such as a StarRocks cluster ID, the main account ID for DLF or MaxCompute, or an RDS instance ID.

catalog_name

string

The name of the data catalog. This field is populated when the data source type supports data catalogs.

database_name

string

The name of the database.

schema_name

string

The name of the schema. This field is populated when the data source type supports schemas.

table_name

string

The name of the table.

name

string

The partition name (partition specification).

create_timestamp

bigint

The 13-digit creation timestamp.

update_timestamp

bigint

The 13-digit update timestamp.

content_size

bigint

The partition size, in bytes.

properties

string

A JSON string of properties and parameters.

dt

string

The date partition (a logical partition field) in YYYYMMDD format. The valid value range is [TODAY-31D, TODAY-1D].

Resource groups

Parameter

Type

Description

resource_group_id

bigint

The ID of the resource group.

resource_group_identifier

string

The identifier of the resource group.

resource_group_type

bigint

The type of the resource group. Valid values: 1 (scheduling resource group), 2 (MaxCompute resource group), and 4 (data integration resource group).

resource_group_mode

bigint

The mode of the resource group. Valid values: 1 (prepaid), 2 (pay-as-you-go), and 3 (developer edition, MaxCompute only).

resource_group_status

bigint

The status of the resource group. Valid values: 0 (Normal), 1 (Frozen), 2 (Deleted), 3 (Creating), 4 (Creation Failed), 5 (Updating), 6 (Update Failed), 7 (Deleting), and 8 (Deletion Failed).

is_exclusive_resource_group

boolean

Specifies whether this is an exclusive resource group.

dt

string

The date partition, a logical partition field. Format: YYYYMMDD. Value range: [TODAY-31D, TODAY-1D].

Schemas

Parameter

Type

Description

datasource_type

string

The data source type. Examples: holodb, MaxCompute, and PostgreSQL.

datasource_id

string

The data source ID, such as an RDS instance ID or the account ID for MaxCompute.

catalog_name

string

The data catalog name. This field is populated only if the data source type supports data catalogs.

database_name

string

The database name.

name

string

The schema name.

type

string

The schema type.

comment

string

A comment about the schema.

properties

string

Properties and parameters, in JSON string format.

owner

string

The schema owner. The value can be an Alibaba Cloud account UID or a database account, depending on the data source type.

create_timestamp

bigint

The creation time, represented as a 13-digit UNIX timestamp.

update_timestamp

bigint

The last update time, represented as a 13-digit UNIX timestamp.

meta_entity_id

string

A unique identifier for the schema, which is API-friendly and compliant with the meta entity ID specification.

dt

string

The date partition (a logical partition field) in YYYYMMDD format. The value range is [TODAY-31D, TODAY-1D].

Tables

Parameter

Type

Description

datasource_type

string

The data source type, such as Data Lake Formation, StarRocks, MaxCompute, Hologres, or MySQL.

datasource_id

string

The data source ID. This value is the cluster ID for a StarRocks cluster, the main account ID for Data Lake Formation or MaxCompute, or the instance ID for an RDS instance.

catalog_name

string

The name of the data catalog. This field applies only to data source types that support data catalogs.

database_name

string

The name of the database.

schema_name

string

The name of the schema. This field applies only to data source types that support schemas.

name

string

The name of the table.

type

string

The type of the table.

comment

string

The comment for the table.

partition_keys

string

The partition keys. For multi-level partitioning, keys are separated by commas (,).

location

string

The storage path for the table.

properties

string

A JSON string of properties and parameters. For a view, this field contains the view's DDL definition.

owner

string

The table owner. The value can be an Alibaba Cloud account ID or a database system account, depending on the data source type.

content_size

bigint

The storage size, in bytes.

data_retention

map<string,string>

The data retention period or lifecycle. The value varies by table type. For MaxCompute tables, the key is lifecycle and the value is the table's lifecycle, such as 365. For Data Lake Formation tables, the key is retention and the value is the table's lifecycle, such as 91. This field is not supported for other table types.

is_compressed

boolean

Indicates whether the table is compressed.

is_temporary

boolean

Indicates whether the table is a temporary table.

entity_type

string

The type of the entity, such as table, view, or materialized_view.

input_format

string

The input format.

output_format

string

The output format.

serde_parameters

string

The SerDe parameters.

serialization_lib

string

The serialization library.

create_timestamp

bigint

A 13-digit UNIX timestamp indicating when the table was created.

meta_modified_timestamp

bigint

A 13-digit UNIX timestamp indicating when the table metadata was last modified.

data_modified_timestamp

bigint

A 13-digit UNIX timestamp indicating when the table data was last modified.

last_access_timestamp

bigint

A 13-digit UNIX timestamp indicating when the table was last accessed.

business_description

string

The business description or Chinese name.

meta_entity_id

string

The unique identifier for the table. This ID is designed for API use and conforms to the metadata entity ID specification.

Examples:

  • maxcompute-table: [main account ID]::[project_name]:[schema_name]:[table_name]

  • holo-table: [Hologres instance ID]::[sample_database]:[public_schema]:[table_name]

  • starrocks-table: [cluster ID]:[default_catalog]:[sample_database]::[sample_table]

uuid

string

The UUID of the table, used to link to the table details page in the DataWorks data map.

business_tags

array<string>

Business tags. This field contains tags set on the data map page.

wikis

array<struct<version:bigint,operator:string,update_timestamp:bigint,content:string>>

The table wiki. The struct contains the following fields: version (the version number), operator (the user who submitted the entry), update_timestamp (a 13-digit UNIX timestamp indicating when the entry was updated), and content (the content).

producing_tasks

array<bigint>

A list of scheduling task IDs that produce data for this table. For more information, see the tasks table.

dt

string

The date partition (a logical partition field) in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Task and workflow run instances (task_instances)

Parameter

Type

Description

id

bigint

The task instance ID.

node_id

bigint

The task ID. References the tasks table.

node_type

bigint

The task type. For a list of node code values, see Node Development.

node_name

string

The name of the task.

description

string

The description of the task.

workflow_id

bigint

The ID of the workflow. References the workflows table.

workflow_name

string

The name of the workflow.

workflow_instance_id

bigint

The ID of the workflow instance.

workflow_instance_type

bigint

The type of the workflow instance. Valid values: 0 (daily scheduling), 1 (manual task), 2 (smoke test), 3 (backfill), 4 (one-time workflow), 5 (manual workflow).

trigger_type

string

The trigger type (Scheduler/Manual).

trigger_recurrence

string

The run mode. Valid values: 0 (normal), 1 (manual), 2 (paused), 3 (dry run), 4 (referenced).

timeout

bigint

The task execution timeout, in hours.

rerun_mode

string

The rerun configuration. Valid values: 0 (rerunnable on failure), 1 (rerunnable on failure or success), 2 (not rerunnable).

run_number

bigint

The number of runs.

period_number

bigint

The period number.

baseline_id

bigint

The ID of the baseline.

priority

bigint

The task priority (1-8).

script_parameters

string

A list of script parameters for the run.

runtime_resource_group_id

bigint

The resource group ID for the task run.

runtime_resource_group_identifier

string

The resource group identifier for the task run.

runtime_image

string

The runtime image ID.

runtime_cu

string

CUs consumed at runtime.

runtime_process_id

string

The process ID at runtime.

runtime_gateway

string

The gateway used at runtime.

datasource_name

string

The name of the data source.

inputs_variables

array<struct<name:string,type:string,value:string>>

A list of input variables.

outputs

array<struct<output:string,type:string>>

A list of output identifiers.

outputs_variables

array<struct<name:string,type:string,value:string>>

A list of output variables.

tags

array<struct<key:string,value:string>>

A list of task tags.

status

bigint

The task status. Valid values: 1 (not run), 2 (waiting for schedule), 3 (waiting for resources), 4 (running), 5 (failed), 6 (succeeded), 7 (verifying), 8 (pending condition), 9 (waiting for a trigger).

trigger_time

string

The time the task was triggered.

bizdate

string

The business date.

started_time

string

The time the task started.

finished_time

string

The time the task finished.

project_id

bigint

The project ID. References the workspace_id field in the workspaces table.

project_env

string

The environment type (PROD/DEV).

owner

string

The owner's account ID. References the users table.

create_time

string

The creation time.

modify_time

string

The last modification time.

create_user

string

The creator's user ID. References the users table.

modify_user

string

The last modifier's user ID. References the users table.

waiting_resource_time

string

The time spent waiting for resources.

waiting_trigger_time

string

The time spent waiting for a trigger.

dt

string

The logical date partition, in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D].

Task and workflow definitions (tasks)

Parameter

Type

Description

id

bigint

The task ID.

name

string

The task name.

description

string

The task description.

type

bigint

The task type. See node development for node code values.

workflow_id

bigint

The workflow ID.

instance_mode

string

The instance generation mode.

  • T+1: The instance is generated the next day.

  • Immediately: The instance is generated immediately.

baseline_id

bigint

The baseline ID.

priority

bigint

The task priority, ranging from 1 (lowest) to 8 (highest). A higher value indicates a higher priority. The default is 1.

timeout

bigint

The task execution timeout, in hours.

rerun_mode

bigint

The rerun policy for the task. Valid values: 0 (Rerun only on failure), 1 (Rerun on failure or success), and 2 (Never rerun).

rerun_times

bigint

The number of rerun attempts. This setting applies only when the task is configured to allow reruns.

rerun_interval

bigint

The interval between rerun attempts, in seconds.

script_parameters

string

The script parameters for the runtime.

trigger_type

string

The trigger type. Valid values: Scheduler (schedule-based trigger) and Manual (manually triggered).

trigger_recurrence

bigint

The run mode when the task is triggered. Valid values: 0 (Normal run), 1 (Manual task), 2 (Paused), 3 (Dry run), and 4 (Referenced task).

trigger_cron

string

The Cron expression. Applies when trigger_type is Scheduler.

trigger_start_time

string

The start time for the scheduled trigger. Applies when trigger_type is Scheduler.

trigger_end_time

string

The expiration time for the scheduled trigger. Applies when trigger_type is Scheduler.

runtime_resource_group_id

bigint

The ID of the resource group for the task runtime.

runtime_image

string

The image ID for the task runtime.

runtime_cu

string

The CU consumption for the task runtime.

datasource_name

string

The data source name.

inputs_variables

array<struct<name:string,type:string,value:string>>

The input variables.

outputs

array<struct<output:string,type:string>>

The task output identifiers.

outputs_variables

array<struct<name:string,type:string,value:string>>

The output variables.

dependencies

array<struct<type:string,upstream_output:string,upstream_node_id:bigint>>

The dependencies.

related_workflow_id

bigint

The ID of the related workflow.

tags

array<struct<key:string,value:string>>

The task tags.

project_id

bigint

The project ID. See the workspace_id field in the workspaces table.

project_env

string

The environment type. Valid values: PROD (production) and DEV (development).

owner

string

The account ID of the task owner. See the users table.

create_time

string

The creation time.

modify_time

string

The last modification time.

create_user

string

The account ID of the user who created the task. See the users table.

modify_user

string

The account ID of the user who last modified the task. See the users table.

dt

string

The date partition in YYYYMMDD format. Valid range: [TODAY-31D, TODAY-1D].

Users

Parameter

Type

Description

user_id

string

The unique identifier for the user.

user_nick

string

The user's account alias or display name.

dt

string

The logical partition field, representing the date partition in YYYYMMDD format. Valid values: [TODAY-31D, TODAY-1D].

Workspace members

Parameter

Type

Description

workspace_id

bigint

The workspace ID. See the workspaces table.

user_id

string

The user ID. See the users table.

user_status

bigint

The user status. Valid values: 0 (Normal), 1 (Disabled), and 2 (Deleted).

gmt_create_ts

bigint

The creation time, a 13-digit timestamp.

gmt_modified_ts

bigint

The modification time, a 13-digit timestamp.

dt

string

The date partition (a logical partition) in YYYYMMDD format. Value range: [TODAY-31D, TODAY-1D].

Workspaces

Parameter

Type

Description

workspace_id

bigint

The workspace ID.

workspace_name

string

The workspace name.

workspace_identifier

string

The workspace identifier.

workspace_description

string

The workspace description.

workspace_owner

string

The workspace owner ID. See the users table.

workspace_status

bigint

The workspace status. Valid values: 0 (Normal), 1 (Deleted), 2 (Initializing), 3 (Initialization Failed), 4 (Manually Disabled), 5 (Deleting), 6 (Deletion Failed), and 7 (Frozen due to Overdue Payment).

dt

string

The date partition (a logical partition field). Format: YYYYMMDD. Value range: [TODAY-31D, TODAY-1D].

Data quality rule instances (quality_rule_results)

Partition field: dt

Description: Describes data quality rule instances.

Parameter

Type

Description

id

bigint

The primary key.

scan_run_id

bigint

The ID of the quality monitoring instance.

rule_id

bigint

The ID of the rule.

rule_name

string

The name of the rule.

status

string

The validation result of the rule. Possible values: Pass, Error, Warn, Fail, or Running.

severity

string

The strength of the rule. Possible values: High (strong rule) or Normal (weak rule).

create_time

bigint

The creation time of the instance.

modify_time

bigint

The last modification time of the instance.

spec

string

The specification of the rule instance.

tags

array<string>

The tags for the rule instance.

tenant_id

bigint

The ID of the DataWorks tenant.

project_id

bigint

The ID of the DataWorks workspace.

meta_entity_id

string

The unique identifier for the meta table entity.

dt

string

The date partition, in YYYYMMDD format. Value range: [TODAY-D, TODAY-1D].

Data quality rule metrics (quality_rules)

Partition field: dt

Description: Detailed metrics for each data quality rule.

Parameter

Type

Description

id

bigint

The primary key.

scan_id

bigint

The ID of the quality monitoring instance.

rule_name

string

The name of the rule.

enabled

boolean

Indicates whether the rule is enabled.

severity

string

The strength of the rule. Possible values: High (strong rule) and Normal (weak rule).

create_time

bigint

The time when the rule was created.

modify_time

bigint

The time when the rule was last modified.

spec

string

The specification of the rule.

tags

array<string>

The rule's tags.

tenant_id

bigint

The ID of the DataWorks tenant.

project_id

bigint

The ID of the DataWorks workspace.

meta_entity_id

string

The unique identifier of the meta entity in the data map.

pass_count

int

The number of times the rule passed.

warn_count

int

The number of times the rule triggered a warning.

error_count

int

The number of times the rule triggered an error.

fail_count

int

The number of times the rule failed.

dt

string

The date partition, in YYYYMMDD format. The value range is [TODAY-D, TODAY-1D].

Data quality scan runs (quality_scan_runs)

partition field: dt

Description: Stores information about each data quality scan run.

Parameter

Type

Description

id

bigint

The primary key.

scan_id

bigint

The data quality scan ID.

name

string

The name of the scan.

status

string

The status of the scan run. Valid values: Pass, Warn, Error, Fail, and Running.

post_action_type

string

The post-check action. Valid values: Alert and BlockTaskInstance.

data_filter

string

The data range used for sampling.

trigger_time

bigint

The scheduled time of the task.

trigger_type

string

The trigger type of the data quality scan. Valid values: ByManual, BySchedule, and ByQualityNode.

create_time

bigint

The creation time of the scan run.

modify_time

bigint

The time when the scan run was last modified.

datasource_id

bigint

The ID of the data source to which the table belongs.

datasource_type

string

The type of the data source.

computing_resource_id

bigint

The ID of the compute resource.

compute_resource_option

string

The compute resource used for the scan run.

spec

string

The data quality scan specification.

tenant_id

bigint

The ID of the DataWorks tenant.

project_id

bigint

The ID of the DataWorks workspace.

owner

string

The owner of the data quality scan.

task_id

bigint

The ID of the scheduling task.

task_instance_id

bigint

The ID of the scheduling task instance.

meta_entity_id

string

The unique identifier of the metadata entity.

table_name

string

The name of the table.

catalog_name

string

The name of the table's data catalog.

schema_name

string

The name of the table's schema.

database_name

string

The name of the table's database.

cluster_id

string

The ID of the table's cluster.

dt

string

The date partition, in YYYYMMDD format. The value range is [TODAY-D, TODAY-1D].

Data quality scan metrics (quality_scans)

partition field: dt

Description: Detailed metrics for data quality monitoring tasks.

Parameter

Type

Description

id

bigint

Unique identifier for the quality scan.

name

string

Name of the quality scan.

data_filter_type

string

Type of the data filter. Valid values: ByPartition and ByWhere.

data_filter

string

Data filter expression.

trigger_type

string

Trigger type for data quality monitoring. Valid values: ByManual, BySchedule, and ByQualityNode.

create_time

bigint

Time when the quality scan was created.

modify_time

bigint

Time when the quality scan was last updated.

computing_resource_id

bigint

ID of the compute engine.

compute_resource_option

string

Compute resource for the data quality monitoring task.

spec

string

Specification for the data quality monitoring.

related_tasks

array<bigint>

Associated scheduling tasks.

tenant_id

bigint

ID of the DataWorks tenant.

project_id

bigint

ID of the DataWorks workspace.

owner

string

Owner of the quality scan.

datasource_id

string

ID of the table's data source.

datasource_type

string

Type of the data source.

meta_entity_id

string

Unique identifier for the meta-entity in the data catalog.

table_name

string

Name of the table.

catalog_name

string

Name of the table's data catalog.

schema_name

string

Name of the table's schema.

database_name

string

Name of the table's database.

cluster_id

string

ID of the table's cluster.

related_scheduler_task_count

int

Number of associated scheduling tasks.

rule_count

int

Number of associated rules.

high_severity_rule_count

int

Number of associated high-severity rules.

normal_severity_rule_count

int

Number of associated normal-severity rules.

enabled_rule_count

int

Number of enabled rules.

enabled_high_severity_rule_count

int

Number of enabled high-severity rules.

enabled_normal_severity_rule_count

int

Number of enabled normal-severity rules.

rule_instance_count

int

Number of rule instances run today.

high_severity_rule_instance_count

int

Number of high-severity rule instances run today.

normal_severity_rule_instance_count

int

Number of normal-severity rule instances run today.

high_severity_rule_instance_pass_count

int

Number of high-severity rule instances that passed today.

high_severity_rule_instance_warn_count

int

Number of high-severity rule instances with warnings (orange alerts) today.

high_severity_rule_instance_error_count

int

Number of high-severity rule instances with errors (red alerts) today.

high_severity_rule_instance_fail_count

int

Number of high-severity rule instances that failed today.

normal_severity_rule_instance_pass_count

int

Number of normal-severity rule instances that passed today.

normal_severity_rule_instance_warn_count

int

Number of normal-severity rule instances with warnings (orange alerts) today.

normal_severity_rule_instance_error_count

int

Number of normal-severity rule instances with errors (red alerts) today.

normal_severity_rule_instance_fail_count

int

Number of normal-severity rule instances that failed today.

block_task_instance_count

int

Number of scheduling tasks blocked today.

alert_rule_count

int

Number of configured alert subscriptions.

sms_alert_rule_count

int

Number of configured SMS alert subscriptions.

mail_alert_rule_count

int

Number of configured email alert subscriptions.

phone_alert_rule_count

int

Number of configured phone alert subscriptions.

ding_alert_rule_count

int

Number of configured DingTalk alert subscriptions.

feishu_alert_rule_count

int

Number of configured Lark alert subscriptions.

weixin_alert_rule_count

int

Number of configured WeChat alert subscriptions.

webhook_alert_rule_count

int

Number of configured custom webhook alert subscriptions.

alert_times

int

Number of alerts triggered today.

sms_alert_times

int

Number of SMS alerts triggered today.

mail_alert_times

int

Number of email alerts triggered today.

phone_alert_times

int

Number of phone alerts triggered today.

ding_alert_times

int

Number of DingTalk alerts triggered today.

feishu_alert_times

int

Number of Lark alerts triggered today.

weixin_alert_times

int

Number of WeChat alerts triggered today.

webhook_alert_times

int

Number of custom webhook alerts triggered today.

dt

string

Date partition in YYYYMMDD format, with a value range of [TODAY-D, TODAY-1D].

Data quality summary (table_quality_summary)

Partition field: dt

Description: Contains data quality metrics for the table.

Parameter

Type

Description

meta_entity_id

string

The unique identifier for the table's meta entity.

project_id

bigint

The ID of the DataWorks workspace.

table_name

string

The name of the table.

schema_name

string

The name of the table's schema.

database_name

string

The name of the table's database.

catalog_name

string

The name of the table's data catalog.

datasource_id

bigint

The ID of the table's data source. This field is NULL if data quality is not configured.

tenant_id

bigint

The ID of the DataWorks tenant.

owner

string

The owner of the table.

scan_count

int

The number of configured quality monitors.

scheduler_related_scan_count

int

The number of quality monitors linked to scheduling.

scan_run_count

int

The number of quality monitoring task instances today.

alert_scan_run_count

int

The number of quality monitoring task instances that triggered an alert today.

block_task_instance_scan_run_count

int

The number of quality monitoring task instances that blocked scheduling tasks today.

rule_count

int

The number of configured rules.

enabled_rule_count

int

The number of enabled rules.

high_severity_rule_count

int

The number of configured high-severity rules.

normal_severity_rule_count

int

The number of configured normal-severity rules.

rule_instance_count

int

The number of rule instances today.

high_severity_rule_instance_count

int

The number of high-severity rule instances today.

normal_severity_rule_instance_count

int

The number of normal-severity rule instances today.

high_severity_rule_instance_pass_count

int

The number of successful high-severity rule checks today.

high_severity_rule_instance_warn_count

int

The number of high-severity rule checks that triggered a warning today.

high_severity_rule_instance_error_count

int

The number of high-severity rule checks that triggered an error today.

high_severity_rule_instance_fail_count

int

The number of failed high-severity rule checks today.

normal_severity_rule_instance_pass_count

int

The number of successful normal-severity rule checks today.

normal_severity_rule_instance_warn_count

int

The number of normal-severity rule checks that triggered a warning today.

normal_severity_rule_instance_error_count

int

The number of normal-severity rule checks that triggered an error today.

normal_severity_rule_instance_fail_count

int

The number of failed normal-severity rule checks today.

dt

string

The date partition in YYYYMMDD format. The value can range from 31 days prior to the current date to one day before the current date, matching the range [TODAY-31D, TODAY-1D].

Examples metadata

Table metric details (table_metrics_detail)

Parameter

Type

Description

datasource_type

string

The data source type.

datasource_id

string

The data source ID.

catalog_name

string

The data catalog name.

database_name

string

The database name.

schema_name

string

The schema name.

table_name

string

The table name.

table_uuid

string

The UUID of the table, used to access its details page.

meta_entity_id

string

The human-readable ID of the table.

content_size

bigint

The collected storage size. This value is NULL if storage size collection is not supported.

daily_rate_cs

decimal(16,6)

The day-over-day change rate of the storage size.

avg_content_size_7d

bigint

The 7-day average storage size.

daily_rate_acs_7d

decimal(16,6)

The day-over-day change rate of the 7-day average storage size.

latest_data_update_time_31d

bigint

The timestamp of the most recent data update within the last 31 days. This time is derived from the end time of the corresponding downstream instance in the data lineage and represents the maximum data_modified_timestamp. Returns NULL if no updates occurred during this period.

latest_data_update_task_id

bigint

The ID of the scheduling task that most recently updated the table within the last 31 days.

latest_data_update_instance_id

bigint

The ID of the scheduling task instance that most recently updated the table within the last 31 days.

latest_data_update_time_by_task

bigint

The end time of the scheduling task instance that most recently updated the table within the last 31 days.

writing_task_ids

array<bigint>

A unique list of scheduling task IDs that wrote to the table on the current business date.

writing_task_ids_31d

array<bigint>

A unique list of scheduling task IDs that wrote to the table within the last 31 days.

latest_data_access_time_31d

bigint

The timestamp of the most recent data access within the last 31 days. This time is derived from the end time of the corresponding upstream instance in the data lineage and represents the maximum last_access_timestamp. Returns NULL if no access occurred during this period.

latest_data_access_task_id

bigint

The ID of the scheduling task that most recently read from the table within the last 31 days.

latest_data_access_instance_id

bigint

The ID of the scheduling task instance that most recently read from the table within the last 31 days.

latest_data_access_time_by_task

bigint

The end time of the scheduling task instance that most recently read from the table within the last 31 days.

reading_task_ids

array<string>

A unique list of scheduling task IDs that read from the table on the current business date.

reading_task_ids_31d

array<string>

A unique list of scheduling task IDs that read from the table within the last 31 days.

direct_downstream_tables

array<string>

A list of direct downstream table UUIDs.

direct_upstream_tables

array<string>

A list of direct upstream table UUIDs.

dt

string

The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].

Table metric summary (table_metrics_summary)

Parameter

Type

Description

table_count

bigint

The number of tables.

daily_rate_tc

decimal(16,6)

The day-over-day change rate of the table count.

avg_table_count_7d

bigint

The 7-day average table count.

daily_rate_atc_7d

decimal(16,6)

The day-over-day change rate of the 7-day average table count.

content_size

bigint

The collected storage size. This value is NULL if storage size collection is not supported.

daily_rate_cs

decimal(16,6)

The day-over-day change rate of the storage size.

avg_content_size_7d

bigint

The 7-day average storage size.

daily_rate_acs_7d

decimal(16,6)

The day-over-day change rate of the 7-day average storage size.

updated_table_count

bigint

The number of tables updated within the last 31 days.

daily_rate_utc

decimal(16,6)

The day-over-day change rate of the number of tables updated within the last 31 days.

avg_updated_table_count_7d

bigint

The 7-day average number of tables updated within the last 31 days.

daily_rate_autc_7d

decimal(16,6)

The day-over-day change rate of the 7-day average number of tables updated within the last 31 days.

accessed_table_count

bigint

The number of tables read from within the last 31 days.

daily_rate_atc

decimal(16,6)

The day-over-day change rate of the number of tables read from within the last 31 days.

avg_accessed_table_count_7d

bigint

The 7-day average number of tables read from within the last 31 days.

daily_rate_aatc_7d

decimal(16,6)

The day-over-day change rate of the 7-day average number of tables read from within the last 31 days.

dt

string

The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].

Task metric details (task_metrics_detail)

Parameter

Type

Description

task_id

bigint

The scheduling task ID.

workflow_id

bigint

The workflow ID.

node_type

bigint

The node type.

project_id

bigint

The workspace ID.

week_number

bigint

The week number of the business date in the year.

task_owner

string

The owner ID.

compute_resource_type

string

The compute resource type.

compute_resource_id

string

The ID of the compute resource, such as a MaxCompute project name, an E-MapReduce (EMR) cluster ID, or a Hologres instance ID.

datasource_name

string

The data source name.

inst_success_count

bigint

The number of successful instances.

inst_failed_count

bigint

The number of failed instances.

inst_running_count

bigint

The number of running instances.

inst_abnormal_count

bigint

The number of abnormal instances.

inst_not_started_count

bigint

The number of instances that have not started.

inst_runtime_cu

double

The total CUs consumed by the task's instances on the business date.

task_avg_cu_31d

double

The average daily CU consumption of the task over the last 31 days.

dt

string

The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].

Task metric summary (task_metrics_summary)

Parameter

Type

Description

node_type

bigint

The node type.

inst_status

string

The instance status.

inst_count

bigint

The number of instances.

avg_inst_count_7d

double

The 7-day average instance count.

granularity

string

The statistical granularity. Valid values: DAILY and WEEKLY.

dt

string

The date partition, in YYYYMMDD format. Valid values are in the range [TODAY-31D, TODAY-1D].