Metadata warehouse sharing model (MaxCompute)
Dataphin provides a metadata sharing model for the MaxCompute computing engine, exposing system metadata for data analysis and secondary development.
Overview
Dataphin exposes system metadata through the metadata sharing model for flexible data analysis and secondary development. For usage instructions, see .
Usage
Each partition of a shared model table stores a full copy of the data. Always use the latest partition to avoid issues with downstream services. Historical partitions may contain incomplete data after major version upgrades.
Changelog
|
Version |
Changes |
|
V6.1 |
Added the |
|
V6.0 |
Added the |
|
V5.5 |
|
|
V5.4 |
|
|
V5.3 |
|
|
V5.2 |
Added the fields |
|
V5.1 |
|
|
V5.0 |
|
|
V4.5 |
|
|
V4.3.1 |
|
|
V4.3 |
|
|
V4.2 |
|
|
V4.1 |
|
|
V4.0 |
Added the |
|
V3.14 |
|
|
V3.13 |
|
|
V3.12 |
|
|
V3.11 |
|
|
V3.10 |
|
|
V3.9 |
|
|
V3.8 |
|
|
V3.7 |
|
|
V3.5.4 |
|
|
V3.3.1-hf3 V2.9.7-hf10 |
Added the |
|
V3.2.4 |
The |
|
V3.2.4 |
All fields related to |
|
V2.9.7 |
Added detailed comments for the |
|
V2.9.5.3 |
Created the document. |
Metadata warehouse catalog
|
Category |
Subcategory |
Table name |
Description |
Associated table |
|
Standardized modeling |
Planning |
Business unit table |
||
|
Data domain table |
||||
|
Statistical period table |
||||
|
Modeling |
Dimension table |
|||
|
Business process table |
||||
|
Atomic metric table |
||||
|
Business qualifier table |
||||
|
Derived metric table |
||||
|
Logical table metadata |
||||
|
Logical table field metadata |
||||
|
Development & O&M |
Physical O&M |
Physical scheduling node |
||
|
DataX physical scheduling node |
||||
|
DLink physical scheduling node |
||||
|
Fact table for physical node scheduling instances |
||||
|
Physical scheduling node dependency table |
||||
|
Physical scheduling instance dependency table |
||||
|
O&M monitoring configuration table |
||||
|
Statistics for integration task node instances |
||||
|
Alert event table |
||||
|
Alert notification table |
||||
|
Logical O&M |
Logical table scheduling node |
|||
|
Logical table scheduling instance |
||||
|
Engine |
MaxCompute job table |
|||
|
OS |
Transactional fact table for table access via SQL (Granularity: execution task ID, input table ID, input environment, output table ID, and output environment) |
|||
|
Development |
Submission record table |
|||
|
Publish record table |
||||
|
Permission |
Permission table |
|||
|
Baseline |
Baseline monitoring object table |
N/A |
||
|
Real-time development |
Real-time task statistics |
N/A |
||
|
Real-time task relationship table |
||||
|
Real-time task submission record table |
||||
|
Common |
User |
Tenant user table |
||
|
Project user table |
||||
|
User binding table |
||||
|
Project |
Project space table |
|||
|
Project space to compute engine binding table |
||||
|
Data source |
Data source table |
|||
|
Data source configuration table |
||||
|
Data assets |
Physical table |
Physical table |
||
|
Field table for physical tables |
||||
|
Physical table partition table |
||||
|
Data service |
Published data service unit table |
|||
|
Published data service API table |
||||
|
Published direct-connection data source API table |
||||
|
App table |
||||
|
Published service unit field table |
||||
|
App-API mapping table |
||||
|
Minute-level summary of data service API calls |
||||
|
Published API parameter table |
||||
|
Data lineage |
Fine-grained data lineage (Granularity: node, input table, input field, output table, output field) |
|||
|
Fine-grained data lineage (Granularity: node, input table, output table) |
||||
|
Fine-grained data lineage (Granularity: input table, output table) |
||||
|
Security |
Security label table |
|||
|
Security data classification table |
||||
|
Data security level table |
||||
|
Security identification rule table |
||||
|
Data masking rule table |
||||
|
Data masking rule whitelist table |
||||
|
Data quality |
Data quality monitoring object table |
|||
|
Data quality rule table |
||||
|
Data quality rule task table |
||||
|
Owner table for data quality monitoring objects |
||||
|
Alert configuration table for data quality monitoring objects |
||||
|
Data quality rule scoring table |
||||
|
Data quality scheduling configuration table |
||||
|
Data quality issue table |
||||
|
Data quality remediation workflow table |
||||
|
Operation record table for data quality remediation |
||||
|
Relationship table for data quality remediation workflows and issues |
||||
|
Data standard |
Data standard table |
|||
|
Data standard mapping rule table |
||||
|
Mapping table for data standard rules and assets |
||||
|
Mapping result table for data standards and assets |
||||
|
Data standard lookup table |
||||
|
Real-time |
Real-time table metadata |
N/A |
||
|
Real-time table field metadata |
||||
|
Label |
Label market table |
|||
|
Label data lineage table |
||||
|
Asset inventory |
Global table metadata |
N/A |
||
|
Global field metadata |
||||
|
Asset catalog |
Listed asset table |
N/A |
||
|
Asset catalog topic table |
Communication tables
A communication table is the last table generated for each module, confirming that all other tables in that module have been successfully generated for the day.
|
Table name |
Description |
|
Communication table for the basic shared model |
|
|
Communication table for the data security module |
|
|
Communication table for the data service module |
|
|
Communication table for the data quality module |
|
|
Communication table for the data standard module |
|
|
Communication table for the tag module |
Standardized modeling
Plan
dim_dataphin_biz_unit
create table if not exists dim_dataphin_biz_unit(
tenant_id bigint comment 'Tenant ID.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
biz_unit_name_cn string comment 'Business unit name in Chinese.',
biz_unit_desc string comment 'Business unit description.',
is_basic_biz string comment 'Specifies whether the business unit is a basic unit. Valid values: Y/N.',
biz_unit_spec string comment 'Business unit type. `datadev`: data development, `distill`: data extraction.',
env string comment 'Business unit environment. `PROD`: production environment.',
owner_id string comment 'The ID of the business unit owner.',
owner_name string comment 'Name of the business unit owner.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'Name of the last modifier.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.'
) comment 'Business unit table.'
partitioned by (ds string comment 'Partition field in yyyymmdd format.')
lifecycle 30;
dim_dataphin_data_domain
create table if not exists dim_dataphin_data_domain(
tenant_id bigint comment 'tenant ID',
data_domain_id bigint comment 'data domain ID',
data_domain_name string comment 'data domain name',
data_domain_abbreviation string comment 'data domain abbreviation',
data_domain_name_cn string comment 'data domain name in Chinese',
data_domain_desc string comment 'data domain description',
biz_unit_id bigint comment 'business unit ID',
biz_unit_name string comment 'business unit name',
owner_id string comment 'owner ID',
owner_name string comment 'owner name',
modifier_id string comment 'last modifier ID',
modifier_name string comment 'last modifier name',
gmt_create string comment 'creation time',
gmt_modified string comment 'last modified time',
data_domain_level bigint comment 'data domain level',
parent_data_domain_id bigint comment 'parent data domain ID',
parent_data_domain_name string comment 'parent data domain name',
parent_data_domain_level1_abbreviation string comment 'level 1 parent data domain abbreviation',
parent_data_domain_level1_name_cn string comment 'level 1 parent data domain name in Chinese',
data_domain_level1_id bigint comment 'level 1 data domain ID',
data_domain_level1_name string comment 'level 1 data domain name',
data_domain_level1_abbreviation string comment 'level 1 data domain abbreviation',
data_domain_level1_name_cn string comment 'level 1 data domain name in Chinese',
data_domain_level2_id bigint comment 'level 2 data domain ID',
data_domain_level2_name string comment 'level 2 data domain name',
data_domain_level2_abbreviation string comment 'level 2 data domain abbreviation',
data_domain_level2_name_cn string comment 'level 2 data domain name in Chinese',
data_domain_level3_id bigint comment 'level 3 data domain ID',
data_domain_level3_name string comment 'level 3 data domain name',
data_domain_level3_abbreviation string comment 'level 3 data domain abbreviation',
data_domain_level3_name_cn string comment 'level 3 data domain name in Chinese',
data_domain_level4_id bigint comment 'level 4 data domain ID',
data_domain_level4_name string comment 'level 4 data domain name',
data_domain_level4_abbreviation string comment 'level 4 data domain abbreviation',
data_domain_level4_name_cn string comment 'level 4 data domain name in Chinese',
data_domain_level5_id bigint comment 'level 5 data domain ID',
data_domain_level5_name string comment 'level 5 data domain name',
data_domain_level5_abbreviation string comment 'level 5 data domain abbreviation',
data_domain_level5_name_cn string comment 'level 5 data domain name in Chinese'
) comment 'data domain table'
partitioned by (ds string comment 'partition column, in yyyymmdd format')
lifecycle 30;
dim_dataphin_time_period
create table if not exists dim_dataphin_time_period(
tenant_id bigint comment 'Tenant ID.',
time_period_id bigint comment 'Time period ID.',
time_period_name string comment 'Time period name.',
time_period_name_cn string comment 'Chinese name of the time period.',
time_period_desc string comment 'Time period description.',
period_start string comment 'Period start date.',
period_end string comment 'Period end date.',
--env string comment 'The environment. PROD indicates the production environment.',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
modifier_id string comment 'ID of the last modifier.',
modifier_name string comment 'Name of the last modifier.',
gmt_create string comment 'Creation time in GMT.',
gmt_modified string comment 'Modification time in GMT.'
) comment 'Time period table.'
partitioned by (ds string comment 'Partition field, in yyyymmdd format.')
lifecycle 30;
Modeling
dim_dataphin_dimension
create table if not exists dim_dataphin_dimension(
tenant_id bigint comment 'The ID of the tenant.',
dimension_id bigint comment 'The ID of the dimension.',
dimension_name string comment 'The name of the dimension.',
dimension_name_cn string comment 'The Chinese name of the dimension.',
dimension_desc string comment 'The description of the dimension.',
dimension_type string comment 'The type of the dimension. Valid values: NORMAL, ENUM, VIRTUAL, and LEVEL.',
dimension_pk string comment 'The primary key of the dimension.',
dimension_pk_cn string comment 'The Chinese name of the primary key.',
dimension_pk_datatype string comment 'The data type of the primary key.',
dimension_pk_logic string comment 'The primary key logic of the dimension.',
dimension_status string comment 'The status of the dimension. 0: draft, 2: submitted, 100: developing, 200: publishing, 201: published.',
parent_dimension_id string comment 'The ID of the parent dimension.',
parent_dimension_name string comment 'The name of the parent dimension.',
parent_dimension_name_cn string comment 'The Chinese name of the parent dimension.',
dimension_lifecycle string comment 'The lifecycle of the dimension.',
data_domain_id bigint comment 'The ID of the data domain.',
data_domain_name string comment 'The name of the data domain.',
data_domain_abbreviation string comment 'The abbreviation of the data domain.',
data_domain_name_cn string comment 'The Chinese name of the data domain.',
project_id bigint comment 'The ID of the project.',
project_name string comment 'The name of the project.',
biz_unit_id bigint comment 'The ID of the business unit.',
biz_unit_name string comment 'The name of the business unit.',
env string comment 'The environment. Valid values: DEV and PROD. DEV: development environment. PROD: production environment.',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
gmt_create string comment 'The time when the dimension was created.',
gmt_modified string comment 'The time when the dimension was last modified.'
) comment 'Dimension table.'
partitioned by (ds string comment 'The partition field in the yyyymmdd format.')
lifecycle 30;
dim_dataphin_biz_process
create table if not exists dim_dataphin_biz_process(
tenant_id bigint comment 'The tenant ID.',
biz_process_id bigint comment 'The business process ID.',
biz_process_name string comment 'The business process name.',
biz_process_name_cn string comment 'The Chinese name of the business process.',
biz_process_desc string comment 'The business process description.',
data_domain_id bigint comment 'The data domain ID.',
data_domain_name string comment 'The data domain name.',
data_domain_abbreviation string comment 'The data domain abbreviation.',
data_domain_name_cn string comment 'The Chinese name of the data domain.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
env string comment 'The environment. Valid values: DEV (development) and PROD (production).',
owner_id string comment 'The owner ID.',
owner_name string comment 'The owner name.',
modifier_id string comment 'The last modifier ID.',
modifier_name string comment 'The last modifier name.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
) comment 'Business process table'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_atom_index
For derived atomic metrics, the calculation logic (atom_index_logic) uses the format "@atomic metric ID" to reference other atomic metrics. To restore the initial calculation logic, run a query using the atomic metric ID.
create table if not exists dim_dataphin_atom_index(
tenant_id bigint comment 'Tenant ID.',
atom_index_id bigint comment 'Atomic metric ID.',
atom_index_name string comment 'Atomic metric name.',
atom_index_name_cn string comment 'Chinese name of the atomic metric.',
atom_index_desc string comment 'Description of the atomic metric.',
atom_index_type string comment 'Atomic metric type. Valid values: NORMAL (normal atomic metric) and COMPOSITE (derived atomic metric).',
atom_index_data_type string comment 'Data type of the atomic metric.',
atom_index_logic string comment 'Calculation logic for the atomic metric.',
atom_index_status string comment 'Atomic metric status. Valid values: 0 (Draft), 2 (Submitted), 100 (Developing), 200 (Publishing), and 201 (Published).',
is_acc_sum string comment 'Specifies whether the metric can be summed. Valid values: Y and N.',
source_path_attribute string comment 'Primary source field.',
time_col_attribute_id bigint comment 'ID of the time period field.',
time_col_attribute_name string comment 'Name of the time period field.',
time_col_format string comment 'Format of the time period.',
is_comp_atom_index string comment 'Specifies whether the metric is a derived atomic metric. Valid values: Y and N.',
comp_atom_index_ids string comment 'IDs of component atomic metrics, separated by semicolons.',
comp_atom_index_names string comment 'Names of component atomic metrics, separated by semicolons.',
model_id bigint comment 'Logical table ID.',
model_type string comment 'Logical table type. Valid values: FCT (fact logical table) and DIM (dimension logical table).',
model_name string comment 'Logical table name.',
model_name_cn string comment 'Chinese name of the logical table.',
biz_process_id bigint comment 'Business process ID.',
biz_process_name string comment 'Business process name.',
biz_process_name_cn string comment 'Chinese name of the business process.',
data_domain_id bigint comment 'Data domain ID.',
data_domain_name string comment 'Data domain name.',
data_domain_abbreviation string comment 'Abbreviation of the data domain.',
data_domain_name_cn string comment 'Chinese name of the data domain.',
project_id bigint comment 'Project space ID.',
project_name string comment 'Project space name.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
env string comment 'Environment. Valid values: DEV (development) and PROD (production).',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
modifier_id string comment 'ID of the last modifier.',
modifier_name string comment 'Name of the last modifier.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.',
period_type string comment 'Data granularity. Valid values: DAY, HOUR, MINUTE, and STREAMING (real-time).'
) comment 'Atomic metric table.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_adjunct_word
create table if not exists dim_dataphin_adjunct_word(
tenant_id bigint comment 'Tenant ID.',
adjunct_word_id bigint comment 'Business qualifier ID.',
adjunct_word_name string comment 'Business qualifier name.',
adjunct_word_name_cn string comment 'Business qualifier Chinese name.',
adjunct_word_desc string comment 'Business qualifier description.',
adjunct_word_logic string comment 'Business qualifier logic.',
adjunct_word_status string comment 'Business qualifier status. 0: draft, 2: submitted, 100: developing, 200: publishing, 201: published.',
source_path_attribute string comment 'Primary source field.',
model_id bigint comment 'Logical model ID.',
model_type string comment 'Logical model type. FCT: fact table, DIM: dimension table.',
model_name string comment 'Logical model name.',
model_name_cn string comment 'Logical model Chinese name.',
biz_process_id bigint comment 'Business process ID.',
biz_process_name string comment 'Business process name.',
biz_process_name_cn string comment 'Business process Chinese name.',
data_domain_id bigint comment 'Data domain ID.',
data_domain_name string comment 'Data domain name.',
data_domain_abbreviation string comment 'Data domain abbreviation.',
data_domain_name_cn string comment 'Data domain Chinese name.',
project_id bigint comment 'Project ID.',
project_name string comment 'Project name.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
env string comment 'Environment. DEV: development, PROD: production.',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.'
) comment 'Business qualifier table.'
partitioned by (ds string comment 'Partition field. Format: yyyymmdd.')
lifecycle 30;
dim_dataphin_derived_index
create table if not exists dim_dataphin_derived_index(
tenant_id bigint comment 'Tenant ID.',
derived_index_id bigint comment 'Derived index ID.',
derived_index_name string comment 'Derived index name.',
derived_index_name_cn string comment 'Chinese name of the derived index.',
derived_index_desc string comment 'Description of the derived index.',
derived_index_data_type string comment 'Data type of the derived index.',
derived_index_status string comment 'Status of the derived index. Valid values: `0` (Draft), `2` (Submitted), `100` (Developing), `200` (Publishing), and `201` (Published).',
derived_index_logic string comment 'Calculation logic of the derived index.',
atom_index_id bigint comment 'Atomic index ID.',
atom_index_name string comment 'Atomic index name.',
atom_index_name_cn string comment 'Chinese name of the atomic index.',
atom_index_type string comment 'Type of the atomic index. Valid values: `NORMAL` (Normal atomic index) and `COMPOSITE` (Composite atomic index).',
atom_index_data_type string comment 'Data type of the atomic index.',
atom_index_logic string comment 'Calculation logic of the atomic index.',
atom_index_status string comment 'Status of the atomic index. Valid values: `0` (Draft), `2` (Submitted), `100` (Developing), `200` (Publishing), and `201` (Published).',
atom_source_path_attribute string comment 'Primary source field.',
adjunct_word_id bigint comment 'Business qualifier ID.',
adjunct_word_name string comment 'Business qualifier name.',
adjunct_word_name_cn string comment 'Chinese name of the business qualifier.',
adjunct_word_desc string comment 'Description of the business qualifier.',
adjunct_word_logic string comment 'Definition of the business qualifier.',
adjunct_word_status string comment 'Status of the business qualifier. Valid values: `0` (Draft), `2` (Submitted), `100` (Developing), `200` (Publishing), and `201` (Published).',
adjunct_word_source_path_attribute string comment 'Primary source field.',
time_period_id bigint comment 'Time period ID.',
time_period_name string comment 'Time period name.',
time_period_name_cn string comment 'Chinese name of the time period.',
granularity_id bigint comment 'Granularity ID.',
granularity_name string comment 'Granularity name.',
granularity_name_cn string comment 'Chinese name of the granularity.',
dimension_ids string comment 'Dimension IDs, separated by semicolons.',
dimension_names string comment 'Dimension names, separated by semicolons.',
dimension_name_cns string comment 'Chinese names of the dimensions, separated by semicolons.',
summary_id bigint comment 'ID of the summary table for the granularity.',
summary_name string comment 'Name of the summary table for the granularity.',
summary_name_cn string comment 'Chinese name of the summary table for the granularity.',
biz_process_id bigint comment 'Business process ID.',
biz_process_name string comment 'Business process name.',
biz_process_name_cn string comment 'Chinese name of the business process.',
data_domain_id bigint comment 'Data domain ID.',
data_domain_name string comment 'Data domain name.',
data_domain_abbreviation string comment 'Abbreviation of the data domain.',
data_domain_name_cn string comment 'Chinese name of the data domain.',
project_id bigint comment 'Project ID.',
project_name string comment 'Project name.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
env string comment 'Environment. Valid values: `DEV` (Development) and `PROD` (Production).',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
modifier_id string comment 'ID of the last modifier.',
modifier_name string comment 'Name of the last modifier.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Last modification time.',
period_type string comment 'Time granularity. Valid values: `DAY`, `HOUR`, `MINUTE`, and `STREAMING` (real-time).',
compute_type string comment 'Compute type. Valid values: `AUTO` (derived index), `COMBINE` (combined index), `CUSTOM` (expert-coded), and `MOUNT` (externally registered).'
) comment 'Derived indexes.'
partitioned by (ds string comment 'Partition key, in `yyyymmdd` format.')
lifecycle 30;
dim_dataphin_model
create table if not exists dim_dataphin_model(
tenant_id bigint comment 'The tenant ID.',
model_id bigint comment 'The model ID.',
model_name string comment 'The model name.',
model_name_cn string comment 'The Chinese model name.',
model_desc string comment 'The model description.',
model_status string comment 'The model status. Valid values: 0 (Draft), 2 (Submitted), 100 (Developing), 200 (Publishing), and 201 (Published).',
model_type string comment 'The model type. Valid values: DIM (dimension logical tables), FACT (fact logical tables), SUM (summary tables), and VIEW (logical views).',
env string comment 'The environment. Valid values: DEV (development) and PROD (production).',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
owner_id string comment 'The owner ID.',
owner_name string comment 'The owner name.',
modifier_id string comment 'The last modifier ID.',
modifier_name string comment 'The last modifier name.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The last modification time.',
customize_config string comment 'Custom configuration.',
table_lifecycle string comment 'The model lifecycle.',
granularity_id bigint comment 'The granularity ID.',
granularity_name string comment 'The granularity name.',
granularity_name_cn string comment 'The Chinese granularity name.',
biz_process_id bigint comment 'The business process ID.',
biz_process_name string comment 'The business process name.',
biz_process_name_cn string comment 'The Chinese business process name.',
data_domain_id bigint comment 'The data domain ID.',
data_domain_name string comment 'The data domain name.',
data_domain_abbreviation string comment 'The data domain abbreviation.',
data_domain_name_cn string comment 'The Chinese data domain name.',
biz_detail_time_column string comment 'The time field for the fact logical table.',
biz_detail_main_table string comment 'The main table for the fact logical table.',
biz_detail_main_table_condition string comment 'The condition for the main table of the fact logical table.',
is_summary_support_cube string comment 'Indicates whether the summary table supports cubes.',
view_content string comment 'The SQL content of the logical view.',
view_ref_table string comment 'The tables referenced by the logical view.',
ops_owner_id string comment 'The operations owner ID.',
ops_owner_name string comment 'The operations owner name.',
dev_owner_id string comment 'The development owner ID.',
dev_owner_name string comment 'The development owner name.',
physical_data_size bigint comment 'The total size, in bytes, of the physical tables for this model.'
) comment 'The model metadata.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_field
create table if not exists dim_dataphin_field(
tenant_id bigint comment 'The tenant ID.',
field_id bigint comment 'The logical field ID.',
field_name string comment 'The name of the logical field.',
field_name_cn string comment 'The Chinese name of the logical field.',
field_seq bigint comment 'The sequence of the logical field.',
field_desc string comment 'The description of the logical field.',
field_data_type string comment 'The data type of the logical field.',
field_logic string comment 'The definition or calculation logic of the logical field.',
field_type string comment 'The field type. Valid values: SYS for system field, DIM for dimension attribute, REF for fact attribute, MEAS for measure, and SUM for summary metric.',
field_status string comment 'The status of the logical field. Valid values: 0 for Draft, 2 for Submitted, 100 for In Development, 200 for Publishing, and 201 for Published.',
field_default_value string comment 'The default value of the logical field.',
env string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
model_type string comment 'The type of the logical table. Valid values: DIM for dimension table, FACT for fact table, VIEW for logical view, and SUM for summary table.',
model_id bigint comment 'The ID of the logical table.',
model_name string comment 'The name of the logical table.',
model_name_cn string comment 'The Chinese name of the logical table.',
is_partition string comment 'Whether the field is a partition field. Valid values: Y or N.',
is_pk string comment 'Whether the field is a primary key. Valid values: Y or N.',
is_not_null string comment 'Whether the field is non-nullable. Valid values: Y or N.',
is_unique string comment 'Whether the field value must be unique. Valid values: Y or N.',
is_physical string comment 'Whether the field is a physical field. Valid values: Y or N.',
ref_dimension_id string comment 'The ID of the referenced dimension.',
ref_dimension_role string comment 'The role of the referenced dimension.',
ref_dimension_role_cn string comment 'The Chinese name of the referenced dimension role.',
ref_field_id bigint comment 'The ID of the referenced logical field (for logical views).',
attribute_scope string comment 'The scope of the attribute.',
biz_unit_id bigint comment 'The ID of the business unit.',
biz_unit_name string comment 'The name of the business unit.',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.',
classify string comment 'The data classification. Valid values: S for company data, B for business data, and C for personal data. Other values are custom-defined.',
security_level string comment 'The security level. Valid values: L1 for Public, L2 for Internal, L3 for Confidential, and L4 for Top Secret. Other values are custom-defined.'
) comment 'Metadata for logical table fields.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
DevOps
Physical operations
dim_dataphin_node
-
This table describes offline computing tasks, offline integration tasks, and logical table tasks.
-
Query the
dim_dataphin_stream_taskanddim_dataphin_stream_task_related_tabletables for information about real-time computing tasks. Real-time integration tasks cannot be queried yet.
create table if not exists dim_dataphin_node(
tenant_id bigint comment 'The tenant ID.',
node_id string comment 'The node ID.',
node_name string comment 'The node name.',
node_desc string comment 'The node description.',
node_type string comment 'The scheduling type. Valid values: NORMAL for periodic nodes and MANUAL for manual nodes.',
operator_type string comment 'The node type.',
cron_expression string comment 'The cron expression.',
cron_type string comment 'The scheduling cycle. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.',
priority bigint comment 'The priority.',
env string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
node_status string comment 'The node status. Valid values: NORMAL for normal scheduling, PAUSED for paused scheduling, and IDLE for a dry run.',
param string comment 'The node parameters.',
output_name_list string comment 'The output list.',
compute_engine_config string comment 'The compute source configuration.',
compute_engine_id bigint comment 'The compute source ID.',
compute_engine_name string comment 'The compute source name.',
node_content string comment 'The node content.',
is_logical_node string comment 'Indicates whether this is a scheduling node for a logical table. Valid values: Y and N.',
resource_group string comment 'The resource group.',
node_from string comment 'The node source.',
project_id bigint comment 'The workspace ID.',
project_name string comment 'The workspace name.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
owner_id string comment 'The owner ID.',
owner_name string comment 'The owner name.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.',
directory_path string comment 'The directory path.'
) comment 'Physical scheduling node'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
fct_dataphin_node_taskrun_di
create table if not exists fct_dataphin_node_taskrun_di (
tenant_id bigint comment 'The tenant ID.'
,taskrun_id string comment 'The ID of the instance run.'
,ready_time string comment 'The time when the instance met its scheduling conditions.'
,dispatch_time string comment 'The time when the instance was dispatched.'
,start_execute_time string comment 'The start time of the instance run.'
,finish_execute_time string comment 'The finish time of the instance run.'
,taskrun_status string comment 'The status of the instance run. Possible values: SUCCESS, FAILED, WAIT_RESOURCE, KILLED, DROPPED, RUNNING, PAUSED, and READY.'
,with_downstreams string comment 'Whether to schedule downstream tasks.'
,schedule_type string comment 'The type of schedule that triggered the run. Possible values: AUTO_SCHEDULED, RERUN, and REDISPATCH.'
,task_id string comment 'The task ID.'
,task_name string comment 'The task name.'
,task_type string comment 'The task type. Possible values: NORMAL for a normal task, VIRTUAL for a virtual task, TEMP for a temporary task, PAUSED for a paused task, and IDLE for an idle run.'
,task_due_time string comment 'The scheduled execution time for the instance, calculated based on the cron expression.'
,task_upstream_ready_time string comment 'The time when all upstream tasks for this instance completed.'
,task_last_execute_time string comment 'The last execution time of the task.'
,task_run_times string comment 'The number of runs for this instance, including retries.'
,dagrun_type string comment 'The trigger type for the task run. Possible values: NORMAL for a periodic run, MANUAL for a manual run, SUPPLEMENT for a backfill, and TEMP for a temporary run.'
,operator_type string comment 'The operator type of the task. Possible values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, and VIRTUAL.'
,node_id string comment 'The node ID.'
,node_name string comment 'The node name.'
,biz_date string comment 'The business date.'
,env string comment 'The environment. Possible values: DEV for the development environment and PROD for the production environment.'
,index_col string comment 'The sequence number.'
,is_logical_node string comment 'Indicates whether this is a scheduling node for a logical table. Valid values: Y and N.'
--node_from string comment 'The node source.',
,node_content_version string comment 'The version of the task code.'
,node_content string comment 'The task code.'
,project_id bigint comment 'The project ID.'
,project_name string comment 'The project name.'
,owner_id string comment 'The owner ID.'
,owner_name string comment 'The owner name.'
,gmt_create string comment 'The creation time.'
,gmt_modified string comment 'The modification time.'
,submitter_id string comment 'The ID of the user who submitted the instance run.'
,submitter_name string comment 'The name of the user who submitted the instance run.'
,is_system_submitter string comment 'Whether the run was submitted by the system.'
)
comment 'Stores instance runs of physical nodes.'
partitioned by (
ds string comment 'The partition key, in yyyymmdd format.'
)
lifecycle 90;
dim_dataphin_datax_node
create table if not exists dim_dataphin_datax_node(
tenant_id bigint comment 'Tenant ID.',
node_id string comment 'Node ID.',
node_name string comment 'Node name.',
node_desc string comment 'Node description.',
node_type string comment 'Scheduling type. Valid values: NORMAL for a periodic node, and MANUAL for a manual node.',
operator_type string comment 'Node type.',
cron_expression string comment 'Cron expression.',
cron_type string comment 'Scheduling period. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.',
priority bigint comment 'Execution priority.',
env string comment 'Environment. Valid values: DEV (development) and PROD (production).',
node_status string comment 'Node status. Valid values: NORMAL (normal scheduling), PAUSED (paused scheduling), and IDLE (dry run).',
param string comment 'Node parameters.',
output_name_list string comment 'Output names.',
compute_engine_config string comment 'Compute engine configuration.',
compute_engine_id bigint comment 'Compute engine ID.',
compute_engine_name string comment 'Compute engine name.',
node_content string comment 'Node content.',
is_logical_node string comment 'Indicates if the node is for a logical table. Valid values: Y and N.',
resource_group string comment 'Resource group.',
project_id bigint comment 'Project space ID.',
project_name string comment 'Project space name.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
reader_type string comment 'Source data type.',
reader_ds_id string comment 'Source data source ID.',
reader_ds_name string comment 'Source data source name.',
reader_db_name string comment 'Source database name.',
reader_table_name string comment 'Source table name.',
reader_where string comment 'Source data filter condition.',
writer_type string comment 'Destination data type.',
writer_ds_id string comment 'Destination data source ID.',
writer_ds_name string comment 'Destination data source name.',
writer_db_name string comment 'Destination database name.',
writer_table_name string comment 'Destination table name.',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
modifier_id string comment 'ID of the last modifier.',
modifier_name string comment 'Name of the last modifier.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.'
) comment 'DataX-type physical scheduling node.'
partitioned by (ds string comment 'Partition field, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_dlink_node
In script mode, DLink tasks cannot parse table names.
create table if not exists dim_dataphin_dlink_node (
tenant_id bigint comment 'The tenant ID.'
,node_id string comment 'The node ID.'
,node_name string comment 'The node name.'
,node_desc string comment 'A description of the node.'
,node_type string comment 'The scheduling type. Valid values: NORMAL (periodic node) or MANUAL (manual node).'
,operator_type string comment 'The node type.'
,cron_expression string comment 'The cron expression.'
,cron_type string comment 'The scheduling period. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, or WEEKLY.'
,priority bigint comment 'The priority.'
,env string comment 'The environment. Valid values: DEV (development environment) or PROD (production environment).'
,node_status string comment 'The node status. Valid values: NORMAL (scheduled), PAUSED (paused), or IDLE (dry run).'
,param string comment 'The parameters for the node.'
,output_name_list string comment 'The list of output names.'
,compute_engine_config string comment 'The compute engine configuration.'
,compute_engine_id bigint comment 'The compute engine ID.'
,compute_engine_name string comment 'The compute engine name.'
,node_content string comment 'The node content.'
,is_logical_node string comment 'Indicates whether the node schedules a logical table. Valid values: Y or N.'
,resource_group string comment 'The resource group.'
,project_id bigint comment 'The project ID.'
,project_name string comment 'The project name.'
,biz_unit_id bigint comment 'The business unit ID.'
,biz_unit_name string comment 'The business unit name.'
,reader_type string comment 'The type of the reader data source.'
,reader_ds_id string comment 'The ID of the reader data source.'
,reader_ds_name string comment 'The name of the reader data source.'
,reader_db_name string comment 'The name of the reader database.'
,reader_table_name string comment 'The name of the source table.'
,writer_type string comment 'The type of the writer data source.'
,writer_ds_id string comment 'The ID of the writer data source.'
,writer_ds_name string comment 'The name of the writer data source.'
,writer_db_name string comment 'The name of the writer database.'
,writer_table_name string comment 'The name of the destination table.'
,owner_id bigint comment 'The owner ID.'
,owner_name string comment 'The owner name.'
,modifier_id bigint comment 'The ID of the last modifier.'
,modifier_name string comment 'The name of the last modifier.'
,gmt_create string comment 'The creation time.'
,gmt_modified string comment 'The modification time.'
,reader_schema_name string comment 'The source schema name.'
,writer_schema_name string comment 'The destination schema name.'
)
comment 'Stores dlink-type physical scheduling nodes. Note: The value of node_id in this table may not be unique.'
partitioned by (
ds string comment 'The partition key, in yyyymmdd format.'
)
lifecycle 30;
dim_dataphin_node_edge
-
The combination of
source_node_id,source_node_output_name, andtarget_node_idis unique for each record in this table. -
In the
source_node_namefield, nodes with names starting withvirtual_root_nodeare system-built-in root nodes. -
A logical table displays dependencies for its corresponding logical node and its internal materialized nodes (one or more per logical node). To view only the dependencies between logical nodes, filter by
target_operator_type = 'LOGICAL' and source_operator_type = 'LOGICAL'. -
Version 6.0 introduces support for data lineage in cross-tenant dependencies. If an upstream or downstream task has the same
tenant_id,source_tenant_id, andtarget_tenant_idas the task in the current tenant, then thetenant_idof that task's cross-tenant dependency record matches itstarget_tenant_id.
create table if not exists dim_dataphin_node_edge (
tenant_id bigint comment 'The tenant ID.'
,source_node_id string comment 'The ID of the upstream node.'
,source_node_name string comment 'The name of the upstream node.'
,source_node_status string comment 'The status of the upstream node. Valid values: NORMAL for normal scheduling, PAUSED for paused scheduling, and IDLE for a dry run.'
,source_node_type string comment 'The scheduling type of the upstream node. Valid values: NORMAL for a periodic node and MANUAL for a manual node.'
,source_cron_type string comment 'The scheduling period of the upstream node. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.'
,source_operator_type string comment 'The task type of the upstream node. Valid values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, VIRTUAL, ONE_SERVICE_SQL_HIVE_CDP, and VVP_BATCH.'
,source_is_logical_node string comment 'Whether the upstream node is a scheduling node for a logical table. Valid values: Y and N.'
,source_project_id bigint comment 'The workspace ID of the upstream node.'
,source_project_name string comment 'The workspace name of the upstream node.'
,source_node_output_name string comment 'The output name of the upstream node.'
,target_node_id string comment 'The ID of the downstream node.'
,target_node_name string comment 'The name of the downstream node.'
,target_node_status string comment 'The status of the downstream node. Valid values: NORMAL for normal scheduling, PAUSED for paused scheduling, and IDLE for a dry run.'
,target_node_type string comment 'The scheduling type of the downstream node. Valid values: NORMAL for a periodic node and MANUAL for a manual node.'
,target_cron_type string comment 'The scheduling period of the downstream node. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.'
,target_operator_type string comment 'The task type of the downstream node. Valid values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, VIRTUAL, ONE_SERVICE_SQL_HIVE_CDP, and VVP_BATCH.'
,target_is_logical_node string comment 'Whether the downstream node is a scheduling node for a logical table. Valid values: Y and N.'
,target_project_id bigint comment 'The workspace ID of the downstream node.'
,target_project_name string comment 'The workspace name of the downstream node.'
,period_diff bigint comment 'The difference in scheduling periods. Must be a non-negative integer. Defaults to 1 for a self-dependency (0 is not allowed) and 0 for all other dependencies. A positive value means the target node depends on the source node instance from `period_diff` cycles ago.'
,day_diff bigint comment 'The difference in days. When set, the target node depends on all source node instances that ran `day_diff` days ago, and `period_diff` is ignored. Defaults to null.'
,is_self_depend string comment 'Whether the dependency is a self-dependency. Valid values: Y and N.'
,env string comment 'The environment. Valid values: PROD and DEV.'
,gmt_create string comment 'The creation time.'
,gmt_modified string comment 'The modification time.'
,is_cross_tenant string comment 'Whether a cross-tenant dependency is enabled. Valid values: Y and N.'
,source_tenant_id bigint comment 'The tenant ID of the upstream node.'
,target_tenant_id bigint comment 'The tenant ID of the downstream node.'
)
comment 'The physical scheduling node dependency table.'
partitioned by (
ds string comment 'The partition field.'
) stored as orc;
fct_dataphin_node_task_link_di
-
Retains only system-generated scheduled instance data in the production environment, with incremental updates based on scheduled tasks in vdm_link after 11:00 on T-2.
-
V6.0 introduces lineage support for cross-tenant dependent tasks. When an upstream or downstream task shares the same tenant_id, source_tenant_id, and target_tenant_id as a task in the current tenant, the system sets the tenant_id for the associated cross-tenant dependent task to the target_tenant_id.
create table if not exists fct_dataphin_node_task_link_di (
tenant_id bigint comment 'Tenant ID.'
,source_task_id string comment 'ID of the upstream task instance.'
,source_task_name string comment 'Name of the upstream task instance.'
,source_task_type string comment 'Instance type. Values: `NORMAL`, `VIRTUAL`, `TEMP`, `PAUSED`, `IDLE` (idle scheduling).'
,source_operator_type string comment 'Task type. Values: `ONE_SERVICE_SQL`, `MAX_COMPUTE_SQL`, `DATAX`, `DLINK`, `FLINK_STREAMING`, `MAX_COMPUTE_MR`, `PYTHON`, `SHELL`, `VIRTUAL`, `ONE_SERVICE_SQL_HIVE_CDP`, `VVP_BATCH`, etc.'
,source_task_status string comment 'Upstream instance status. Values: `INIT` (Not started), `WAIT_SCHEDULE`, `DISPATCH_TO_AGENT_BLOCKED` (Dispatch blocked), `WAIT_SUBMISSION`, `WAIT_RESOURCE`, `FAILED`, `RUNNING`, `KILLED`, `PAUSED`, `SUCCESS`.'
,source_biz_date string comment 'Business date.'
,source_node_id string comment 'ID of the upstream node.'
,source_node_name string comment 'Name of the upstream node.'
,source_node_status string comment 'Upstream node status. Values: `NORMAL` (normal), `PAUSED` (paused), `IDLE` (idle).'
,source_node_type string comment 'Upstream node scheduling type. `NORMAL`: periodic node.'
,source_cron_type string comment 'Upstream node scheduling cycle. Values: `MINUTELY`, `HOURLY`, `DAILY`, `MONTHLY`, `WEEKLY`.'
,source_project_id bigint comment 'Upstream instance project ID.'
,source_project_name string comment 'Upstream instance project name.'
,target_task_id string comment 'ID of the downstream task instance.'
,target_task_name string comment 'Name of the downstream task instance.'
,target_task_type string comment 'Instance type. Values: `NORMAL`, `VIRTUAL`, `TEMP`, `PAUSED`, `IDLE` (idle scheduling).'
,target_operator_type string comment 'Task type. Values: `ONE_SERVICE_SQL`, `MAX_COMPUTE_SQL`, `DATAX`, `DLINK`, `FLINK_STREAMING`, `MAX_COMPUTE_MR`, `PYTHON`, `SHELL`, `VIRTUAL`, `ONE_SERVICE_SQL_HIVE_CDP`, `VVP_BATCH`, etc.'
,target_task_status string comment 'Downstream instance status. Values: `INIT` (Not started), `WAIT_SCHEDULE`, `DISPATCH_TO_AGENT_BLOCKED` (Dispatch blocked), `WAIT_SUBMISSION`, `WAIT_RESOURCE`, `FAILED`, `RUNNING`, `KILLED`, `PAUSED`, `SUCCESS`.'
,target_biz_date string comment 'Business date.'
,target_node_id string comment 'ID of the downstream node.'
,target_node_name string comment 'Name of the downstream node.'
,target_node_status string comment 'Downstream node status. Values: `NORMAL` (normal), `PAUSED` (paused), `IDLE` (idle).'
,target_node_type string comment 'Downstream node scheduling type. `NORMAL`: periodic node.'
,target_cron_type string comment 'Downstream node scheduling cycle. Values: `MINUTELY`, `HOURLY`, `DAILY`, `MONTHLY`, `WEEKLY`.'
,target_project_id bigint comment 'Downstream instance project ID.'
,target_project_name string comment 'Downstream instance project name.'
,dagrun_type string comment 'Task run type. Values: `NORMAL` (periodic), `MANUAL`, `SUPPLEMENT` (backfill), `TEMP` (temporary).'
,is_self_depend string comment 'Self-dependency flag.'
,env string comment 'Environment. Values: `PROD`, `DEV`.'
,gmt_create string comment 'Creation timestamp.'
,gmt_modified string comment 'Last modification timestamp.'
,is_cross_tenant string comment 'Cross-tenant dependency flag (`Y`/`N`).'
,source_tenant_id bigint comment 'Upstream node tenant ID.'
,target_tenant_id bigint comment 'Downstream node tenant ID.'
)
comment 'Physical scheduling dependencies between instances.'
partitioned by (
ds string comment 'Partition key, typically a date string (e.g., `YYYYMMDD`).'
) stored as orc;
dim_dataphin_monitored_rule
create table if not exists dim_dataphin_monitored_rule(
tenant_id bigint comment 'The tenant ID.',
monitored_rule_id bigint comment 'The monitoring configuration ID.',
monitored_rule_type string comment 'The monitoring configuration type. Valid values: LOGIC_FIELD: logical table field, SCHEDULE_PHYSICAL_NODE: scheduled physical task, MANUAL_PHYSICAL_NODE: manual physical task, LOGICAL_TABLE_NODE: logical table task, and PROJECT: project-level.',
switch_status string comment 'The switch status. Valid values: TRUE and FALSE.',
external_node_id string comment 'The ID of the monitored object. The object type is determined by the monitored_rule_type value.',
external_node_name string comment 'The name of the monitored object.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
model_id string comment 'The ID of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
model_name string comment 'The name of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
biz_unit_id string comment 'The business unit ID of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
biz_unit_name string comment 'The business unit name of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
alert_reason_type string comment 'The alert reason. Valid values: DATA_DELAY: data delay, FAILED: failed run, SUCCESS: successful run, TIME_OUT: timeout, and UNFINISHED: unfinished task.',
alert_reason_config string comment 'The alert reason configuration.',
alert_strategy string comment 'The alert notification policy.',
owner_id string comment 'The owner ID.',
owner_name string comment 'The owner name.',
last_modifier_id string comment 'The last modifier ID.',
last_modifier_name string comment 'The last modifier name.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
) comment 'The O&M monitoring configuration table.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30
;
fct_dataphin_dlink_node_taskrun_step_metric_di
create table if not exists fct_dataphin_dlink_node_taskrun_step_metric_di(
tenant_id bigint comment 'The tenant ID.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
taskrun_id string comment 'The ID of the task run.',
engine_type string comment 'Engine type: `dlink`.',
env string comment 'Environment. Valid values: `DEV` and `PROD`.',
gmt_create string comment 'The creation time.',
dagrun_type string comment 'Task run type. `NORMAL`: scheduled run, `MANUAL`: manual run, `SUPPLEMENT`: backfill, `TEMP`: temporary run.',
taskrun_status string comment 'Task run status. Valid values include: `INIT`, `WAIT_SCHEDULE`, `DISPATCH_TO_AGENT_BLOCKED`, `WAIT_SUBMISSION`, `WAIT_RESOURCE`, `RUNNING`, `FAILED`, `KILLING`, `KILLED`, `PAUSED`, `SUCCESS`, and `TASK_NOT_EXISTS`.',
exception_info string comment 'Exception details if the task run failed.',
total_time bigint comment 'The total duration of the task run, in seconds.',
start_time string comment 'The start time of the task run.',
end_time string comment 'The end time of the task run.',
node_id string comment 'The node ID.',
schedule_interval_type string comment 'Node scheduling interval. Valid values: `MINUTELY`, `HOURLY`, `DAILY`, `MONTHLY`, `WEEKLY`, and `YEARLY`.',
node_config_memory bigint comment 'The memory configured for the node, in MB.',
node_config_cpu double comment 'The number of CPU cores configured for the node.',
node_config_paralle bigint comment 'The global parallelism configured for the node.',
owner_id string comment 'The ID of the node owner.',
owner_name string comment 'The name of the node owner.',
task_id string comment 'The task ID.',
task_name string comment 'The task name.',
biz_date string comment 'The business date.',
execute_ip string comment 'IP address of the machine that ran the task run.',
total_lines_reader bigint comment 'Total records successfully read by the task run.',
total_lines_written bigint comment 'Total records successfully written by the task run.',
total_lines_rejected bigint comment 'Total number of dirty data records in the task run.',
total_bytes_reader bigint comment 'Total bytes successfully read by the task run.',
total_bytes_written bigint comment 'Total bytes successfully written by the task run.',
total_wait_reader_time bigint comment 'The total wait time for read operations in the task run, in milliseconds.',
total_wait_writer_time bigint comment 'The total wait time for write operations during the task run, in milliseconds.',
avg_reader_record bigint comment 'Average records read per second in the task run.',
avg_writer_record bigint comment 'Average records written per second in the task run.',
avg_reader_bytes double comment 'Average bytes read per second in the task run.',
avg_writer_bytes double comment 'Average bytes written per second in the task run.',
taskrun_content string comment 'The JSON content used for the task run.',
memory_info string comment 'Memory usage details from the task run.',
average_cpu double comment 'The average CPU load during the task run, as a percentage.',
gc_time bigint comment 'The total garbage collection (GC) time during the task run, in milliseconds.',
step_name string comment 'The name of the step.',
category string comment 'Step category. Valid values: `reader`, `writer`, `transform`, `filter`, and `map`.',
step_type string comment 'The type of the dlink step.',
step_ds_type string comment 'Step source type. `COMPUTE`: compute source, `DATASOURCE`: data source, `LOGICTABLE`: logical table, `OTHER`: other.',
ds_id bigint comment 'The ID of the data source or compute source.',
ds_name string comment 'The name of the data source or compute source.',
ds_project_id bigint comment 'ID of the project bound to the compute source. Populated only when the compute source is a project.',
ds_project_name string comment 'Name of the project bound to the compute source. Populated only when the compute source is a project.',
ds_biz_unit_id string comment 'ID of the business unit for the logical table. Populated only when the step reads from a logical table.',
ds_biz_unit_name string comment 'Name of the business unit for the logical table. Populated only when the step reads from a logical table.',
ds_type string comment 'The type of the data source or compute source.',
table_name string comment 'The table read from or written to by the step. If multiple tables are involved, only one is shown.',
tables_list string comment 'A comma-separated list of all tables read from or written to by the step.',
step_lines_reader bigint comment 'Total records read by the step.',
step_lines_written bigint comment 'Total records written by the step.',
step_lines_rejected bigint comment 'Total number of dirty data records for the step.',
step_bytes_reader bigint comment 'Total bytes read by the step.',
step_bytes_written bigint comment 'Total bytes written by the step.',
step_wait_reader_time bigint comment 'The wait time for read operations in the step, in milliseconds.',
step_wait_writer_time bigint comment 'The wait time for write operations in the step, in milliseconds.'
) comment 'Step-level runtime statistics for integration task nodes. Each record is uniquely identified by `taskrun_id` and `step_name`.'
partitioned by (ds string comment 'The partition key, formatted as `yyyymmdd`.')
lifecycle 180
;
fct_dataphin_monitor_alert_event_di
create table if not exists fct_dataphin_monitor_alert_event_di
(
tenant_id bigint comment 'The tenant ID.',
alert_event_id bigint comment 'The ID of the alert event.',
source_system string comment 'The source system. Valid values: DQE for data quality, KGB for baseline, VDM_BATCH for scheduling and O&M, OS for data service, STREAM for real-time computing, REAL_TIME_PIPELINE for real-time integration, any value prefixed with QD_ for QD, and CUSTOM for custom.',
alert_object_name string comment 'The name of the object that triggered the alert.',
alert_object_type string comment 'The type of the object that triggered the alert. Valid values: REAL_TIME_PIPELINE_TASK for a real-time computing task, OS_API for an API interface, OS_APPLICATION_SERVICE for a data service application, KGB for a baseline, QD_DECISION_INVOKE for a tag system, CUSTOM for a custom object, any value prefixed with VDM_ for an offline task, and any value prefixed with DQE_ for a data quality rule.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
biz_date string comment 'The business date.',
alert_reason string comment 'The reason for the alert. For a list of possible values, see the product documentation.',
alert_reason_params string comment 'Parameters for the alert reason.',
latest_alert_time string comment 'Time of the most recent alert.',
first_alert_time string comment 'Time of the first alert.',
url_config string comment 'The link information.',
alert_sending_config string comment 'Alert frequency settings.',
status string comment 'The alert status. Valid values: ALERTING, DO_NOT_DISTURB, SILENCING (alerting during a silence period), and FINISH (completed).',
do_not_disturb_end_time string comment 'The end time of the do not disturb period.',
total_alert_times bigint comment 'The total number of times the alert has triggered.',
alert_channel_list string comment 'A list of alert channels. Valid values: VOICE (phone call), SMS (text message), MAIL (email), DINGTALK_ROBOT (DingTalk robot), DINGDING (DingTalk work notification), CUSTOM (custom channel), and SILENCE (do not send).',
alert_user_list string comment 'A list of alert recipients.',
alert_advanced_config string comment 'Advanced configuration settings.',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
on_call_table_list string comment 'A list of on-call schedules.',
gmt_create string comment 'The creation time of the alert event.',
gmt_modified string comment 'The modification time of the alert event.',
source_system_id string comment 'The value depends on `source_system`. For `KGB`, this is `dim_dataphin_baseline_object.baseline_id`. For `VDM_BATCH`, this is `dim_dataphin_monitored_rule.monitored_rule_id`.',
source_system_biz_id string comment 'The value depends on `source_system`. For `DQE`, this is `dim_dataphin_dp_quality_rule.rule_id`. For `VDM_BATCH`, this is `dim_dataphin_node.node_id` or `dim_dataphin_model.model_id`.'
)comment 'The alert event table, incrementally updated daily based on `first_alert_time`.'
partitioned by (ds string comment 'The partition column, in yyyymmdd format.')
lifecycle 90;
The alert_reason enum values are:
-
Data quality:
-
DQE_COLUMN: Column rule exception. -
DQE_DATA_SOURCE: Data source rule exception. -
DQE_TABLE: Table rule exception. -
DQE_REALTIME_TABLE: Real-time table rule exception. -
DQE_INDEX: Metric rule exception.
-
-
Data service:
-
OS_AVG_RESPONSE: Average response time exception. -
OS_CALL_TIMES: Number of calls exception. -
OS_ERROR_RATE: Error rate exception. -
OS_OFFLINE: Offline percentage exception.
-
-
Batch task:
-
VDM_BATCH_ERROR: Task error. -
VDM_BATCH_FINISH: Task finished successfully. -
VDM_BATCH_TIME_OUT: Execution timeout. -
VDM_BATCH_UNDONE: Task incomplete. -
VDM_BATCH_LOGIC_DATA_DELAY: Logical table data delay.
-
-
Baseline:
-
KGB_TASK_ERROR: Baseline task error. -
KGB_TASK_SLOW_DOWN: Baseline task slowdown. -
KGB_EARLY_WARNING: Baseline early warning. -
KGB_BROKEN_LINE: Baseline breach.
-
-
Real-time computing and real-time integration:
-
STREAM_BIZ_DELAY: High business latency. -
STREAM_DATA_RETENTION: Data backlog exceeds the configured limit. -
STREAM_MORE_THAN_FAILURE: Failure frequency exceeds the configured limit. -
STREAM_TPS_OUT_RANGE: TPS is outside the configured range. -
STREAM_CHECKPOINT_FAILURE: Checkpoint failures exceed the configured limit. -
STREAM_BACKPRESSURE: Backpressure duration exceeds the configured limit. -
STREAM_JOB_FAILURE: Job failure.
-
-
Custom:
-
LOGICAL_INSTANCE_GENERATION: Logical instance generation alert.
-
fct_dataphin_monitor_alert_notification_di
The enumeration values for alert_reason are defined in the fct_dataphin_monitor_alert_event_di table.
create table if not exists fct_dataphin_monitor_alert_notification_di(
tenant_id bigint comment 'The tenant ID.',
alert_notification_id bigint comment 'The ID of the notification record.',
alert_event_id bigint comment 'The ID of the alert event. This ID corresponds to `alert_event_id` in the `fct_dataphin_monitor_alert_event_di` table. A value of -1 indicates consolidated alert events.',
source_system string comment 'The source system. Valid values: `DQE` for data quality, `KGB` for baseline, `VDM_BATCH` for scheduling and O&M, `OS` for data service, `STREAM` for real-time computing, `REAL_TIME_PIPELINE` for real-time integration, any value prefixed with `QD_` for QD, and `CUSTOM` for custom.',
alert_object_name string comment 'The name of the object that triggered the alert.',
alert_object_type string comment 'The type of object that triggered the alert. Valid values: `REAL_TIME_PIPELINE_TASK` for a real-time computing task, `OS_API` for an API, `OS_APPLICATION_SERVICE` for a data service application, `KGB` for a baseline, `QD_DECISION_INVOKE` for a tagging system, `CUSTOM` for a custom object, any value prefixed with `VDM_` for an offline task, and any value prefixed with `DQE_` for a data quality rule.',
notify_time string comment 'The notification time.',
biz_date string comment 'The business date.',
alert_reason string comment 'The reason for the alert. Refer to the relevant documentation for a list of enumerated values.',
alert_reason_params string comment 'The parameters for the alert reason.',
message_content string comment 'The content of the alert notification.',
receiver_type string comment 'The receiver type. Valid values: `ON_CALL_TABLE` for an on-call schedule, `USER_DEFINED` for a custom user, and `OWNER` for the owner.',
alert_channel string comment 'The notification channel.',
alert_user string comment 'The notification receiver.',
on_call_table_id bigint comment 'The ID of the on-call schedule.',
on_call_table_name string comment 'The name of the on-call schedule.',
notify_status string comment 'The notification status. Valid values: `SUCCESS`, `FAILED`, `SENDING`, and `UNKNOWN`.',
fail_reason string comment 'The reason for a `FAILED` notification status.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
)comment 'The alert notification record table. Daily increments are determined by the `notify_time` field.'
partitioned by (ds string comment 'The partition field, using the `yyyymmdd` format.')
lifecycle 90;
Service operations
dim_dataphin_model_node
create table if not exists dim_dataphin_model_node(
tenant_id bigint comment 'Tenant ID.',
model_node_id string comment 'Logical table node ID.',
model_node_name string comment 'Logical table node name.',
model_node_name_cn string comment 'Chinese name of the logical table node.',
model_id bigint comment 'Logical table ID.',
model_name string comment 'Logical table name.',
model_name_cn string comment 'Chinese name of the logical table.',
model_type string comment 'Logical table type.',
control_node_id string comment 'Control node ID. Corresponds to the `node_id` field in the `dim_dataphin_vdm_node` table.',
cron_type string comment 'Scheduling cycle. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.',
fix_time string comment 'Scheduled time.',
is_paused string comment 'Indicates whether the node is paused. Valid values: Y, N.',
field_cnt bigint comment 'Number of fields.',
field_group_cnt bigint comment 'Number of field groups.',
priority string comment 'Priority.',
env string comment 'Environment. Valid values: DEV (development) and PROD (production).',
is_level string comment 'Indicates whether this is a hierarchical dimension.',
level_mid_node_id string comment 'ID of the intermediate node for the hierarchical dimension.',
level_mid_node_name string comment 'Name of the intermediate node for the hierarchical dimension.',
resource_group string comment 'Resource group.',
task_group string comment 'Task group.',
project_id bigint comment 'Project ID.',
project_name string comment 'Project name.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.'
) comment 'Logical table scheduling node.'
partitioned by (ds string comment 'Partition field. Format: yyyymmdd.')
lifecycle 30;
fct_dataphin_model_node_task_di
create table if not exists fct_dataphin_model_node_task_di(
tenant_id bigint comment 'The tenant ID.',
model_task_id string comment 'The ID of the logical table scheduling instance.',
model_node_id string comment 'The ID of the logical table node.',
model_node_name string comment 'The name of the logical table node.',
model_node_name_cn string comment 'The Chinese name of the logical table node.',
model_task_type string comment 'The task run type. Valid values: NORMAL for a periodic run, MANUAL for a manual run, SUPPLEMENT for a backfill, and TEMP for a temporary run.',
start_time string comment 'The instance start time.',
end_time string comment 'The instance end time.',
model_id bigint comment 'The ID of the logical table.',
model_name string comment 'The name of the logical table.',
model_name_cn string comment 'The Chinese name of the logical table.',
model_type string comment 'The type of the logical table.',
control_node_id string comment 'The ID of the control node. This corresponds to the node_id in the dim_dataphin_vdm_node table.',
control_node_task_id string comment 'The ID of the control node instance. This corresponds to the task_id in the dwd_dataphin_vdm_node_taskrun_di table.',
cron_type string comment 'The scheduling period. Valid values: MINUTE, HOUR, DAY, MONTH, and WEEK.',
fix_time string comment 'The scheduled time.',
biz_date string comment 'The business date.',
is_paused string comment 'Specifies if the task is paused. Valid values: Y and N.',
paused_type string comment 'The type of pause.',
initialized string comment 'Specifies if the task is initialized. Valid values: Y and N.',
field_cnt bigint comment 'The number of fields.',
field_group_cnt bigint comment 'The number of field groups.',
priority string comment 'The priority of the task.',
internal_status string comment 'An internal status field. You can ignore this field.',
env string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
is_level string comment 'Specifies if this is a hierarchical dimension.',
level_mid_node_id string comment 'The ID of the intermediate node for the hierarchical dimension.',
level_mid_node_name string comment 'The name of the intermediate node for the hierarchical dimension.',
resource_group string comment 'The resource group.',
task_group string comment 'The task group.',
project_id bigint comment 'The ID of the project space.',
project_name string comment 'The name of the project space.',
biz_unit_id bigint comment 'The ID of the business unit.',
biz_unit_name string comment 'The name of the business unit.',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
) comment 'Contains a record for each logical table scheduling instance.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
Engine
Dataphin MaxCompute data integration
-
The data in this table is for reference only. For the most accurate MaxCompute job information, refer to the source data in MaxCompute.
-
This table contains job data only for MaxCompute projects linked to both a Dataphin project and a computing engine.
-
This table is archived by business date based on the
start_time. As a result, information like status, end time, and cost for cross-day jobs may be inaccurate and is not backfilled. -
Dataphin-related information is available only for tasks submitted through Dataphin. This information is not available for tasks that connect directly to MaxCompute, such as meta-warehouse tasks, Dlink tasks for creating tables or adding partitions, or other tasks executed directly on MaxCompute.
-
The table uses the MaxCompute project name to associate records. For tasks not submitted through Dataphin, the record is associated with the Dataphin project that shares the computing engine of the MaxCompute project. Note that if a MaxCompute project is linked to multiple Dataphin tenants, you may encounter duplicate
inst_idrecords with differentdataphin_tenant_idvalues.
create table if not exists fct_dataphin_maxcompute_task_di(
maxcompute_project_name string comment 'The name of the MaxCompute project.',
task_name string comment 'The name of the MaxCompute task.',
task_type string comment 'The type of the task. Valid values: SQL for an SQL task, CUPID for a Spark or Mars task, SQLCost for an SQL cost estimation task, SQLRT for a query acceleration SQL task, LOT for a MapReduce task, and PS for a PAI Parameter Server task.',
inst_id string comment 'The unique ID of the MaxCompute instance.',
inst_status string comment 'The task''s execution status at the time of data collection (not a real-time value). Valid values: `Terminated` (the task finished), `Failed` (the task failed), `Canceled` (the task was canceled), or `Other` (any other state).',
inst_owner_name string comment 'The name of the Alibaba Cloud account that submitted the task.',
`result` string comment 'The error message if the task failed. This field is null for successful tasks.',
start_time string comment 'The time the task started, in `yyyy-mm-dd hh:mi:ss` format.',
end_time string comment 'The time the task ended, in `yyyy-mm-dd hh:mi:ss` format.',
input_records bigint comment 'The number of input records.',
output_records bigint comment 'The number of output records.',
input_bytes bigint comment 'The volume of input data, in bytes.',
output_bytes bigint comment 'The volume of output data, in bytes.',
input_tables string comment 'A list of input tables. Applies to SQL tasks only.',
output_tables string comment 'A list of output tables. Applies to SQL tasks only.',
operation_text string comment 'The statement executed by the task. Applies to SQL tasks only.',
cost_cpu double comment 'The CPU consumption of the task, where 100 represents 1 core-second (1 core × 1 s). For example, a task using 10 cores for 5 seconds has a cost_cpu of (10 × 5) × 100 = 5000.',
cost_mem double comment 'The memory consumption of the task, measured in MB-seconds (MB × seconds).',
settings string comment 'Information from an upstream scheduler or a user, stored in JSON format.',
dataphin_tenant_id string comment 'The ID of the DataPhin tenant.',
dataphin_project_id bigint comment 'The ID of the DataPhin project.',
dataphin_project_name string comment 'The name of the DataPhin project.',
dataphin_node_id string comment 'The ID of the DataPhin node.',
dataphin_node_name string comment 'The name of the DataPhin node.',
dataphin_node_owner_id string comment 'The ID of the DataPhin account that owns the node.',
dataphin_node_owner_name string comment 'The name of the DataPhin account that owns the node.',
dataphin_task_id string comment 'The ID of the DataPhin task.',
dataphin_taskrun_id string comment 'The ID of the DataPhin task run.',
dataphin_taskrun_submitter_id string comment 'The ID of the account that submitted the DataPhin task run.',
dataphin_taskrun_submitter_name string comment 'The name of the account that submitted the DataPhin task run.',
dataphin_is_system_submitter string comment 'Indicates whether the DataPhin system submitted the task run.',
dataphin_dagrun_type string comment 'The type of the DataPhin DAG run. Valid values: `NORMAL` (a scheduled run), `MANUAL` (a manual run or ad-hoc query), and `SUPPLEMENT` (a backfill run).',
dataphin_biz_date string comment 'The business date of the DataPhin task run, in `yyyymmdd` format.',
platform_name string comment 'The name of the source platform. Fixed value: DATAPHIN'
) comment 'MaxCompute task incremental table'
partitioned by (ds string comment 'The partition field, in `yyyymmdd` format. Partitions are based on the `start_time` field.')
lifecycle 15;
OS
fct_dataphin_os_task_table_inputoutput_di
Granularity is defined by the task execution ID, input table ID, input environment, output table ID, and output environment.
create table if not exists fct_dataphin_os_task_table_inputoutput_di (
tenant_id bigint comment 'Tenant ID.',
os_task_id string comment 'Execution task ID.',
os_instance_id string comment 'Execution instance ID.',
os_job_id string comment 'Execution schedule ID.',
engine_job_id string comment 'The job ID from the execution engine. For MaxCompute, this is the instance ID.',
vdm_task_id string comment 'VDM task ID.',
vdm_taskrun_id string comment 'VDM task run ID.',
os_task_source string comment 'Execution task source.',
biz_id string comment 'Business ID.',
dagrun_type string comment 'The run type of the task. Valid values: NORMAL for a scheduled run, MANUAL for a manual run, SUPPLEMENT for a backfill run, and TEMP for a temporary run.',
operator_type string comment 'The task type. Valid values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, and VIRTUAL.',
node_id string comment 'Node ID.',
sql_type string comment 'SQL task type, such as DML.',
gmt_create string comment 'Creation time.',
run_time string comment 'Task run start time.',
env string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
input_project_id bigint comment 'Input project ID.',
input_project_name string comment 'Input project name.',
input_biz_unit_id bigint comment 'Input business unit ID.',
input_biz_unit_name string comment 'Input business unit name.',
input_table_id string comment 'Input table ID.',
input_table_name string comment 'Input table name.',
input_env string comment 'The input environment. Valid values: DEV for the development environment and PROD for the production environment.',
input_is_logical_table string comment 'Specifies whether the input is a logical table. Valid values: Y and N.',
output_project_id bigint comment 'Output project ID.',
output_project_name string comment 'Output project name.',
output_biz_unit_id bigint comment 'Output business unit ID.',
output_biz_unit_name string comment 'Output business unit name.',
output_table_id string comment 'Output table ID.',
output_table_name string comment 'Output table name.',
output_env string comment 'The output environment. Valid values: DEV for the development environment and PROD for the production environment.',
output_is_logical_table string comment 'Specifies whether the output is a logical table. Valid values: Y and N.'
)comment 'Input and output tables parsed from OS SQL tasks.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 90;
Development
fct_dataphin_submit_record_df
-
This table lists submission records for computation tasks, intelligent modeling, physical table management, and data integration.
-
The combination of
submit_idandsubmit_typeuniquely identifies each record. -
Notes for the
object_typefield:-
If
submit_typeisNODE,object_typecan beMANUALfor a manual task orNORMALfor a scheduled task. -
If
submit_typeisDLINK,object_typecan beOFFLINE_PIPELINEfor offline data integration orREAL_TIME_PIPELINEfor real-time data integration. -
If
submit_typeisLOGIC,object_typecan be one of the following:FACT_LOGIC_TABLEfor a fact logic table,SUMMARY_LOGIC_TABLEfor a summary logic table,DIM_LOGIC_TABLEfor a dimension logic table,BIZ_PROCESSfor a business process,BIZ_OBJECTfor a business object,BIZ_CONDITIONfor a business condition,ATOM_INDEXfor an atomic indicator, orDERIVED_INDEXfor a derived indicator. -
If
submit_typeisTABLE,object_typecan beALTERfor an alter table operation,CREATEfor a create table operation, orDELETEfor a drop table operation.
-
-
The
project_id/project_namefields are null whensubmit_type='LOGIC' and object_type in ('BIZ_OBJECT','BIZ_PROCESS'). For historical records wheresubmit_type='LOGIC' and object_type = 'SUMMARY_LOGIC_TABLE', these fields are also null. -
In Basic mode (
project_mode=BASIC), submissions are published immediately, so allis_publishedflags are set to 'N'.
create table if not exists fct_dataphin_submit_record_df(
submit_id string comment 'The submission ID.',
submit_type string comment 'The submission type. Valid values: NODE for a code task, LOGIC for intelligent modeling, TABLE for physical table management, and DLINK for data integration.',
tenant_id bigint comment 'The tenant ID.',
object_id string comment 'The object ID. The specific type of this ID depends on the submit_type and object_type fields.',
object_type string comment 'The type of the submitted object. See the documentation for details.',
object_name string comment 'The object name.',
object_version string comment 'The object version.',
object_content string comment 'The content of the submitted object.',
object_owner_id string comment 'The object owner ID.',
object_owner_name string comment 'The object owner name.',
submit_comment string comment 'The submission comment.',
submit_user_id string comment 'The submitting user ID.',
submit_user_name string comment 'The submitting user''s name.',
is_published string comment 'The published status. Valid values: Y or N.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
project_mode string comment 'The project mode. Valid values: BASIC for basic mode and DEV_PROD for development/production mode.',
biz_unit_id string comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
submit_time string comment 'The submission time.'
)comment 'Contains all submission records in Dataphin.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 7;
fct_dataphin_publish_record_df
The object_type field has the following values:
-
Development:
-
VIRTUAL: virtual task;SHELL: shell script;PYTHON: Python task;UDF: user-defined function;IDE_RESOURCE: custom resource;MAX_COMPUTE_SQL: MaxCompute SQL task;HIVE_SQL: Hive SQL task;DATABASE_SQL: database SQL task;SPARK_JAR_ON_HIVE: Spark on Hive task;SPARK_JAR_ON_MAX_COMPUTE: Spark on MaxCompute task;STARROCKS_SQL: StarRocks SQL task;ADB_PG_SQL: ADB for PostgreSQL SQL task;DATA_PROCESS_PHYSICAL_TABLE: physical table;TASK_TEMPLATE: computing template;ARGODB_SQL: ArgoDB SQL task;HOLOGRES_SQL: Hologres SQL task;IMPALA_SQL: Impala SQL task.
-
-
Integration:
-
OFFLINE_PIPELINE: offline data integration;REAL_TIME_PIPELINE: real-time data integration.
-
-
Intelligent Modeling:
-
FACT_LOGIC_TABLE: fact logical table;SUMMARY_LOGIC_TABLE: summary logical table;DIM_LOGIC_TABLE: dimension logical table;BIZ_PROCESS: business process;BIZ_OBJECT: business object;BIZ_CONDITION: business qualifier;ATOM_INDEX: atomic metric;DERIVED_INDEX: derived metric;LOGICAL_TABLE_COLUMN: logical table column;DATA_DOMAIN: data domain;BIZ_UNIT: business unit.
-
-
Real-time:
-
FLINK_SQL: real-time SQL task;META_TABLE: real-time metatable;MIRROR_TABLE: real-time mirror table.
-
-
Any value prefixed with
QUALITY_: Pertains to data quality. -
Any value prefixed with
DATA_SECURITY_: Pertains to data security. -
Any value prefixed with
DATA_STANDARD_: Pertains to data standard. -
Any value prefixed with
QD_: Pertains to the intelligent decision engine.
create table if not exists fct_dataphin_publish_record_df(
tenant_id bigint comment 'Tenant ID.',
object_id string comment 'Object ID.',
object_type string comment 'Object type.',
object_name string comment 'Object name.',
object_version string comment 'Object version.',
change_type string comment 'Change type. Valid values: CREATE, UPDATE, DELETE.',
publish_id string comment 'Publish ID.',
publish_name string comment 'Publish name.',
publish_status string comment 'Publish status. Valid values: NOT_VERIFIED, APPROVING, APPROVE_FAILED, APPROVE_SUCCESS, APPROVE_REVOKED, CHECK_FAILED, CHECK_SUCCESS, PUBLISHING, FAILED, SUCCESS.',
publish_error string comment 'Publish error message.',
publisher_id string comment 'Publisher ID.',
publisher_name string comment 'Publisher name.',
group_id string comment 'Publish group ID.',
group_name string comment 'Publish group name.',
finish_time string comment 'Completion time.',
data_domain_id bigint comment 'Data domain ID.',
data_domain_name string comment 'Data domain name.',
node_id string comment 'Node ID. Corresponds to the node_id in the dim_dataphin_node table.',
gmt_create string comment 'Creation time.',
gmt_modify string comment 'Modification time.',
project_id bigint comment 'Project ID.',
project_name string comment 'Project name.',
biz_unit_id bigint comment 'Business unit ID.',
biz_unit_name string comment 'Business unit name.',
extend_properties string comment 'Extended object properties.',
tag string comment 'Tag for data migration and marking. Valid values: NORMAL_SUBMIT (user-initiated, UI-visible), SYSTEM_SUBMIT (system-initiated, not UI-visible), UPGRADE (system upgrades).',
public_status_info string comment 'Extended publish status information in JSON format.',
sub_item_list string comment 'JSON string of sub-items for display only.',
sync_publish string comment 'Indicates if the publish is synchronous. Valid values: SYNC, ASYNC.',
biz_tag string comment 'Business tag. Example: BATCH_PUBLISH.'
)comment 'Dataphin publish records.'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 15;
Permissions
fct_dataphin_security_auth_permission_record_df
-
Data service permissions are not currently included.
-
Description of the
resource_typefield:-
Planning:
BIZ_UNIT: business unit;DATA_DOMAIN: data domain;PROJECT: project;GLOBAL_PARAM: global parameter. -
Data source:
DATA_SOURCE: data source. -
Physical table:
PHYSICAL_TABLE: physical table. -
Function:
FUNCTION: function. -
Logical table:
LOGICAL_FIELD: logical table field;PHYSICAL_FIELD: physical table field. -
Data service:
OS_API: data service API;OS_API_FIELD: data service API field;OS_APP: data service application;OS_LOGIC_UNIT: data service unit;OS_DS: data service data source;OS_DS_TABLE: data service data source table;OS_DS_FIELD: data service data source table field;OS_DS_LOGICAL_TABLE: data service logical table;OS_DS_LOGICAL_FIELD: data service logical table field. -
Resource group:
RS_RESOURCE_GROUP: RS resource group. -
Meta table:
REALTIME_LOGICAL_TABLE: real-time meta table;REALTIME_LOGICAL_FIELD: real-time meta table field. -
Mirror table:
REALTIME_MIRROR_TABLE: real-time mirror table;REALTIME_MIRROR_FIELD: real-time mirror table field. -
View:
FEATURE: physical view;PHYSICAL_VIEW: physical view;LOGICAL_VIEW: logical view;PHYSICAL_MATERIALIZED_VIEW: materialized view. -
Secret key:
SECRET_KEY: secret key. -
Tag:
QD_SERVICE_FEATURE: service tag;QD_OFFLINE_FEATURE: offline tag;QD_REALTIME_FEATURE: real-time tag;QD_ADVANCED_FEATURE: advanced tag;QD_CLUSTER: cluster;QD_EVENT: event;QD_OFFLINE_SERVICE: offline tag service task.
-
-
We recommend that you do not rely on the
resource_ownersfield.-
Before V4.3, this field listed the approvers from the default
system approval template(the initial, unedited version). Modifying the template did not update this field. Fordata sourcetables, this field was empty. -
Starting in V4.3:
-
For table resources, the field now contains the
table ownerconfigured on the asset inventory page in Dataphin. Fordata sourcetables, this field remains empty. -
For
functionandsecret keyresources, the logic is unchanged. The field still lists approvers from thesystem approval template.
-
-
create table if not exists fct_dataphin_security_auth_permission_record_df (
record_id bigint comment 'The unique ID of the permission record.',
tenant_id bigint comment 'The ID of the tenant.',
permission_source string comment 'The permission source. Valid values: `BY_APPLY` (from a permission application) and `BY_GRANT` (from a direct permission grant).',
account_type string comment 'The account type. Valid values: `PERSONAL` (personal account), `USER_GROUP` (user group), and `PRODUCE` (production account).',
account_user_id string comment 'The ID of the account.',
account_user_name string comment 'The name of the account.',
account_project_id string comment 'The ID of the project associated with the production account.',
account_project_name string comment 'The name of the project associated with the production account.',
resource_id string comment 'The ID of the resource associated with the permission.',
resource_type string comment 'The type of the resource.',
resource_env string comment 'The resource environment. Valid values: `PROD` and `DEV`.',
resource_name string comment 'The name of the resource.',
resource_cn_name string comment 'The Chinese name of the resource.',
resource_owners string comment 'The owners of the resource.',
resource_project_id string comment 'The ID of the project that contains the resource.',
resource_project_name string comment 'The name of the project that contains the resource.',
resource_project_env string comment 'The environment of the project. Valid values: `PROD` and `DEV`.',
resource_project_cn_name string comment 'The Chinese name of the project that contains the resource.',
resource_biz_unit_id string comment 'The ID of the business unit that contains the resource.',
resource_biz_unit_env string comment 'The environment of the business unit. Valid values: `PROD` and `DEV`.',
resource_biz_unit_name string comment 'The name of the business unit that contains the resource.',
resource_biz_unit_cn_name string comment 'The Chinese name of the business unit that contains the resource.',
resource_properties string comment 'Resource properties in JSON format.',
operate_periods string comment 'Detailed permission information, in JSON format.',
nearest_period_type string comment 'The expiration type for the permission closest to expiry. Valid values: `DAYS_30` (30 days), `DAYS_90` (90 days), `DAYS_180` (180 days), `DAYS_365` (365 days), `LONG_TERM`, and `CUSTOM`.',
nearest_period_end string comment 'The expiration date for the permission closest to expiry.',
is_invalid string comment 'Indicates if the permission is invalid. Valid values: `Y` (invalid) and `N` (valid).',
invalid_date string comment 'The timestamp when the permission became invalid.',
auth_scope string comment 'The authorization scope. Valid values: `SELECT_TABLE` (a specific table), `PROJECT_ALL_TABLE` (all tables in a project), and `BIZ_UNIT_ALL_LOGIC_TABLE` (all logical tables in a business unit).',
creator_id string comment 'The ID of the user who created the record.',
creator_name string comment 'The name of the user who created the record.',
modifier_id string comment 'The ID of the user who last modified the record.',
modifier_name string comment 'The name of the user who last modified the record.',
gmt_create string comment 'The timestamp when the record was created.',
gmt_modified string comment 'The timestamp when the record was last modified.',
column_level_auth string comment 'Column-level permission in JSON format. This field applies only to table-level permission applications.'
)comment 'Stores Dataphin permission records.'
partitioned by (ds string comment 'The date partition key (format: `yyyymmdd`).')
stored as orc;
Baseline
dim_dataphin_baseline_object
create table if not exists dim_dataphin_baseline_object(
tenant_id bigint comment 'The tenant ID.',
baseline_id bigint comment 'The baseline ID.',
baseline_name string comment 'The baseline name.',
baseline_type string comment 'The baseline type. Valid values: DAY for a daily baseline, HOUR for an hourly baseline, and EMPTY for an empty baseline.',
baseline_description string comment 'The baseline description.',
promise_times string comment 'The promised completion time, in JSON format.',
warning_times string comment 'The alert time, in JSON format.',
baseline_priority bigint comment 'The baseline priority. Valid values: 1 (Lowest), 2 (Low), 3 (Medium), 4 (High), and 5 (Highest).',
warning_setting string comment 'The baseline alert configuration, in JSON format.',
baseline_warning_switch string comment 'Indicates whether baseline alerts are enabled. Valid values: `TRUE` and `FALSE`.',
baseline_owner_id string comment 'The ID of the baseline owner.',
baseline_owner_name string comment 'The name of the baseline owner.',
baseline_modifier_id string comment 'The ID of the last modifier.',
baseline_modifier_name string comment 'The name of the last modifier.',
object_type string comment 'The object type. Valid values: PHYSICAL_NODE for a physical node and LOGIC_FIELD for a logical table task.',
object_id string comment 'The object ID. Its value depends on `object_type`.',
object_name string comment 'The object name.',
object_catalog string comment 'The object''s catalog, which can be a project or business unit name.',
object_owner_id string comment 'The ID of the object owner.',
object_owner_name string comment 'The name of the object owner.',
object_project_id string comment 'The ID of the project that contains the physical node.',
object_project_name string comment 'The name of the project that contains the physical node.',
object_biz_unit_id string comment 'The ID of the business unit that contains the logical table task.',
object_biz_unit_name string comment 'The name of the business unit that contains the logical table task.',
object_field_names string comment 'A comma-separated list of field names for the logical table.',
object_fix_produce_time string comment 'The object''s fixed production time.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
) comment 'Stores baseline monitoring objects.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
Real-time development
dim_dataphin_stream_task
-
For tasks submitted in version 5.1.0 or earlier, the
task_resource_configurationfield may be NULL. We recommend saving and resubmitting them. -
Versions 5.3.0 and earlier do not include real-time integration tasks.
create table if not exists dim_dataphin_stream_task (
tenant_id bigint comment 'Tenant ID.'
,task_id bigint comment 'Task ID.'
,task_name string comment 'Task name.'
,env string comment 'Environment mode. Valid values: DEV, PROD.'
,task_type string comment 'Task type. Valid values: FLINK_SQL, FLINK_DATASTREAM.'
,task_content string comment 'Task code.'
,task_resource_configuration string comment 'Task resource configuration.'
,resource_rellocation_type string comment 'Resource configuration type. Valid values: System Recommended, Custom.'
,customed_resource_context string comment 'Custom resource configuration content.'
,task_state string comment 'Task status. 0: Draft, 1: Submitting, 2: Submitted, 3: Published, 100: In development.'
,task_description string comment 'Task description.'
,project_id bigint comment 'Project ID.'
,project_name string comment 'Project name.'
,biz_unit_id string comment 'Business unit ID.'
,biz_unit_name string comment 'Business unit name.'
,cluster_queue_info string comment 'Cluster information.'
,owner_id string comment 'Owner ID.'
,owner_name string comment 'Owner name.'
,modifier_id string comment 'Modifier ID.'
,modifier_name string comment 'Last modifier name.'
,gmt_create string comment 'Creation time.'
,gmt_modified string comment 'Last modified time.'
)
comment 'Statistics table for real-time tasks. Granularity: tenant_id + task_id.'
partitioned by (
ds string comment 'Partition field, in `yyyymmdd` format.'
)
lifecycle 30;
dim_dataphin_stream_task_related_table
-
table_type: The table type. Valid values:REAL_TIME_LOGIC_TABLE(a logic table),REAL_TIME_MIRROR_TABLE(a mirror table),PHYSICAL_TABLE(a physical table), andDATA_SOURCE_PHYSICAL_TABLE(a data source table). -
table_name: For data sources with schemas, such as Hologres or Oracle, the returned table name includes the schema if it is referenced in the task code. Otherwise, the schema is omitted. -
task_mode: The task mode. Valid values:STREAM(real-time mode) andBATCH(batch mode).A real-time task is created on the O&M page. If you also enable batch mode, an additional scheduled task is also created.
-
Granularity:
-
tenant_id+task_id: Uniquely identifies a real-time task. -
tenant_id+task_id+input_table_id: Uniquely identifies an input table for a real-time task. A single task can have multiple input tables but only one output table. -
tenant_id+task_id+input_table_id+node_id: Uniquely identifies an O&M node. The node can belong to either the real-time or batch mode of the task, as indicated by thetask_modefield.
-
-
Real-time integration tasks are not available in versions V5.3.0 and earlier.
create table if not exists dim_dataphin_stream_task_related_table(
tenant_id bigint comment'Tenant ID.',
task_id string comment'Task ID.',
task_name string comment'Task name.',
task_env string comment'Task environment. Valid values: DEV, PROD.',
output_table_name string comment'Output table name. If the table has a schema (e.g., a Hologres table), the format is schema.table_name.',
output_table_type string comment'Output table type. Valid values: REAL_TIME_LOGIC_TABLE, REAL_TIME_MIRROR_TABLE, PHYSICAL_TABLE, DATA_SOURCE_PHYSICAL_TABLE.',
output_project_id bigint comment'Output project ID.',
output_project_name string comment'Output project name.',
output_biz_unit_id bigint comment'Output business unit ID.',
output_biz_unit_name string comment'Output business unit name.',
output_env string comment'Output environment. Valid values: DEV, PROD.',
output_data_source_id bigint comment'ID of the output data source or compute source.',
output_data_source_name string comment'Name of the output data source or compute source. Specifies the compute source for Dataphin physical tables and the data source for data source tables. Populated for meta tables but empty for mirror tables.',
output_data_source_type string comment'Type of the output data source or compute source.',
input_table_name string comment'Input table name. If the table has a schema (e.g., a Hologres table), the format is schema.table_name.',
input_table_type string comment'Input table type. Valid values: REAL_TIME_LOGIC_TABLE, REAL_TIME_MIRROR_TABLE, PHYSICAL_TABLE, DATA_SOURCE_PHYSICAL_TABLE.',
input_project_id bigint comment'Input project ID.',
input_project_name string comment'Input project name.',
input_biz_unit_id bigint comment'Input business unit ID.',
input_biz_unit_name string comment'Input business unit name.',
input_env string comment'Input environment. Valid values: DEV, PROD.',
input_data_source_id bigint comment'ID of the input data source or compute source.',
input_data_source_name string comment'Name of the input data source or compute source. Specifies the compute source for Dataphin physical tables and the data source for data source tables. Populated for meta tables but empty for mirror tables.',
input_data_source_type string comment'Type of the input data source or compute source.',
task_mode string comment'Task mode. Valid values: STREAM, BATCH.',
node_id string comment'O&M node ID. Corresponds to node_id in the dim_dataphin_node table.',
project_id bigint comment'Project ID.',
project_name string comment'Project name.',
biz_unit_id bigint comment'Business unit ID.',
biz_unit_name string comment'Business unit name.',
owner_id string comment'Owner ID.',
owner_name string comment'Owner name.'
)comment'Association table for real-time tasks and related tables.'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 30;
fct_dataphin_stream_task_submit_record_df
Versions V5.3.0 and earlier do not support real-time integration tasks.
create table if not exists fct_dataphin_stream_task_submit_record_df(
tenant_id bigint comment 'The tenant ID.',
submit_id string comment 'The submission ID.',
task_id string comment 'The task ID.',
task_name string comment 'The task name.',
task_type string comment 'The task type. Valid values: `FLINK_SQL` for an SQL task and `FLINK_DATASTREAM` for a streaming task.',
submit_version string comment 'The submission version.',
submit_content string comment 'The submission content.',
submit_comment string comment 'The submission comment.',
owner_id string comment 'The owner ID.',
owner_name string comment 'The owner name.',
modifier_id string comment 'The modifier ID.',
modifier_name string comment 'The name of the last modifier.',
is_published string comment 'Indicates whether the task is published. Valid values: Y and N.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The last modified time.'
)comment 'Contains all records of real-time task submissions in Dataphin. Each record is uniquely identified by the combination of `tenant_id` and `submit_id`.'
partitioned by (ds string comment 'The partition field, formatted as `yyyymmdd`.')
lifecycle 7;
Public
User
dim_dataphin_tenant_user
create table if not exists dim_dataphin_tenant_user(
tenant_id bigint comment 'Tenant ID.',
user_id string comment 'Dataphin user ID.',
account_name string comment 'Account name.',
account_type string comment 'Account type.',
display_name string comment 'Display name.',
nickname string comment 'Nickname.',
real_name string comment 'Real name.',
source_user_id string comment 'User ID in the source system.',
source_type string comment 'Source system type.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.',
mail string comment 'User''s email.',
creator_id string comment 'Creator''s Dataphin user ID.',
creator_name string comment 'Creator''s Dataphin account name.',
modifier_id string comment 'Last modifier''s Dataphin user ID.',
modifier_name string comment 'Last modifier''s Dataphin account name.',
user_status string comment 'User status. Valid values: `NORMAL` (default), `DELETE` (deleted), and `DEACTIVATE` (inactive).',
system_role_list string comment 'System roles (comma-separated). Valid values: `DATASOURCE_MANAGER` (data source administrator), `DATA_STANDARD_MANAGER` (data standard administrator), `EXPORT_ADMIN` (import/export administrator), `LABELS_BUSINESS_PLANNER` (label planner), `QUALITY_MANAGER` (quality administrator), `SECURITY_ADMIN` (security administrator), `SUPER_ADMIN` (super administrator), and `SYSTEM_ADMIN` (system administrator).',
system_role_list_name string comment 'Display names of system roles (comma-separated).'
) comment 'Tenant user table.'
partitioned by (ds string comment 'Partition column. Format: yyyymmdd.')
lifecycle 30;
dim_dataphin_project_user
create table if not exists dim_dataphin_project_user(
tenant_id bigint comment 'Tenant ID.',
user_id string comment 'Dataphin user ID.',
account_name string comment 'Account name. For a production account, this is the project name.',
display_name string comment 'Display name.',
nickname string comment 'Nickname.',
real_name string comment 'Real name.',
project_id bigint comment 'Project ID.',
project_name string comment 'Project name.',
env string comment 'Environment. Valid values: DEV and PROD.',
role_name string comment 'Project role. Valid values: PROJECT_OWNER, PROJECT_ADMIN, PROJECT_DEVELOPER, PROJECT_GUEST, PROJECT_ANALYST, PROJECT_PRODUCT, PROJECT_VIEWER, PROJECT_OPS, and PROJECT_BUSINESS_MEMBER.',
role_display_name string comment 'Display name of the project role.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.'
) comment 'Project user table.'
partitioned by (ds string comment 'Partition field in yyyymmdd format.')
lifecycle 30;
dim_dataphin_security_user_bind
This currently only includes user and user group relationships.
create table if not exists dim_dataphin_security_user_bind(
tenant_id bigint comment 'The ID of the tenant.',
account_id string comment 'The ID of the account.',
account_name string comment 'The name of the account.',
account_type string comment 'The type of the account.',
bind_type string comment 'The binding type, e.g., USER_GROUP for a user group.',
bind_id string comment 'The ID of the bound object, whose type is specified by bind_type.',
bind_env string comment 'The environment. Valid values: PROD, DEV.',
bind_name string comment 'The name of the bound object.',
bind_is_active string comment 'Indicates whether the bound object is active. Valid values: Y, N.',
bind_desc string comment 'The description of the bound object.',
bind_role_name string comment 'The name of the bound role.',
creator_id string comment 'The ID of the creator.',
creator_name string comment 'The name of the creator.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
)comment 'Stores the binding relationships for Dataphin users.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 15;
Project
dim_dataphin_project
create table if not exists dim_dataphin_project(
tenant_id bigint comment 'The tenant ID.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project''s English name.',
project_name_cn string comment 'The project''s Chinese name.',
project_desc string comment 'The project description.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
owner_id string comment 'The project owner ID.',
owner_name string comment 'The project owner name.',
project_mode string comment 'The project mode. Valid values: BASIC for basic mode and DEV_PROD for development/production mode.',
env string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
ns_tag string comment 'The space type. Valid values: PUBLIC for the intermediate layer, APPLICATION for the application layer, BASE for the source layer, and GENERAL for the common layer.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
) comment 'The project space table.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_project_compute_engine
create table if not exists dim_dataphin_project_compute_engine(
tenant_id bigint comment 'The ID of the tenant.',
project_id bigint comment 'The ID of the project.',
project_name string comment 'The English name of the project.',
compute_engine_id bigint comment 'The ID of the compute engine.',
compute_engine_name string comment 'The name of the compute engine.',
env string comment 'The project environment. `DEV` represents the development environment; `PROD` represents the production environment.',
compute_engine_support_type string comment 'The supported type of the compute engine. `OFFLINE` indicates an offline engine, `REALTIME` a real-time engine, and `ANALYTICAL` an analytical engine.',
compute_engine_desc string comment 'The description of the compute engine.',
db_type string comment 'The type of the compute engine. Possible values include MAX_COMPUTE, HADOOP, FLINK, ANALYTICDB_FOR_PG, HOLOGRES, and VVP.',
db_name string comment 'The name of the underlying project or database for the compute engine. For example, for MaxCompute or Hive, this is the project name; for AnalyticDB for PostgreSQL, this is the database name.',
shema_name string comment 'The schema name.',
maxcompute_project_name string comment 'The name of the MaxCompute project.',
end_point string comment 'The access endpoint.',
config_plaintext string comment 'The compute engine configuration in JSON format.',
owner_id string comment 'The ID of the compute engine owner.',
owner_name string comment 'The name of the compute engine owner.',
deploy_type string comment 'The deployment mode of the data source. Possible values include public deployment, RDS deployment, and self-hosted on ECS.',
access_type string comment 'The access type of the data source. Possible values include public access and VPC access.',
gmt_create string comment 'The creation timestamp of the binding.',
gmt_modified string comment 'The last modification timestamp of the binding.'
) comment 'This table stores the binding relationship between projects and compute engines.'
partitioned by (ds string comment 'The partition column, in yyyymmdd format.')
lifecycle 30;
Data sources
dim_dataphin_data_source
create table if not exists dim_dataphin_data_source(
tenant_id bigint comment 'The ID of the tenant.',
data_source_id bigint comment 'The ID of the data source.',
data_source_name string comment 'The name of the data source.',
data_source_desc string comment 'The description of the data source.',
data_source_type string comment 'The type of the data source. Examples: MaxCompute, MySQL, SQL Server, Hive, PostgreSQL, E-MapReduce, Oracle, HDFS, DRDS, AnalyticDB for MySQL, MongoDB, Flink, AnalyticDB for PostgreSQL, Hologres, and HBase.',
scope string comment 'The scope of the data source. Valid values: OFFLINE (offline), STREAMING (real-time), ALL (offline and real-time), DATA_DISTILL (data extraction), and DIP_META_SYNC (metadata warehouse).',
jdbc_url string comment 'The connection URL. This is the endpoint for MaxCompute or the JDBC URL for sources like Hive, MySQL, Oracle, and PostgreSQL.',
db_name string comment 'The project or database name within the compute engine. This corresponds to the project name for MaxCompute and Hive, or the database name for MySQL and Oracle.',
shema_name string comment 'The schema name.',
customized_type string comment 'A user-defined type.',
env string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
owner_id string comment 'The ID of the project owner.',
owner_name string comment 'The name of the project owner.',
gmt_create string comment 'The timestamp when the record was created.',
gmt_modified string comment 'The timestamp of the last modification.',
data_source_catalog string comment 'A unique identifier for the data source, prefixed with ds_.',
tag_name_list string comment 'A comma-separated list of data source tag names.'
) comment 'The data source table.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_data_source_config
create table if not exists dim_dataphin_data_source_config(
tenant_id bigint comment 'ID of the tenant.',
data_source_id bigint comment 'ID of the data source.',
data_source_name string comment 'Name of the data source.',
data_source_desc string comment 'Description of the data source.',
data_source_type string comment 'Specifies the data source type. Examples: MAX_COMPUTE, MYSQL, SQL_SERVER, HIVE, ONE_SERVICE, POSTGRE_SQL, FTP, EMR_HIVE, ORACLE, HDFS, DRDS, ANALYTICDB, MONGODB, FLINK, ANALYTICDB_FOR_PG, HOLOGRES, and HBASE_1_1_X.',
env string comment 'Specifies the project environment. Valid values: DEV (development environment) and PROD (production environment).',
key_name string comment 'Specifies the scope of the data source. Valid values: OFFLINE (offline data source), STREAMING (streaming data source), ALL (both offline and streaming), DATA_DISTILL (data distillation source), and DIP_META_SYNC (metadata warehouse source).',
value string comment 'The connection value: an endpoint for MaxCompute or a JDBC URL for sources like HIVE, MYSQL, ORACLE, and POSTGRE_SQL.',
gmt_create string comment 'The time the record was created.',
gmt_modified string comment 'The time the record was last modified.'
) comment 'Data source configuration table.'
partitioned by (ds string comment 'Specifies the partition field, formatted as `yyyymmdd`.')
lifecycle 30;
Data asset
Physical table
dim_dataphin_table
-
If a physical table is the source table for a registered metric, the
is_from_logicalfield is set to 'Y'. -
The table access count is the number of times a table is accessed via SQL queries in Dataphin. To calculate this value, use the
countfunction on thefct_dataphin_os_task_table_inputoutput_ditable for a specified time range.
create table if not exists dim_dataphin_table(
tenant_id bigint comment 'The ID of the tenant.',
project_id bigint comment 'The ID of the Dataphin project.',
project_name string comment 'The name of the Dataphin project.',
table_name string comment 'The name of the table.',
table_comment string comment 'Description of the table.',
table_type string comment 'Object type. Valid values: TABLE and VIEW.',
is_from_logical string comment 'Specifies if the physical table is derived from a logical model. Valid values: Y and N.',
is_partitioned string comment 'Specifies if this is a partitioned table. Valid values: Y and N.',
last_ddl_time string comment 'Time of the last DDL modification.',
last_dml_time string comment 'Time of the last data modification.',
view_text string comment 'The SQL definition of the view.',
physical_data_size bigint comment 'The physical storage size of the table.',
data_size bigint comment 'The logical storage size of the table.',
create_time string comment 'The creation time of the table.',
modify_time string comment 'The last modification time.',
record_num bigint comment 'Number of rows.',
life_cycle bigint comment 'Table lifecycle in days. This field applies only to MaxCompute tables.',
compute_engine_id bigint comment 'The ID of the compute engine.',
compute_engine_name string comment 'The name of the compute engine.',
env string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
compute_engine_use_type string comment 'The usage type of the compute engine. Valid values: OFFLINE for an offline engine, REALTIME for a real-time engine, and ANALYTICAL for an acceleration engine.',
compute_engine_desc string comment 'Description of the compute engine.',
db_type string comment 'The type of the compute engine. Valid values include max_compute, hadoop, flink, analyticdb_for_pg, hologres, and vvp.',
db_name string comment 'Physical namespace of the compute engine. For MaxCompute and Hive, this refers to the project name; for AnalyticDB for PostgreSQL, it is the database name.',
table_id string comment 'The Dataphin ID for the table. This field is optional and used for system integration.',
owner_id string comment 'The ID of the table owner.',
owner_name string comment 'The name of the table owner.',
is_external_table string comment 'Specifies if this is an external table. Valid values: Y and N.',
schema_name string comment 'The schema name within the compute engine, used for sources such as Hologres and AnalyticDB for PostgreSQL.',
last_access_time string comment 'The last access time.',
tags_list string comment 'Comma-separated list of tags.',
storage_format string comment 'The storage format. For Hive tables, this specifies the file format; for other tables, it specifies the data source type.',
favorites_count bigint comment 'Number of favorites.',
pv_count bigint comment 'The number of page views.',
table_location string comment 'The storage path of the table. This field is populated for Hive and other external tables.',
last_dml_time_dataphin string comment 'Last data modification time, tracked internally by Dataphin and supported by all engines.'
) comment 'Metadata for the physical table.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_table_partition
If a physical table is the source table for a registered metric, the is_from_logical field is set to 'Y'.
create table if not exists dim_dataphin_table_partition(
tenant_id bigint comment 'The ID of the tenant.',
project_id bigint comment 'The ID of the Dataphin project.',
project_name string comment 'The name of the Dataphin project.',
table_name string comment 'The name of the table.',
partition_name string comment 'The name of the partition.',
is_from_logical string comment 'Indicates whether the physical table is generated from a standardized modeling process. Valid values: Y and N.',
last_ddl_time string comment 'The last DDL modification time of the table.',
last_dml_time string comment 'The last data modification time of the table.',
physical_data_size bigint comment 'The physical storage size of the table.',
data_size bigint comment 'The logical storage size of the table.',
create_time string comment 'The creation time.',
modify_time string comment 'The last modification time.',
record_num bigint comment 'The record count.',
compute_engine_id bigint comment 'The ID of the compute engine.',
compute_engine_name string comment 'The name of the compute engine.',
env string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
compute_engine_support_type string comment 'The type of the compute engine. Valid values: OFFLINE for an offline engine, REALTIME for a real-time engine, and ANALYTICAL for an acceleration engine.',
compute_engine_desc string comment 'The description of the compute engine.',
db_type string comment 'The type of the compute engine, such as MaxCompute, Hadoop, Flink, AnalyticDB for PostgreSQL, Hologres, and Ververica Platform.',
db_name string comment 'The name of the physical space for the compute engine. For example, this is the project name for MaxCompute and Hive, and the database name for AnalyticDB for PostgreSQL.',
table_id string comment 'The optional ID of the table in Dataphin, used for integration with other systems.',
owner_id string comment 'The ID of the table owner.',
owner_name string comment 'The name of the table owner.',
schema_name string comment 'The schema name of the compute engine, applicable to engines like Hologres and AnalyticDB for PostgreSQL.'
) comment 'Contains metadata for physical table partitions.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_column
When a physical table is the source table for a registered mounted metric, the is_from_logical field is set to 'Y'.
create table if not exists dim_dataphin_column(
tenant_id bigint comment 'The ID of the tenant.',
project_id bigint comment 'The ID of the Dataphin project.',
project_name string comment 'The name of the Dataphin project.',
table_name string comment 'The name of the table.',
column_name string comment 'The name of the column.',
column_type string comment 'The data type of the column.',
column_comment string comment 'The comment for the column or partition key.',
column_seq bigint comment 'The sequence number of the column.',
is_pk string comment 'Specifies if the column is a primary key. Valid values: Y and N.',
is_from_logical string comment 'Specifies if the physical table is generated from a standardized modeling process. Valid values: Y and N.',
is_partitioned string comment 'Specifies if this is a partitioned table. Valid values: Y and N.',
compute_engine_id bigint comment 'The ID of the compute engine.',
compute_engine_name string comment 'The name of the compute engine.',
env string comment 'The project environment. Valid values: DEV (development environment) and PROD (production environment).',
compute_engine_support_type string comment 'The usage type of the compute engine. Valid values: OFFLINE (offline engine), REALTIME (real-time engine), and ANALYTICAL (acceleration engine).',
compute_engine_desc string comment 'The description of the compute engine.',
db_type string comment 'The type of the compute engine. Possible values: MaxCompute, Hadoop, Flink, AnalyticDB for PostgreSQL, Hologres, and Ververica Platform.',
db_name string comment 'The name of the physical space for the compute engine. For example, this refers to the project name for MaxCompute and Hive, or the database name for AnalyticDB for PostgreSQL.',
table_id string comment 'Optional. The ID of the Dataphin table, used for integration with other systems.',
owner_id string comment 'The ID of the table owner.',
owner_name string comment 'The name of the table owner.',
classify string comment 'The data classification. Valid values: S (company data), B (business data), and C (personal data). Custom values are also allowed.',
security_level string comment 'The security level. Valid values: L1 (public), L2 (internal), L3 (confidential data), and L4 (top-secret data). Custom values are also allowed.'
) comment 'Physical table column metadata'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
Data service
dim_dataphin_service_logic_unit
create table if not exists dim_dataphin_service_logic_unit(
tenant_id bigint comment 'Tenant ID',
logic_unit_id bigint comment 'Logic unit ID',
logic_unit_no string comment 'Logic unit identifier',
logic_unit_name string comment 'Logic unit name',
logic_unit_type string comment 'Logic unit type. Valid values: `SINGLE_PHYSICAL` for a single physical table, `MULTI_PHYSICAL` for multiple physical tables, and `DATAPHIN_LOGIC` for a Dataphin logical table.',
logic_unit_desc string comment 'Logic unit description',
model_type string comment 'Mode. Valid values: `basic` (basic mode) and `prodDev` (development/production mode).',
status string comment 'Status. Valid values: `1` (online) and `2` (offline).',
delete_flag string comment 'Delete flag. `0`: not deleted, `1`: deleted.',
logic_unit_version bigint comment 'Logic unit version',
lock_version bigint comment 'Lock version. Used for optimistic locking.',
group_id bigint comment 'Group ID',
group_name string comment 'Group name',
record_num bigint comment 'Maximum number of records to return.',
source_tables string comment 'Source tables. Use a semicolon (;) to separate multiple tables.',
creator_id string comment 'Creator ID',
creator_name string comment 'Creator name',
create_time string comment 'Creation time',
modifier_id string comment 'Last modifier ID',
modifier_name string comment 'Last modifier name',
modified_time string comment 'Last modified time',
proj_id bigint comment 'Data service project ID',
proj_name string comment 'Data service project name'
) comment 'Data service logic unit table for the production environment'
partitioned by (ds string comment 'Partition key, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_service_logic_unit_column
create table if not exists dim_dataphin_service_logic_unit_column(
tenant_id bigint comment 'Tenant ID.',
data_source_id bigint comment 'Data source ID.',
data_source_type string comment 'Data source type. Valid values: MYSQL, ADS, ES, HBASE, Oracle, SQL Server, PgSQL, MongoDB, Impala, and adb3.0.',
data_source_name string comment 'Data source name.',
data_source_url string comment 'Data source URL.',
table_name string comment 'Physical table name.',
original_column string comment 'Source column name.',
original_column_type string comment 'Data type of the source column.',
mapping_column string comment 'Mapped column name.',
column_type string comment 'Data type of the logic unit column.',
datetime_format string comment 'Date and time format.',
delete_flag string comment 'Logical delete flag. 0: not deleted, 1: deleted.',
family_name string comment 'Column family name for HBase data sources.',
rowkey_sort string comment 'Sort order for the HBase rowkey column.',
rowkey_separator string comment 'Separator for the HBase rowkey.',
is_master_table string comment 'Whether it is a master table. Valid values: Y, N.',
is_relate_column string comment 'Whether it is a related column. Valid values: Y, N.',
logic_unit_id bigint comment 'Logic unit ID.',
logic_unit_no string comment 'String identifier of the logic unit.',
logic_unit_name string comment 'Logic unit name.',
logic_unit_type string comment 'Logic unit type. Valid values: SINGLE_PHYSICAL (single physical table), MULTI_PHYSICAL (multiple physical tables), and DATAPHIN_LOGIC (Dataphin logical table).',
logic_unit_status string comment 'Logic unit status. 1: online, 2: offline.',
creator_id string comment 'Creator ID.',
creator_name string comment 'Creator name.',
create_time string comment 'Creation time.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
modified_time string comment 'Last modification time.',
proj_id bigint comment 'Data service project ID.',
proj_name string comment 'Data service project name.'
) comment 'Column information for logic units in the production environment.'
partitioned by (ds string comment 'Partition column, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_service_api
create table if not exists dim_dataphin_service_api (
tenant_id bigint comment 'The ID of the tenant.'
,api_id bigint comment 'The ID of the API.'
,api_no string comment 'The identifier of the API.'
,api_name string comment 'The name of the API.'
,request_method string comment 'The request method. Valid values: GET and LIST.'
,update_rate string comment 'The update frequency. Valid values: DAY, HOUR, and MIN.'
,api_desc string comment 'The description of the API.'
,return_type string comment 'The return type. The value is JSON.'
,protocol string comment 'The protocol. Valid values: HTTPS, HTTP, and HTTP&HTTPS.'
,sql_statement string comment 'The SQL statement.'
,api_version bigint comment 'The version of the API.'
,lock_version bigint comment 'The version lock, used for optimistic locking.'
,document_url string comment 'The URL of the API documentation.'
,create_type string comment 'The API creation method. Valid values: SELFDEFINE and GUIDE.'
,call_time bigint comment 'The total number of API calls.'
,api_gw_group_id bigint comment 'The group ID returned by the API gateway.'
,api_gw_id string comment 'The unique API identifier returned by the API gateway.'
,status string comment 'The status of the API. Valid values: 1 (published) and 2 (unpublished).'
,delete_flag string comment 'The logical delete flag. 0 = not deleted, 1 = deleted.'
,max_return_num bigint comment 'The maximum number of records to return.'
,api_timeout bigint comment 'The API timeout, in seconds.'
,logic_unit_no string comment 'The ID of the service unit.'
,logic_unit_name string comment 'The name of the service unit.'
,logic_unit_type string comment 'The type of the service unit. Valid values: SINGLE_PHYSICAL (single physical table), MULTI_PHYSICAL (multiple physical tables), and DATAPHIN_LOGIC (Dataphin logical table).'
,logic_unit_desc string comment 'The description of the service unit.'
,model_type string comment 'The mode. Valid values: basic (basic mode) and prodDev (development and production mode).'
,logic_unit_version bigint comment 'The version of the service unit.'
,group_id bigint comment 'The ID of the group to which the API belongs.'
,group_name string comment 'The name of the group to which the API belongs.'
,source_tables string comment 'The source tables. Multiple names are separated by a semicolon.'
,api_type string comment 'The type of the API. Valid values: NORMAL, DIRECTAPI, LOGICTABLE, and COMPOSITEAPI.'
,data_source_id bigint comment 'The ID of the data source for a DIRECTAPI.'
,data_source_type string comment 'The type of the data source for a DIRECTAPI. Examples: impala and HoloLens.'
,data_source_name string comment 'The name of the data source for a DIRECTAPI.'
,biz_unit_name string comment 'The business unit name for a LOGICTABLE API.'
,table_name string comment 'The logical table name for a LOGICTABLE API.'
,creator_id string comment 'The ID of the user who created the API.'
,creator_name string comment 'The name of the user who created the API.'
,modifier_id string comment 'The ID of the user who last modified the API.'
,modifier_name string comment 'The name of the user who last modified the API.'
,create_time string comment 'The time the API was created.'
,modified_time string comment 'The time the API was last modified.'
,proj_id bigint comment 'The ID of the data service project.'
,proj_name string comment 'The name of the data service project.'
,order_priority string comment 'The sort priority. `sql_script`: Uses the sorting specified in the SQL script. `ORDER_BY_LIST`: Uses the sorting specified in the `order_by_list` field.'
,is_paged_query bigint comment 'Indicates whether pagination is enabled for query results. 0 = enabled, 1 = disabled.'
,order_by_list string comment 'The fields to use for sorting.'
,update_rate_detail string comment 'The custom update frequency. This field is used when `update_rate` is set to CUSTOM.'
,call_mode string comment 'The API call mode. Valid values: `synchronous call` and `asynchronous call`.'
,execution_timeout bigint comment 'The execution timeout for an asynchronous call, in seconds.'
)
comment 'The data service API table.'
partitioned by (
ds string comment 'The partition field, in yyyymmdd format.'
)
lifecycle 30;
dim_dataphin_service_api_param
create table if not exists dim_dataphin_service_api_param
(
id bigint comment 'The unique, auto-incrementing ID.',
tenant_id bigint comment 'The tenant ID.',
api_id bigint comment 'The ID of the associated API, referencing `api_id` in the `dim_dataphin_service_api` table.',
api_no string comment 'The API identifier.',
api_name string comment 'The name of the API.',
param_name string comment 'The name of the parameter.',
param_type string comment 'The data type of the parameter.',
type bigint comment 'The parameter category. Valid values: 1 for a response parameter, 2 for a request parameter, and 3 for a common parameter (system-default input parameters, e.g., pagination parameters or sorting parameters).',
sample string comment 'An example value.',
param_desc string comment 'A description of the parameter.',
must bigint comment 'Specifies if the parameter is required. Valid values: 1 for required, 0 for optional.',
date_format string comment 'The date format.',
operator string comment 'The operator.',
mapping_column string comment 'The source field, which is the name of a service unit field (not a database column). In API wizard mode, this field stores the alias.',
api_request_method string comment 'The request method (e.g., GET, LIST), referencing `request_method` in the `dim_dataphin_service_api` table.',
api_desc string comment 'A description of the API.',
api_create_type string comment 'The creation type (e.g., SELFDEFINE, GUIDE, DIRECTAPI), referencing `create_type` in the `dim_dataphin_service_api` table.',
api_status string comment 'The API status (1: online, 2: offline), referencing `status` in the `dim_dataphin_service_api` table.',
api_delete_flag string comment 'The logical delete flag for the API (0: not deleted, 1: deleted), referencing `delete_flag` in the `dim_dataphin_service_api` table.',
create_person_id string comment 'The creator ID.',
create_person string comment 'The name of the creator.',
create_time string comment 'The creation time.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
modified_time string comment 'The last modification time.',
delete_flag string comment 'The logical delete flag for the parameter (0: not deleted, 1: deleted).',
proj_id bigint comment 'The ID of the data service project.',
proj_name string comment 'The name of the data service project.',
update_rate string comment 'The update frequency. Valid values: DAY, HOUR, MIN, CUSTOM, and REALTIME.',
update_rate_detail string comment 'Specifies the custom update frequency, used only when `update_rate` is `CUSTOM`.',
call_mode string comment 'The API invocation mode. Valid values: synchronous invocation, asynchronous invocation.',
execution_timeout bigint comment 'The execution timeout for an asynchronous invocation, in seconds.'
)comment 'Stores parameters for data service APIs.'
partitioned by (ds string comment 'The business date, in yyyymmdd format.')
lifecycle 30;
dim_dataphin_service_app
create table if not exists dim_dataphin_service_app(
tenant_id bigint comment 'The tenant ID.',
app_id bigint comment 'The application ID.',
app_name string comment 'The application name.',
app_key bigint comment 'The app key.',
app_gw_id string comment 'The unique API identifier returned by the API Gateway.',
env string comment 'Specifies the environment. Valid values: `1` for the staging environment and `2` for the production environment.',
group_id bigint comment 'The group ID.',
group_name string comment 'The group name.',
app_secret string comment 'The app secret. The management service generates this value if an API Gateway is not used.',
enable_white_list string comment 'Specifies if the whitelist is enabled. Valid values: `1` (disabled) and `2` (enabled).',
white_list string comment 'The whitelisted IP addresses.',
delete_flag string comment 'The logical delete flag. A value of `0` indicates the record is active, while `1` indicates it is deleted.',
creator_id string comment 'The ID of the creator.',
creator_name string comment 'The name of the creator.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
create_time string comment 'The creation time.',
modified_time string comment 'The last modification time.',
proj_id bigint comment 'The ID of the data service project.',
proj_name string comment 'The name of the data service project.'
) comment 'The data service application table.'
partitioned by (ds string comment 'The partition field, in `yyyymmdd` format.')
lifecycle 30;
dim_dataphin_service_app_api
create table if not exists dim_dataphin_service_app_api(
tenant_id bigint comment 'The tenant ID.',
api_id bigint comment 'The API ID.',
api_no string comment 'The API identifier.',
api_name string comment 'The API name.',
api_version bigint comment 'The API version.',
app_id bigint comment 'The app ID.',
app_name string comment 'The app name.',
app_key bigint comment 'The AppKey.',
app_gw_id string comment 'The unique API identifier returned by the API gateway.',
delete_flag string comment 'The logical delete flag. 0 means not deleted, and 1 means deleted.',
creator_id string comment 'The creator ID.',
creator_name string comment 'The creator.',
modifier_id string comment 'The last modifier ID.',
modifier_name string comment 'The last modifier name.',
create_time string comment 'The creation time.',
modified_time string comment 'The last modified time.',
proj_id bigint comment 'The data service project ID.',
proj_name string comment 'The data service project name.'
) comment 'This table stores the relationship between data service apps and APIs.'
partitioned by (ds string comment 'Partition field in yyyymmdd format.')
lifecycle 30;
DWS Dataphin service API MI
create table if not exists dws_dataphin_service_api_mi(
tenant_id bigint comment 'Tenant ID.',
api_no string comment 'API identifier.',
api_name string comment 'API name.',
minute string comment 'Statistics minute (yyyy_MMdd_HHmm).',
app_key string comment 'Application key.',
client_ip string comment 'Client IP address.',
total_count bigint comment 'Total calls.',
offline_count bigint comment 'Offline calls.',
client_fail_count bigint comment 'Client-side failures.',
total_time_cost bigint comment 'Total time cost for all calls (including failures).',
total_succ_time_cost bigint comment 'Total time cost for successful calls.',
api_ex_count bigint comment 'API exceptions.'
)comment 'Minute-level statistics for data service API calls.'
partitioned by (ds string comment 'Partition field (yyyymmdd).')
lifecycle 90;
dim_dataphin_service_api_direct
create table if not exists dim_dataphin_service_api_direct(
tenant_id bigint comment 'The tenant ID.',
api_id bigint comment 'The API ID.',
api_no string comment 'The API identifier.',
api_name string comment 'The API name.',
model_type string comment 'The mode. Valid values: basic: basic mode, prodDev: development and production mode.',
data_source_id bigint comment 'The data source ID.',
data_source_type string comment 'The data source type. Example values: Impala and Hologres.',
data_source_name string comment 'The data source name.',
delete_flag string comment 'The logical delete flag. 0: not deleted, 1: deleted.',
creator_id string comment 'The creator ID.',
creator_name string comment 'The creator name.',
create_time string comment 'The time when the record was created.',
modifier_id string comment 'The ID of the user who last modified the record.',
modifier_name string comment 'The name of the user who last modified the record.',
modified_time string comment 'The time when the record was last modified.',
proj_id bigint comment 'The data service project ID.',
proj_name string comment 'The data service project name.'
) comment 'Stores information about direct-connection data source APIs.'
partitioned by (ds string comment 'The partition column. The format is yyyymmdd.')
lifecycle 30;
Data lineage
-
The
dim_dataphin_lineage_node_tableanddim_dataphin_lineage_node_columntables contain both structured and unstructured data. The enumeration values for theinput_table_typeandoutput_table_typefields are as follows:-
Structured data:
TABLE(table),VIEW(view),DIM_LOGIC_TABLE(dimension logic table),FACT_LOGIC_TABLE(fact logic table),SUM_LOGIC_TABLE(summary logic table),REAL_TIME_TABLE(real-time table),LOGIC_VIEW(logical view),DATASOURCE_TABLE(data source table),DATASOURCE_VIEW(data source view), andDATASOURCE_MATERIALIZED_VIEW(data source materialized view). -
Unstructured data:
DATASOURCE(unstructured data source) andDP_DATASET(unstructured dataset).
-
-
The
dim_dataphin_lineage_nodetable does not contain unstructured data.
dim_dataphin_lineage_node_column
create table if not exists dim_dataphin_lineage_node_column (
tenant_id bigint comment 'The ID of the tenant.'
,node_id string comment 'The ID of the node.'
,node_name string comment 'The name of the node.'
,node_env string comment 'The environment of the node. Valid values: PROD and DEV.'
,node_operator_type string comment 'The type of the task. Possible values include DATAX, DLINK, ONE_SERVICE_SQL, VIRTUAL, PYTHON, and SHELL.'
,lineage_type string comment 'The type of data lineage. Valid values: SELECT and OTHER.'
,input_column_id string comment 'This field is deprecated.'
,input_column_name string comment 'The name of the input column.'
,output_column_id string comment 'This field is deprecated.'
,output_column_name string comment 'The name of the output column.'
,input_table_id string comment 'This field is deprecated.'
,input_table_type string comment 'The type of the input table. Possible values include TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
,input_table_name string comment 'The name of the input table.'
,input_data_domain_id bigint comment 'The ID of the data domain for the input table.'
,input_data_domain_name string comment 'The name of the data domain for the input table.'
,input_data_domain_abbreviation string comment 'The abbreviation of the data domain for the input table.'
,input_data_domain_name_cn string comment 'The Chinese name of the data domain for the input table.'
,output_table_id string comment 'This field is deprecated.'
,output_table_type string comment 'The type of the output table. Possible values include TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
,output_table_name string comment 'The name of the output table.'
,output_data_domain_id bigint comment 'The ID of the data domain for the output table.'
,output_data_domain_name string comment 'The name of the data domain for the output table.'
,output_data_domain_abbreviation string comment 'The abbreviation of the data domain for the output table.'
,output_data_domain_name_cn string comment 'The Chinese name of the data domain for the output table.'
,input_project_id bigint comment 'The ID of the input project.'
,input_project_name string comment 'The English name of the input project.'
,input_project_cn_name string comment 'The Chinese name of the input project.'
,input_project_ns_tag string comment 'The type of the input project. Valid values: BASE, PUBLIC, and APPLICATION.'
,output_project_id bigint comment 'The ID of the output project.'
,output_project_name string comment 'The English name of the output project.'
,output_project_cn_name string comment 'The Chinese name of the output project.'
,output_project_ns_tag string comment 'The type of the output project. Valid values: BASE, PUBLIC, and APPLICATION.'
,input_biz_unit_id bigint comment 'The ID of the input business unit.'
,input_biz_unit_name string comment 'The name of the input business unit.'
,output_biz_unit_id bigint comment 'The ID of the output business unit.'
,output_biz_unit_name string comment 'The name of the output business unit.'
,input_env string comment 'The input environment. Valid values: PROD and DEV.'
,output_env string comment 'The output environment. Valid values: PROD and DEV.'
,gmt_create string comment 'The creation time.'
,gmt_modified string comment 'The last modification time.'
,input_data_source_id bigint comment 'The ID of the input data source.'
,input_data_source_name string comment 'The name of the input data source.'
,input_data_source_type string comment 'The type of the input data source. For valid values, see the data_source_type field in the dim_dataphin_data_source table.'
,input_db_name string comment 'The name of the input database.'
,input_schema_name string comment 'The name of the input schema. For data sources with a schema (such as PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those without a schema (such as MySQL, MaxCompute, DM, CK, and SR), it contains the database or project name.'
,output_data_source_id bigint comment 'The ID of the output data source.'
,output_data_source_name string comment 'The name of the output data source.'
,output_data_source_type string comment 'The type of the output data source. For valid values, see the data_source_type field in the dim_dataphin_data_source table.'
,output_db_name string comment 'The name of the output database.'
,output_schema_name string comment 'The name of the output schema. For data sources with a schema (such as PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those without a schema (such as MySQL, MaxCompute, DM, CK, and SR), it contains the database or project name.'
,input_is_structured_data string comment 'Specifies whether the input is structured data. Valid values: Y and N.'
,output_is_structured_data string comment 'Specifies whether the output is structured data. Valid values: Y and N.'
,input_dataset_version string comment 'The version of the input dataset.'
,output_dataset_version string comment 'The version of the output dataset.'
)
comment 'Stores column-level data lineage, broken down by node, input table, input column, output table, and output column.'
partitioned by (
ds string comment 'The partition column, in yyyymmdd format.'
)
lifecycle 30;
dim_dataphin_lineage_node_table
create table if not exists dim_dataphin_lineage_node_column (
tenant_id bigint comment 'The tenant ID.'
,node_id string comment 'The node ID.'
,node_name string comment 'The node name.'
,node_env string comment 'The node''s environment. Valid values: PROD and DEV.'
,node_operator_type string comment 'The task''s operator type. Valid values: DATAX, DLINK, ONE_SERVICE_SQL, VIRTUAL, PYTHON, and SHELL.'
,lineage_type string comment 'The data lineage type. Valid values: SELECT and OTHER.'
,input_column_id string comment 'Deprecated field.'
,input_column_name string comment 'The input column name.'
,output_column_id string comment 'Deprecated field.'
,output_column_name string comment 'The output column name.'
,input_table_id string comment 'Deprecated field.'
,input_table_type string comment 'The input table type. Valid values: TABLE (table), VIEW (view), DIM_LOGIC_TABLE (dimension table), FACT_LOGIC_TABLE (fact table), SUM_LOGIC_TABLE (summary table), REAL_TIME_TABLE (real-time table), LOGIC_VIEW (logical view), DATASOURCE_TABLE (data source table), DATASOURCE_VIEW (data source view), and DATASOURCE_MATERIALIZED_VIEW (data source materialized view).'
,input_table_name string comment 'The input table name.'
,input_data_domain_id bigint comment 'The data domain ID for the input table.'
,input_data_domain_name string comment 'The data domain name for the input table.'
,input_data_domain_abbreviation string comment 'The data domain abbreviation for the input table.'
,input_data_domain_name_cn string comment 'The Chinese name of the input table''s data domain.'
,output_table_id string comment 'Deprecated field.'
,output_table_type string comment 'The output table type. Valid values: TABLE (table), VIEW (view), DIM_LOGIC_TABLE (dimension table), FACT_LOGIC_TABLE (fact table), SUM_LOGIC_TABLE (summary table), REAL_TIME_TABLE (real-time table), LOGIC_VIEW (logical view), DATASOURCE_TABLE (data source table), DATASOURCE_VIEW (data source view), and DATASOURCE_MATERIALIZED_VIEW (data source materialized view).'
,output_table_name string comment 'The output table name.'
,output_data_domain_id bigint comment 'The data domain ID for the output table.'
,output_data_domain_name string comment 'The data domain name for the output table.'
,output_data_domain_abbreviation string comment 'The data domain abbreviation for the output table.'
,output_data_domain_name_cn string comment 'The Chinese name of the output table''s data domain.'
,input_project_id bigint comment 'The input project ID.'
,input_project_name string comment 'The input project''s English name.'
,input_project_cn_name string comment 'The input project''s Chinese name.'
,input_project_ns_tag string comment 'The input project type. Valid values: BASE, PUBLIC, and APPLICATION.'
,output_project_id bigint comment 'The output project ID.'
,output_project_name string comment 'The output project''s English name.'
,output_project_cn_name string comment 'The output project''s Chinese name.'
,output_project_ns_tag string comment 'The output project type. Valid values: BASE, PUBLIC, and APPLICATION.'
,input_biz_unit_id bigint comment 'The input business unit ID.'
,input_biz_unit_name string comment 'The input business unit name.'
,output_biz_unit_id bigint comment 'The output business unit ID.'
,output_biz_unit_name string comment 'The output business unit name.'
,input_env string comment 'The input environment. Valid values: PROD and DEV.'
,output_env string comment 'The output environment. Valid values: PROD and DEV.'
,gmt_create string comment 'The creation time.'
,gmt_modified string comment 'The modification time.'
,input_data_source_id bigint comment 'The input data source ID.'
,input_data_source_name string comment 'The input data source name.'
,input_data_source_type string comment 'The input data source type. For valid values, see the `data_source_type` column in the `dim_dataphin_data_source` table.'
,input_db_name string comment 'The input database name.'
,input_schema_name string comment 'The input schema name. For data sources that use a schema (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those that do not (e.g., MySQL, MaxCompute, DM, CK, and SR), it contains the database name (`db_name`) or project name (`project_name`).'
,output_data_source_id bigint comment 'The output data source ID.'
,output_data_source_name string comment 'The output data source name.'
,output_data_source_type string comment 'The output data source type. For valid values, see the `data_source_type` column in the `dim_dataphin_data_source` table.'
,output_db_name string comment 'The output database name.'
,output_schema_name string comment 'The output schema name. For data sources that use a schema (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those that do not (e.g., MySQL, MaxCompute, DM, CK, and SR), it contains the database name (`db_name`) or project name (`project_name`).'
,input_is_structured_data string comment 'Indicates whether the input is structured data. Valid values: Y and N.'
,output_is_structured_data string comment 'Indicates whether the output is structured data. Valid values: Y and N.'
,input_dataset_version string comment 'The input dataset version.'
,output_dataset_version string comment 'The output dataset version.'
)
comment 'Stores data lineage linking input and output columns for each node and table.'
partitioned by (
ds string comment 'The partition column, in yyyymmdd format.'
)
lifecycle 30;
dim_dataphin_lineage_table
create table if not exists dim_dataphin_lineage_table (
tenant_id bigint comment 'The ID of the tenant.'
,input_table_id string comment 'Deprecated.'
,input_table_type string comment 'The type of the input table. Valid values: TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_LOGIC_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
,input_table_name string comment 'The name of the input table.'
,input_data_domain_id bigint comment 'The ID of the data domain for the input table.'
,input_data_domain_name string comment 'The name of the data domain for the input table.'
,input_data_domain_abbreviation string comment 'The abbreviation of the input data domain name.'
,input_data_domain_name_cn string comment 'The Chinese name of the input data domain.'
,output_table_id string comment 'Deprecated.'
,output_table_type string comment 'The type of the output table. Valid values: TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_LOGIC_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
,output_table_name string comment 'The name of the output table.'
,output_data_domain_id bigint comment 'The ID of the data domain for the output table.'
,output_data_domain_name string comment 'The name of the data domain for the output table.'
,output_data_domain_abbreviation string comment 'The abbreviation of the output data domain name.'
,output_data_domain_name_cn string comment 'The Chinese name of the output data domain.'
,input_project_id bigint comment 'The ID of the input project.'
,input_project_name string comment 'The English name of the input project.'
,input_project_cn_name string comment 'The Chinese name of the input project.'
,input_project_ns_tag string comment 'The type of the input project. Valid values: BASE, PUBLIC, and APPLICATION.'
,output_project_id bigint comment 'The ID of the output project.'
,output_project_name string comment 'The English name of the output project.'
,output_project_cn_name string comment 'The Chinese name of the output project.'
,output_project_ns_tag string comment 'The type of the output project. Valid values: BASE, PUBLIC, and APPLICATION.'
,input_biz_unit_id bigint comment 'The ID of the input business unit.'
,input_biz_unit_name string comment 'The name of the input business unit.'
,output_biz_unit_id bigint comment 'The ID of the output business unit.'
,output_biz_unit_name string comment 'The name of the output business unit.'
,input_env string comment 'The input environment. Valid values: PROD and DEV.'
,output_env string comment 'The output environment. Valid values: PROD and DEV.'
,gmt_create string comment 'The time the record was created.'
,gmt_modified string comment 'The time the record was last modified.'
,input_data_source_id bigint comment 'The ID of the input data source.'
,input_data_source_name string comment 'The name of the input data source.'
,input_data_source_type string comment 'The type of the input data source. For a list of valid values, see the data_source_type column in the dim_dataphin_data_source table.'
,input_db_name string comment 'The name of the input database.'
,input_schema_name string comment 'The name of the input schema. For schema-based data sources (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For other data sources (e.g., MySQL, MaxCompute, DM, ClickHouse, SR), it contains the database or project name.'
,output_data_source_id bigint comment 'The ID of the output data source.'
,output_data_source_name string comment 'The name of the output data source.'
,output_data_source_type string comment 'The type of the output data source. For a list of valid values, see the data_source_type column in the dim_dataphin_data_source table.'
,output_db_name string comment 'The name of the output database.'
,output_schema_name string comment 'The name of the output schema. For schema-based data sources (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For other data sources (e.g., MySQL, MaxCompute, DM, ClickHouse, SR), it contains the database or project name.'
)
comment 'Defines the data lineage between input and output tables.'
partitioned by (
ds string comment 'The partition key, formatted as yyyymmdd.'
)
lifecycle 30;
Security
dim_dataphin_security_classify
create table if not exists dim_dataphin_security_classify(
tenant_id bigint comment 'Tenant ID.',
classify_id bigint comment 'Classification ID.',
classify_name string comment 'Classification name.',
classify_short_name string comment 'Classification short name.',
classify_desc string comment 'Classification description.',
creator_id string comment 'Creator ID.',
creator_name string comment 'Creator name.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.',
status string comment 'Status. Valid values: ENABLE, DISABLE.',
level_id bigint comment 'Sensitivity level ID.',
level_name string comment 'Sensitivity level name.',
level_index bigint comment 'Sensitivity level index.',
parent_path string comment 'Parent classification path.'
)comment 'Security data classification table.'
partitioned by (ds string comment 'Partition field. Format: yyyymmdd.')
lifecycle 30;
dim_dataphin_security_level
create table if not exists dim_dataphin_security_level(
tenant_id bigint comment 'tenant id',
level_id bigint comment 'level id',
level_name string comment 'level name',
level_short_name string comment 'level short name',
level_desc string comment 'level description',
level_index bigint comment 'level index',
creator_id string comment 'creator id',
creator_name string comment 'creator name',
modifier_id string comment 'last modifier id',
modifier_name string comment 'last modifier name',
gmt_create string comment 'creation time',
gmt_modified string comment 'modification time'
)comment 'Stores data security levels'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 30;
dim_dataphin_security_identify_rule
create table if not exists dim_dataphin_security_identify_rule(
tenant_id bigint comment 'The tenant ID.',
identify_rule_id bigint comment 'The ID of the identification rule.',
identify_rule_name string comment 'The name of the identification rule.',
biz_unit_scope_type string comment 'Deprecated in v3.11. The scope of the business unit. Valid values: ALL, ENUMS, and EXPRESSION.',
biz_unit_ids string comment 'Deprecated in v3.11. The IDs of the enumerated business units.',
custom_biz_unit_expression string comment 'Deprecated in v3.11. The regular expression for matching business unit names.',
project_scope_type string comment 'Deprecated in v3.11. The scope of the project. Valid values: ALL, ENUMS, and EXPRESSION.',
project_ids string comment 'Deprecated in v3.11. The IDs of the enumerated projects.',
custom_project_expression string comment 'Deprecated in v3.11. The regular expression for matching project names.',
table_scope_type string comment 'Deprecated in v3.11. The scope of the table. Valid values: ALL, ENUMS, and EXPRESSION.',
custom_table_expression string comment 'Deprecated in v3.11. The regular expression for matching table names.',
enable_by_content string comment 'Deprecated in v3.11. Whether content identification is enabled.',
content_method_name string comment 'Deprecated in v3.11. The name of the content identification method.',
custom_content_expression string comment 'Deprecated in v3.11. The custom regular expression for content identification.',
enable_by_field_name string comment 'Deprecated in v3.11. Whether field name identification is enabled.',
custom_field_name_expression string comment 'Deprecated in v3.11. The custom regular expression for field name identification.',
priority_level bigint comment 'The priority.',
status string comment 'The status. Valid values: ENABLE and DISABLE.',
level_id bigint comment 'Deprecated in v3.11. The ID of the data level.',
level_name string comment 'Deprecated in v3.11. The name of the data level.',
level_index bigint comment 'Deprecated in v3.11. The index of the data level.',
classify_id bigint comment 'The ID of the data classification.',
classify_name string comment 'The name of the data classification.',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
creator_id string comment 'The ID of the creator.',
creator_name string comment 'The name of the creator.',
modifier_id string comment 'The ID of the modifier.',
modifier_name string comment 'The name of the modifier.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.',
classify_ids string comment 'A comma-separated list of data classification IDs.'
)comment 'This table stores security identification rules.'
partitioned by (ds string comment 'The partition field in yyyymmdd format.')
lifecycle 30;
dim_dataphin_security_label
create table if not exists dim_dataphin_security_label (
tenant_id bigint comment 'The tenant ID.'
,label_id string comment 'The security label ID.'
,identify_rule_id string comment 'The identification rule ID.'
,identify_rule_name string comment 'The identification rule name.'
,identify_rate double comment 'The confidence rate of the identification.'
,identify_is_custom string comment 'Indicates whether the identification rule is custom. Valid values: `TRUE`, `FALSE`.'
,biz_unit_id bigint comment 'The business unit ID.'
,biz_unit_name string comment 'The business unit name.'
,project_id bigint comment 'The project ID.'
,project_name string comment 'The project name.'
,table_id string comment 'The table ID, which corresponds to the `table_id` in the `dim_dataphin_table` table.'
,table_catalog string comment 'The table catalog, such as the project or business unit name.'
,table_name string comment 'The table name.'
,table_desc string comment 'The table description.'
,table_partitioned string comment 'Indicates whether the table is partitioned. Valid values: `TRUE`, `FALSE`.'
,table_type string comment 'The table type. Valid values: `LOGIC_TABLE` (logical table), `LOGIC_DIM_TABLE` (dimension logical table), `LOGIC_FACT_TABLE` (fact logical table), `LOGIC_SUM_TABLE` (summary logical table), `PHYSICAL_TABLE` (physical table), and `REALTIME_LOGICAL_TABLE` (real-time logical table).'
,table_env string comment 'The table environment. Valid values: `PROD`, `DEV`.'
,field_id string comment 'The field ID, which corresponds to the `column_id` in the `dim_dataphin_table` table.'
,field_name string comment 'The field name.'
,field_desc string comment 'The field description.'
,biz_date string comment 'The business date.'
,status string comment 'The status. Valid values: `ENABLE` (enabled), `DISABLE` (disabled).'
,level_id bigint comment 'The security level ID.'
,level_name string comment 'The security level name.'
,level_index bigint comment 'The index of the security level.'
,classify_id bigint comment 'The data category ID.'
,classify_name string comment 'The data category name.'
,creator_id string comment 'The creator ID.'
,creator_name string comment 'The creator name.'
,modifier_id string comment 'The ID of the last modifier.'
,modifier_name string comment 'The name of the last modifier.'
,gmt_create string comment 'The creation time.'
,gmt_modified string comment 'The last modification time.'
,classfy_parent_path string comment 'The full path of the data category.'
,classfy_level1_path string comment 'The name of the level-1 category.'
,record_from string comment 'The source of the security label. Valid values: `CUSTOM` (manually specified), `AUTO_SCAN` (automatically identified), `AUTO_INHERIT` (automatically inherited).'
,field_from string comment 'A comma-separated list of source field names. This field is populated only when `record_from` is `AUTO_INHERIT`.'
,data_source_id bigint comment 'The data source ID.'
,datasource_name string comment 'The data source name.'
,data_source_type string comment 'The data source type. For valid values, see the `data_source_type` field in the `dim_dataphin_data_source` table.'
)
comment 'Stores the security labels and categories applied to tables and fields by identification scans.'
partitioned by (
ds string comment 'The partition field, in `yyyymmdd` format.'
)
lifecycle 30;
dim_dataphin_security_desensitize_rule
create table if not exists dim_dataphin_security_desensitize_rule(
tenant_id bigint comment 'Tenant ID',
desensitize_rule_id bigint comment 'Desensitization rule ID',
desensitize_rule_name string comment 'Desensitization rule name',
identify_rule_id bigint comment '[Deprecated in v3.11] Identification rule ID',
identify_rule_name string comment '[Deprecated in v3.11] Identification rule name',
enable_by_temp_query string comment 'Enables desensitization for ad-hoc queries.',
enable_by_read_write string comment 'Enables desensitization when writing to development tables.',
algorithm_type string comment 'Algorithm type',
algorithm_code string comment 'Algorithm code',
rule_method_text_param string comment 'Text parameters for the algorithm.',
rule_method_selected_param string comment 'Secret key parameters for the algorithm.',
status string comment 'Status. Valid values: ENABLE, DISABLE.',
owner_id string comment 'Owner ID',
owner_name string comment 'Owner name',
creator_id string comment 'Creator ID',
creator_name string comment 'Creator name',
modifier_id string comment 'Modifier ID',
modifier_name string comment 'Modifier name',
gmt_create string comment 'Creation time',
gmt_modified string comment 'Modification time',
classify_id bigint comment 'Classification ID',
classify_name string comment 'Classification name'
)comment 'Security desensitization rules.'
partitioned by (ds string comment 'Partition field, yyyymmdd format.')
lifecycle 30;
dim_dataphin_security_desensitize_rule_allowlist
create table if not exists dim_dataphin_security_desensitize_rule_allowlist(
tenant_id bigint comment 'The ID of the tenant.',
desensitize_allowlist_rule_id bigint comment 'The ID of the data desensitization allowlist rule.',
identify_rule_id bigint comment 'Deprecated in v3.11. The ID of the identification rule.',
identify_rule_name string comment 'Deprecated in v3.11. The name of the identification rule.',
filter_account string comment 'Accounts exempt from data desensitization.',
filter_by_temp_query string comment 'Whether the rule applies to ad-hoc queries.',
filter_by_read_write string comment 'Whether the rule applies to data development (read/write operations).',
biz_unit_scope_type string comment 'The scope type for business units. Valid values: ALL, ENUMS, and EXPRESSION.',
biz_unit_ids string comment 'Deprecated in v3.11. The IDs of the enumerated business units.',
custom_biz_unit_expression string comment 'The regular expression to match business unit names.',
project_scope_type string comment 'The scope type for projects. Valid values: ALL, ENUMS, and EXPRESSION.',
project_ids string comment 'Deprecated in v3.11. The IDs of the enumerated projects.',
custom_project_expression string comment 'The regular expression to match project names.',
table_scope_type string comment 'The scope type for tables. Valid values: ALL, ENUMS, and EXPRESSION.',
custom_table_expression string comment 'Deprecated in v3.11. The regular expression to match table names.',
start_time string comment 'Effective start time.',
end_time string comment 'Effective end time.',
identify_rule_exists string comment 'Whether the identification rule exists.',
status string comment 'The status of the rule. Valid values: ENABLE (enabled) and DISABLE (disabled).',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
creator_id string comment 'The ID of the creator.',
creator_name string comment 'The name of the creator.',
modifier_id string comment 'The ID of the last modifier.',
modifier_name string comment 'The name of the last modifier.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The last modification time.',
classify_id bigint comment 'The ID of the classification.',
classify_name string comment 'The name of the classification.'
)comment 'Stores the allowlist rules for data desensitization.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 30;
Quality
dim_dataphin_dp_quality_watch
Field descriptions:
-
table_id-
If
watch_type=TABLE and table_type in (PHYSICAL_TABLE,LOGIC_FACT_TABLE,LOGIC_DIM_TABLE,LOGIC_SUM_TABLE, REALTIME_LOGICAL_TABLE)orwatch_type =REALTIME_LOGICAL_TABLE, this is the GUID from the asset inventory. -
If
watch_type=DATASOURCE_TABLE, this is an internal identifier with no specific meaning. -
If
watch_type in (DATASOURCE,INDEX), you can ignore this field.
-
-
index_idIf
watch_type=INDEX, this is thederived_index_idfrom the dim_dataphin_derived_index table. -
data_source_idIf
watch_type=DATASOURCE, this is thedata_source_idfrom the dim_dataphin_data_source table.
create table if not exists dim_dataphin_dp_quality_watch
(
tenant_id bigint comment 'The tenant ID.',
watch_id bigint comment 'The watch object ID.',
watch_type string comment 'The watch object type. Valid values: `INDEX` (index), `TABLE` (Dataphin table), `REALTIME_LOGICAL_TABLE` (real-time logical table), `DATASOURCE` (data source), and `DATASOURCE_TABLE` (global table).',
table_id string comment 'The table ID. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this ID references the `dim_dataphin_table` table. If `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`, this ID references the `dim_dataphin_model` table.',
table_type string comment 'The table type. Valid values include `LOGIC_DIM_TABLE` (dimension logical table), `LOGIC_FACT_TABLE` (fact logical table), `LOGIC_SUM_TABLE` (summary logical table), `LOGIC_LABEL_TABLE` (label logical table), `PHYSICAL_TABLE` (physical table), and `REALTIME_LOGICAL_TABLE` (real-time logical table).',
is_logical_table string comment 'Indicates whether the table is a logical table. Valid values: `Y` and `N`.',
table_env string comment 'The table environment.',
table_name string comment 'The table name.',
table_catalog string comment 'The table catalog. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this contains the project name. If `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`, this contains the business unit name.',
table_description string comment 'The table description.',
table_is_partitioned string comment 'Indicates whether the table is a partitioned table. Valid values: `Y` and `N`.',
table_owner_id string comment 'The table owner''s ID.',
table_owner_name string comment 'The table owner''s name.',
biz_unit_id bigint comment 'The ID of the table''s business unit.',
biz_unit_name string comment 'The name of the table''s business unit.',
project_id bigint comment 'The ID of the table''s project.',
project_name string comment 'The name of the table''s project.',
data_source_id bigint comment 'The data source ID. References the `data_source_id` column in the `dim_dataphin_data_source` table.',
data_source_type string comment 'The data source type. For valid values, see the `data_source_type` column in the `dim_dataphin_data_source` table.',
data_source_env string comment 'The data source environment.',
data_source_name string comment 'The data source name.',
data_source_scope string comment 'The data source scope.',
data_source_owner string comment 'The data source owner.',
data_source_from string comment 'The data source origin.',
index_id string comment 'The GUID of the derived index from the `dim_dataphin_derived_index` table when `watch_type` is `INDEX`.',
index_name string comment 'The index name.',
index_name_cn string comment 'The Chinese name of the index.',
index_env string comment 'The index environment.',
index_owner string comment 'The index owner.',
index_sum_logic_table_name string comment 'The name of the index''s summary logical table.',
index_desc string comment 'The index description.',
index_compute_type string comment 'The index computation type.',
index_granularity_id bigint comment 'The index granularity ID.',
index_granularity_cn string comment 'The Chinese name of the index granularity.',
index_period_attribute string comment 'The index period attribute.',
index_catalog string comment 'The index catalog, which is typically the business unit name.',
quality_owner_id string comment 'The quality owner''s ID.',
quality_owner_name string comment 'The quality owner''s name.',
watch_status string comment 'The watch object status. Valid values: `ENABLE` and `DISABLE`.',
creator_id string comment 'The creator''s ID.',
creator_name string comment 'The creator''s name.',
gmt_create string comment 'The creation time.',
modifier_id string comment 'The modifier''s ID.',
modifier_name string comment 'The modifier''s name.',
gmt_modified string comment 'The modification time.'
) comment 'Stores metadata for quality watch objects.'
partitioned by (ds string comment 'The partition column.')
lifecycle 30;
Dataphin DP quality rule
To retrieve details about monitored objects such as data sources, data source tables, or metrics, join the dim_dataphin_dp_quality_watch table on tenant_id and watch_id.
create table if not exists dim_dataphin_dp_quality_rule (
tenant_id bigint comment 'The tenant ID.'
,rule_id bigint comment 'The rule ID.'
,rule_name string comment 'The rule name.'
,strength string comment 'Rule strength. Valid values: STRONG, WEAK.'
,rule_desc string comment 'Rule description.'
,rule_catalogs string comment 'A comma-separated list of rule category tags. Valid values: CONSISTENT, EFFECTIVE, TIMELINESS, ACCURATE, UNIQUENESS, COMPLETENESS, STABILITY, CUSTOM.'
,template_id bigint comment 'The template ID.'
,template_type string comment 'Template type.'
,template_name string comment 'Template name.'
,watch_id bigint comment 'Monitoring object ID.'
,watch_type string comment 'Type of the monitoring object. Valid values: INDEX (metric), TABLE (Dataphin table), REALTIME_LOGICAL_TABLE (real-time logical table), DATASOURCE (data source), DATASOURCE_TABLE (table in a data source).'
,table_id string comment 'The table ID. If watch_type is TABLE and table_type is PHYSICAL_TABLE, this ID refers to the dim_dataphin_table. If watch_type is TABLE or INDEX and is_logical_table is ''Y'', this ID refers to the dim_dataphin_model.'
,table_type string comment 'Table type. Valid values: LOGIC_DIM_TABLE (logical dimension table), LOGIC_FACT_TABLE (logical fact table), LOGIC_SUM_TABLE (logical summary table), LOGIC_LABEL_TABLE (logical label table), PHYSICAL_TABLE (physical table), REALTIME_LOGICAL_TABLE (real-time logical table).'
,is_logical_table string comment 'Indicates whether the table is a logical table. Valid values: Y, N.'
,table_env string comment 'Table environment. Valid values: PROD, DEV.'
,table_name string comment 'Table name.'
,table_catalog string comment 'Table group. If watch_type is TABLE and table_type is PHYSICAL_TABLE, this is the project_name. If watch_type is TABLE or INDEX and is_logical_table is ''Y'', this is the biz_unit_name.'
,table_description string comment 'Table description.'
,table_is_partitioned string comment 'Indicates whether the table is partitioned. Valid values: Y, N.'
,biz_unit_id bigint comment 'ID of the business unit.'
,biz_unit_name string comment 'Name of the business unit.'
,project_id bigint comment 'ID of the project.'
,project_name string comment 'Name of the project.'
,validate_object_type string comment 'Type of the validation object. Valid values: TABLE (table-level rule, or rule for a table in a data source), COLUMN (column-level rule), DATASOURCE (data source-level rule), REALTIME (real-time rule), INDEX (metric-level rule), CHAIN (offline pipeline), UNKNOWN.'
,validate_object_name string comment 'Name of the validation object.'
,rule_properties string comment 'Key-value pairs for the dynamic template rule configuration.'
,enable_error_archive string comment 'Indicates whether to enable error archiving. Valid values: Y, N.'
,validate_condition string comment 'Validation condition.'
,rule_status string comment 'Rule status. Valid values: ENABLE, DISABLE.'
,schedule_ids string comment 'Comma-separated list of bound scheduling IDs.'
,alert_id bigint comment 'Alert setting ID.'
,alert_scope string comment 'Alert scope. Valid values: WATCH, RULE.'
,alert_users string comment 'IDs of users to be alerted, in JSON format.'
,alert_users_channels string comment 'Channels for sending alerts to users, in JSON format. Valid values: VOICE (voice call), SMS (text message), MAIL (email), DINGTALK_ROBOT (DingTalk chatbot), DingTalk (DingTalk app notification).'
,alert_duties string comment 'On-call schedule for alerts.'
,alert_duties_channels string comment 'Channels for sending alerts based on the on-call schedule. The values are the same as for alert_users_channels.'
,alert_quality_owner string comment 'Indicates whether to alert the quality owner. Valid values: Y, N.'
,alert_quality_owner_channels string comment 'Channels for alerting the quality owner. The values are the same as for alert_users_channels.'
,creator_id string comment 'Creator ID.'
,creator_name string comment 'Creator name.'
,gmt_create string comment 'Creation time.'
,modifier_id string comment 'ID of the last modifier.'
,modifier_name string comment 'Name of the last modifier.'
,gmt_modified string comment 'Time of last modification.'
,tag string comment 'Business tag. Valid values: DATA_STANDARD_MANUAL, DATA_STANDARD_AUTO, PIPELINE.'
,tag_unique_key string comment 'Unique identifier for the business tag.'
,tag_properties string comment 'Additional properties for the business tag, in JSON format.'
,archive_mode string comment 'Archiving mode. The only valid value is ERROR.'
,archive_store_type string comment 'Archive storage type. The only valid value is FILE_SYSTEM.'
,archive_table_name string comment 'Name of the archive table.',
data_source_id bigint comment 'Data source ID.',
data_source_type string comment 'Data source type.',
data_source_name string comment 'Data source name.',
attribute_config string comment 'Key-value pairs for business attributes. Pairs are separated by semicolons (;), and keys are separated from values by equal signs (=). Allowed values include custom text, single-select, and multi-select options, but not ranges.'
)
comment 'Stores data quality rules defined in Dataphin.'
partitioned by (
ds string comment 'Partition key.'
)
lifecycle 30;
fct_dataphin_dp_quality_rule_task_di
Starting with version 4.3, all validation summary metrics are stored in the execute_context field. A single rule execution record can now contain multiple validation summary metrics.
Summary Statistics
TOTAL_NUMBER: Total row count
NORMAL_NUMBER: Normal row count
NORMAL_RATE: Normal rate (%)
ERROR_NUMBER: Error row count
ERROR_RATE: Error rate (%)
ERROR_DATA: Error data
ERROR_DATA_FULL: Full error record
ERROR_DATA_ARCHIVE_TABLE: Error data archive table
STATISTICAL_VALUE: Statistic
STATISTICAL_DIFF_VALUE: Statistical difference
STATISTICAL_DIFF_RATE: Statistical difference rate (%)
STATISTICAL_VALIDATE_VALUE: Verification field statistic
STATISTICAL_COMPARE_VALUE: Comparison field statistic
ONE_DAY_FLUCTUATE: 1-day fluctuation rate (%)
SEVEN_DAY_FLUCTUATE: 7-day fluctuation rate (%)
THIRTY_DAY_FLUCTUATE: 30-day fluctuation rate (%)
SEVEN_DAYS_AVG_FLUCTUATE: 7-day average fluctuation rate (%)
THIRTY_DAYS_AVG_FLUCTUATE: 30-day average fluctuation rate (%)
FIRST_DAY_OF_MONTH_FLUCTUATE: Fluctuation rate vs. first day of the month (%)
LAST_MONTH_FLUCTUATE: Month-over-month fluctuation rate (%)
LAST_YEAR_FLUCTUATE: Year-over-year fluctuation rate (%)
DATASOURCE_CONNECTIVE: Data source connectivity
TABLE_SCHEMA_CHANGED: Table schema change check
REALTIME_COMPARE_CHAIN_1_DIFF: Chain 1 deviation rate (%)
REALTIME_COMPARE_CHAIN_2_DIFF: Chain 2 deviation rate (%)
REALTIME_COMPARE_CHAIN_1: Real-time chain 1 metric
REALTIME_COMPARE_CHAIN_2: Real-time chain 2 metric
REALTIME_OFFLINE_CHAIN_DIFF: Offline chain deviation rate (%)
REALTIME_STATISTICAL_VALUE: Real-time statistic
REALTIME_ONE_DAY_FLUCTUATE: Real-time 1-day fluctuation rate (%)
REALTIME_SEVEN_DAY_FLUCTUATE: Real-time 7-day fluctuation rate (%)
REALTIME_THIRTY_DAY_FLUCTUATE: Real-time 30-day fluctuation rate (%)
CUSTOM_STATISTICAL_SQL: Custom statistical SQL
CUSTOM_STATISTICAL_SQL_TODAY: Custom metric SQL (today)
CUSTOM_STATISTICAL_SQL_YESTERDAY: Custom metric SQL (yesterday)
CUSTOM_STATISTICAL_SQL_SEVEN_DAY: Custom metric SQL (7-day)
CUSTOM_STATISTICAL_SQL_SEVEN_DAYS_AVG: Custom metric SQL (7-day average)
CUSTOM_STATISTICAL_SQL_THIRTY_DAY: Custom metric SQL (30-day)
CUSTOM_STATISTICAL_SQL_THIRTY_DAYS_AVG: Custom metric SQL (30-day average)
CUSTOM_STATISTICAL_SQL_LAST_MONTH: Custom metric SQL (last month)
CUSTOM_STATISTICAL_SQL_LAST_YEAR: Custom metric SQL (last year)
TABLE_SIZE: Table size (bytes)
TABLE_SIZE_ONE_DAY: Table size (yesterday) (bytes)
TABLE_SIZE_SEVEN_DAY: Table size (7 days ago) (bytes)
TABLE_SIZE_SEVEN_DAYS_AVG: 7-day average table size (bytes)
TABLE_SIZE_THIRTY_DAY: Table size (30 days ago) (bytes)
TABLE_SIZE_THIRTY_DAYS_AVG: 30-day average table size (bytes)
TABLE_SIZE_LAST_MONTH: Last month's table size (bytes)
TABLE_SIZE_LAST_YEAR: Last year's table size (bytes)
TABLE_SIZE_FIRST_DAY_OF_MONTH: Table size (first day of month) (bytes)
TABLE_LINE_COUNT: Table row count
TABLE_LINE_COUNT_ONE_DAY: Table row count (yesterday)
TABLE_LINE_COUNT_SEVEN_DAY: Table row count (7 days ago)
TABLE_LINE_COUNT_SEVEN_DAYS_AVG: 7-day average table row count
TABLE_LINE_COUNT_THIRTY_DAY: Table row count (30 days ago)
TABLE_LINE_COUNT_THIRTY_DAYS_AVG: 30-day average table row count
TABLE_LINE_COUNT_LAST_MONTH: Last month's table row count
TABLE_LINE_COUNT_LAST_YEAR: Last year's table row count
TABLE_LINE_COUNT_FIRST_DAY_OF_MONTH: Table row count (first day of month)
TABLE_PARTITION_COUNT: Table partition count
TABLE_PARTITION_COUNT_ONE_DAY: Table partition count (yesterday)
TABLE_PARTITION_COUNT_SEVEN_DAY: Table partition count (7 days ago)
TABLE_PARTITION_COUNT_SEVEN_DAYS_AVG: 7-day average table partition count
TABLE_PARTITION_COUNT_THIRTY_DAY: Table partition count (30 days ago)
TABLE_PARTITION_COUNT_THIRTY_DAYS_AVG: 30-day average table partition count
TABLE_PARTITION_COUNT_LAST_MONTH: Last month's table partition count
TABLE_PARTITION_COUNT_LAST_YEAR: Last year's table partition count
TABLE_PARTITION_COUNT_FIRST_DAY_OF_MONTH: Table partition count (first day of month)
PARTITION_SIZE: Partition size (bytes)
PARTITION_SIZE_ONE_DAY: Partition size (yesterday) (bytes)
PARTITION_SIZE_SEVEN_DAY: Partition size (7 days ago) (bytes)
PARTITION_SIZE_SEVEN_DAYS_AVG: 7-day average partition size (bytes)
PARTITION_SIZE_THIRTY_DAY: Partition size (30 days ago) (bytes)
PARTITION_SIZE_THIRTY_DAYS_AVG: 30-day average partition size (bytes)
PARTITION_SIZE_LAST_MONTH: Last month's partition size (bytes)
PARTITION_SIZE_LAST_YEAR: Last year's partition size (bytes)
PARTITION_SIZE_FIRST_DAY_OF_MONTH: Partition size (first day of month) (bytes)
PARTITION_LINE_COUNT: Partition row count
PARTITION_LINE_COUNT_ONE_DAY: Partition row count (yesterday)
PARTITION_LINE_COUNT_SEVEN_DAY: Partition row count (7 days ago)
PARTITION_LINE_COUNT_SEVEN_DAYS_AVG: 7-day average partition row count
PARTITION_LINE_COUNT_THIRTY_DAY: Partition row count (30 days ago)
PARTITION_LINE_COUNT_THIRTY_DAYS_AVG: 30-day average partition row count
PARTITION_LINE_COUNT_LAST_MONTH: Last month's partition row count
PARTITION_LINE_COUNT_LAST_YEAR: Last year's partition row count
PARTITION_LINE_COUNT_FIRST_DAY_OF_MONTH: Partition row count (first day of month)
FIELD_AVG: Field average
FIELD_AVG_ONE_DAY: Field average (yesterday)
FIELD_AVG_SEVEN_DAY: Field average (7 days ago)
FIELD_AVG_SEVEN_DAYS_AVG: 7-day average of field average
FIELD_AVG_THIRTY_DAY: Field average (30 days ago)
FIELD_AVG_THIRTY_DAYS_AVG: 30-day average of field average
FIELD_AVG_LAST_MONTH: Last month's field average
FIELD_AVG_LAST_YEAR: Last year's field average
FIELD_AVG_FIRST_DAY_OF_MONTH: Field average (first day of month)
FIELD_MAX: Field maximum
FIELD_MAX_ONE_DAY: Field maximum (yesterday)
FIELD_MAX_SEVEN_DAY: Field maximum (7 days ago)
FIELD_MAX_SEVEN_DAYS_AVG: 7-day average of field maximum
FIELD_MAX_THIRTY_DAY: Field maximum (30 days ago)
FIELD_MAX_THIRTY_DAYS_AVG: 30-day average of field maximum
FIELD_MAX_LAST_MONTH: Last month's field maximum
FIELD_MAX_LAST_YEAR: Last year's field maximum
FIELD_MAX_FIRST_DAY_OF_MONTH: Field maximum (first day of month)
FIELD_MIN: Field minimum
FIELD_MIN_ONE_DAY: Field minimum (yesterday)
FIELD_MIN_SEVEN_DAY: Field minimum (7 days ago)
FIELD_MIN_SEVEN_DAYS_AVG: 7-day average of field minimum
FIELD_MIN_THIRTY_DAY: Field minimum (30 days ago)
FIELD_MIN_THIRTY_DAYS_AVG: 30-day average of field minimum
FIELD_MIN_LAST_MONTH: Last month's field minimum
FIELD_MIN_LAST_YEAR: Last year's field minimum
FIELD_MIN_FIRST_DAY_OF_MONTH: Field minimum (first day of month)
FIELD_SUM: Field sum
FIELD_SUM_ONE_DAY: Field sum (yesterday)
FIELD_SUM_SEVEN_DAY: Field sum (7 days ago)
FIELD_SUM_SEVEN_DAYS_AVG: 7-day average of field sum
FIELD_SUM_THIRTY_DAY: Field sum (30 days ago)
FIELD_SUM_THIRTY_DAYS_AVG: 30-day average of field sum
FIELD_SUM_LAST_MONTH: Last month's field sum
FIELD_SUM_LAST_YEAR: Last year's field sum
FIELD_SUM_FIRST_DAY_OF_MONTH: Field sum (first day of month)
FIELD_COUNT: Field count
FIELD_COUNT_ONE_DAY: Field count (yesterday)
FIELD_COUNT_SEVEN_DAY: Field count (7 days ago)
FIELD_COUNT_SEVEN_DAYS_AVG: 7-day average of field count
FIELD_COUNT_THIRTY_DAY: Field count (30 days ago)
FIELD_COUNT_THIRTY_DAYS_AVG: 30-day average of field count
FIELD_COUNT_LAST_MONTH: Last month's field count
FIELD_COUNT_LAST_YEAR: Last year's field count
FIELD_COUNT_FIRST_DAY_OF_MONTH: Field count (first day of month)
FIELD_DISTINCT_COUNT: Field distinct count
FIELD_DISTINCT_COUNT_ONE_DAY: Field distinct count (yesterday)
FIELD_DISTINCT_COUNT_SEVEN_DAY: Field distinct count (7 days ago)
FIELD_DISTINCT_COUNT_SEVEN_DAYS_AVG: 7-day average of field distinct count
FIELD_DISTINCT_COUNT_THIRTY_DAY: Field distinct count (30 days ago)
FIELD_DISTINCT_COUNT_THIRTY_DAYS_AVG: 30-day average of field distinct count
FIELD_DISTINCT_COUNT_LAST_MONTH: Last month's field distinct count
FIELD_DISTINCT_COUNT_LAST_YEAR: Last year's field distinct count
FIELD_DISTINCT_COUNT_FIRST_DAY_OF_MONTH: Field distinct count (first day of month)
FIELD_GROUP_COUNT: Field group count
FIELD_GROUP_COUNT_ONE_DAY: Field group count (yesterday)
FIELD_GROUP_COUNT_SEVEN_DAY: Field group count (7 days ago)
FIELD_GROUP_COUNT_SEVEN_DAYS_AVG: 7-day average of field group count
FIELD_GROUP_COUNT_THIRTY_DAY: Field group count (30 days ago)
FIELD_GROUP_COUNT_THIRTY_DAYS_AVG: 30-day average of field group count
FIELD_GROUP_COUNT_LAST_MONTH: Last month's field group count
FIELD_GROUP_COUNT_LAST_YEAR: Last year's field group count
FIELD_GROUP_COUNT_FIRST_DAY_OF_MONTH: Field group count (first day of month)
FIELD_NULL_COUNT: Field null count
FIELD_NULL_COUNT_ONE_DAY: Field null count (yesterday)
FIELD_NULL_COUNT_SEVEN_DAY: Field null count (7 days ago)
FIELD_NULL_COUNT_SEVEN_DAYS_AVG: 7-day average of field null count
FIELD_NULL_COUNT_THIRTY_DAY: Field null count (30 days ago)
FIELD_NULL_COUNT_THIRTY_DAYS_AVG: 30-day average of field null count
FIELD_NULL_COUNT_LAST_MONTH: Last month's field null count
FIELD_NULL_COUNT_LAST_YEAR: Last year's field null count
FIELD_NULL_COUNT_FIRST_DAY_OF_MONTH: Field null count (first day of month)
FIELD_NULL_RATE: Field null rate (%)
FIELD_NULL_RATE_ONE_DAY: Field null rate (yesterday) (%)
FIELD_NULL_RATE_SEVEN_DAY: Field null rate (7 days ago) (%)
FIELD_NULL_RATE_SEVEN_DAYS_AVG: 7-day average of field null rate (%)
FIELD_NULL_RATE_THIRTY_DAY: Field null rate (30 days ago) (%)
FIELD_NULL_RATE_THIRTY_DAYS_AVG: 30-day average of field null rate (%)
FIELD_NULL_RATE_LAST_MONTH: Last month's field null rate (%)
FIELD_NULL_RATE_LAST_YEAR: Last year's field null rate (%)
FIELD_NULL_RATE_FIRST_DAY_OF_MONTH: Field null rate (first day of month) (%)
FIELD_DUPLICATE_COUNT: Field duplicate row count
FIELD_DUPLICATE_COUNT_ONE_DAY: Field duplicate row count (yesterday)
FIELD_DUPLICATE_COUNT_SEVEN_DAY: Field duplicate row count (7 days ago)
FIELD_DUPLICATE_COUNT_SEVEN_DAYS_AVG: 7-day average of field duplicate row count
FIELD_DUPLICATE_COUNT_THIRTY_DAY: Field duplicate row count (30 days ago)
FIELD_DUPLICATE_COUNT_THIRTY_DAYS_AVG: 30-day average of field duplicate row count
FIELD_DUPLICATE_COUNT_LAST_MONTH: Last month's field duplicate row count
FIELD_DUPLICATE_COUNT_LAST_YEAR: Last year's field duplicate row count
FIELD_DUPLICATE_COUNT_FIRST_DAY_OF_MONTH: Field duplicate row count (first day of month)
FIELD_DUPLICATE_RATE: Field duplicate row rate (%)
FIELD_DUPLICATE_RATE_ONE_DAY: Field duplicate row rate (yesterday) (%)
FIELD_DUPLICATE_RATE_SEVEN_DAY: Field duplicate row rate (7 days ago) (%)
FIELD_DUPLICATE_RATE_SEVEN_DAYS_AVG: 7-day average of field duplicate row rate (%)
FIELD_DUPLICATE_RATE_THIRTY_DAY: Field duplicate row rate (30 days ago) (%)
FIELD_DUPLICATE_RATE_THIRTY_DAYS_AVG: 30-day average of field duplicate row rate (%)
FIELD_DUPLICATE_RATE_LAST_MONTH: Last month's field duplicate row rate (%)
FIELD_DUPLICATE_RATE_LAST_YEAR: Last year's field duplicate row rate (%)
FIELD_DUPLICATE_RATE_FIRST_DAY_OF_MONTH: Field duplicate row rate (first day of month) (%)
VALUE: Original value
COUNT: Count
DISTINCT: Distinct count
SUM: Sum
MAX: Maximum
MIN: Minimum
AVG: Average
FIELD_VALIDATE_FIELDS: Verification field list
CODE_TABLE_DATA: Code table data
METRIC_N_DAY: N-day statistic
Summary statistics metric types
SINGLE_TABLE_STATISTICAL_DIFF_VALUE: Single-table statistical difference
SINGLE_TABLE_STATISTICAL_DIFF_RATE: Single-table statistical difference rate (%)
DOUBLE_TABLE_STATISTICAL_DIFF_VALUE: Two-table statistical difference
DOUBLE_TABLE_STATISTICAL_VALIDATE_VALUE: Validation field value
DOUBLE_TABLE_STATISTICAL_COMPARE_VALUE: Comparison field value
SINGLE_TABLE_STATISTICAL_VALIDATE_VALUE: Validation field value
SINGLE_TABLE_STATISTICAL_COMPARE_VALUE: Comparison field value
DOUBLE_TABLE_STATISTICAL_DIFF_RATE: Two-table statistical difference rate (%)
ERROR_NUMBER: Error count
NORMAL_NUMBER: Success count
ERROR_RATE: Error rate
NORMAL_RATE: Success rate
STATISTICAL_VALUE: Statistical value
ONE_DAY_FLUCTUATE: 1-day fluctuation (%)
SEVEN_DAY_FLUCTUATE: 7-day fluctuation (%)
THIRTY_DAY_FLUCTUATE: 30-day fluctuation (%)
LAST_MONTH_FLUCTUATE: Month-over-month fluctuation (%)
LAST_YEAR_FLUCTUATE: Year-over-year fluctuation (%)
DATASOURCE_CONNECTIVE: Data source connectivity
TABLE_SCHEMA_CHANGED: Table schema change
CUSTOM_STATISTICAL_SQL: Custom statistical SQL
TABLE_SIZE: Table size
TABLE_LINE_COUNT: Table row count
TABLE_PARTITION_COUNT: Table partition count
PARTITION_SIZE: Partition size
PARTITION_LINE_COUNT: Partition row count
FIELD_NULL_COUNT: Field null count
FIELD_NULL_RATE: Field null rate (%)
FIELD_AVG: Field average
FIELD_MAX: Field maximum
FIELD_MIN: Field minimum
FIELD_SUM: Field sum
FIELD_COUNT: Field count
FIELD_DISTINCT_COUNT: Field distinct count
FIELD_GROUP_COUNT: Field group count
FIELD_DUPLICATE_COUNT: Field duplicate count
FIELD_DUPLICATE_RATE: Field duplicate rate
CODE_TABLE_COMPARE_TOTAL_NUMBER: Code table comparison: Total rows
CODE_TABLE_COMPARE_ERROR_NUMBER: Code table comparison: Error rows
ARCHIVE_TABLE_DDL: Archive table DDL
ARCHIVE_TABLE_DQL: Archive table DQL
Getting aggregated statistical metrics
select
tenant_id
,rule_task_id
,rule_id
,watch_id
,metric_type
,metric_name
,bizdate
,metric_value
,bizdate_format
from
(select
tenant_id
,rule_task_id
,rule_id
,watch_id
,get_json_object(metric_value, '$.metricType') as metric_type
,get_json_object(metric_value, '$.metricName') as metric_name
,get_json_object(metric_value, '$.bizDate') as bizdate
,get_json_object(metric_value, '$.metricValue') as metric_value
,get_json_object(metric_value, '$.bizDateFormat') as bizdate_format
,row_number() over(partition by tenant_id,rule_task_id,rule_id,get_json_object(metric_value, '$.metricName'),get_json_object(metric_value, '$.bizDate') order by gmt_create desc) as rank_num
from (
select
--regexp_replace(regexp_replace(execute_context,'\\\\r|\\\\t|\\\\v|\\\\f|\\\\n|\\\\u([[:xdigit:]]){4}|\\{\\\\"|\\\\"\\}|\\\\"',''),'"metricValue":NaN','"metricValue":"NaN"}'),
split(regexp_replace(regexp_replace(get_json_object(
regexp_replace(execute_context,'"metricValue":NaN','"metricValue":"NaN"')
, '$.allMetricValues'), '\\[|\\]', ''),'(\\}(\t\r\v\f\n)*,)','}abcyugongcba'), 'abcyugongcba') as all_metric_values
,*
from fct_dataphin_dp_quality_rule_task_di
where ds='${bizdate}'
) json_table
LATERAL VIEW explode(all_metric_values) exploded_table AS metric_value
)tt
where tt.rank_num = 1
;create table if not exists fct_dataphin_dp_quality_rule_task_di
(
tenant_id bigint comment 'Tenant ID.',
rule_task_id bigint comment 'Rule task ID.',
watch_task_id bigint comment 'Associated watch task ID.',
watch_id bigint comment 'Monitored object ID.',
watch_detail string comment 'Monitored object details.',
watch_type string comment 'Monitored object type.',
watch_task_status string comment 'Watch task status. Values: WAITING, RUNNING, SUCCESS, FAILED.',
data_source_id bigint comment 'Data source ID.',
data_source_type string comment 'Data source type.',
data_source_scope string comment 'Data source scope.',
data_source_owner string comment 'Data source owner.',
table_catalog string comment 'Table catalog.',
table_name string comment 'Table name.',
table_type string comment 'Table type.',
table_owner string comment 'Table owner.',
biz_unit_name string comment 'Business unit name.',
project_name string comment 'Project name.',
template_id bigint comment 'Rule template ID.',
template_type string comment 'Rule template type.',
template_name string comment 'Rule template name.',
template string comment 'Rule template details.',
rule_id bigint comment 'Rule ID.',
rule_detail string comment 'Rule details.',
rule_name string comment 'Rule name.',
rule_strength string comment 'Rule strength.',
rule_catalogs string comment 'Rule categories, separated by commas. Values: CONSISTENT, EFFECTIVE, TIMELINESS, ACCURATE, UNIQUENESS, COMPLETENESS, STABILITY, CUSTOM.',
validate_object_type string comment 'Validated object type.',
validate_object_name string comment 'Validated object name.',
biz_date string comment 'Business date.',
biz_date_format string comment 'Business date format.',
quality_owner_id string comment 'Quality owner ID.',
quality_owner_name string comment 'Quality owner name.',
rule_task_status string comment 'Rule task status. Values: WAITING, RUNNING, SUCCESS, FAILED.',
execute_context string comment 'Rule execution context.',
is_validate_result string comment 'Whether the rule validation passed. Values: Y, N.',
start_time string comment 'Start time.',
end_time string comment 'End time.',
error_code string comment 'Error code.',
schedule_id bigint comment 'Schedule setting ID.',
schedule_type string comment 'Schedule setting type.',
schedule_partition_values string comment 'Calculated partition values for the schedule.',
schedule_params string comment 'Parameter values for the schedule.',
trace_id string comment 'Trace ID. Format: type:id.',
creator_id string comment 'Creator''s ID.',
creator_name string comment 'Creator''s name.',
gmt_create string comment 'Creation time.',
modifier_id string comment 'Last modifier''s ID.',
modifier_name string comment 'Last modifier''s name.',
gmt_modified string comment 'Last modification time.',
tag string comment 'Business tag. Values: DATA_STANDARD_MANUAL, DATA_STANDARD_AUTO, PIPELINE.',
tag_unique_key string comment 'Unique key for the business tag.'
) comment 'Quality rule task table.'
partitioned by (ds string comment 'Partition field.')
lifecycle 90
;
dim_dataphin_dp_quality_watch_member
create table if not exists dim_dataphin_dp_quality_watch_member
(
tenant_id bigint comment 'Tenant ID.',
watch_id bigint comment 'Monitored object ID.',
watch_type string comment 'Monitored object type. Valid values: INDEX (metric), TABLE (Dataphin table), REALTIME_LOGICAL_TABLE (real-time logical table), DATASOURCE (data source), and DATASOURCE_TABLE (data source table).',
watch_status string comment 'Status. Valid values: ENABLE and DISABLE.',
user_id string comment 'User ID.',
user_name string comment 'User name.',
user_type string comment 'Member type. Fixed value: QUALITY_OWNER.',
creator_id string comment 'Creator ID.',
creator_name string comment 'Creator name.',
gmt_create string comment 'Creation time.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
gmt_modified string comment 'Last modification time.'
) comment 'Quality monitoring object owner table.'
partitioned by (ds string comment 'Partition column.')
lifecycle 30;
dim_dataphin_dp_quality_watch_alert
create table if not exists dim_dataphin_dp_quality_watch_alert
(
tenant_id bigint comment 'Tenant ID.',
alert_id bigint comment 'Alert setting ID. For configurations that apply to all rules. For rule-specific configurations, see the `dim_dataphin_dp_quality_rule` table.',
alert_name string comment 'Alert name.',
alert_scope string comment 'Alert scope. The value is `WATCH`.',
alert_users string comment 'User IDs to alert, in JSON format.',
alert_users_channels string comment 'Channels for alerting users, in JSON format. Valid values: `VOICE`, `SMS`, `MAIL` (email), `DINGTALK_ROBOT` (DingTalk robot), and `DINGDING` (DingTalk work notification).',
alert_duties string comment 'On-call schedule to alert.',
alert_duties_channels string comment 'Channels for alerting the on-call schedule. For valid values, see `alert_users_channels`.',
alert_quality_owner string comment 'Specifies whether to alert the quality owner. Valid values: `Y`, `N`.',
alert_quality_owner_channels string comment 'Channels for alerting the quality owner. For valid values, see `alert_users_channels`.',
custom_alert_scope string comment 'Custom alert scope in JSON format, containing the rule ID and name.',
sort_index bigint comment 'Sort index.',
watch_id bigint comment 'Monitored object ID.',
watch_type string comment 'Type of the monitored object. Valid values: `INDEX` (metric), `TABLE` (Dataphin table), `REALTIME_LOGICAL_TABLE` (real-time logical table), `DATASOURCE` (data source), and `DATASOURCE_TABLE` (data source table).',
table_id string comment 'Table ID. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this ID is from `dim_dataphin_table`. If `watch_type` is `TABLE` or `INDEX` for a logical table, this ID is from `dim_dataphin_model`.',
table_type string comment 'Table type. Valid values: `LOGIC_DIM_TABLE` (dimension logical), `LOGIC_FACT_TABLE` (fact logical), `LOGIC_SUM_TABLE` (summary logical), `LOGIC_LABEL_TABLE` (label logical), `PHYSICAL_TABLE` (physical), and `REALTIME_LOGICAL_TABLE` (real-time logical).',
table_env string comment 'Table environment. Valid values: `PROD`, `DEV`.',
table_name string comment 'Table name.',
table_catalog string comment 'Table group. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this is the project name. If `watch_type` is `TABLE` or `INDEX` for a logical table, this is the business unit name.',
creator_id string comment 'Creator ID.',
creator_name string comment 'Creator name.',
gmt_create string comment 'Creation time.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
gmt_modified string comment 'Last modification time.'
) comment 'Alert configurations for quality monitoring objects.'
partitioned by (ds string comment 'Partition key.')
lifecycle 90
;
fct_dataphin_dp_quality_rule_score_di
create table if not exists fct_dataphin_dp_quality_rule_score_di
(
id bigint comment 'The auto-incrementing ID.',
tenant_id bigint comment 'The ID of the tenant.',
rule_id bigint comment 'The ID of the rule.',
rule_name string comment 'Rule name.',
rule_catalogs string comment 'Rule category tags.',
rule_weight bigint comment 'The scoring weight for the monitored table.',
rule_score double comment 'The rule quality score.',
strength string comment 'Rule strength. Valid values: STRONG, WEAK.',
template_id bigint comment 'The ID of the template.',
template_type string comment 'The template type.',
template_name string comment 'Template name.',
template_cn_name string comment 'Chinese name of the template.',
watch_id bigint comment 'Monitored object ID.',
watch_type string comment 'Type of the monitored object. Valid values: INDEX, TABLE, REALTIME_LOGICAL_TABLE, DATASOURCE, DATASOURCE_TABLE.',
watch_weight bigint comment 'The scoring weight for the rule.',
table_id string comment 'Table ID. This is the ID in `dim_dataphin_table` if `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`. This is the ID in `dim_dataphin_model` if `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`.',
table_type string comment 'Table type. Valid values: `LOGIC_DIM_TABLE`, `LOGIC_FACT_TABLE`, `LOGIC_SUM_TABLE`, `LOGIC_LABEL_TABLE`, `PHYSICAL_TABLE`, and `REALTIME_LOGICAL_TABLE`.',
is_logical_table string comment 'Indicates if the table is a logical table (Y/N).',
table_env string comment 'Table environment. Valid values: PROD, DEV.',
table_name string comment 'Table name.',
table_catalog string comment 'Table catalog. Project name if `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`. Business unit name if `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`.',
biz_unit_id bigint comment 'ID of the business unit.',
biz_unit_name string comment 'Name of the business unit.',
project_id bigint comment 'Project ID.',
project_name string comment 'Project name.',
validate_object_type string comment 'Validation object type. Valid values: TABLE, COLUMN, DATASOURCE, REALTIME, INDEX, CHAIN, and UNKNOWN.',
validate_object_name string comment 'Validation object name.',
rule_status string comment 'Rule status. Valid values: ENABLE, DISABLE.',
biz_date string comment 'Business date in YYYY-MM-DD format.',
rule_task_id bigint comment 'Rule execution ID.',
rule_task_status string comment 'Rule execution status. Valid values: WAITING, RUNNING, SUCCESS, FAILED.',
is_validate_result string comment 'Indicates if the validation passed (Y/N).',
start_time string comment 'The start time.',
end_time string comment 'The end time.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The last modified time.'
) comment 'The quality rule scoring table.'
partitioned by (ds string comment 'The partition column.')
lifecycle 91
;
dim_dataphin_dp_quality_schedule
create table if not exists dim_dataphin_dp_quality_schedule(
tenant_id bigint comment 'The ID of the tenant.',
schedule_id bigint comment 'The ID of the schedule.',
schedule_type string comment 'The schedule type. Valid values include `STATIC_TASK_TRIGGER` (time-based), `TRIGGER_SCHEDULE` (task-triggered), and `CODE_CHECK_TRIGGER` (triggered by data updates).',
schedule_name string comment 'The name of the schedule.',
watch_id bigint comment 'The ID of the associated watch.',
watch_type string comment 'The type of the associated watch.',
partition_type string comment 'The type of the partition expression. Valid value: `CUSTOM`.',
partition_expression string comment 'The partition expression.',
date_format string comment 'The date format.',
trigger_schedule_nodes string comment 'The nodes that trigger the schedule.',
static_task_trigger_type string comment 'The trigger mode for a task-triggered schedule. Valid values: `ALL_TASKS_FINISHED` to trigger after all specified tasks complete successfully, `ONE_TASKS_FINISHED` to trigger after each specified task completes successfully, and `PRE_ONE_TASKS_START` to trigger before each specified task starts.',
cron_expression string comment 'The cron expression.',
schedule_condition string comment 'Schedule conditions in JSON format.',
validate_partition_type string comment 'The type of the partition range to validate. Valid values: `TASK_REFERRED_PARTITION` for the partitions updated by the task, and `USER_DEFINED_PARTITION` for a custom partition range.',
creator_id string comment 'The ID of the creator.',
creator_name string comment 'The name of the creator.',
modifier_id string comment 'The ID of the modifier.',
modifier_name string comment 'The name of the modifier.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The modification time.'
)comment 'Stores the data quality schedule configurations.'
partitioned by(ds string comment 'The partition field, in `yyyymmdd` format.')
lifecycle 15;
fct_dataphin_dp_quality_problem_list_df
-
To retrieve detailed information about monitored objects such as data sources, data source tables, or metrics, join the
dim_dataphin_dp_quality_watchtable usingtenant_id/watch_id. -
To retrieve detailed information about a quality rule, join the
dim_dataphin_dp_quality_ruletable usingtenant_id/rule_id. -
Composition of the problem object name and the validation object name (for displaying the problem object):
-
problem_object_big_type = TABLE/DATASOURCE_TABLE/REALTIME_LOGICAL_TABLE
-
When
validate_object_type=TABLE, bothproblem_object_nameandvalidate_object_nameare the table name. -
When
validate_object_type=COLUMN/REALTIME,problem_object_nameistable_name.column_name, andvalidate_object_nameiscolumn_name. -
When
validate_object_type=CHAIN,problem_object_nameis the real-time table name, andvalidate_object_nameis the offline table name.
-
-
problem_object_big_type = DATASOURCE
-
When
validate_object_type=TABLE, bothproblem_object_nameandvalidate_object_nameare the data source table name. -
When
validate_object_type=DATASOURCE, bothproblem_object_nameandvalidate_object_nameare the data source name.
-
-
problem_object_big_type = INDEX
-
For system-detected problems,
problem_object_nameistable_name.metric_name, andvalidate_object_nameismetric_name. -
For manual entries, both
problem_object_nameandvalidate_object_nameare the metric name.
-
-
problem_object_big_type = QD_FEATURE
Currently, label issues must be entered manually. In this case,
problem_object_nameis the label name, andvalidate_object_nameis the label code.
-
create table if not exists fct_dataphin_dp_quality_problem_list_df(
tenant_id bigint,
problem_object_id bigint comment 'The ID of the problem object.',
problem_object_big_type string comment 'The primary category of the problem object. Valid values: TABLE for a Dataphin table, DATASOURCE_TABLE for a global table, INDEX for a metric, DATASOURCE for a data source, REALTIME_LOGICAL_TABLE for a real-time meta table, and QD_FEATURE for a tag.',
validate_object_type string comment 'The type of the validation object. Valid values: TABLE, COLUMN, DATASOURCE, INDEX, REALTIME, CHAIN for a batch pipeline, and QD_FEATURE for a tag.',
problem_object_name string comment 'The name of the problem object.',
validate_object_name string comment 'The name of the validation object.',
watch_id string comment 'The ID of the monitored object.',
table_id string comment 'The ID of the table.',
table_name string comment 'The name of the table.',
table_desc string comment 'The description of the table.',
datasource_type string comment 'The type of the data source.',
datasource_id string comment 'The ID of the data source.',
datasource_name string comment 'The name of the data source.',
index_id string comment 'The ID of the metric.',
index_name_en string comment 'The English name of the metric.',
index_name_cn string comment 'The Chinese name of the metric.',
index_biz_unit_name string comment 'The name of the business unit for the metric.',
qd_feature_id string comment 'The ID of the tag.',
qd_feature_name string comment 'The name of the tag.',
qd_feature_code string comment 'The code for the tag.',
rule_id string comment 'The ID of the rule.',
rule_name string comment 'The name of the rule.',
rule_strength string comment 'The strength of the rule.',
last_problem_status string comment 'The latest status of the quality issue. Valid values: NEW (pending), VERIFY (verifying), FINISHED (resolved), and IGNORE (ignored).',
last_problem_types string comment 'The latest types of the quality issue, in JSON format.',
problem_desc string comment 'The description of the quality issue.',
problem_submit_type string comment 'The submission method for the quality issue. Valid values: SYSTEM (system-detected) and MANUAL (manually entered).',
problem_submit_time string comment 'The submission time of the quality issue. This is the latest validation time for system-detected issues and the submission time for manually entered issues.',
problem_submitter_id string comment 'The ID of the user who manually submitted the quality issue.',
problem_submitter_name string comment 'The name of the user who manually submitted the quality issue.',
last_problem_owner_ids string comment 'A comma-separated list of the latest owner IDs for the quality issue.',
last_problem_owner_names string comment 'A comma-separated list of the latest owner names for the quality issue.',
last_validate_scope string comment 'The scope of the latest system validation.',
last_validate_result string comment 'The result of the latest system validation. Valid values: PASS (passed), NOT_PASS (failed), and FAILED (execution failed).',
last_rectify_id bigint comment 'The ID of the latest rectification.',
last_rectify_name string comment 'The name of the latest rectification.'
) comment 'Fact table for Dataphin quality issues.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;
fct_dataphin_dp_quality_rectify_df
create table if not exists fct_dataphin_dp_quality_rectify_df(
tenant_id bigint,
rectify_id bigint comment 'Rectification ID',
rectify_name string comment 'Rectification name',
problem_object_big_type string comment 'Problem object type. Valid values: TABLE (Dataphin table), DATASOURCE_TABLE (global table), INDEX (index), DATASOURCE (data source), REALTIME_LOGICAL_TABLE (real-time logical table), and QD_FEATURE (tag).',
problem_num bigint comment 'Number of problems',
priority string comment 'Priority. Valid values: LOW, MIDDLE, and HIGH.',
last_rectify_status string comment 'Latest rectification status. Valid values: in progress, pending acceptance, requires re-rectification, and accepted.',
start_time string comment 'Start time',
finish_time string comment 'Finish time',
gmt_create string comment 'Creation time',
gmt_modified string comment 'Last modified time',
submit_user_id string comment 'Submitter ID',
submit_user_name string comment 'Submitter name',
last_rectify_user_id string comment 'Latest rectifier ID',
last_rectify_user_name string comment 'Latest rectifier name',
last_verify_user_id string comment 'Latest verifier ID',
last_verify_user_name string comment 'Latest verifier name'
) comment 'Data quality rectification tasks'
partitioned by (ds string comment 'Data partition date, in yyyymmdd format.')
lifecycle 30;
fct_dataphin_dp_quality_rectify_operation_record_df
create table if not exists fct_dataphin_dp_quality_rectify_operation_record_df(
tenant_id bigint,
record_id bigint,
rectify_id bigint comment 'Specifies the ID of the rectification task.',
rectify_name string comment 'Specifies the name of the rectification task.',
rectify_operation string comment 'Specifies the type of rectification operation. Valid values: Add to rectification process, Associate with existing rectification process, Submit for acceptance, Reject, and Accept.',
rectify_status string comment 'Specifies the status of the rectification task. Valid values: In progress, Pending acceptance, Rework required, and Accepted.',
operator_id string comment 'Specifies the ID of the user who performed the operation.',
operator_name string comment 'Specifies the name of the user who performed the operation.',
operation_time string comment 'Specifies when the operation was performed.',
operation_comment string comment 'A comment or note about the operation.'
) comment 'Stores operation records for quality rectification tasks.'
partitioned by (ds string comment 'The partition field, formatted as yyyymmdd.')
lifecycle 30;
fct_dataphin_dp_quality_rectify_problem_relation_df
create table if not exists fct_dataphin_dp_quality_rectify_problem_relation_df(
tenant_id bigint,
rectify_id bigint comment 'The ID of the quality rectification.',
rectify_name string comment 'The name of the quality rectification.',
problem_object_id bigint comment 'The ID of the problem object.',
problem_object_big_type string comment 'The primary category of the problem object. Valid values: TABLE for a Dataphin table, DATASOURCE_TABLE for a global table, INDEX for an index, DATASOURCE for a data source, REALTIME_LOGICAL_TABLE for a real-time logical table, and QD_FEATURE for a tag.',
validate_object_type string comment 'The type of the validation object. Valid values: TABLE for a table, COLUMN for a column, DATASOURCE for a data source, INDEX for an index, REALTIME for real-time, CHAIN for offline, and QD_FEATURE for a tag.',
problem_object_name string comment 'The name of the problem object.',
validate_object_name string comment 'The name of the validation object.',
problem_status string comment 'The status of the problem in the quality rectification snapshot. Valid values: NEW for pending, VERIFY for validating, FINISHED for resolved, and IGNORE for ignored.',
last_problem_types string comment 'The latest problem types, in JSON format.',
problem_desc string comment 'Problem description.',
rule_id string comment 'Rule ID.',
rule_name string comment 'Rule name.',
rule_strength string comment 'Rule strength.',
last_validate_scope string comment 'Latest validation scope.',
last_validate_result string comment 'Latest validation result.',
last_validate_time string comment 'Latest validation time.',
problem_submit_type string comment 'The problem reporting method. Valid values: SYSTEM for system identification and MANUAL for manual entry.',
problem_owner_names string comment 'Comma-separated names of problem owners.'
) comment 'This table stores the relationship between quality rectification processes and their associated problems.'
partitioned by (ds string comment 'The partition column, in yyyymmdd format.')
lifecycle 30;
Default
dim_dataphin_datastandard_standard
Starting from V4.3.1, you can retrieve the relationships between standards and code tables.
select
tenant_id
,standard_id
,standard_name
,standard_name_en
,standard_code
,standard_status
,get_json_object(attribute_relation, '$.attributeId') as attribute_id -- The ID of the standard attribute.
,get_json_object(attribute_relation, '$.attributeName') as attribute_name -- The standard attribute name.
,get_json_object(attribute_relation, '$.relationType') as relation_type -- The relationship type. `LOOKUP_TABLE` indicates a lookup table, and `LOOKUP_TABLE_VALUE` indicates a value in a lookup table.
,get_json_object(attribute_relation, '$.lookupTableId') as lookup_table_id -- The ID of the lookup table.
,get_json_object(attribute_relation, '$.lookupTableName') as lookup_table_name -- The lookup table name.
,get_json_object(attribute_relation, '$.lookupTableCode') as lookup_table_code -- The lookup table code.
from (
select
split(regexp_replace(regexp_replace(
get_json_object(attributes_with_value_config , '$.attributeLookupTableRelations')
, '\\[|\\]', ''),'(\\}(\t\r\v\f\n)*,)','}abcyugongcba'), 'abcyugongcba') as attribute_lookup_table_relations
,*
from dim_dataphin_datastandard_standard
where ds='${bizdate}'
) json_table
LATERAL VIEW explode(attribute_lookup_table_relations) exploded_table AS attribute_relation
;
-
standard_status: standard status. -
The possible values for
standard_stage=PRODare:NOT_ACTIVATED: Not activated;ACTIVE: Active;EXPIRED: Expired. -
The possible values for
standard_stage=DEVare:DRAFT: Draft;UNDER_REVISION: Under revision;UNDER_REVIEW: Under review;REVIEW_PASSED: Review passed;PUBLISHING: Publishing.
create table if not exists dim_dataphin_datastandard_standard
(
tenant_id bigint comment 'Tenant ID.',
standard_id bigint comment 'Standard ID.',
standard_name string comment 'Standard name.',
standard_name_en string comment 'Standard English name.',
standard_code string comment 'Standard code.',
standard_type string comment 'Standard type.',
version bigint comment 'Version number.',
standard_status string comment 'Standard status. `NOT_ACTIVATED`: Not activated. `ACTIVE`: Active. `EXPIRED`: Expired.',
standard_stage string comment 'Standard stage. `DEV`: Draft, in review, or being published. `PROD`: In production.',
standard_set_id bigint comment 'Standard set ID.',
standard_set_name string comment 'Standard set name.',
basic_info string comment 'Basic standard information.',
standard_set_reference string comment 'Associated standard set reference.',
attributes_with_value_config string comment 'Standard attribute configuration.',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
creator_id string comment 'Creator ID.',
creator_name string comment 'Creator name.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Last modification time.',
monitor_config string comment 'Monitoring configuration in JSON format.',
standard_template_id bigint comment 'Standard template ID.',
standard_template_name string comment 'Standard template name.',
standard_template_reference string comment 'Associated standard template reference.'
)comment 'Standard table.'
partitioned by (ds string comment 'Partition field.')
lifecycle 30;
dim_dataphin_datastandard_rule
create table if not exists dim_dataphin_datastandard_rule
(
tenant_id bigint comment 'Tenant ID.',
standard_rule_id bigint comment 'Standard mapping rule ID.',
standard_rule_name string comment 'Standard mapping rule name.',
standard_rule_desc string comment 'Standard mapping rule description.',
version bigint comment 'Version number.',
asset_type string comment 'Asset type. INDEX for metric, COLUMN for column.',
--standard_set_id bigint comment 'Deprecated. Standard set ID.',
--standard_set_name string comment 'Deprecated. Standard set name.',
basic_info string comment 'Basic standard information.',
evaluate_scope_config string comment 'Evaluation scope configuration.',
mapping_rule_config string comment 'Standard mapping rule configuration.',
is_valid string comment 'Indicates if the rule is active: Y for active, N for inactive.',
owner_id string comment 'Owner ID.',
owner_name string comment 'Owner name.',
creator_id string comment 'Creator ID.',
creator_name string comment 'Creator name.',
modifier_id string comment 'Last modifier ID.',
modifier_name string comment 'Last modifier name.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Last modification time.',
standard_set_ids string comment 'List of standard set IDs.',
standard_template_id bigint comment 'Standard template ID.',
standard_template_name string comment 'Standard template name.'
) comment 'Standard mapping rules.'
partitioned by (ds string comment 'Partition key.')
lifecycle 30;
dim_dataphin_datastandard_rule_asset_relation
-
For data standard mapping results, we recommend using the
dim_dataphin_datastandard_asset_relationtable. -
When a mapping rule runs, it first deletes existing mapping relationships and then regenerates them based on the latest configuration. If the meta warehouse shared model task and the standard mapping rule task run concurrently, data loss or inconsistencies may occur.
-
Starting with version V4.3.1, you can view standard mapping monitoring results for your existing mapping relationships.
-
When you delete a standard mapping rule, its data remains in the table, but the corresponding
standard_rule_nameis set to null.
select
tenant_id
,guid
,asset_type
,standard_id
,standard_name
,standard_name_en
,standard_rule_id
,standard_rule_name
,standard_set_id
,standard_set_name
,gmt_create
,monitor_status
,total_monitor_num --Total number of monitoring rules.
,passed_monitor_num --Number of passed monitoring rules.
,get_json_object(single_column_monitor_result, '$.monitorResultStatus') as monitor_result_status --The status of the monitoring result. Valid values: `FAILED` (failed), `PASSED` (passed), `NOT_CONFIGURED` (not configured), and `WAIT_EVALUATE` (waiting for evaluation).
,get_json_object(single_column_monitor_result, '$.monitorType') as monitor_type --The type of the monitoring rule. Valid values: `METADATA` (metadata) and `QUALITY` (data quality).
,get_json_object(single_column_monitor_result, '$.ruleName') as rule_name --The name of the rule.
,get_json_object(single_column_monitor_result, '$.attributeId') as attribute_id --The ID of the standard attribute.
,get_json_object(single_column_monitor_result, '$.attributeName') as attribute_name --The name of the standard attribute.
,get_json_object(single_column_monitor_result, '$.attributeValue') as attribute_value --The value of the standard attribute.
,get_json_object(single_column_monitor_result, '$.assetColumnValue') as asset_column_value --The value of the asset object. Populated only when `monitor_type` = 'METADATA'.
,get_json_object(single_column_monitor_result, '$.qualityRuleId') as quality_rule_id --The ID of the data quality rule. Populated only when `monitor_type` = 'QUALITY'.
,get_json_object(single_column_monitor_result, '$.recentExecuteHistoryId') as quality_rule_task_id --The ID of the data quality task. Populated only when `monitor_type` = 'QUALITY'.
from (
select
split(regexp_replace(regexp_replace(
get_json_object(monitor_result , '$.simpleSingleColumnMonitorResults')
, '\\[|\\]', ''),'(\\}(\t\r\v\f\n)*,)','}abcyugongcba'), 'abcyugongcba') as all_column_monitor_results
,get_json_object(monitor_result , '$.totalMonitorNum') as total_monitor_num
,get_json_object(monitor_result , '$.passedMonitorNum') as passed_monitor_num
,*
from dim_dataphin_datastandard_rule_asset_relation
where ds='${bizdate}'
) json_table
LATERAL VIEW explode(all_column_monitor_results) exploded_table AS single_column_monitor_result
;create table if not exists dim_dataphin_datastandard_rule_asset_relation
(
tenant_id bigint comment 'The tenant ID.',
guid string comment 'The asset GUID.',
asset_type string comment 'The asset type. Valid values: `INDEX` for a metric and `COLUMN` for a column. Other values are custom.',
standard_id bigint comment 'The standard ID.',
standard_name string comment 'The standard name.',
standard_name_en string comment 'The English name of the standard.',
standard_status string comment 'The status of the standard. Valid values: `NOT_ACTIVATED` for inactive, `ACTIVE` for active, and `EXPIRED` for published standards that have expired.',
standard_rule_id bigint comment 'The ID of the standard rule or evaluation job.',
standard_rule_name string comment 'The name of the standard rule. Populated only when `job_type` is `STANDARD_RULE`.',
standard_set_id bigint comment 'The ID of the standard set.',
standard_set_name string comment 'The name of the standard set.',
gmt_create string comment 'The creation time.',
gmt_modified string comment 'The last modified time.',
asset_snapshot string comment 'The asset snapshot.',
standard_version bigint comment 'The standard version.',
monitor_result string comment 'The monitoring result.',
monitor_status string comment 'The status of the monitoring result. Valid values: NOT_CONFIGURED, PASSED, and FAILED.',
is_latest_rule string comment 'Indicates whether this mapping is the result of the latest run of the standard rule for the given standard and asset pair. Valid values: `Y` and `N`.',
job_type string comment 'The job type. Valid values: `STANDARD_RULE` for a standard rule and `EVALUATE_JOB` for an evaluation job.'
) comment 'This table maps standard rules to assets.'
partitioned by (ds string comment 'The partition field.')
lifecycle 30;
dim_dataphin_datastandard_asset_relation
This table is for standard bid selection results.
create table if not exists dim_dataphin_datastandard_asset_relation
(
tenant_id bigint comment 'Tenant ID',
guid string comment 'Asset GUID',
asset_type string comment 'Asset type. `INDEX`: indicator, `COLUMN`: column. Other values are user-defined.',
asset_name string comment 'Asset name',
asset_belong_name string comment 'Name of the parent asset. For a column asset, this is the full table name.',
standard_id bigint comment 'Standard ID',
standard_name string comment 'Standard name',
standard_name_en string comment 'English name of the standard',
standard_status string comment 'Status of the standard. `NOT_ACTIVATED`: not activated, `ACTIVE`: active, `EXPIRED`: expired. A standard that was published but has passed its expiration time.',
standard_set_id bigint comment 'Standard set ID',
standard_set_name string comment 'Standard set name',
gmt_create string comment 'Create time',
gmt_modified string comment 'Last modified time',
standard_template_id bigint comment 'Standard template ID',
standard_template_name string comment 'Standard template name'
) comment 'Table for standard-to-asset mapping'
partitioned by (ds string comment 'The partition column.')
lifecycle 90
;
DataPhin: Data standard lookup table
create table if not exists dim_dataphin_datastandard_lookup_table
(
tenant_id bigint comment 'Tenant ID',
lookup_table_id bigint comment 'Lookup table ID',
lookup_table_name string comment 'Lookup table name',
lookup_table_code string comment 'Lookup table code',
lookup_table_des string comment 'Lookup table description',
lookup_table_value_name string comment 'Lookup value name',
lookup_table_value_value string comment 'Lookup value',
lookup_table_value_en_name string comment 'Lookup value English name',
lookup_table_value_des string comment 'Lookup value description',
lookup_table_directory string comment 'Lookup table directory',
owner_id string comment 'Owner ID',
owner_name string comment 'Owner name',
creator_id string comment 'Creator ID',
creator_name string comment 'Creator name',
modifier_id string comment 'Last modifier ID',
modifier_name string comment 'Last modifier name',
gmt_create string comment 'Creation time',
gmt_modified string comment 'Last modification time'
)
comment 'Contains metadata for data standard lookup tables.'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 30;
Real-time
dim_DataPhin_stream_table
create table if not exists dim_dataphin_stream_table
(
tenant_id bigint comment 'ID of the tenant.',
project_id bigint comment 'ID of the project.',
project_name string comment 'Name of the project.',
table_name string comment 'Name of the metadata table.',
env string comment 'The environment. Valid values are DEV and PROD.',
owner_id string comment 'ID of the table owner.',
owner_name string comment 'Name of the table owner.',
compute_engine_id bigint comment 'ID of the compute engine.',
compute_engine_name string comment 'Name of the compute engine.',
compute_engine_type string comment 'Type of the compute engine. Valid values are OPEN_FLINK, Flink, and VVP.',
data_source_id bigint comment 'ID of the data source.',
data_source_name string comment 'Name of the data source.',
data_source_type string comment 'Type of the data source. Examples include MaxCompute, MySQL, SQL Server, Hive, ONE_SERVICE, PostgreSQL, FTP, EMR_HIVE, Oracle, HDFS, DRDS, AnalyticDB, MongoDB, Flink, AnalyticDB for PG, Hologres, and HBase 1.1.x.',
src_table_name string comment 'Name of the source table.',
connector string comment 'Connector.',
gmt_create string comment 'Creation time.',
gmt_modified string comment 'Modification time.'
) comment 'Metadata for real-time metadata tables.'
partitioned by (ds string comment 'Partition field, in yyyymmdd format.')
lifecycle 31 ;
dim_dataphin_stream_column
create table if not exists dim_dataphin_stream_column
(
tenant_id bigint comment 'The ID of the tenant.',
project_id bigint comment 'The ID of the project.',
project_name string comment 'The name of the project.',
table_name string comment 'The name of the metatable.',
column_name string comment 'The name of the column.',
column_type string comment 'The data type of the column.',
column_comment string comment 'The comment for the column or partition key.',
column_seq bigint comment 'The ordinal position of the column.',
raw_column_type string comment 'The original data type of the column.',
env string comment 'The environment. Possible values are DEV (development) and PROD (production).',
create_time string comment 'Timestamp of record creation.',
modify_time string comment 'Timestamp of the last modification.'
) comment 'Stores metadata for columns in real-time metatables.'
partitioned by (ds string comment 'The partition field, formatted as yyyymmdd.')
lifecycle 31 ;
Label platform
dim_dataphin_featurex_market
create table if not exists dim_dataphin_featurex_market (
tenant_id bigint comment 'The tenant ID.',
market_id string comment 'The ID of the market.',
market_name string comment 'The name of the market.',
market_type bigint comment 'The market type. Set to 0 for a public market or 1 for a private market.',
market_desc string comment 'The description of the market.',
owner_id string comment 'The ID of the owner.',
owner_name string comment 'The name of the owner.',
gmt_create string comment 'The time when the record was created.',
gmt_modified string comment 'The time when the record was last modified.'
)comment 'Stores market information for the Dataphin label platform.'
partitioned by (ds string comment 'The partition field, in the `yyyymmdd` format.')
lifecycle 30;
dim_dataphin_featurex_lineage
create table if not exists dim_dataphin_featurex_lineage (
tenant_id bigint comment 'The tenant ID.',
from_id string comment 'The ID of the source resource.',
from_version bigint comment 'The version of the source resource.',
from_type string comment 'The type of the source resource. Valid values: view, feature, data_source, application, subscription, behavior, cluster, market, project, entity_identity, physical_table, profile, event, and view_field.',
from_code string comment 'The code of the source resource.',
from_status bigint comment 'The status of the source resource. Valid values: -1 (Publishing) | -3 (Submitting) | 0 (Editing) | 1 (Published) | 2 (Listed) | 3 (Submitted).',
to_id string comment 'The ID of the target resource.',
to_type string comment 'The type of the target resource. For valid values, see the description of `from_type`.',
to_code string comment 'The code of the target resource.',
to_status bigint comment 'The status of the target resource. For valid values, see the description of `from_status`.',
gmt_create string comment 'The time when the record was created.',
gmt_modified string comment 'The time when the record was last modified.'
)comment 'Stores data lineage information for the Dataphin label platform.'
partitioned by (ds string comment 'The partition field, in the `yyyymmdd` format.')
lifecycle 30;
Asset inventory
dim_dataphin_mdc_object
The asset_sub_type field has the following values:
-
Data source:
DATASOURCE_TABLE(data source table),DATASOURCE_VIEW(data source view),DATASOURCE_MATERIALIZED_VIEW(data source materialized view). -
Compute source:
PHYSICAL_TABLE(compute source physical table),PHYSICAL_VIEW(compute source physical view),PHYSICAL_MATERIALIZED_VIEW(compute source physical materialized view). -
Logical table:
DIM_NORMAL(standard dimension logical table),DIM_LEVEL(hierarchical dimension logical table),DIM_ENUM(enumerated dimension logical table),DIM_VIRTUAL(virtual dimension logical table),FACT_PROCESS(process fact logical table),FACT_EVENT(event fact logical table),FACT_SNAPSHOT(snapshot fact logical table),SUM_BIZ_UNIT(business unit summary logical table),LOGIC_VIEW(logical view).
create table if not exists dim_dataphin_mdc_object
(
tenant_id bigint comment 'Tenant ID',
asset_from string comment 'The source of the asset. `dataphin` indicates compute source tables and logical tables. `biz_system` indicates data source tables in business systems.',
object_name string comment 'Table name',
display_name string comment 'Display name',
object_desc string comment 'Table description',
asset_type string comment 'Asset type. Possible values: TABLE, COLUMN, INDEX, PROJECT, BIZ_UNIT, DATA_CELL, FUNCTION, DATASOURCE, DATASERVICE, DATA_STANDARD',
asset_sub_type string comment 'Asset sub-type',
project_id bigint comment 'Project ID',
project_name string comment 'Project name',
biz_unit_id bigint comment 'Business unit ID',
biz_unit_name string comment 'Business unit name',
data_source_id bigint comment 'Data source ID',
data_source_name string comment 'Data source name',
data_source_type string comment 'Data source type',
data_source_desc string comment 'Data source description',
biz_system_id bigint comment 'Associated business system ID',
biz_system_name string comment 'Associated business system name',
env string comment 'Asset environment. Possible values: DEV, PROD',
db_type string comment 'Data source or compute source type',
db_name string comment 'Name of the database or project where the asset resides',
schema_name string comment 'Name of the schema where the asset resides. This applies only to certain data sources or compute sources.',
owner_id string comment 'Table owner ID',
owner_name string comment 'Table owner name',
is_from_logical string comment 'Specifies whether the physical table is generated from a standardized model (Y/N).',
is_partitioned string comment 'Specifies whether the table is a partitioned table (Y/N).',
is_external_table string comment 'Specifies whether the table is an external table (Y/N).',
last_ddl_time string comment 'Last DDL modification time',
last_dml_time string comment 'Last data modification time',
last_access_time string comment 'Last access time',
physical_data_size bigint comment 'Storage size in bytes',
record_num bigint comment 'Record count',
table_lifecycle string comment 'Table lifecycle',
view_text string comment 'View definition',
storage_format string comment 'Storage format. For Hive tables, this is the specific file format. For other sources, this is the data source type.',
tags_list string comment 'Comma-separated list of asset inventory tags',
create_time string comment 'Creation time',
modify_time string comment 'Modification time',
favorites_count bigint comment 'Number of favorites',
pv_count bigint comment 'Number of page views',
table_location string comment 'The storage path of the table. This field applies to all Hive tables and other external tables.',
data_domain_id bigint comment 'Data domain ID',
data_domain_name string comment 'Data domain code',
data_domain_name_cn string comment 'Data domain name'
) comment 'Dataphin global table metadata'
partitioned by (ds string comment 'Partition field in yyyymmdd format')
lifecycle 30;
dim_dataphin_mdc_column
asset_sub_type: Same as the asset_sub_type field in the dim_dataphin_mdc_object table.
create table if not exists dim_dataphin_mdc_column
(
tenant_id bigint comment 'Tenant ID',
asset_from string comment 'The source of the asset. `dataphin` indicates compute source tables and logical tables. `biz_system` indicates data source tables in business systems.',
table_name string comment 'Table name',
column_name string comment 'Column name',
display_name string comment 'Display name',
column_data_type string comment 'Column data type',
column_comment string comment 'Column comment or partition key comment',
column_seq bigint comment 'Column sequence number',
column_key_type string comment 'Column key type. Applies only to certain data sources.',
is_nullable string comment 'Specifies whether the column can contain null values (Y/N). Applies to data source tables.',
default_value string comment 'Default value. Applies to data source tables.',
is_partition_column string comment 'Specifies whether this is a partition field (Y/N).',
partition_type string comment 'Partition type. Applies only to certain data sources.',
is_primary_key string comment 'Specifies whether this is a primary key (Y/N). Applies to data source tables.',
asset_type string comment 'Asset type. The value is always `TABLE`.',
asset_sub_type string comment 'Asset sub-type',
project_id bigint comment 'Project ID',
project_name string comment 'Project name',
biz_unit_id bigint comment 'Business unit ID',
biz_unit_name string comment 'Business unit name',
data_source_id bigint comment 'Data source ID',
data_source_name string comment 'Data source name',
data_source_type string comment 'Data source type',
biz_system_id bigint comment 'Associated business system ID',
biz_system_name string comment 'Associated business system name',
env string comment 'Asset environment. Possible values: DEV, PROD',
db_type string comment 'Data source or compute source type',
db_name string comment 'Name of the database or project where the asset resides',
schema_name string comment 'Name of the schema where the asset resides. This applies only to certain data sources or compute sources.'
) comment 'Dataphin global metadata for columns'
partitioned by (ds string comment 'Partition field in yyyymmdd format')
lifecycle 15;
Asset directory
dim_dataphin_mdc_topic_directory
create table if not exists dim_dataphin_mdc_topic_directory(
tenant_id bigint comment 'The tenant ID.',
directory_id bigint comment 'The directory ID.',
directory_name string comment 'The directory name.',
directory_des string comment 'The directory description.',
directory_parent_id bigint comment 'The parent directory ID.',
directory_parent_path string comment 'The parent directory path.',
directory_modifier_id string comment 'The ID of the user who last modified the directory.',
directory_modifier_name string comment 'The name of the user who last modified the directory.',
directory_gmt_create string comment 'The time the directory was created.',
directory_gmt_modified string comment 'The time the directory was last updated.',
topic_id bigint comment 'The asset topic ID.',
topic_name string comment 'The asset topic name.',
topic_des string comment 'The asset topic description.',
topic_status string comment 'The status of the asset topic. Valid values: DRAFT (draft), IN_PUBLISH (publishing), or PUBLISHED (published).',
topic_asset_type string comment 'The asset type. Valid values: TABLE (table) or INDEX (technical index).',
topic_admin_users string comment 'A comma-separated list of user IDs of the asset topic administrators.',
topic_visibility_type string comment 'The visibility scope of the asset topic. Valid values: PUBLIC (visible to all users) or ADMINS_AND_SPECIFIED (visible only to administrators and specified users).',
topic_visible_users string comment 'A comma-separated list of user IDs allowed to view the asset topic.',
topic_visible_user_groups string comment 'A comma-separated list of user group IDs allowed to view the asset topic.',
topic_enable_publish_approve string comment 'Specifies whether publish approval is enabled (`true`) or disabled (`false`).',
topic_modifier_id string comment 'The ID of the user who last modified the asset topic.',
topic_modifier_name string comment 'The name of the user who last modified the asset topic.',
topic_gmt_create string comment 'The time the asset topic was created.',
topic_gmt_modified string comment 'The time the asset topic was last updated.'
)comment 'Stores information about asset topic directories.'
partitioned by (ds string comment 'The business date, in yyyymmdd format.')
lifecycle 15;
dim_dataphin_mdc_list_object
-
Version 4.5 includes information for table assets only. Versions 5.4 and later also include information for technical and business indexes.
-
owner_user_id: The asset owner ID when the asset was put on-shelf. -
owner_user_name: The asset owner name when the asset was put on-shelf. -
asset_sub_type: Same as theasset_sub_typefield in thedim_dataphin_mdc_objecttable.
create table if not exists dim_dataphin_mdc_list_object
(
tenant_id bigint comment 'The tenant ID.',
name string comment 'The object name.',
display_name string comment 'The display name.',
full_name string comment 'The full name of the object.',
asset_type string comment 'The asset type. Valid values: TABLE, COLUMN, PROJECT, BIZ_UNIT, DATA_CELL, FUNCTION, DATASOURCE, DATASERVICE, DATA_STANDARD, INDEX, and BIZ_INDEX.',
asset_sub_type string comment 'The asset subtype. For indexes, valid values include: INDEX (standardized modeling technical index), CUSTOM_INDEX (custom technical index), and BIZ_INDEX (business index). Other subtypes are not documented.',
env string comment 'The asset environment. Valid values: DEV and PROD.',
asset_source string comment 'The asset source.',
project_id bigint comment 'The project ID.',
project_name string comment 'The project name.',
biz_unit_id bigint comment 'The business unit ID.',
biz_unit_name string comment 'The business unit name.',
data_source_id bigint comment 'The data source ID.',
data_source_name string comment 'The data source name.',
data_source_type string comment 'The data source type.',
directory_ids string comment 'A comma-separated list of IDs of the directories to which the asset belongs.',
directory_names string comment 'A comma-separated list of names of the directories to which the asset belongs.',
topic_ids string comment 'A comma-separated list of IDs of the asset topics to which the asset belongs.',
topic_names string comment 'A comma-separated list of names of the asset topics to which the asset belongs.',
tags_list string comment 'A comma-separated list of tags for the on-shelf asset.',
view_scope_type string comment 'The visibility scope for the on-shelf asset. Valid values: ALL_USERS_CAN_VIEW (visible to all users), PART_USERS_CAN_VIEW (visible to some users), PART_USERS_CAN_NOT_VIEW (not visible to some users), or ALL_USERS_CAN_NOT_VIEW (not visible to any user).',
view_scope_user_ids string comment 'A comma-separated list of user IDs referenced by the visibility scope.',
view_scope_user_groups string comment 'A comma-separated list of user group IDs referenced by the visibility scope.',
asset_from_is_deleted string comment 'Specifies if the source asset has been deleted (`Y` or `N`). This field is deprecated.',
owner_user_id string comment 'The asset owner ID.',
owner_user_name string comment 'The asset owner name.',
on_shelve_time string comment 'The time the asset was last put on-shelf.',
db_type string comment 'The type of the data source or compute engine.',
db_name string comment 'The name of the database or project where the asset resides.',
schema_name string comment 'The name of the schema to which the asset belongs. This field is populated only for certain data sources or compute engines.',
favorites_count bigint comment 'The favorites count.',
pv_count bigint comment 'The PV count.',
shelve_description string comment 'The description of the on-shelf asset.',
custom_attribute string comment 'The custom attributes, in JSON format.',
first_on_shelve_time string comment 'The time the asset was first put on-shelf.',
on_shelve_user_id string comment 'The user ID for the last on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.',
on_shelve_user_name string comment 'The user name for the last on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.',
index_guid string comment 'The GUID of the index.',
index_logic string comment 'The index logic. This field is not populated for custom technical indexes.',
index_attribute string comment 'The attribute information for the index, in JSON format. For attribute information of some standardized modeling indexes, query the dim_dataphin_derived_index table.',
index_belong_table_name string comment 'The name of the table to which the index belongs. This field is not populated for business indexes.',
biz_index_related_tech_index_name_list string comment 'A comma-separated list of names of associated technical indexes. To retrieve details, query this table using the index name and the tenant_id.',
biz_index_related_biz_index_name_list string comment 'A comma-separated list of names of associated business indexes. To retrieve details, query this table using the index name and the tenant_id.',
biz_index_related_biz_index_relation_type_list string comment 'A comma-separated list of relationship types for the associated business indexes. Valid values: POSITIVE (positive correlation), NEGATIVE (negative correlation), or OTHER (other).',
first_on_shelve_user_id string comment 'The user ID for the first on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.',
first_on_shelve_user_name string comment 'The user name for the first on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.'
) comment 'Stores information about on-shelf assets.'
partitioned by (ds string comment 'The business date, in the yyyymmdd format.')
lifecycle 15;
Signal tables
data_share_finish
create table if not exists data_share_finish
(
module_name string comment 'Module name: data_share',
finish_time string comment 'Completion time'
)comment 'Signals the completion of processing for basic shared metadata.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;
data_security_finish
create table if not exists data_security_finish
(
module_name string comment 'Module name: data_security',
finish_time string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data security module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;
data_service_finish
create table if not exists data_service_finish
(
module_name string comment 'Module name: data_service',
finish_time string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data service module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;
data_quality_finish
create table if not exists data_quality_finish
(
module_name string comment 'Module name: data_quality',
finish_time string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data quality module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;
data_standard_finish
create table if not exists data_standard_finish
(
module_name string comment 'Module name: data_standard',
finish_time string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data standard module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;
data_featurex_finish
create table if not exists data_featurex_finish (
module_name string comment 'Module name: data_featurex',
finish_time string comment 'Completion time'
) comment 'Signals the completion of metadata processing for the tag platform module.'
PARTITIONED BY (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;