Metadata warehouse sharing model (MaxCompute)

更新时间: 2026-06-23 10:12:21

Dataphin provides a metadata sharing model for the MaxCompute computing engine, exposing system metadata for data analysis and secondary development.

Overview

Dataphin exposes system metadata through the metadata sharing model for flexible data analysis and secondary development. For usage instructions, see .

Usage

Each partition of a shared model table stores a full copy of the data. Always use the latest partition to avoid issues with downstream services. Historical partitions may contain incomplete data after major version upgrades.

Changelog

Version

Changes

V6.1

Added the input_is_structured_data, output_is_structured_data, input_dataset_version, and output_dataset_version fields to the dim_dataphin_lineage_node_table and dim_dataphin_lineage_node_column data lineage tables. These fields record data lineage for nodes, tables, and fields involving unstructured data.

V6.0

Added the is_cross_tenant, source_tenant_id, and target_tenant_id fields to the dim_dataphin_node_edge (physical scheduling node dependency) and fct_dataphin_node_task_link_di (physical scheduling instance dependency) tables to record cross-tenant information.

V5.5

  • Added the column_level_auth field to the fct_dataphin_security_auth_permission_record_df (permission record) table to record requested column-level permissions.

  • Added the first_on_shelve_user_id and first_on_shelve_user_name fields to the dim_dataphin_mdc_list_object (listed asset information) table to record the ID and name of the user who first listed the asset.

V5.4

  • Added the last_dml_time_dataphin field to the dim_dataphin_table (physical table) table to record the last data modification time of a table as tracked internally by Dataphin.

  • The dim_dataphin_mdc_list_object (listed asset) table now supports metric assets and includes the following new fields to provide more attribute information for business and technical metrics: shelve_description, custom_attribute, first_on_shelve_time, on_shelve_user_id, on_shelve_user_name, index_guid, index_logic, index_attribute, index_belong_table_name, biz_index_related_tech_index_name_list, biz_index_related_biz_index_name_list, and biz_index_related_biz_index_relation_type_list.

  • Added the following tables for the data quality governance workbench to record quality issues and track their rectification processes:

    • fct_dataphin_dp_quality_problem_list_df: Issue list table.

    • fct_dataphin_dp_quality_rectify_df: Quality rectification process table.

    • fct_dataphin_dp_quality_rectify_operation_record_df: Quality rectification operation record table.

    • fct_dataphin_dp_quality_rectify_problem_relation_df: Quality rectification process and issue relationship table.

  • Added the data_domain_id, data_domain_name, and data_domain_name_cn fields to the dim_dataphin_mdc_object (asset inventory) table to record data domain information.

V5.3

  • Added the update_rate_detail, call_mode, and execution_timeout fields to the dim_dataphin_service_api (data service API) table to record the execution timeout and update frequency for asynchronous API calls.

  • Added the api_timeout, update_rate, update_rate_detail, call_mode, and execution_timeout fields to the dim_dataphin_service_api_param (data service API parameter) table to record the SQL execution duration, call duration, and update frequency for asynchronous API calls.

  • Added the system_role_list_name field to the dim_dataphin_tenant_user (tenant user) table to display the Chinese names of system roles.

  • Added the directory_path field to the dim_dataphin_node (physical scheduling node) table to store the directory paths for batch compute tasks and data integration tasks. This currently excludes directory paths for logical table tasks.

V5.2

Added the fields input_data_source_id, input_data_source_name, input_data_source_type, input_db_name, input_schema_name, output_data_source_id, output_data_source_name, output_data_source_type, output_db_name, and output_schema_name to the dim_dataphin_lineage_node_table, dim_dataphin_lineage_table, and dim_dataphin_lineage_node_column data lineage tables. These fields provide input and output data source information for data integration tasks, real-time/batch compute tasks, and logical table tasks.

V5.1

  • Added the reader_schema_name and writer_schema_name fields to the dim_dataphin_dlink_node (DLink task) table to record the schema names for database reads and writes.

  • Added the dim_dataphin_stream_task (real-time task statistics) table to provide basic information and resource configurations for real-time tasks.

  • Added the dim_dataphin_stream_task_related_table (real-time task-related table) table to view the input and output tables associated with real-time tasks.

  • Added the fct_dataphin_stream_task_submit_record_df (real-time task submission records) table to view the submission records for real-time tasks.

V5.0

  • Added the table_location field to the dim_dataphin_table (physical table) and dim_dataphin_mdc_object (asset inventory) tables to record the storage paths of tables.

  • Added the tag_name_list (data source tags) field to the dim_dataphin_data_source (data source) table.

  • Added the data_source_id, datasource_name, and data_source_type fields to the dim_dataphin_security_label (security identification record) table to provide more information for data source table identification results.

V4.5

  • Added the following fields to the dim_dataphin_table table to provide more asset details for statistical analysis: last_access_time, tag_list (asset tags configured in the asset inventory), storage_format, favorites_count, and pv_count (page views). This update also includes a description of how table access counts are calculated.

  • Added the dim_dataphin_mdc_list_object (listed asset information) table to store details about listed assets, including their source, parent directory, asset tags, and visibility scope.

  • Added the dim_dataphin_mdc_object (candidate asset objects) and dim_dataphin_mdc_column (candidate asset columns) tables. These currently include physical tables, logical tables, and data source tables. The asset type is available in the asset_sub_type field.

V4.3.1

  • Added the following fields to the dim_dataphin_datastandard_standard (data standard) table to retrieve information about associated lookup tables and their relationship types:

    attribute_id, attribute_name, relation_type, lookup_table_id, lookup_table_name, lookup_table_code.

  • Added the following fields to the dim_dataphin_datastandard_rule_asset_relation table, which maps data standard rules to asset evaluation results: monitor_result_status, monitor_type, rule_name, attribute_id, attribute_name, attribute_value, asset_column_value, quality_rule_id, and quality_rule_task_id.

  • The execute_context field in the fct_dataphin_dp_quality_rule_task_di (quality rule task) table stores all data for summary statistical metrics. A single rule execution record may contain multiple summary statistical metrics. To retrieve relevant data, use the enumerations for metric names and types as described in the documentation.

V4.3

  • Added the dim_dataphin_datastandard_lookup_table (data standard lookup table information) table, which contains information such as lookup table name, code, ID, value code name, and parent directory.

  • Added the following tables for the tag platform:

    • dim_dataphin_featurex_market: Market table, which includes market ID, name, type, and owner ID.

    • dim_dataphin_featurex_lineage: Data lineage table, which includes the ID, version, type, code, and status of dependent resources.

  • Fixed the value retrieval logic for the resource_owners field in the fct_dataphin_security_auth_permission_record_df (permission record) table. This fix only applies to table-type resources; the logic for function and key resources remains unchanged.

  • Added the data source-related fields data_source_id, datasource_name, and data_source_type, and the business attribute field attribute_config, to the dim_dataphin_dp_quality_rule (quality rule) table.

  • Added the dim_dataphin_mdc_topic_directory (asset topic directory information) table, which includes asset topic name and ID, topic visibility scope, directory name and ID, and directory structure.

V4.2

  • Added the fct_dataphin_publish_record_df (publish record) table, which contains publishing information for compute tasks, standard modeling objects, physical tables, and data integration tasks.

  • Added the dim_dataphin_dp_quality_schedule (quality scheduling configuration) table to view configurations such as triggering nodes, fixed task trigger modes, scheduling conditions, and validation partition scope types.

  • The schedule_ids field in the dim_dataphin_dp_quality_rule (quality rule) table is now populated with a list of quality rule scheduling IDs.

V4.1

  • Added the fct_dataphin_submit_record_df (Dataphin task submission record) table, which contains submission information for compute tasks, standard modeling objects, physical tables, and data integration tasks. You can query the submitter, submission version, submission comments, and submission time.

  • Added the submitter_id (operator ID), submitter_name (operator name), and is_system_submitter (indicates if the task was system-executed, for example, through periodic scheduling or data backfill) fields to the fct_dataphin_node_taskrun_di (node instance run record) table. This table now also includes run records for ad-hoc queries.

  • Added the fct_dataphin_security_auth_permission_record_df (Dataphin permission record) table to query permission information for resources accessible to a specified account (user, user group, or production account). This includes details such as authorization method, account type, permission status, expiration time, permission details, authorization scope, and operator.

  • Added the dim_dataphin_security_user_bind table to view the binding relationships between users and user groups.

  • Added the fct_dataphin_maxcompute_task_di table, which is specific to the MaxCompute engine and only contains data for tasks submitted through Dataphin. You can query information for each execution record, such as the number of input/output records, input/output data volume, lists of input/output tables, job CPU and memory consumption, and the Dataphin operator.

  • Added the order_priority, is_paged_query (result pagination), and order_by_list (sort fields) fields to the dim_dataphin_service_api (data service API) table.

V4.0

Added the fct_dataphin_dp_quality_rule_score_di (quality rule score) table.

V3.14

  • Added the fct_dataphin_monitor_alert_event_di (alert event) table.

  • Added the fct_dataphin_monitor_alert_notification_di (alert notification record) table.

  • Added the dim_dataphin_dp_quality_watch_alert (quality monitoring object alert configuration) table.

  • Added the record_from and field_from fields to the dim_dataphin_security_label (security identification record) table.

V3.13

  • Added the dim_dataphin_monitored_rule (O&M monitoring configuration) table.

  • Added the dim_dataphin_baseline_object (baseline monitoring object) table.

  • Added the fct_dataphin_dlink_node_taskrun_step_metric_di (data integration task node run instance statistics) table.

  • Added the ops_owner_id, ops_owner_name, dev_owner_id, dev_owner_name, and physical_data_size fields to the dim_dataphin_model (logical table) table.

  • Added the data_source_catalog (data source code) field to the dim_dataphin_data_source (data source) table.

V3.12

  • Added the standard_template_id, standard_template_name, and standard_template_reference fields to the dim_dataphin_datastandard_standard (data standard) table.

  • Added the standard_template_id, standard_template_name, and standard_set_ids fields to the dim_dataphin_datastandard_rule (data standard rule) table, and deprecated the standard_set_id and standard_set_name fields.

  • Added the standard_template_id and standard_template_name fields to the dim_dataphin_datastandard_asset_relation (data standard and asset mapping result) table.

  • Added the classfy_parent_path and classfy_level1_path fields to the dim_dataphin_security_label (data security labeling result) table.

  • Added the api_ex_count field to the dws_dataphin_service_api_mi (data service API call count statistics) table.

V3.11

  • Documentation for the is_external_table and schema_name fields in the dim_dataphin_table (physical table) table has been updated.

  • Added the dim_dataphin_project_user (project user) table.

  • Added the status, level_id, level_name, level_index, and parent_path fields to the dim_dataphin_security_classify (security classification) table.

  • Added the classify_ids field to the dim_dataphin_security_identify_rule (security identification rule) table and deprecated some fields. For more information, see the field comments in the table.

  • Added the classify_id and classify_name fields to the dim_dataphin_security_desensitize_rule (data masking rule) table and deprecated some fields. For more information, see the field comments in the table.

  • Added the classify_id and classify_name fields to the dim_dataphin_security_desensitize_rule_white_list (data masking rule whitelist) table and deprecated some fields. For more information, see the field comments in the table.

  • Added the job_type field to the dim_dataphin_datastandard_rule_asset_relation (standard adoption rule and asset mapping) table.

  • Added the dim_dataphin_datastandard_asset_relation (standard and asset mapping result) table. Use this table for adoption results going forward.

V3.10

  • Added data lineage for real-time tables to the dim_dataphin_lineage_node_table and dim_dataphin_lineage_table data lineage tables.

  • Added the user_status and system_role_list fields to the dim_dataphin_tenant_user (tenant user) table. The table now also includes data for project production accounts.

  • Added the dim_dataphin_data_source_config (data source configuration information) table.

  • Added the dim_dataphin_stream_table (real-time metadata table) and dim_dataphin_stream_column (real-time metadata column) tables.

  • Added the archive_mode, archive_store_type, and archive_table_name fields to the dim_dataphin_dp_quality_rule (quality rule) table.

  • Added the dim_dataphin_dp_quality_watch_member (quality monitoring owner) table.

  • Added the is_external_table and schema_name fields to the dim_dataphin_table (physical table) table.

  • Added the schema_name field to the dim_dataphin_table_partition (physical table partition) table.

V3.9

  • Added the monitor_config field to the dim_dataphin_datastandard_standard (data standard) table.

  • Added the tag, tag_unique_key, and tag_properties fields to the dim_dataphin_dp_quality_rule (quality rule) table.

  • Added the tag and tag_unique_key fields to the fct_dataphin_dp_quality_rule_task_di (quality rule task) table.

V3.8

  • Added data classification and security level fields to the dim_dataphin_column and dim_dataphin_field tables.

  • Added standard adoption status to the dim_dataphin_datastandard_rule_asset_relation table.

V3.7

  • Added tables for the quality module, including the dim_dataphin_dp_quality_watch (quality monitoring object), dim_dataphin_dp_quality_rule (quality rule), and fct_dataphin_dp_quality_rule_task_di (quality rule task) tables.

  • Added the dim_dataphin_node_edge (physical scheduling node dependency) and fct_dataphin_node_task_link_di (physical scheduling instance dependency) tables for the physical O&M module.

  • Added the proj_id and proj_name fields to all tables in the data service module except dws_dataphin_service_api_mi.

  • Added fields for level-5 data domains to the dim_datpahin_data_domain table.

  • Added the data standard module, including the dim_dataphin_datastandard_standard (data standard), dim_dataphin_datastandard_rule (adoption rule), and dim_dataphin_datastandard_rule_asset_relation (standard adoption rule-to-asset mapping) tables.

V3.5.4

  • Added the period_type field to the dim_dataphin_atom_index table. The primary key for this table is now atom_index_id+period_type+env.

  • Added the period_type and compute_type fields to the dim_dataphin_derived_index table. The primary key for this table is now derived_index_id+period_type+env.

V3.3.1-hf3

V2.9.7-hf10

Added the dim_dataphin_service_api_param (data service API parameter) table.

V3.2.4

The dim_dataphin_physical_index and dim_dataphin_index_catalog tables are deprecated, as mounted physical metrics were merged into the dim_dataphin_derived_index table in v3.2.4 and later.

V3.2.4

All fields related to table_id and column_id are deprecated and set to null.

V2.9.7

Added detailed comments for the status field in the modeling and physical O&M modules.

V2.9.5.3

Created the document.

Metadata warehouse catalog

Category

Subcategory

Table name

Description

Associated table

Standardized modeling

Planning

dim_dataphin_biz_unit

Business unit table

data_share_finish

dim_dataphin_data_domain

Data domain table

dim_dataphin_time_period

Statistical period table

Modeling

dim_dataphin_dimension

Dimension table

dim_dataphin_biz_process

Business process table

dim_dataphin_atom_index

Atomic metric table

dim_dataphin_adjunct_word

Business qualifier table

dim_dataphin_derived_index

Derived metric table

dim_dataphin_model

Logical table metadata

dim_dataphin_field

Logical table field metadata

Development & O&M

Physical O&M

dim_dataphin_node

Physical scheduling node

data_share_finish

dim_dataphin_datax_node

DataX physical scheduling node

dim_dataphin_dlink_node

DLink physical scheduling node

fct_dataphin_node_taskrun_di

Fact table for physical node scheduling instances

dim_dataphin_node_edge

Physical scheduling node dependency table

fct_dataphin_node_task_link_di

Physical scheduling instance dependency table

dim_dataphin_monitored_rule

O&M monitoring configuration table

fct_dataphin_dlink_node_taskrun_step_metric_di

Statistics for integration task node instances

fct_dataphin_monitor_alert_event_di

Alert event table

fct_dataphin_monitor_alert_notification_di

Alert notification table

Logical O&M

dim_dataphin_model_node

Logical table scheduling node

fct_dataphin_model_node_task_di

Logical table scheduling instance

Engine

fct_dataphin_maxcompute_task_di

MaxCompute job table

OS

fct_dataphin_os_task_table_inputoutput_di

Transactional fact table for table access via SQL (Granularity: execution task ID, input table ID, input environment, output table ID, and output environment)

Development

fct_dataphin_submit_record_df

Submission record table

fct_dataphin_publish_record_df

Publish record table

Permission

fct_dataphin_security_auth_permission_record_df

Permission table

Baseline

dim_dataphin_baseline_object

Baseline monitoring object table

N/A

Real-time development

dim_dataphin_stream_task

Real-time task statistics

N/A

dim_dataphin_stream_task_related_table

Real-time task relationship table

fct_dataphin_stream_task_submit_record_df

Real-time task submission record table

Common

User

dim_dataphin_tenant_user

Tenant user table

data_share_finish

dim_dataphin_project_user

Project user table

dim_dataphin_security_user_bind

User binding table

Project

dim_dataphin_project

Project space table

dim_dataphin_project_compute_engine

Project space to compute engine binding table

Data source

dim_dataphin_data_source

Data source table

dim_dataphin_data_source_config

Data source configuration table

Data assets

Physical table

dim_dataphin_table

Physical table

data_share_finish

dim_dataphin_column

Field table for physical tables

dim_dataphin_table_partition

Physical table partition table

Data service

dim_dataphin_service_logic_unit

Published data service unit table

data_service_finish

dim_dataphin_service_api

Published data service API table

dim_dataphin_service_api_direct

Published direct-connection data source API table

dim_dataphin_service_app

App table

dim_dataphin_service_logic_unit_column

Published service unit field table

dim_dataphin_service_app_api

App-API mapping table

dws_dataphin_service_api_mi

Minute-level summary of data service API calls

dim_dataphin_service_api_param

Published API parameter table

Data lineage

dim_dataphin_lineage_node_column

Fine-grained data lineage (Granularity: node, input table, input field, output table, output field)

data_share_finish

dim_dataphin_lineage_node_table

Fine-grained data lineage (Granularity: node, input table, output table)

dim_dataphin_lineage_table

Fine-grained data lineage (Granularity: input table, output table)

Security

dim_dataphin_security_label

Security label table

data_security_finish

dim_dataphin_security_classify

Security data classification table

dim_dataphin_security_level

Data security level table

dim_dataphin_security_identify_rule

Security identification rule table

dim_dataphin_security_desensitize_rule

Data masking rule table

dim_dataphin_security_desensitize_rule_white_list

Data masking rule whitelist table

Data quality

dim_dataphin_dp_quality_watch

Data quality monitoring object table

data_quality_finish

dim_dataphin_dp_quality_rule

Data quality rule table

fct_dataphin_dp_quality_rule_task_di

Data quality rule task table

dim_dataphin_dp_quality_watch_member

Owner table for data quality monitoring objects

dim_dataphin_dp_quality_watch_alert

Alert configuration table for data quality monitoring objects

fct_dataphin_dp_quality_rule_score_di

Data quality rule scoring table

dim_dataphin_dp_quality_schedule

Data quality scheduling configuration table

fct_dataphin_dp_quality_problem_list_df

Data quality issue table

fct_dataphin_dp_quality_rectify_df

Data quality remediation workflow table

fct_dataphin_dp_quality_rectify_operation_record_df

Operation record table for data quality remediation

fct_dataphin_dp_quality_rectify_problem_relation_df

Relationship table for data quality remediation workflows and issues

Data standard

dim_dataphin_datastandard_standard

Data standard table

data_standard_finish

dim_dataphin_datastandard_rule

Data standard mapping rule table

dim_dataphin_datastandard_rule_asset_relation

Mapping table for data standard rules and assets

dim_dataphin_datastandard_asset_relation

Mapping result table for data standards and assets

dim_dataphin_datastandard_lookup_table

Data standard lookup table

Real-time

dim_dataphin_stream_table

Real-time table metadata

N/A

dim_dataphin_stream_column

Real-time table field metadata

Label

dim_dataphin_featurex_market

Label market table

data_featurex_finish

dim_dataphin_featurex_lineage

Label data lineage table

Asset inventory

dim_dataphin_mdc_object

Global table metadata

N/A

dim_dataphin_mdc_column

Global field metadata

Asset catalog

dim_dataphin_mdc_list_object

Listed asset table

N/A

dim_dataphin_mdc_topic_directory

Asset catalog topic table

Communication tables

A communication table is the last table generated for each module, confirming that all other tables in that module have been successfully generated for the day.

Table name

Description

data_share_finish

Communication table for the basic shared model

data_security_finish

Communication table for the data security module

data_service_finish

Communication table for the data service module

data_quality_finish

Communication table for the data quality module

data_standard_finish

Communication table for the data standard module

data_featurex_finish

Communication table for the tag module

Standardized modeling

Plan

dim_dataphin_biz_unit

create table if not exists dim_dataphin_biz_unit(
	tenant_id 							bigint comment 'Tenant ID.',
	biz_unit_id 						bigint comment 'Business unit ID.',
	biz_unit_name 					string comment 'Business unit name.',
	biz_unit_name_cn 				string comment 'Business unit name in Chinese.',
  biz_unit_desc						string comment 'Business unit description.',
  is_basic_biz						string comment 'Specifies whether the business unit is a basic unit. Valid values: Y/N.',
  biz_unit_spec						string comment 'Business unit type. `datadev`: data development, `distill`: data extraction.',
  env											string comment 'Business unit environment. `PROD`: production environment.',
	owner_id								string comment 'The ID of the business unit owner.',
  owner_name							string comment 'Name of the business unit owner.',
  modifier_id							string comment 'The ID of the last modifier.',
  modifier_name						string comment 'Name of the last modifier.',
  gmt_create 							string comment 'Creation time.',
	gmt_modified 						string comment 'Modification time.'
) comment 'Business unit table.'
partitioned by (ds string comment 'Partition field in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_data_domain

create table if not exists dim_dataphin_data_domain(
  tenant_id                                bigint comment 'tenant ID',
  data_domain_id                           bigint comment 'data domain ID',
  data_domain_name                         string comment 'data domain name',
  data_domain_abbreviation                 string comment 'data domain abbreviation',
  data_domain_name_cn                      string comment 'data domain name in Chinese', 
  data_domain_desc                         string comment 'data domain description',
  biz_unit_id                              bigint comment 'business unit ID',
  biz_unit_name                            string comment 'business unit name',
  owner_id                                 string comment 'owner ID',
  owner_name                               string comment 'owner name',
  modifier_id                              string comment 'last modifier ID',
  modifier_name                            string comment 'last modifier name',
  gmt_create                               string comment 'creation time',
  gmt_modified                             string comment 'last modified time',
  data_domain_level                        bigint comment 'data domain level',
  parent_data_domain_id                    bigint comment 'parent data domain ID',
  parent_data_domain_name                  string comment 'parent data domain name',
  parent_data_domain_level1_abbreviation   string comment 'level 1 parent data domain abbreviation',
  parent_data_domain_level1_name_cn        string comment 'level 1 parent data domain name in Chinese', 
  data_domain_level1_id                    bigint comment 'level 1 data domain ID',
  data_domain_level1_name                  string comment 'level 1 data domain name',
  data_domain_level1_abbreviation          string comment 'level 1 data domain abbreviation',
  data_domain_level1_name_cn               string comment 'level 1 data domain name in Chinese', 
  data_domain_level2_id                    bigint comment 'level 2 data domain ID',
  data_domain_level2_name                  string comment 'level 2 data domain name',
  data_domain_level2_abbreviation          string comment 'level 2 data domain abbreviation',
  data_domain_level2_name_cn               string comment 'level 2 data domain name in Chinese', 
  data_domain_level3_id                    bigint comment 'level 3 data domain ID',
  data_domain_level3_name                  string comment 'level 3 data domain name',
  data_domain_level3_abbreviation          string comment 'level 3 data domain abbreviation',
  data_domain_level3_name_cn               string comment 'level 3 data domain name in Chinese', 
  data_domain_level4_id                    bigint comment 'level 4 data domain ID',
  data_domain_level4_name                  string comment 'level 4 data domain name',
  data_domain_level4_abbreviation          string comment 'level 4 data domain abbreviation',
  data_domain_level4_name_cn               string comment 'level 4 data domain name in Chinese', 
  data_domain_level5_id                    bigint comment 'level 5 data domain ID',
  data_domain_level5_name                  string comment 'level 5 data domain name',
  data_domain_level5_abbreviation          string comment 'level 5 data domain abbreviation',
  data_domain_level5_name_cn               string comment 'level 5 data domain name in Chinese'
) comment 'data domain table'
partitioned by (ds string comment 'partition column, in yyyymmdd format') 
lifecycle 30;

dim_dataphin_time_period

create table if not exists dim_dataphin_time_period(
	tenant_id 								bigint comment 'Tenant ID.',
	time_period_id 						bigint comment 'Time period ID.',
	time_period_name 					string comment 'Time period name.',
  time_period_name_cn				string comment 'Chinese name of the time period.',
  time_period_desc					string comment 'Time period description.',
  period_start							string comment 'Period start date.',
  period_end								string comment 'Period end date.',
  --env                       string comment 'The environment. PROD indicates the production environment.',
  owner_id									string comment 'Owner ID.',
  owner_name								string comment 'Owner name.',
  modifier_id								string comment 'ID of the last modifier.',
  modifier_name							string comment 'Name of the last modifier.',
  gmt_create 								string comment 'Creation time in GMT.',
	gmt_modified 							string comment 'Modification time in GMT.'
) comment 'Time period table.'
partitioned by (ds string comment 'Partition field, in yyyymmdd format.') 
lifecycle 30;

Modeling

dim_dataphin_dimension

create table if not exists dim_dataphin_dimension(
	tenant_id 								bigint comment 'The ID of the tenant.',
	dimension_id							bigint comment 'The ID of the dimension.',
	dimension_name						string comment 'The name of the dimension.',
	dimension_name_cn					string comment 'The Chinese name of the dimension.',
	dimension_desc						string comment 'The description of the dimension.',
	dimension_type						string comment 'The type of the dimension. Valid values: NORMAL, ENUM, VIRTUAL, and LEVEL.',
	dimension_pk							string comment 'The primary key of the dimension.',
	dimension_pk_cn						string comment 'The Chinese name of the primary key.',
	dimension_pk_datatype			string comment 'The data type of the primary key.',
	dimension_pk_logic				string comment 'The primary key logic of the dimension.',
	dimension_status					string comment 'The status of the dimension. 0: draft, 2: submitted, 100: developing, 200: publishing, 201: published.',
	parent_dimension_id				string comment 'The ID of the parent dimension.',
	parent_dimension_name			string comment 'The name of the parent dimension.',
	parent_dimension_name_cn	string comment 'The Chinese name of the parent dimension.',
	dimension_lifecycle				string comment 'The lifecycle of the dimension.',
	data_domain_id						bigint comment 'The ID of the data domain.',
	data_domain_name					string comment 'The name of the data domain.',
	data_domain_abbreviation	string comment 'The abbreviation of the data domain.',
	data_domain_name_cn				string comment 'The Chinese name of the data domain.',
	project_id								bigint comment 'The ID of the project.',
	project_name							string comment 'The name of the project.',
	biz_unit_id								bigint comment 'The ID of the business unit.',
	biz_unit_name							string comment 'The name of the business unit.',
  env                       string comment 'The environment. Valid values: DEV and PROD. DEV: development environment. PROD: production environment.',
  owner_id									string comment 'The ID of the owner.',
  owner_name								string comment 'The name of the owner.',
  modifier_id								string comment 'The ID of the last modifier.',
  modifier_name							string comment 'The name of the last modifier.',
  gmt_create 								string comment 'The time when the dimension was created.',
	gmt_modified 							string comment 'The time when the dimension was last modified.'
) comment 'Dimension table.'
partitioned by (ds string comment 'The partition field in the yyyymmdd format.') 
lifecycle 30;

dim_dataphin_biz_process

create table if not exists dim_dataphin_biz_process(
	tenant_id 								bigint comment 'The tenant ID.',
	biz_process_id						bigint comment 'The business process ID.',
	biz_process_name					string comment 'The business process name.',
	biz_process_name_cn				string comment 'The Chinese name of the business process.',
	biz_process_desc					string comment 'The business process description.',
	data_domain_id						bigint comment 'The data domain ID.',
	data_domain_name					string comment 'The data domain name.',
	data_domain_abbreviation	string comment 'The data domain abbreviation.',
	data_domain_name_cn				string comment 'The Chinese name of the data domain.',
	project_id								bigint comment 'The project ID.',
	project_name							string comment 'The project name.',
	biz_unit_id								bigint comment 'The business unit ID.',
	biz_unit_name							string comment 'The business unit name.',
  env                       string comment 'The environment. Valid values: DEV (development) and PROD (production).',
  owner_id									string comment 'The owner ID.',
  owner_name								string comment 'The owner name.',
  modifier_id								string comment 'The last modifier ID.',
  modifier_name							string comment 'The last modifier name.',
  gmt_create 								string comment 'The creation time.',
	gmt_modified 							string comment 'The modification time.'
) comment 'Business process table'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 30;

dim_dataphin_atom_index

Note

For derived atomic metrics, the calculation logic (atom_index_logic) uses the format "@atomic metric ID" to reference other atomic metrics. To restore the initial calculation logic, run a query using the atomic metric ID.

create table if not exists dim_dataphin_atom_index(
	tenant_id 								bigint comment 'Tenant ID.',
  atom_index_id							bigint comment 'Atomic metric ID.',
	atom_index_name						string comment 'Atomic metric name.',
	atom_index_name_cn				string comment 'Chinese name of the atomic metric.',
	atom_index_desc						string comment 'Description of the atomic metric.',
	atom_index_type						string comment 'Atomic metric type. Valid values: NORMAL (normal atomic metric) and COMPOSITE (derived atomic metric).',
	atom_index_data_type			string comment 'Data type of the atomic metric.',
	atom_index_logic					string comment 'Calculation logic for the atomic metric.',
	atom_index_status					string comment 'Atomic metric status. Valid values: 0 (Draft), 2 (Submitted), 100 (Developing), 200 (Publishing), and 201 (Published).',
	is_acc_sum								string comment 'Specifies whether the metric can be summed. Valid values: Y and N.',
	source_path_attribute			string comment 'Primary source field.',
	time_col_attribute_id			bigint comment 'ID of the time period field.',
	time_col_attribute_name		string comment 'Name of the time period field.',
	time_col_format						string comment 'Format of the time period.',
  is_comp_atom_index				string comment 'Specifies whether the metric is a derived atomic metric. Valid values: Y and N.',
  comp_atom_index_ids				string comment 'IDs of component atomic metrics, separated by semicolons.',
  comp_atom_index_names		  string comment 'Names of component atomic metrics, separated by semicolons.',
	model_id									bigint comment 'Logical table ID.',
	model_type								string comment 'Logical table type. Valid values: FCT (fact logical table) and DIM (dimension logical table).',
	model_name								string comment 'Logical table name.',
	model_name_cn							string comment 'Chinese name of the logical table.',
	biz_process_id						bigint comment 'Business process ID.',
	biz_process_name					string comment 'Business process name.',
	biz_process_name_cn				string comment 'Chinese name of the business process.',
	data_domain_id						bigint comment 'Data domain ID.',
	data_domain_name					string comment 'Data domain name.',
	data_domain_abbreviation	string comment 'Abbreviation of the data domain.',
	data_domain_name_cn				string comment 'Chinese name of the data domain.',
	project_id								bigint comment 'Project space ID.',
	project_name							string comment 'Project space name.',
	biz_unit_id								bigint comment 'Business unit ID.',
	biz_unit_name							string comment 'Business unit name.',
  env                       string comment 'Environment. Valid values: DEV (development) and PROD (production).',
  owner_id									string comment 'Owner ID.',
  owner_name								string comment 'Owner name.',
  modifier_id								string comment 'ID of the last modifier.',
  modifier_name							string comment 'Name of the last modifier.',
  gmt_create 								string comment 'Creation time.',
	gmt_modified 							string comment 'Modification time.',
  period_type               string comment 'Data granularity. Valid values: DAY, HOUR, MINUTE, and STREAMING (real-time).'
) comment 'Atomic metric table.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_adjunct_word

create table if not exists dim_dataphin_adjunct_word(
	tenant_id 								bigint comment 'Tenant ID.',
	adjunct_word_id						bigint comment 'Business qualifier ID.',
	adjunct_word_name					string comment 'Business qualifier name.',
	adjunct_word_name_cn			string comment 'Business qualifier Chinese name.',
	adjunct_word_desc					string comment 'Business qualifier description.',
	adjunct_word_logic				string comment 'Business qualifier logic.',
	adjunct_word_status				string comment 'Business qualifier status. 0: draft, 2: submitted, 100: developing, 200: publishing, 201: published.',
	source_path_attribute			string comment 'Primary source field.',
	model_id									bigint comment 'Logical model ID.',
	model_type								string comment 'Logical model type. FCT: fact table, DIM: dimension table.',
	model_name								string comment 'Logical model name.',
	model_name_cn							string comment 'Logical model Chinese name.',
	biz_process_id						bigint comment 'Business process ID.',
	biz_process_name					string comment 'Business process name.',
	biz_process_name_cn				string comment 'Business process Chinese name.',
	data_domain_id						bigint comment 'Data domain ID.',
	data_domain_name					string comment 'Data domain name.',
	data_domain_abbreviation	string comment 'Data domain abbreviation.',
	data_domain_name_cn				string comment 'Data domain Chinese name.',
	project_id								bigint comment 'Project ID.',
	project_name							string comment 'Project name.',
	biz_unit_id								bigint comment 'Business unit ID.',
	biz_unit_name							string comment 'Business unit name.',
  env                       string comment 'Environment. DEV: development, PROD: production.',
  owner_id									string comment 'Owner ID.',
  owner_name								string comment 'Owner name.',
  modifier_id								string comment 'Last modifier ID.',
  modifier_name							string comment 'Last modifier name.',
  gmt_create 								string comment 'Creation time.',
	gmt_modified 							string comment 'Modification time.'
) comment 'Business qualifier table.'
partitioned by (ds string comment 'Partition field. Format: yyyymmdd.') 
lifecycle 30;

dim_dataphin_derived_index

create table if not exists dim_dataphin_derived_index(
	tenant_id 											bigint comment 'Tenant ID.',
  derived_index_id								bigint comment 'Derived index ID.',
	derived_index_name							string comment 'Derived index name.',
	derived_index_name_cn						string comment 'Chinese name of the derived index.',
	derived_index_desc							string comment 'Description of the derived index.',
	derived_index_data_type					string comment 'Data type of the derived index.',
	derived_index_status						string comment 'Status of the derived index. Valid values: `0` (Draft), `2` (Submitted), `100` (Developing), `200` (Publishing), and `201` (Published).',
  derived_index_logic							string comment 'Calculation logic of the derived index.',
  atom_index_id										bigint comment 'Atomic index ID.',
	atom_index_name									string comment 'Atomic index name.',
	atom_index_name_cn							string comment 'Chinese name of the atomic index.',
	atom_index_type									string comment 'Type of the atomic index. Valid values: `NORMAL` (Normal atomic index) and `COMPOSITE` (Composite atomic index).',
	atom_index_data_type						string comment 'Data type of the atomic index.',
	atom_index_logic								string comment 'Calculation logic of the atomic index.',
	atom_index_status								string comment 'Status of the atomic index. Valid values: `0` (Draft), `2` (Submitted), `100` (Developing), `200` (Publishing), and `201` (Published).',
	atom_source_path_attribute			string comment 'Primary source field.',
 	adjunct_word_id									bigint comment 'Business qualifier ID.',
	adjunct_word_name								string comment 'Business qualifier name.',
	adjunct_word_name_cn						string comment 'Chinese name of the business qualifier.',
	adjunct_word_desc								string comment 'Description of the business qualifier.',
	adjunct_word_logic							string comment 'Definition of the business qualifier.',
	adjunct_word_status							string comment 'Status of the business qualifier. Valid values: `0` (Draft), `2` (Submitted), `100` (Developing), `200` (Publishing), and `201` (Published).',
	adjunct_word_source_path_attribute						string comment 'Primary source field.',
	time_period_id 									bigint comment 'Time period ID.',
	time_period_name 								string comment 'Time period name.',
  time_period_name_cn							string comment 'Chinese name of the time period.',
 	granularity_id									bigint comment 'Granularity ID.',
	granularity_name								string comment 'Granularity name.',
	granularity_name_cn							string comment 'Chinese name of the granularity.',
	dimension_ids										string comment 'Dimension IDs, separated by semicolons.',
	dimension_names									string comment 'Dimension names, separated by semicolons.',
	dimension_name_cns							string comment 'Chinese names of the dimensions, separated by semicolons.',
	summary_id											bigint comment 'ID of the summary table for the granularity.',
	summary_name										string comment 'Name of the summary table for the granularity.',
	summary_name_cn									string comment 'Chinese name of the summary table for the granularity.', 
	biz_process_id									bigint comment 'Business process ID.',
	biz_process_name								string comment 'Business process name.',
	biz_process_name_cn							string comment 'Chinese name of the business process.',
	data_domain_id									bigint comment 'Data domain ID.',
	data_domain_name								string comment 'Data domain name.',
	data_domain_abbreviation				string comment 'Abbreviation of the data domain.',
	data_domain_name_cn							string comment 'Chinese name of the data domain.',
	project_id											bigint comment 'Project ID.',
	project_name										string comment 'Project name.',
	biz_unit_id											bigint comment 'Business unit ID.',
	biz_unit_name										string comment 'Business unit name.',
  env                       			string comment 'Environment. Valid values: `DEV` (Development) and `PROD` (Production).',
  owner_id												string comment 'Owner ID.',
  owner_name											string comment 'Owner name.',
  modifier_id											string comment 'ID of the last modifier.',
  modifier_name										string comment 'Name of the last modifier.',
  gmt_create 											string comment 'Creation time.',
	gmt_modified 										string comment 'Last modification time.',
  period_type                 		string comment 'Time granularity. Valid values: `DAY`, `HOUR`, `MINUTE`, and `STREAMING` (real-time).',
  compute_type                 		string comment 'Compute type. Valid values: `AUTO` (derived index), `COMBINE` (combined index), `CUSTOM` (expert-coded), and `MOUNT` (externally registered).'
) comment 'Derived indexes.'
partitioned by (ds string comment 'Partition key, in `yyyymmdd` format.') 
lifecycle 30;

dim_dataphin_model

create table if not exists dim_dataphin_model(
	tenant_id 											bigint comment 'The tenant ID.',
  model_id												bigint comment 'The model ID.',
  model_name											string comment 'The model name.',
  model_name_cn										string comment 'The Chinese model name.',
  model_desc											string comment 'The model description.',
  model_status										string comment 'The model status. Valid values: 0 (Draft), 2 (Submitted), 100 (Developing), 200 (Publishing), and 201 (Published).',
  model_type											string comment 'The model type. Valid values: DIM (dimension logical tables), FACT (fact logical tables), SUM (summary tables), and VIEW (logical views).',
	env															string comment 'The environment. Valid values: DEV (development) and PROD (production).',
	project_id											bigint comment 'The project ID.',
	project_name										string comment 'The project name.',
	biz_unit_id											bigint comment 'The business unit ID.',
	biz_unit_name										string comment 'The business unit name.',
  owner_id												string comment 'The owner ID.',
  owner_name											string comment 'The owner name.',
  modifier_id											string comment 'The last modifier ID.',
  modifier_name										string comment 'The last modifier name.',
  gmt_create 											string comment 'The creation time.',
	gmt_modified 										string comment 'The last modification time.',
  customize_config								string comment 'Custom configuration.',
  table_lifecycle									string comment 'The model lifecycle.',
 	granularity_id									bigint comment 'The granularity ID.',
	granularity_name								string comment 'The granularity name.',
	granularity_name_cn							string comment 'The Chinese granularity name.',
	biz_process_id									bigint comment 'The business process ID.',
	biz_process_name								string comment 'The business process name.',
	biz_process_name_cn							string comment 'The Chinese business process name.',
	data_domain_id									bigint comment 'The data domain ID.',
	data_domain_name								string comment 'The data domain name.',
	data_domain_abbreviation				string comment 'The data domain abbreviation.',
	data_domain_name_cn							string comment 'The Chinese data domain name.',
  biz_detail_time_column					string comment 'The time field for the fact logical table.',
  biz_detail_main_table						string comment 'The main table for the fact logical table.',
  biz_detail_main_table_condition	string comment 'The condition for the main table of the fact logical table.',  
  is_summary_support_cube					string comment 'Indicates whether the summary table supports cubes.',
  view_content										string comment 'The SQL content of the logical view.',
	view_ref_table									string comment 'The tables referenced by the logical view.',
  ops_owner_id                    string comment 'The operations owner ID.',
  ops_owner_name                  string comment 'The operations owner name.',
  dev_owner_id                    string comment 'The development owner ID.',
  dev_owner_name                  string comment 'The development owner name.',
  physical_data_size              bigint comment 'The total size, in bytes, of the physical tables for this model.'
) comment 'The model metadata.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_field

create table if not exists dim_dataphin_field(
	tenant_id 											bigint comment 'The tenant ID.',
 	field_id												bigint comment 'The logical field ID.',
 	field_name											string comment 'The name of the logical field.',
 	field_name_cn										string comment 'The Chinese name of the logical field.',
 	field_seq												bigint comment 'The sequence of the logical field.',
  field_desc											string comment 'The description of the logical field.',
  field_data_type									string comment 'The data type of the logical field.',
  field_logic											string comment 'The definition or calculation logic of the logical field.',
  field_type											string comment 'The field type. Valid values: SYS for system field, DIM for dimension attribute, REF for fact attribute, MEAS for measure, and SUM for summary metric.',
  field_status										string comment 'The status of the logical field. Valid values: 0 for Draft, 2 for Submitted, 100 for In Development, 200 for Publishing, and 201 for Published.',
  field_default_value							string comment 'The default value of the logical field.',
  env															string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
  model_type											string comment 'The type of the logical table. Valid values: DIM for dimension table, FACT for fact table, VIEW for logical view, and SUM for summary table.',
  model_id												bigint comment 'The ID of the logical table.',
  model_name											string comment 'The name of the logical table.',
  model_name_cn										string comment 'The Chinese name of the logical table.',
  is_partition										string comment 'Whether the field is a partition field. Valid values: Y or N.',
	is_pk														string comment 'Whether the field is a primary key. Valid values: Y or N.',
	is_not_null											string comment 'Whether the field is non-nullable. Valid values: Y or N.',
	is_unique												string comment 'Whether the field value must be unique. Valid values: Y or N.',
  is_physical											string comment 'Whether the field is a physical field. Valid values: Y or N.',
	ref_dimension_id								string comment 'The ID of the referenced dimension.',
	ref_dimension_role							string comment 'The role of the referenced dimension.',
	ref_dimension_role_cn						string comment 'The Chinese name of the referenced dimension role.',
  ref_field_id										bigint comment 'The ID of the referenced logical field (for logical views).', 
	attribute_scope									string comment 'The scope of the attribute.',
	biz_unit_id											bigint comment 'The ID of the business unit.',
	biz_unit_name										string comment 'The name of the business unit.',
  owner_id												string comment 'The ID of the owner.',
  owner_name											string comment 'The name of the owner.',
  modifier_id											string comment 'The ID of the last modifier.',
  modifier_name										string comment 'The name of the last modifier.',
  gmt_create 											string comment 'The creation time.',
	gmt_modified 										string comment 'The modification time.',
  classify                        string comment 'The data classification. Valid values: S for company data, B for business data, and C for personal data. Other values are custom-defined.',
  security_level                  string comment 'The security level. Valid values: L1 for Public, L2 for Internal, L3 for Confidential, and L4 for Top Secret. Other values are custom-defined.'
) comment 'Metadata for logical table fields.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.') 
lifecycle 30;

DevOps

Physical operations

dim_dataphin_node

Note
  • This table describes offline computing tasks, offline integration tasks, and logical table tasks.

  • Query the dim_dataphin_stream_task and dim_dataphin_stream_task_related_table tables for information about real-time computing tasks. Real-time integration tasks cannot be queried yet.

create table if not exists dim_dataphin_node(
  tenant_id 											bigint comment 'The tenant ID.',
  node_id													string comment 'The node ID.',
  node_name												string comment 'The node name.',
  node_desc												string comment 'The node description.',
  node_type												string comment 'The scheduling type. Valid values: NORMAL for periodic nodes and MANUAL for manual nodes.',
  operator_type										string comment 'The node type.',
  cron_expression									string comment 'The cron expression.',
  cron_type												string comment 'The scheduling cycle. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.',
  priority												bigint comment 'The priority.',
  env															string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
  node_status											string comment 'The node status. Valid values: NORMAL for normal scheduling, PAUSED for paused scheduling, and IDLE for a dry run.',
  param														string comment 'The node parameters.',
  output_name_list								string comment 'The output list.',
  compute_engine_config						string comment 'The compute source configuration.',
  compute_engine_id								bigint comment 'The compute source ID.',
  compute_engine_name							string comment 'The compute source name.',
  node_content										string comment 'The node content.',	
  is_logical_node									string comment 'Indicates whether this is a scheduling node for a logical table. Valid values: Y and N.',
  resource_group									string comment 'The resource group.',
  node_from                       string comment 'The node source.',
  project_id											bigint comment 'The workspace ID.',
  project_name										string comment 'The workspace name.',
  biz_unit_id         						bigint comment 'The business unit ID.',
  biz_unit_name       					  string comment 'The business unit name.',
  owner_id												string comment 'The owner ID.',
  owner_name											string comment 'The owner name.',
  modifier_id											string comment 'The ID of the last modifier.',
  modifier_name										string comment 'The name of the last modifier.',
  gmt_create 											string comment 'The creation time.',
  gmt_modified 										string comment 'The modification time.',
  directory_path        string comment 'The directory path.'
  ) comment 'Physical scheduling node'
  partitioned by (ds string comment 'The partition field, in yyyymmdd format.') 
  lifecycle 30;

fct_dataphin_node_taskrun_di

create table if not exists fct_dataphin_node_taskrun_di (
    tenant_id                bigint comment 'The tenant ID.'
   ,taskrun_id               string comment 'The ID of the instance run.'
   ,ready_time               string comment 'The time when the instance met its scheduling conditions.'
   ,dispatch_time            string comment 'The time when the instance was dispatched.'
   ,start_execute_time       string comment 'The start time of the instance run.'
   ,finish_execute_time      string comment 'The finish time of the instance run.'
   ,taskrun_status           string comment 'The status of the instance run. Possible values: SUCCESS, FAILED, WAIT_RESOURCE, KILLED, DROPPED, RUNNING, PAUSED, and READY.'
   ,with_downstreams         string comment 'Whether to schedule downstream tasks.'
   ,schedule_type            string comment 'The type of schedule that triggered the run. Possible values: AUTO_SCHEDULED, RERUN, and REDISPATCH.'
   ,task_id                  string comment 'The task ID.'
   ,task_name                string comment 'The task name.'
   ,task_type                string comment 'The task type. Possible values: NORMAL for a normal task, VIRTUAL for a virtual task, TEMP for a temporary task, PAUSED for a paused task, and IDLE for an idle run.'
   ,task_due_time            string comment 'The scheduled execution time for the instance, calculated based on the cron expression.'
   ,task_upstream_ready_time string comment 'The time when all upstream tasks for this instance completed.'
   ,task_last_execute_time   string comment 'The last execution time of the task.'
   ,task_run_times           string comment 'The number of runs for this instance, including retries.'
   ,dagrun_type              string comment 'The trigger type for the task run. Possible values: NORMAL for a periodic run, MANUAL for a manual run, SUPPLEMENT for a backfill, and TEMP for a temporary run.'
   ,operator_type            string comment 'The operator type of the task. Possible values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, and VIRTUAL.'
   ,node_id                  string comment 'The node ID.'
   ,node_name                string comment 'The node name.'
   ,biz_date                 string comment 'The business date.'
   ,env                      string comment 'The environment. Possible values: DEV for the development environment and PROD for the production environment.'
   ,index_col                string comment 'The sequence number.'
   ,is_logical_node          string comment 'Indicates whether this is a scheduling node for a logical table. Valid values: Y and N.'
   --node_from				    		string comment 'The node source.',
   ,node_content_version     string comment 'The version of the task code.'
   ,node_content             string comment 'The task code.'
   ,project_id               bigint comment 'The project ID.'
   ,project_name             string comment 'The project name.'
   ,owner_id                 string comment 'The owner ID.'
   ,owner_name               string comment 'The owner name.'
   ,gmt_create               string comment 'The creation time.'
   ,gmt_modified             string comment 'The modification time.'
   ,submitter_id             string comment 'The ID of the user who submitted the instance run.'
   ,submitter_name           string comment 'The name of the user who submitted the instance run.'
   ,is_system_submitter      string comment 'Whether the run was submitted by the system.'
)
comment 'Stores instance runs of physical nodes.'
partitioned by (
    ds string comment 'The partition key, in yyyymmdd format.'
)
lifecycle 90;

dim_dataphin_datax_node

create table if not exists dim_dataphin_datax_node(
	tenant_id 											bigint comment 'Tenant ID.',
 	node_id													string comment 'Node ID.',
 	node_name												string comment 'Node name.',
  node_desc												string comment 'Node description.',
  node_type												string comment 'Scheduling type. Valid values: NORMAL for a periodic node, and MANUAL for a manual node.',
  operator_type										string comment 'Node type.',
  cron_expression									string comment 'Cron expression.',
  cron_type												string comment 'Scheduling period. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.',
  priority												bigint comment 'Execution priority.',
  env															string comment 'Environment. Valid values: DEV (development) and PROD (production).',
  node_status											string comment 'Node status. Valid values: NORMAL (normal scheduling), PAUSED (paused scheduling), and IDLE (dry run).',
  param														string comment 'Node parameters.',
  output_name_list								string comment 'Output names.',
  compute_engine_config						string comment 'Compute engine configuration.',
  compute_engine_id								bigint comment 'Compute engine ID.',
  compute_engine_name							string comment 'Compute engine name.',
  node_content										string comment 'Node content.',	
  is_logical_node									string comment 'Indicates if the node is for a logical table. Valid values: Y and N.',
  resource_group									string comment 'Resource group.',
	project_id											bigint comment 'Project space ID.',
	project_name										string comment 'Project space name.',
	biz_unit_id											bigint comment 'Business unit ID.',
	biz_unit_name										string comment 'Business unit name.',
	reader_type											string comment 'Source data type.',
	reader_ds_id										string comment 'Source data source ID.',
	reader_ds_name									string comment 'Source data source name.',
	reader_db_name									string comment 'Source database name.',
	reader_table_name							  string comment 'Source table name.',
	reader_where										string comment 'Source data filter condition.',
	writer_type											string comment 'Destination data type.',
	writer_ds_id										string comment 'Destination data source ID.',
	writer_ds_name									string comment 'Destination data source name.',
	writer_db_name									string comment 'Destination database name.',
	writer_table_name								string comment 'Destination table name.',
  owner_id												string comment 'Owner ID.',
  owner_name											string comment 'Owner name.',
  modifier_id											string comment 'ID of the last modifier.',
  modifier_name										string comment 'Name of the last modifier.',
  gmt_create 											string comment 'Creation time.',
	gmt_modified 										string comment 'Modification time.'
) comment 'DataX-type physical scheduling node.'
partitioned by (ds string comment 'Partition field, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_dlink_node

Note

In script mode, DLink tasks cannot parse table names.

create table if not exists dim_dataphin_dlink_node (
    tenant_id             bigint comment 'The tenant ID.'
   ,node_id               string comment 'The node ID.'
   ,node_name             string comment 'The node name.'
   ,node_desc             string comment 'A description of the node.'
   ,node_type             string comment 'The scheduling type. Valid values: NORMAL (periodic node) or MANUAL (manual node).'
   ,operator_type         string comment 'The node type.'
   ,cron_expression       string comment 'The cron expression.'
   ,cron_type             string comment 'The scheduling period. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, or WEEKLY.'
   ,priority              bigint comment 'The priority.'
   ,env                   string comment 'The environment. Valid values: DEV (development environment) or PROD (production environment).'
   ,node_status           string comment 'The node status. Valid values: NORMAL (scheduled), PAUSED (paused), or IDLE (dry run).'
   ,param                 string comment 'The parameters for the node.'
   ,output_name_list      string comment 'The list of output names.'
   ,compute_engine_config string comment 'The compute engine configuration.'
   ,compute_engine_id     bigint comment 'The compute engine ID.'
   ,compute_engine_name   string comment 'The compute engine name.'
   ,node_content          string comment 'The node content.'
   ,is_logical_node       string comment 'Indicates whether the node schedules a logical table. Valid values: Y or N.'
   ,resource_group        string comment 'The resource group.'
   ,project_id            bigint comment 'The project ID.'
   ,project_name          string comment 'The project name.'
   ,biz_unit_id           bigint comment 'The business unit ID.'
   ,biz_unit_name         string comment 'The business unit name.'
   ,reader_type           string comment 'The type of the reader data source.'
   ,reader_ds_id          string comment 'The ID of the reader data source.'
   ,reader_ds_name        string comment 'The name of the reader data source.'
   ,reader_db_name        string comment 'The name of the reader database.'
   ,reader_table_name     string comment 'The name of the source table.'
   ,writer_type           string comment 'The type of the writer data source.'
   ,writer_ds_id          string comment 'The ID of the writer data source.'
   ,writer_ds_name        string comment 'The name of the writer data source.'
   ,writer_db_name        string comment 'The name of the writer database.'
   ,writer_table_name     string comment 'The name of the destination table.'
   ,owner_id              bigint comment 'The owner ID.'
   ,owner_name            string comment 'The owner name.'
   ,modifier_id           bigint comment 'The ID of the last modifier.'
   ,modifier_name         string comment 'The name of the last modifier.'
   ,gmt_create            string comment 'The creation time.'
   ,gmt_modified          string comment 'The modification time.'
   ,reader_schema_name    string comment 'The source schema name.'
   ,writer_schema_name    string comment 'The destination schema name.'
)
comment 'Stores dlink-type physical scheduling nodes. Note: The value of node_id in this table may not be unique.'
partitioned by (
    ds string comment 'The partition key, in yyyymmdd format.'
)
lifecycle 30;

dim_dataphin_node_edge

Note
  • The combination of source_node_id, source_node_output_name, and target_node_id is unique for each record in this table.

  • In the source_node_name field, nodes with names starting with virtual_root_node are system-built-in root nodes.

  • A logical table displays dependencies for its corresponding logical node and its internal materialized nodes (one or more per logical node). To view only the dependencies between logical nodes, filter by target_operator_type = 'LOGICAL' and source_operator_type = 'LOGICAL'.

  • Version 6.0 introduces support for data lineage in cross-tenant dependencies. If an upstream or downstream task has the same tenant_id, source_tenant_id, and target_tenant_id as the task in the current tenant, then the tenant_id of that task's cross-tenant dependency record matches its target_tenant_id.

create table if not exists dim_dataphin_node_edge (
    tenant_id               bigint comment 'The tenant ID.'
   ,source_node_id          string comment 'The ID of the upstream node.'
   ,source_node_name        string comment 'The name of the upstream node.'
   ,source_node_status      string comment 'The status of the upstream node. Valid values: NORMAL for normal scheduling, PAUSED for paused scheduling, and IDLE for a dry run.'
   ,source_node_type        string comment 'The scheduling type of the upstream node. Valid values: NORMAL for a periodic node and MANUAL for a manual node.'
   ,source_cron_type        string comment 'The scheduling period of the upstream node. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.'
   ,source_operator_type    string comment 'The task type of the upstream node. Valid values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, VIRTUAL, ONE_SERVICE_SQL_HIVE_CDP, and VVP_BATCH.'
   ,source_is_logical_node  string comment 'Whether the upstream node is a scheduling node for a logical table. Valid values: Y and N.'
   ,source_project_id       bigint comment 'The workspace ID of the upstream node.'
   ,source_project_name     string comment 'The workspace name of the upstream node.'
   ,source_node_output_name string comment 'The output name of the upstream node.'
   ,target_node_id          string comment 'The ID of the downstream node.'
   ,target_node_name        string comment 'The name of the downstream node.'
   ,target_node_status      string comment 'The status of the downstream node. Valid values: NORMAL for normal scheduling, PAUSED for paused scheduling, and IDLE for a dry run.'
   ,target_node_type        string comment 'The scheduling type of the downstream node. Valid values: NORMAL for a periodic node and MANUAL for a manual node.'
   ,target_cron_type        string comment 'The scheduling period of the downstream node. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.'
   ,target_operator_type    string comment 'The task type of the downstream node. Valid values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, VIRTUAL, ONE_SERVICE_SQL_HIVE_CDP, and VVP_BATCH.'
   ,target_is_logical_node  string comment 'Whether the downstream node is a scheduling node for a logical table. Valid values: Y and N.'
   ,target_project_id       bigint comment 'The workspace ID of the downstream node.'
   ,target_project_name     string comment 'The workspace name of the downstream node.'
   ,period_diff             bigint comment 'The difference in scheduling periods. Must be a non-negative integer. Defaults to 1 for a self-dependency (0 is not allowed) and 0 for all other dependencies. A positive value means the target node depends on the source node instance from `period_diff` cycles ago.'
   ,day_diff                bigint comment 'The difference in days. When set, the target node depends on all source node instances that ran `day_diff` days ago, and `period_diff` is ignored. Defaults to null.'
   ,is_self_depend          string comment 'Whether the dependency is a self-dependency. Valid values: Y and N.'
   ,env                     string comment 'The environment. Valid values: PROD and DEV.'
   ,gmt_create              string comment 'The creation time.'
   ,gmt_modified            string comment 'The modification time.'
   ,is_cross_tenant         string comment 'Whether a cross-tenant dependency is enabled. Valid values: Y and N.'
   ,source_tenant_id        bigint comment 'The tenant ID of the upstream node.'
   ,target_tenant_id        bigint comment 'The tenant ID of the downstream node.'
)
comment 'The physical scheduling node dependency table.'
partitioned by (
    ds string comment 'The partition field.'
) stored as orc;

fct_dataphin_node_task_link_di

Note
  • Retains only system-generated scheduled instance data in the production environment, with incremental updates based on scheduled tasks in vdm_link after 11:00 on T-2.

  • V6.0 introduces lineage support for cross-tenant dependent tasks. When an upstream or downstream task shares the same tenant_id, source_tenant_id, and target_tenant_id as a task in the current tenant, the system sets the tenant_id for the associated cross-tenant dependent task to the target_tenant_id.

create table if not exists fct_dataphin_node_task_link_di (
    tenant_id            bigint comment 'Tenant ID.'
   ,source_task_id       string comment 'ID of the upstream task instance.'
   ,source_task_name     string comment 'Name of the upstream task instance.'
   ,source_task_type     string comment 'Instance type. Values: `NORMAL`, `VIRTUAL`, `TEMP`, `PAUSED`, `IDLE` (idle scheduling).'
   ,source_operator_type string comment 'Task type. Values: `ONE_SERVICE_SQL`, `MAX_COMPUTE_SQL`, `DATAX`, `DLINK`, `FLINK_STREAMING`, `MAX_COMPUTE_MR`, `PYTHON`, `SHELL`, `VIRTUAL`, `ONE_SERVICE_SQL_HIVE_CDP`, `VVP_BATCH`, etc.'
   ,source_task_status   string comment 'Upstream instance status. Values: `INIT` (Not started), `WAIT_SCHEDULE`, `DISPATCH_TO_AGENT_BLOCKED` (Dispatch blocked), `WAIT_SUBMISSION`, `WAIT_RESOURCE`, `FAILED`, `RUNNING`, `KILLED`, `PAUSED`, `SUCCESS`.'
   ,source_biz_date      string comment 'Business date.'
   ,source_node_id       string comment 'ID of the upstream node.'
   ,source_node_name     string comment 'Name of the upstream node.'
   ,source_node_status   string comment 'Upstream node status. Values: `NORMAL` (normal), `PAUSED` (paused), `IDLE` (idle).'
   ,source_node_type     string comment 'Upstream node scheduling type. `NORMAL`: periodic node.'
   ,source_cron_type     string comment 'Upstream node scheduling cycle. Values: `MINUTELY`, `HOURLY`, `DAILY`, `MONTHLY`, `WEEKLY`.'
   ,source_project_id    bigint comment 'Upstream instance project ID.'
   ,source_project_name  string comment 'Upstream instance project name.'
   ,target_task_id       string comment 'ID of the downstream task instance.'
   ,target_task_name     string comment 'Name of the downstream task instance.'
   ,target_task_type     string comment 'Instance type. Values: `NORMAL`, `VIRTUAL`, `TEMP`, `PAUSED`, `IDLE` (idle scheduling).'
   ,target_operator_type string comment 'Task type. Values: `ONE_SERVICE_SQL`, `MAX_COMPUTE_SQL`, `DATAX`, `DLINK`, `FLINK_STREAMING`, `MAX_COMPUTE_MR`, `PYTHON`, `SHELL`, `VIRTUAL`, `ONE_SERVICE_SQL_HIVE_CDP`, `VVP_BATCH`, etc.'
   ,target_task_status   string comment 'Downstream instance status. Values: `INIT` (Not started), `WAIT_SCHEDULE`, `DISPATCH_TO_AGENT_BLOCKED` (Dispatch blocked), `WAIT_SUBMISSION`, `WAIT_RESOURCE`, `FAILED`, `RUNNING`, `KILLED`, `PAUSED`, `SUCCESS`.'
   ,target_biz_date      string comment 'Business date.'
   ,target_node_id       string comment 'ID of the downstream node.'
   ,target_node_name     string comment 'Name of the downstream node.'
   ,target_node_status   string comment 'Downstream node status. Values: `NORMAL` (normal), `PAUSED` (paused), `IDLE` (idle).'
   ,target_node_type     string comment 'Downstream node scheduling type. `NORMAL`: periodic node.'
   ,target_cron_type     string comment 'Downstream node scheduling cycle. Values: `MINUTELY`, `HOURLY`, `DAILY`, `MONTHLY`, `WEEKLY`.'
   ,target_project_id    bigint comment 'Downstream instance project ID.'
   ,target_project_name  string comment 'Downstream instance project name.'
   ,dagrun_type          string comment 'Task run type. Values: `NORMAL` (periodic), `MANUAL`, `SUPPLEMENT` (backfill), `TEMP` (temporary).'
   ,is_self_depend       string comment 'Self-dependency flag.'
   ,env                  string comment 'Environment. Values: `PROD`, `DEV`.'
   ,gmt_create           string comment 'Creation timestamp.'
   ,gmt_modified         string comment 'Last modification timestamp.'
   ,is_cross_tenant      string comment 'Cross-tenant dependency flag (`Y`/`N`).'
   ,source_tenant_id     bigint comment 'Upstream node tenant ID.'
   ,target_tenant_id     bigint comment 'Downstream node tenant ID.'
)
comment 'Physical scheduling dependencies between instances.'
partitioned by (
    ds string comment 'Partition key, typically a date string (e.g., `YYYYMMDD`).'
) stored as orc;

dim_dataphin_monitored_rule

create table if not exists dim_dataphin_monitored_rule(
    tenant_id                bigint comment 'The tenant ID.',
    monitored_rule_id        bigint comment 'The monitoring configuration ID.',
    monitored_rule_type      string comment 'The monitoring configuration type. Valid values: LOGIC_FIELD: logical table field, SCHEDULE_PHYSICAL_NODE: scheduled physical task, MANUAL_PHYSICAL_NODE: manual physical task, LOGICAL_TABLE_NODE: logical table task, and PROJECT: project-level.',
    switch_status            string comment 'The switch status. Valid values: TRUE and FALSE.',
    external_node_id         string comment 'The ID of the monitored object. The object type is determined by the monitored_rule_type value.',
    external_node_name       string comment 'The name of the monitored object.',
    project_id               bigint comment 'The project ID.',
    project_name             string comment 'The project name.',
    model_id                 string comment 'The ID of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
    model_name               string comment 'The name of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
    biz_unit_id              string comment 'The business unit ID of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
    biz_unit_name            string comment 'The business unit name of the logical table. Populated only when monitored_rule_type is LOGIC_FIELD.',
    alert_reason_type        string comment 'The alert reason. Valid values: DATA_DELAY: data delay, FAILED: failed run, SUCCESS: successful run, TIME_OUT: timeout, and UNFINISHED: unfinished task.',
    alert_reason_config      string comment 'The alert reason configuration.',
    alert_strategy           string comment 'The alert notification policy.',
    owner_id                 string comment 'The owner ID.',
    owner_name               string comment 'The owner name.',
    last_modifier_id         string comment 'The last modifier ID.',
    last_modifier_name       string comment 'The last modifier name.',
    gmt_create               string comment 'The creation time.',
    gmt_modified             string comment 'The modification time.'
) comment 'The O&M monitoring configuration table.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30
;

fct_dataphin_dlink_node_taskrun_step_metric_di

create table if not exists fct_dataphin_dlink_node_taskrun_step_metric_di(
     tenant_id               bigint comment 'The tenant ID.',
     project_id              bigint comment 'The project ID.',
     project_name            string comment 'The project name.',
     taskrun_id              string comment 'The ID of the task run.',
     engine_type             string comment 'Engine type: `dlink`.',
     env                     string comment 'Environment. Valid values: `DEV` and `PROD`.',
     gmt_create              string comment 'The creation time.',
     dagrun_type             string comment 'Task run type. `NORMAL`: scheduled run, `MANUAL`: manual run, `SUPPLEMENT`: backfill, `TEMP`: temporary run.',
     taskrun_status          string comment 'Task run status. Valid values include: `INIT`, `WAIT_SCHEDULE`, `DISPATCH_TO_AGENT_BLOCKED`, `WAIT_SUBMISSION`, `WAIT_RESOURCE`, `RUNNING`, `FAILED`, `KILLING`, `KILLED`, `PAUSED`, `SUCCESS`, and `TASK_NOT_EXISTS`.',
     exception_info          string comment 'Exception details if the task run failed.',
     total_time              bigint comment 'The total duration of the task run, in seconds.',
     start_time              string comment 'The start time of the task run.',
     end_time                string comment 'The end time of the task run.',
     node_id                 string comment 'The node ID.',
     schedule_interval_type  string comment 'Node scheduling interval. Valid values: `MINUTELY`, `HOURLY`, `DAILY`, `MONTHLY`, `WEEKLY`, and `YEARLY`.',
     node_config_memory      bigint comment 'The memory configured for the node, in MB.',
     node_config_cpu         double comment 'The number of CPU cores configured for the node.',
     node_config_paralle     bigint comment 'The global parallelism configured for the node.',
     owner_id                string comment 'The ID of the node owner.',
     owner_name              string comment 'The name of the node owner.',
     task_id                 string comment 'The task ID.',
     task_name               string comment 'The task name.',
     biz_date                string comment 'The business date.',
     execute_ip              string comment 'IP address of the machine that ran the task run.',
     total_lines_reader      bigint comment 'Total records successfully read by the task run.',
     total_lines_written     bigint comment 'Total records successfully written by the task run.',
     total_lines_rejected    bigint comment 'Total number of dirty data records in the task run.',
     total_bytes_reader      bigint comment 'Total bytes successfully read by the task run.',
     total_bytes_written     bigint comment 'Total bytes successfully written by the task run.',
     total_wait_reader_time  bigint comment 'The total wait time for read operations in the task run, in milliseconds.',
     total_wait_writer_time  bigint comment 'The total wait time for write operations during the task run, in milliseconds.',
     avg_reader_record       bigint comment 'Average records read per second in the task run.',
     avg_writer_record       bigint comment 'Average records written per second in the task run.',
     avg_reader_bytes        double comment 'Average bytes read per second in the task run.',
     avg_writer_bytes        double comment 'Average bytes written per second in the task run.',
     taskrun_content         string comment 'The JSON content used for the task run.',
     memory_info             string comment 'Memory usage details from the task run.',
     average_cpu             double comment 'The average CPU load during the task run, as a percentage.',
     gc_time                 bigint comment 'The total garbage collection (GC) time during the task run, in milliseconds.',
     step_name               string comment 'The name of the step.',
     category                string comment 'Step category. Valid values: `reader`, `writer`, `transform`, `filter`, and `map`.',
     step_type               string comment 'The type of the dlink step.',
     step_ds_type            string comment 'Step source type. `COMPUTE`: compute source, `DATASOURCE`: data source, `LOGICTABLE`: logical table, `OTHER`: other.',
     ds_id                   bigint comment 'The ID of the data source or compute source.',
     ds_name                 string comment 'The name of the data source or compute source.',
     ds_project_id           bigint comment 'ID of the project bound to the compute source. Populated only when the compute source is a project.',
     ds_project_name         string comment 'Name of the project bound to the compute source. Populated only when the compute source is a project.',
     ds_biz_unit_id          string comment 'ID of the business unit for the logical table. Populated only when the step reads from a logical table.',
     ds_biz_unit_name        string comment 'Name of the business unit for the logical table. Populated only when the step reads from a logical table.',
     ds_type                 string comment 'The type of the data source or compute source.',
     table_name              string comment 'The table read from or written to by the step. If multiple tables are involved, only one is shown.',
     tables_list             string comment 'A comma-separated list of all tables read from or written to by the step.',
     step_lines_reader       bigint comment 'Total records read by the step.',
     step_lines_written      bigint comment 'Total records written by the step.',
     step_lines_rejected     bigint comment 'Total number of dirty data records for the step.',
     step_bytes_reader       bigint comment 'Total bytes read by the step.',
     step_bytes_written      bigint comment 'Total bytes written by the step.',
     step_wait_reader_time   bigint comment 'The wait time for read operations in the step, in milliseconds.',
     step_wait_writer_time   bigint comment 'The wait time for write operations in the step, in milliseconds.'
) comment 'Step-level runtime statistics for integration task nodes. Each record is uniquely identified by `taskrun_id` and `step_name`.'
partitioned by (ds string comment 'The partition key, formatted as `yyyymmdd`.') 
lifecycle 180
;

fct_dataphin_monitor_alert_event_di

create table if not exists fct_dataphin_monitor_alert_event_di
(
     tenant_id                      bigint comment 'The tenant ID.',
     alert_event_id                 bigint comment 'The ID of the alert event.',
     source_system                  string comment 'The source system. Valid values: DQE for data quality, KGB for baseline, VDM_BATCH for scheduling and O&M, OS for data service, STREAM for real-time computing, REAL_TIME_PIPELINE for real-time integration, any value prefixed with QD_ for QD, and CUSTOM for custom.',
     alert_object_name              string comment 'The name of the object that triggered the alert.',
     alert_object_type              string comment 'The type of the object that triggered the alert. Valid values: REAL_TIME_PIPELINE_TASK for a real-time computing task, OS_API for an API interface, OS_APPLICATION_SERVICE for a data service application, KGB for a baseline, QD_DECISION_INVOKE for a tag system, CUSTOM for a custom object, any value prefixed with VDM_ for an offline task, and any value prefixed with DQE_ for a data quality rule.',
     project_id                     bigint comment 'The project ID.',
     project_name                   string comment 'The project name.',
     biz_unit_id                    bigint comment 'The business unit ID.',
     biz_unit_name                  string comment 'The business unit name.',
     biz_date                       string comment 'The business date.',
     alert_reason                   string comment 'The reason for the alert. For a list of possible values, see the product documentation.',
     alert_reason_params            string comment 'Parameters for the alert reason.',
     latest_alert_time              string comment 'Time of the most recent alert.',
     first_alert_time               string comment 'Time of the first alert.',
     url_config                     string comment 'The link information.',
     alert_sending_config           string comment 'Alert frequency settings.',
     status                         string comment 'The alert status. Valid values: ALERTING, DO_NOT_DISTURB, SILENCING (alerting during a silence period), and FINISH (completed).',
     do_not_disturb_end_time        string comment 'The end time of the do not disturb period.',
     total_alert_times              bigint comment 'The total number of times the alert has triggered.',
     alert_channel_list             string comment 'A list of alert channels. Valid values: VOICE (phone call), SMS (text message), MAIL (email), DINGTALK_ROBOT (DingTalk robot), DINGDING (DingTalk work notification), CUSTOM (custom channel), and SILENCE (do not send).',
     alert_user_list                string comment 'A list of alert recipients.',
     alert_advanced_config          string comment 'Advanced configuration settings.',
     owner_id                       string comment 'The ID of the owner.',
     owner_name                     string comment 'The name of the owner.',
     on_call_table_list             string comment 'A list of on-call schedules.',
     gmt_create                     string comment 'The creation time of the alert event.',
     gmt_modified                   string comment 'The modification time of the alert event.',
     source_system_id               string comment 'The value depends on `source_system`. For `KGB`, this is `dim_dataphin_baseline_object.baseline_id`. For `VDM_BATCH`, this is `dim_dataphin_monitored_rule.monitored_rule_id`.',
     source_system_biz_id           string comment 'The value depends on `source_system`. For `DQE`, this is `dim_dataphin_dp_quality_rule.rule_id`. For `VDM_BATCH`, this is `dim_dataphin_node.node_id` or `dim_dataphin_model.model_id`.'
)comment 'The alert event table, incrementally updated daily based on `first_alert_time`.'
partitioned by (ds string comment 'The partition column, in yyyymmdd format.')
lifecycle 90;
Note

The alert_reason enum values are:

  • Data quality:

    • DQE_COLUMN: Column rule exception.

    • DQE_DATA_SOURCE: Data source rule exception.

    • DQE_TABLE: Table rule exception.

    • DQE_REALTIME_TABLE: Real-time table rule exception.

    • DQE_INDEX: Metric rule exception.

  • Data service:

    • OS_AVG_RESPONSE: Average response time exception.

    • OS_CALL_TIMES: Number of calls exception.

    • OS_ERROR_RATE: Error rate exception.

    • OS_OFFLINE: Offline percentage exception.

  • Batch task:

    • VDM_BATCH_ERROR: Task error.

    • VDM_BATCH_FINISH: Task finished successfully.

    • VDM_BATCH_TIME_OUT: Execution timeout.

    • VDM_BATCH_UNDONE: Task incomplete.

    • VDM_BATCH_LOGIC_DATA_DELAY: Logical table data delay.

  • Baseline:

    • KGB_TASK_ERROR: Baseline task error.

    • KGB_TASK_SLOW_DOWN: Baseline task slowdown.

    • KGB_EARLY_WARNING: Baseline early warning.

    • KGB_BROKEN_LINE: Baseline breach.

  • Real-time computing and real-time integration:

    • STREAM_BIZ_DELAY: High business latency.

    • STREAM_DATA_RETENTION: Data backlog exceeds the configured limit.

    • STREAM_MORE_THAN_FAILURE: Failure frequency exceeds the configured limit.

    • STREAM_TPS_OUT_RANGE: TPS is outside the configured range.

    • STREAM_CHECKPOINT_FAILURE: Checkpoint failures exceed the configured limit.

    • STREAM_BACKPRESSURE: Backpressure duration exceeds the configured limit.

    • STREAM_JOB_FAILURE: Job failure.

  • Custom:

    • LOGICAL_INSTANCE_GENERATION: Logical instance generation alert.

fct_dataphin_monitor_alert_notification_di

Note

The enumeration values for alert_reason are defined in the fct_dataphin_monitor_alert_event_di table.

create table if not exists fct_dataphin_monitor_alert_notification_di(
     tenant_id             bigint comment 'The tenant ID.',
     alert_notification_id bigint comment 'The ID of the notification record.',
     alert_event_id        bigint comment 'The ID of the alert event. This ID corresponds to `alert_event_id` in the `fct_dataphin_monitor_alert_event_di` table. A value of -1 indicates consolidated alert events.',
     source_system         string comment 'The source system. Valid values: `DQE` for data quality, `KGB` for baseline, `VDM_BATCH` for scheduling and O&M, `OS` for data service, `STREAM` for real-time computing, `REAL_TIME_PIPELINE` for real-time integration, any value prefixed with `QD_` for QD, and `CUSTOM` for custom.',
     alert_object_name     string comment 'The name of the object that triggered the alert.',
     alert_object_type     string comment 'The type of object that triggered the alert. Valid values: `REAL_TIME_PIPELINE_TASK` for a real-time computing task, `OS_API` for an API, `OS_APPLICATION_SERVICE` for a data service application, `KGB` for a baseline, `QD_DECISION_INVOKE` for a tagging system, `CUSTOM` for a custom object, any value prefixed with `VDM_` for an offline task, and any value prefixed with `DQE_` for a data quality rule.',
     notify_time           string comment 'The notification time.',
     biz_date              string comment 'The business date.',
     alert_reason          string comment 'The reason for the alert. Refer to the relevant documentation for a list of enumerated values.',
     alert_reason_params   string comment 'The parameters for the alert reason.',
     message_content       string comment 'The content of the alert notification.',
     receiver_type         string comment 'The receiver type. Valid values: `ON_CALL_TABLE` for an on-call schedule, `USER_DEFINED` for a custom user, and `OWNER` for the owner.',
     alert_channel         string comment 'The notification channel.',
     alert_user            string comment 'The notification receiver.',
     on_call_table_id      bigint comment 'The ID of the on-call schedule.',
     on_call_table_name    string comment 'The name of the on-call schedule.',
     notify_status         string comment 'The notification status. Valid values: `SUCCESS`, `FAILED`, `SENDING`, and `UNKNOWN`.',
     fail_reason           string comment 'The reason for a `FAILED` notification status.',
     gmt_create            string comment 'The creation time.',
     gmt_modified          string comment 'The modification time.'
)comment 'The alert notification record table. Daily increments are determined by the `notify_time` field.'
partitioned by (ds string comment 'The partition field, using the `yyyymmdd` format.')
lifecycle 90;

Service operations

dim_dataphin_model_node

create table if not exists dim_dataphin_model_node(
	tenant_id 											bigint comment 'Tenant ID.',
 	model_node_id										string comment 'Logical table node ID.',
 	model_node_name									string comment 'Logical table node name.',
 	model_node_name_cn							string comment 'Chinese name of the logical table node.',
  model_id												bigint comment 'Logical table ID.',		
  model_name											string comment 'Logical table name.',
  model_name_cn										string comment 'Chinese name of the logical table.',
  model_type											string comment 'Logical table type.',
 	control_node_id									string comment 'Control node ID. Corresponds to the `node_id` field in the `dim_dataphin_vdm_node` table.',
  cron_type												string comment 'Scheduling cycle. Valid values: MINUTELY, HOURLY, DAILY, MONTHLY, and WEEKLY.',
	fix_time												string comment 'Scheduled time.',
  is_paused												string comment 'Indicates whether the node is paused. Valid values: Y, N.',
  field_cnt												bigint comment 'Number of fields.',
  field_group_cnt									bigint comment 'Number of field groups.',
  priority												string comment 'Priority.',
  env															string comment 'Environment. Valid values: DEV (development) and PROD (production).',
  is_level					              string comment 'Indicates whether this is a hierarchical dimension.',
  level_mid_node_id			          string comment 'ID of the intermediate node for the hierarchical dimension.',
  level_mid_node_name			        string comment 'Name of the intermediate node for the hierarchical dimension.',
  resource_group									string comment 'Resource group.',
  task_group											string comment 'Task group.',
	project_id											bigint comment 'Project ID.',
	project_name										string comment 'Project name.',
	biz_unit_id											bigint comment 'Business unit ID.',
	biz_unit_name										string comment 'Business unit name.',
  owner_id												string comment 'Owner ID.',
  owner_name											string comment 'Owner name.',
  gmt_create 											string comment 'Creation time.',
	gmt_modified 										string comment 'Modification time.'
) comment 'Logical table scheduling node.'
partitioned by (ds string comment 'Partition field. Format: yyyymmdd.') 
lifecycle 30;

fct_dataphin_model_node_task_di

create table if not exists fct_dataphin_model_node_task_di(
	tenant_id 											bigint comment 'The tenant ID.',
 	model_task_id										string comment 'The ID of the logical table scheduling instance.',
 	model_node_id										string comment 'The ID of the logical table node.',
  model_node_name									string comment 'The name of the logical table node.',
 	model_node_name_cn							string comment 'The Chinese name of the logical table node.',
  model_task_type									string comment 'The task run type. Valid values: NORMAL for a periodic run, MANUAL for a manual run, SUPPLEMENT for a backfill, and TEMP for a temporary run.',
  start_time											string comment 'The instance start time.',
  end_time												string comment 'The instance end time.',
  model_id												bigint comment 'The ID of the logical table.',		
  model_name											string comment 'The name of the logical table.',
  model_name_cn										string comment 'The Chinese name of the logical table.',
  model_type											string comment 'The type of the logical table.',
 	control_node_id									string comment 'The ID of the control node. This corresponds to the node_id in the dim_dataphin_vdm_node table.',
 	control_node_task_id						string comment 'The ID of the control node instance. This corresponds to the task_id in the dwd_dataphin_vdm_node_taskrun_di table.',
  cron_type												string comment 'The scheduling period. Valid values: MINUTE, HOUR, DAY, MONTH, and WEEK.',
  fix_time												string comment 'The scheduled time.',
  biz_date                        string comment 'The business date.',
	is_paused												string comment 'Specifies if the task is paused. Valid values: Y and N.',
  paused_type											string comment 'The type of pause.',
  initialized											string comment 'Specifies if the task is initialized. Valid values: Y and N.',
  field_cnt												bigint comment 'The number of fields.',
  field_group_cnt									bigint comment 'The number of field groups.',
  priority												string comment 'The priority of the task.',
  internal_status									string comment 'An internal status field. You can ignore this field.',
  env															string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
  is_level					              string comment 'Specifies if this is a hierarchical dimension.',
  level_mid_node_id			          string comment 'The ID of the intermediate node for the hierarchical dimension.',
  level_mid_node_name			        string comment 'The name of the intermediate node for the hierarchical dimension.',
  resource_group									string comment 'The resource group.',
  task_group											string comment 'The task group.',
	project_id											bigint comment 'The ID of the project space.',
	project_name										string comment 'The name of the project space.',
	biz_unit_id											bigint comment 'The ID of the business unit.',
	biz_unit_name										string comment 'The name of the business unit.',
  owner_id												string comment 'The ID of the owner.',
  owner_name											string comment 'The name of the owner.',
  gmt_create 											string comment 'The creation time.',
	gmt_modified 										string comment 'The modification time.'
) comment 'Contains a record for each logical table scheduling instance.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.') 
lifecycle 30;

Engine

Dataphin MaxCompute data integration

Note
  • The data in this table is for reference only. For the most accurate MaxCompute job information, refer to the source data in MaxCompute.

  • This table contains job data only for MaxCompute projects linked to both a Dataphin project and a computing engine.

  • This table is archived by business date based on the start_time. As a result, information like status, end time, and cost for cross-day jobs may be inaccurate and is not backfilled.

  • Dataphin-related information is available only for tasks submitted through Dataphin. This information is not available for tasks that connect directly to MaxCompute, such as meta-warehouse tasks, Dlink tasks for creating tables or adding partitions, or other tasks executed directly on MaxCompute.

  • The table uses the MaxCompute project name to associate records. For tasks not submitted through Dataphin, the record is associated with the Dataphin project that shares the computing engine of the MaxCompute project. Note that if a MaxCompute project is linked to multiple Dataphin tenants, you may encounter duplicate inst_id records with different dataphin_tenant_id values.

create table if not exists fct_dataphin_maxcompute_task_di(
    maxcompute_project_name     string comment 'The name of the MaxCompute project.',
    task_name                   string comment 'The name of the MaxCompute task.',
    task_type                   string comment 'The type of the task. Valid values: SQL for an SQL task, CUPID for a Spark or Mars task, SQLCost for an SQL cost estimation task, SQLRT for a query acceleration SQL task, LOT for a MapReduce task, and PS for a PAI Parameter Server task.',
    inst_id                     string comment 'The unique ID of the MaxCompute instance.',
    inst_status                 string comment 'The task''s execution status at the time of data collection (not a real-time value). Valid values: `Terminated` (the task finished), `Failed` (the task failed), `Canceled` (the task was canceled), or `Other` (any other state).',
    inst_owner_name             string comment 'The name of the Alibaba Cloud account that submitted the task.',
    `result`                    string comment 'The error message if the task failed. This field is null for successful tasks.',
    start_time                  string comment 'The time the task started, in `yyyy-mm-dd hh:mi:ss` format.',
    end_time                    string comment 'The time the task ended, in `yyyy-mm-dd hh:mi:ss` format.',
    input_records               bigint comment 'The number of input records.',
    output_records              bigint comment 'The number of output records.',
    input_bytes                 bigint comment 'The volume of input data, in bytes.',
    output_bytes                bigint comment 'The volume of output data, in bytes.',
    input_tables                string comment 'A list of input tables. Applies to SQL tasks only.',
    output_tables               string comment 'A list of output tables. Applies to SQL tasks only.',
    operation_text              string comment 'The statement executed by the task. Applies to SQL tasks only.',
    cost_cpu                    double comment 'The CPU consumption of the task, where 100 represents 1 core-second (1 core × 1 s). For example, a task using 10 cores for 5 seconds has a cost_cpu of (10 × 5) × 100 = 5000.',
    cost_mem                    double comment 'The memory consumption of the task, measured in MB-seconds (MB × seconds).',
    settings                    string comment 'Information from an upstream scheduler or a user, stored in JSON format.',
    dataphin_tenant_id          string comment 'The ID of the DataPhin tenant.',
    dataphin_project_id         bigint comment 'The ID of the DataPhin project.',
    dataphin_project_name       string comment 'The name of the DataPhin project.',
    dataphin_node_id            string comment 'The ID of the DataPhin node.',
    dataphin_node_name          string comment 'The name of the DataPhin node.',
    dataphin_node_owner_id      string comment 'The ID of the DataPhin account that owns the node.',
    dataphin_node_owner_name    string comment 'The name of the DataPhin account that owns the node.',
    dataphin_task_id            string comment 'The ID of the DataPhin task.',
    dataphin_taskrun_id         string comment 'The ID of the DataPhin task run.',
    dataphin_taskrun_submitter_id         string comment 'The ID of the account that submitted the DataPhin task run.',
    dataphin_taskrun_submitter_name       string comment 'The name of the account that submitted the DataPhin task run.',
    dataphin_is_system_submitter          string comment 'Indicates whether the DataPhin system submitted the task run.',
    dataphin_dagrun_type        string comment 'The type of the DataPhin DAG run. Valid values: `NORMAL` (a scheduled run), `MANUAL` (a manual run or ad-hoc query), and `SUPPLEMENT` (a backfill run).',
    dataphin_biz_date           string comment 'The business date of the DataPhin task run, in `yyyymmdd` format.',
    platform_name               string comment 'The name of the source platform. Fixed value: DATAPHIN'                 
) comment 'MaxCompute task incremental table'
partitioned by (ds string comment 'The partition field, in `yyyymmdd` format. Partitions are based on the `start_time` field.') 
lifecycle 15;

OS

fct_dataphin_os_task_table_inputoutput_di

Note

Granularity is defined by the task execution ID, input table ID, input environment, output table ID, and output environment.

create table if not exists fct_dataphin_os_task_table_inputoutput_di (
  tenant_id             bigint comment 'Tenant ID.',
  os_task_id            string comment 'Execution task ID.',
  os_instance_id        string comment 'Execution instance ID.',
  os_job_id             string comment 'Execution schedule ID.',
  engine_job_id         string comment 'The job ID from the execution engine. For MaxCompute, this is the instance ID.',
  vdm_task_id           string comment 'VDM task ID.',
  vdm_taskrun_id        string comment 'VDM task run ID.',
  os_task_source        string comment 'Execution task source.',
  biz_id                string comment 'Business ID.',
  dagrun_type            string comment 'The run type of the task. Valid values: NORMAL for a scheduled run, MANUAL for a manual run, SUPPLEMENT for a backfill run, and TEMP for a temporary run.',
  operator_type    string comment 'The task type. Valid values: ONE_SERVICE_SQL, MAX_COMPUTE_SQL, DATAX, DLINK, FLINK_STREAMING, MAX_COMPUTE_MR, PYTHON, SHELL, and VIRTUAL.',
  node_id            string comment 'Node ID.',    
  sql_type              string comment 'SQL task type, such as DML.',
  gmt_create            string comment 'Creation time.',
  run_time              string comment 'Task run start time.',
  env                   string comment 'The environment. Valid values: DEV for the development environment and PROD for the production environment.',
  input_project_id            bigint comment 'Input project ID.',
  input_project_name          string comment 'Input project name.',
  input_biz_unit_id           bigint comment 'Input business unit ID.',
  input_biz_unit_name         string comment 'Input business unit name.',
  input_table_id              string comment 'Input table ID.',
  input_table_name            string comment 'Input table name.',
  input_env                   string comment 'The input environment. Valid values: DEV for the development environment and PROD for the production environment.',
  input_is_logical_table      string comment 'Specifies whether the input is a logical table. Valid values: Y and N.',
  output_project_id            bigint comment 'Output project ID.',
  output_project_name          string comment 'Output project name.',
  output_biz_unit_id           bigint comment 'Output business unit ID.',
  output_biz_unit_name         string comment 'Output business unit name.',
  output_table_id              string comment 'Output table ID.',
  output_table_name            string comment 'Output table name.',
  output_env                   string comment 'The output environment. Valid values: DEV for the development environment and PROD for the production environment.',
  output_is_logical_table      string comment 'Specifies whether the output is a logical table. Valid values: Y and N.'
)comment 'Input and output tables parsed from OS SQL tasks.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.')
lifecycle 90;

Development

fct_dataphin_submit_record_df

Note
  • This table lists submission records for computation tasks, intelligent modeling, physical table management, and data integration.

  • The combination of submit_id and submit_type uniquely identifies each record.

  • Notes for the object_type field:

    • If submit_type is NODE, object_type can be MANUAL for a manual task or NORMAL for a scheduled task.

    • If submit_type is DLINK, object_type can be OFFLINE_PIPELINE for offline data integration or REAL_TIME_PIPELINE for real-time data integration.

    • If submit_type is LOGIC, object_type can be one of the following: FACT_LOGIC_TABLE for a fact logic table, SUMMARY_LOGIC_TABLE for a summary logic table, DIM_LOGIC_TABLE for a dimension logic table, BIZ_PROCESS for a business process, BIZ_OBJECT for a business object, BIZ_CONDITION for a business condition, ATOM_INDEX for an atomic indicator, or DERIVED_INDEX for a derived indicator.

    • If submit_type is TABLE, object_type can be ALTER for an alter table operation, CREATE for a create table operation, or DELETE for a drop table operation.

  • The project_id/project_name fields are null when submit_type='LOGIC' and object_type in ('BIZ_OBJECT','BIZ_PROCESS'). For historical records where submit_type='LOGIC' and object_type = 'SUMMARY_LOGIC_TABLE', these fields are also null.

  • In Basic mode (project_mode=BASIC), submissions are published immediately, so all is_published flags are set to 'N'.

create table if not exists fct_dataphin_submit_record_df(
    submit_id           string comment 'The submission ID.',
    submit_type         string comment 'The submission type. Valid values: NODE for a code task, LOGIC for intelligent modeling, TABLE for physical table management, and DLINK for data integration.',
    tenant_id           bigint comment 'The tenant ID.',
    object_id           string comment 'The object ID. The specific type of this ID depends on the submit_type and object_type fields.',
    object_type         string comment 'The type of the submitted object. See the documentation for details.',
    object_name         string comment 'The object name.',
    object_version      string comment 'The object version.',
    object_content      string comment 'The content of the submitted object.',
    object_owner_id     string comment 'The object owner ID.',
    object_owner_name   string comment 'The object owner name.',
    submit_comment      string comment 'The submission comment.',
    submit_user_id      string comment 'The submitting user ID.',
    submit_user_name    string comment 'The submitting user''s name.',
    is_published        string comment 'The published status. Valid values: Y or N.',
    project_id          bigint comment 'The project ID.',
    project_name        string comment 'The project name.',
    project_mode        string comment 'The project mode. Valid values: BASIC for basic mode and DEV_PROD for development/production mode.',
    biz_unit_id         string comment 'The business unit ID.',
    biz_unit_name       string comment 'The business unit name.',
    submit_time         string comment 'The submission time.'

)comment 'Contains all submission records in Dataphin.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 7;

fct_dataphin_publish_record_df

Note

The object_type field has the following values:

  • Development:

    • VIRTUAL: virtual task; SHELL: shell script; PYTHON: Python task; UDF: user-defined function; IDE_RESOURCE: custom resource; MAX_COMPUTE_SQL: MaxCompute SQL task; HIVE_SQL: Hive SQL task; DATABASE_SQL: database SQL task; SPARK_JAR_ON_HIVE: Spark on Hive task; SPARK_JAR_ON_MAX_COMPUTE: Spark on MaxCompute task; STARROCKS_SQL: StarRocks SQL task; ADB_PG_SQL: ADB for PostgreSQL SQL task; DATA_PROCESS_PHYSICAL_TABLE: physical table; TASK_TEMPLATE: computing template; ARGODB_SQL: ArgoDB SQL task; HOLOGRES_SQL: Hologres SQL task; IMPALA_SQL: Impala SQL task.

  • Integration:

    • OFFLINE_PIPELINE: offline data integration; REAL_TIME_PIPELINE: real-time data integration.

  • Intelligent Modeling:

    • FACT_LOGIC_TABLE: fact logical table; SUMMARY_LOGIC_TABLE: summary logical table; DIM_LOGIC_TABLE: dimension logical table; BIZ_PROCESS: business process; BIZ_OBJECT: business object; BIZ_CONDITION: business qualifier; ATOM_INDEX: atomic metric; DERIVED_INDEX: derived metric; LOGICAL_TABLE_COLUMN: logical table column; DATA_DOMAIN: data domain; BIZ_UNIT: business unit.

  • Real-time:

    • FLINK_SQL: real-time SQL task; META_TABLE: real-time metatable; MIRROR_TABLE: real-time mirror table.

  • Any value prefixed with QUALITY_: Pertains to data quality.

  • Any value prefixed with DATA_SECURITY_: Pertains to data security.

  • Any value prefixed with DATA_STANDARD_: Pertains to data standard.

  • Any value prefixed with QD_: Pertains to the intelligent decision engine.

create table if not exists fct_dataphin_publish_record_df(
    tenant_id          bigint comment 'Tenant ID.',
    object_id          string comment 'Object ID.',
    object_type        string comment 'Object type.',
    object_name        string comment 'Object name.',
    object_version     string comment 'Object version.',
    change_type        string comment 'Change type. Valid values: CREATE, UPDATE, DELETE.',
    publish_id         string comment 'Publish ID.',
    publish_name       string comment 'Publish name.',
    publish_status     string comment 'Publish status. Valid values: NOT_VERIFIED, APPROVING, APPROVE_FAILED, APPROVE_SUCCESS, APPROVE_REVOKED, CHECK_FAILED, CHECK_SUCCESS, PUBLISHING, FAILED, SUCCESS.',
    publish_error      string comment 'Publish error message.',
    publisher_id       string comment 'Publisher ID.',
    publisher_name     string comment 'Publisher name.',
    group_id           string comment 'Publish group ID.',
    group_name         string comment 'Publish group name.',
    finish_time        string comment 'Completion time.',
    data_domain_id     bigint comment 'Data domain ID.',
    data_domain_name   string comment 'Data domain name.',
    node_id            string comment 'Node ID. Corresponds to the node_id in the dim_dataphin_node table.',
    gmt_create         string comment 'Creation time.',
    gmt_modify         string comment 'Modification time.',
    project_id         bigint comment 'Project ID.',
    project_name       string comment 'Project name.',
    biz_unit_id        bigint comment 'Business unit ID.',
    biz_unit_name      string comment 'Business unit name.',
    extend_properties  string comment 'Extended object properties.',
    tag                string comment 'Tag for data migration and marking. Valid values: NORMAL_SUBMIT (user-initiated, UI-visible), SYSTEM_SUBMIT (system-initiated, not UI-visible), UPGRADE (system upgrades).',
    public_status_info string comment 'Extended publish status information in JSON format.',
    sub_item_list      string comment 'JSON string of sub-items for display only.',
    sync_publish       string comment 'Indicates if the publish is synchronous. Valid values: SYNC, ASYNC.',
    biz_tag            string comment 'Business tag. Example: BATCH_PUBLISH.'
)comment 'Dataphin publish records.'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 15;

Permissions

fct_dataphin_security_auth_permission_record_df

Note
  • Data service permissions are not currently included.

  • Description of the resource_type field:

    • Planning: BIZ_UNIT: business unit; DATA_DOMAIN: data domain; PROJECT: project; GLOBAL_PARAM: global parameter.

    • Data source: DATA_SOURCE: data source.

    • Physical table: PHYSICAL_TABLE: physical table.

    • Function: FUNCTION: function.

    • Logical table: LOGICAL_FIELD: logical table field; PHYSICAL_FIELD: physical table field.

    • Data service: OS_API: data service API; OS_API_FIELD: data service API field; OS_APP: data service application; OS_LOGIC_UNIT: data service unit; OS_DS: data service data source; OS_DS_TABLE: data service data source table; OS_DS_FIELD: data service data source table field; OS_DS_LOGICAL_TABLE: data service logical table; OS_DS_LOGICAL_FIELD: data service logical table field.

    • Resource group: RS_RESOURCE_GROUP: RS resource group.

    • Meta table: REALTIME_LOGICAL_TABLE: real-time meta table; REALTIME_LOGICAL_FIELD: real-time meta table field.

    • Mirror table: REALTIME_MIRROR_TABLE: real-time mirror table; REALTIME_MIRROR_FIELD: real-time mirror table field.

    • View: FEATURE: physical view; PHYSICAL_VIEW: physical view; LOGICAL_VIEW: logical view; PHYSICAL_MATERIALIZED_VIEW: materialized view.

    • Secret key: SECRET_KEY: secret key.

    • Tag: QD_SERVICE_FEATURE: service tag; QD_OFFLINE_FEATURE: offline tag; QD_REALTIME_FEATURE: real-time tag; QD_ADVANCED_FEATURE: advanced tag; QD_CLUSTER: cluster; QD_EVENT: event; QD_OFFLINE_SERVICE: offline tag service task.

  • We recommend that you do not rely on the resource_owners field.

    • Before V4.3, this field listed the approvers from the default system approval template (the initial, unedited version). Modifying the template did not update this field. For data source tables, this field was empty.

    • Starting in V4.3:

      • For table resources, the field now contains the table owner configured on the asset inventory page in Dataphin. For data source tables, this field remains empty.

      • For function and secret key resources, the logic is unchanged. The field still lists approvers from the system approval template.

create table if not exists fct_dataphin_security_auth_permission_record_df (
  record_id                    bigint comment 'The unique ID of the permission record.',
  tenant_id                    bigint comment 'The ID of the tenant.',
  permission_source            string comment 'The permission source. Valid values: `BY_APPLY` (from a permission application) and `BY_GRANT` (from a direct permission grant).',
  account_type                 string comment 'The account type. Valid values: `PERSONAL` (personal account), `USER_GROUP` (user group), and `PRODUCE` (production account).',
  account_user_id              string comment 'The ID of the account.',
  account_user_name            string comment 'The name of the account.',
  account_project_id           string comment 'The ID of the project associated with the production account.',
  account_project_name         string comment 'The name of the project associated with the production account.',
  resource_id                  string comment 'The ID of the resource associated with the permission.',
  resource_type                string comment 'The type of the resource.',
  resource_env                 string comment 'The resource environment. Valid values: `PROD` and `DEV`.',
  resource_name                string comment 'The name of the resource.',
  resource_cn_name             string comment 'The Chinese name of the resource.',
  resource_owners              string comment 'The owners of the resource.',
  resource_project_id          string comment 'The ID of the project that contains the resource.',
  resource_project_name        string comment 'The name of the project that contains the resource.',
  resource_project_env         string comment 'The environment of the project. Valid values: `PROD` and `DEV`.',
  resource_project_cn_name     string comment 'The Chinese name of the project that contains the resource.',
  resource_biz_unit_id         string comment 'The ID of the business unit that contains the resource.',
  resource_biz_unit_env        string comment 'The environment of the business unit. Valid values: `PROD` and `DEV`.',
  resource_biz_unit_name       string comment 'The name of the business unit that contains the resource.',
  resource_biz_unit_cn_name    string comment 'The Chinese name of the business unit that contains the resource.',
  resource_properties          string comment 'Resource properties in JSON format.',
  operate_periods              string comment 'Detailed permission information, in JSON format.',
  nearest_period_type          string comment 'The expiration type for the permission closest to expiry. Valid values: `DAYS_30` (30 days), `DAYS_90` (90 days), `DAYS_180` (180 days), `DAYS_365` (365 days), `LONG_TERM`, and `CUSTOM`.',
  nearest_period_end           string comment 'The expiration date for the permission closest to expiry.',
  is_invalid                   string comment 'Indicates if the permission is invalid. Valid values: `Y` (invalid) and `N` (valid).',
  invalid_date                 string comment 'The timestamp when the permission became invalid.',
  auth_scope                   string comment 'The authorization scope. Valid values: `SELECT_TABLE` (a specific table), `PROJECT_ALL_TABLE` (all tables in a project), and `BIZ_UNIT_ALL_LOGIC_TABLE` (all logical tables in a business unit).',
  creator_id                   string comment 'The ID of the user who created the record.',
  creator_name                 string comment 'The name of the user who created the record.',
  modifier_id                  string comment 'The ID of the user who last modified the record.',
  modifier_name                string comment 'The name of the user who last modified the record.',
  gmt_create                   string comment 'The timestamp when the record was created.',
  gmt_modified                 string comment 'The timestamp when the record was last modified.',
  column_level_auth            string comment 'Column-level permission in JSON format. This field applies only to table-level permission applications.'
)comment 'Stores Dataphin permission records.'
partitioned by (ds string comment 'The date partition key (format: `yyyymmdd`).')
stored as orc;

Baseline

dim_dataphin_baseline_object

create table if not exists dim_dataphin_baseline_object(
    tenant_id                    bigint comment 'The tenant ID.',
    baseline_id                  bigint comment 'The baseline ID.',
    baseline_name                string comment 'The baseline name.',
    baseline_type                string comment 'The baseline type. Valid values: DAY for a daily baseline, HOUR for an hourly baseline, and EMPTY for an empty baseline.',
    baseline_description         string comment 'The baseline description.',
    promise_times                string comment 'The promised completion time, in JSON format.',
    warning_times                string comment 'The alert time, in JSON format.',
    baseline_priority            bigint comment 'The baseline priority. Valid values: 1 (Lowest), 2 (Low), 3 (Medium), 4 (High), and 5 (Highest).',
    warning_setting              string comment 'The baseline alert configuration, in JSON format.',
    baseline_warning_switch      string comment 'Indicates whether baseline alerts are enabled. Valid values: `TRUE` and `FALSE`.',
    baseline_owner_id            string comment 'The ID of the baseline owner.',
    baseline_owner_name          string comment 'The name of the baseline owner.',
    baseline_modifier_id         string comment 'The ID of the last modifier.',
    baseline_modifier_name       string comment 'The name of the last modifier.',
    object_type                  string comment 'The object type. Valid values: PHYSICAL_NODE for a physical node and LOGIC_FIELD for a logical table task.',
    object_id                    string comment 'The object ID. Its value depends on `object_type`.',
    object_name                  string comment 'The object name.',
    object_catalog               string comment 'The object''s catalog, which can be a project or business unit name.',
    object_owner_id              string comment 'The ID of the object owner.',
    object_owner_name            string comment 'The name of the object owner.',
    object_project_id            string comment 'The ID of the project that contains the physical node.',
    object_project_name          string comment 'The name of the project that contains the physical node.',
    object_biz_unit_id           string comment 'The ID of the business unit that contains the logical table task.',
    object_biz_unit_name         string comment 'The name of the business unit that contains the logical table task.',
    object_field_names           string comment 'A comma-separated list of field names for the logical table.',
    object_fix_produce_time      string comment 'The object''s fixed production time.',
    gmt_create                   string comment 'The creation time.',
    gmt_modified                 string comment 'The modification time.'
) comment 'Stores baseline monitoring objects.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;

Real-time development

dim_dataphin_stream_task

Note
  • For tasks submitted in version 5.1.0 or earlier, the task_resource_configuration field may be NULL. We recommend saving and resubmitting them.

  • Versions 5.3.0 and earlier do not include real-time integration tasks.

create table if not exists dim_dataphin_stream_task (
 tenant_id                   bigint comment 'Tenant ID.'
,task_id                     bigint comment 'Task ID.'
,task_name                   string comment 'Task name.'
,env                         string comment 'Environment mode. Valid values: DEV, PROD.' 
,task_type                   string comment 'Task type. Valid values: FLINK_SQL, FLINK_DATASTREAM.'
,task_content                string comment 'Task code.'
,task_resource_configuration  string comment 'Task resource configuration.'
,resource_rellocation_type   string comment 'Resource configuration type. Valid values: System Recommended, Custom.'
,customed_resource_context   string comment 'Custom resource configuration content.'
,task_state                  string comment 'Task status. 0: Draft, 1: Submitting, 2: Submitted, 3: Published, 100: In development.'
,task_description            string comment 'Task description.'
,project_id                  bigint comment 'Project ID.'
,project_name                string comment 'Project name.'
,biz_unit_id                 string comment 'Business unit ID.'
,biz_unit_name               string comment 'Business unit name.'
,cluster_queue_info          string comment 'Cluster information.'
,owner_id                    string comment 'Owner ID.'
,owner_name                  string comment 'Owner name.'
,modifier_id                 string comment 'Modifier ID.'
,modifier_name               string comment 'Last modifier name.'
,gmt_create                  string comment 'Creation time.'
,gmt_modified                string comment 'Last modified time.'
)
comment 'Statistics table for real-time tasks. Granularity: tenant_id + task_id.'
partitioned by (
    ds string comment 'Partition field, in `yyyymmdd` format.'
)
lifecycle 30;

dim_dataphin_stream_task_related_table

Note
  • table_type: The table type. Valid values: REAL_TIME_LOGIC_TABLE (a logic table), REAL_TIME_MIRROR_TABLE (a mirror table), PHYSICAL_TABLE (a physical table), and DATA_SOURCE_PHYSICAL_TABLE (a data source table).

  • table_name: For data sources with schemas, such as Hologres or Oracle, the returned table name includes the schema if it is referenced in the task code. Otherwise, the schema is omitted.

  • task_mode: The task mode. Valid values: STREAM (real-time mode) and BATCH (batch mode).

    A real-time task is created on the O&M page. If you also enable batch mode, an additional scheduled task is also created.

  • Granularity:

    • tenant_id + task_id: Uniquely identifies a real-time task.

    • tenant_id + task_id + input_table_id: Uniquely identifies an input table for a real-time task. A single task can have multiple input tables but only one output table.

    • tenant_id + task_id + input_table_id + node_id: Uniquely identifies an O&M node. The node can belong to either the real-time or batch mode of the task, as indicated by the task_mode field.

  • Real-time integration tasks are not available in versions V5.3.0 and earlier.

create table if not exists dim_dataphin_stream_task_related_table(
  tenant_id                  bigint comment'Tenant ID.',
  task_id                    string comment'Task ID.',
  task_name                  string comment'Task name.',
  task_env                   string comment'Task environment. Valid values: DEV, PROD.',
  output_table_name          string comment'Output table name. If the table has a schema (e.g., a Hologres table), the format is schema.table_name.',
  output_table_type          string comment'Output table type. Valid values: REAL_TIME_LOGIC_TABLE, REAL_TIME_MIRROR_TABLE, PHYSICAL_TABLE, DATA_SOURCE_PHYSICAL_TABLE.',
  output_project_id          bigint comment'Output project ID.',
  output_project_name        string comment'Output project name.',
  output_biz_unit_id         bigint comment'Output business unit ID.',
  output_biz_unit_name       string comment'Output business unit name.',
  output_env                 string comment'Output environment. Valid values: DEV, PROD.',
  output_data_source_id      bigint comment'ID of the output data source or compute source.',
  output_data_source_name    string comment'Name of the output data source or compute source. Specifies the compute source for Dataphin physical tables and the data source for data source tables. Populated for meta tables but empty for mirror tables.',
  output_data_source_type    string comment'Type of the output data source or compute source.',
  input_table_name           string comment'Input table name. If the table has a schema (e.g., a Hologres table), the format is schema.table_name.',
  input_table_type           string comment'Input table type. Valid values: REAL_TIME_LOGIC_TABLE, REAL_TIME_MIRROR_TABLE, PHYSICAL_TABLE, DATA_SOURCE_PHYSICAL_TABLE.',
  input_project_id           bigint comment'Input project ID.',
  input_project_name         string comment'Input project name.',
  input_biz_unit_id          bigint comment'Input business unit ID.',
  input_biz_unit_name        string comment'Input business unit name.',
  input_env                  string comment'Input environment. Valid values: DEV, PROD.',
  input_data_source_id       bigint comment'ID of the input data source or compute source.',
  input_data_source_name     string comment'Name of the input data source or compute source. Specifies the compute source for Dataphin physical tables and the data source for data source tables. Populated for meta tables but empty for mirror tables.',
  input_data_source_type     string comment'Type of the input data source or compute source.',
  task_mode                  string comment'Task mode. Valid values: STREAM, BATCH.',
  node_id                    string comment'O&M node ID. Corresponds to node_id in the dim_dataphin_node table.',
  project_id                 bigint comment'Project ID.',
  project_name               string comment'Project name.',
  biz_unit_id                bigint comment'Business unit ID.',
  biz_unit_name              string comment'Business unit name.',
  owner_id                   string comment'Owner ID.',
  owner_name                 string comment'Owner name.'
)comment'Association table for real-time tasks and related tables.'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 30;

fct_dataphin_stream_task_submit_record_df

Note

Versions V5.3.0 and earlier do not support real-time integration tasks.

create table if not exists fct_dataphin_stream_task_submit_record_df(
    tenant_id           bigint comment 'The tenant ID.',
    submit_id           string comment 'The submission ID.',
    task_id             string comment 'The task ID.',
    task_name           string comment 'The task name.',
    task_type           string comment 'The task type. Valid values: `FLINK_SQL` for an SQL task and `FLINK_DATASTREAM` for a streaming task.',
    submit_version      string comment 'The submission version.',
    submit_content      string comment 'The submission content.',
    submit_comment      string comment 'The submission comment.',
    owner_id            string comment 'The owner ID.',
    owner_name          string comment 'The owner name.',
    modifier_id         string comment 'The modifier ID.',
    modifier_name       string comment 'The name of the last modifier.',
    is_published        string comment 'Indicates whether the task is published. Valid values: Y and N.',
    project_id          bigint comment 'The project ID.',
    project_name        string comment 'The project name.',
    biz_unit_id         bigint comment 'The business unit ID.',
    biz_unit_name       string comment 'The business unit name.',
    gmt_create          string comment 'The creation time.',
    gmt_modified        string comment 'The last modified time.'
)comment 'Contains all records of real-time task submissions in Dataphin. Each record is uniquely identified by the combination of `tenant_id` and `submit_id`.'
partitioned by (ds string comment 'The partition field, formatted as `yyyymmdd`.')
lifecycle 7;

Public

User

dim_dataphin_tenant_user

create table if not exists dim_dataphin_tenant_user(  
  tenant_id						bigint comment 'Tenant ID.',
  user_id             string comment 'Dataphin user ID.',
	account_name        string comment 'Account name.',
	account_type        string comment 'Account type.',
	display_name        string comment 'Display name.',
	nickname            string comment 'Nickname.',
	real_name           string comment 'Real name.',
	source_user_id      string comment 'User ID in the source system.',
	source_type         string comment 'Source system type.',
	gmt_create          string comment 'Creation time.',
	gmt_modified        string comment 'Modification time.',
	mail                string comment 'User''s email.',
  creator_id					string comment 'Creator''s Dataphin user ID.',
  creator_name				string comment 'Creator''s Dataphin account name.',
  modifier_id					string comment 'Last modifier''s Dataphin user ID.',
  modifier_name				string comment 'Last modifier''s Dataphin account name.',
  user_status         string comment 'User status. Valid values: `NORMAL` (default), `DELETE` (deleted), and `DEACTIVATE` (inactive).',
  system_role_list    string comment 'System roles (comma-separated). Valid values: `DATASOURCE_MANAGER` (data source administrator), `DATA_STANDARD_MANAGER` (data standard administrator), `EXPORT_ADMIN` (import/export administrator), `LABELS_BUSINESS_PLANNER` (label planner), `QUALITY_MANAGER` (quality administrator), `SECURITY_ADMIN` (security administrator), `SUPER_ADMIN` (super administrator), and `SYSTEM_ADMIN` (system administrator).',
  system_role_list_name string comment 'Display names of system roles (comma-separated).'
) comment 'Tenant user table.'
partitioned by (ds string comment 'Partition column. Format: yyyymmdd.') 
lifecycle 30;

dim_dataphin_project_user

create table if not exists dim_dataphin_project_user(  
    tenant_id		    bigint comment 'Tenant ID.',
    user_id             string comment 'Dataphin user ID.',
		account_name        string comment 'Account name. For a production account, this is the project name.',
		display_name        string comment 'Display name.',
		nickname            string comment 'Nickname.',
		real_name           string comment 'Real name.',
		project_id          bigint comment 'Project ID.',
    project_name        string comment 'Project name.',
    env                 string comment 'Environment. Valid values: DEV and PROD.',
    role_name           string comment 'Project role. Valid values: PROJECT_OWNER, PROJECT_ADMIN, PROJECT_DEVELOPER, PROJECT_GUEST, PROJECT_ANALYST, PROJECT_PRODUCT, PROJECT_VIEWER, PROJECT_OPS, and PROJECT_BUSINESS_MEMBER.',
    role_display_name   string comment 'Display name of the project role.',
		gmt_create          string comment 'Creation time.',
		gmt_modified        string comment 'Modification time.'
) comment 'Project user table.'
partitioned by (ds string comment 'Partition field in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_security_user_bind

Note

This currently only includes user and user group relationships.

create table if not exists dim_dataphin_security_user_bind(
  tenant_id                    bigint comment 'The ID of the tenant.',
  account_id                   string comment 'The ID of the account.',
  account_name                 string comment 'The name of the account.',
  account_type                 string comment 'The type of the account.',
  bind_type                    string comment 'The binding type, e.g., USER_GROUP for a user group.',
  bind_id                      string comment 'The ID of the bound object, whose type is specified by bind_type.',
  bind_env                     string comment 'The environment. Valid values: PROD, DEV.',
  bind_name                    string comment 'The name of the bound object.',
  bind_is_active               string comment 'Indicates whether the bound object is active. Valid values: Y, N.',
  bind_desc                    string comment 'The description of the bound object.',
  bind_role_name               string comment 'The name of the bound role.',
  creator_id                   string comment 'The ID of the creator.',
  creator_name                 string comment 'The name of the creator.',
  modifier_id                  string comment 'The ID of the last modifier.',
  modifier_name                string comment 'The name of the last modifier.',
  gmt_create                   string comment 'The creation time.',
  gmt_modified                 string comment 'The modification time.'
)comment 'Stores the binding relationships for Dataphin users.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 15;

Project

dim_dataphin_project

create table if not exists dim_dataphin_project(
	tenant_id 					bigint comment 'The tenant ID.',
	project_id 					bigint comment 'The project ID.',
	project_name 				string comment 'The project''s English name.',
	project_name_cn 		string comment 'The project''s Chinese name.',
  project_desc				string comment 'The project description.',
  biz_unit_id					bigint comment 'The business unit ID.',
  biz_unit_name				string comment 'The business unit name.',
  owner_id						string comment 'The project owner ID.',
  owner_name					string comment 'The project owner name.',
  project_mode				string comment 'The project mode. Valid values: BASIC for basic mode and DEV_PROD for development/production mode.',
	env									string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
  ns_tag							string comment 'The space type. Valid values: PUBLIC for the intermediate layer, APPLICATION for the application layer, BASE for the source layer, and GENERAL for the common layer.',
  gmt_create 					string comment 'The creation time.',
	gmt_modified 				string comment 'The modification time.'
	
) comment 'The project space table.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_project_compute_engine

create table if not exists dim_dataphin_project_compute_engine(
	tenant_id 										bigint comment 'The ID of the tenant.',
	project_id 										bigint comment 'The ID of the project.',
	project_name 									string comment 'The English name of the project.',
	compute_engine_id						 	bigint comment 'The ID of the compute engine.',
  compute_engine_name						string comment 'The name of the compute engine.',
  env														string comment 'The project environment. `DEV` represents the development environment; `PROD` represents the production environment.',
  compute_engine_support_type		string comment 'The supported type of the compute engine. `OFFLINE` indicates an offline engine, `REALTIME` a real-time engine, and `ANALYTICAL` an analytical engine.',
  compute_engine_desc						string comment 'The description of the compute engine.',
  db_type												string comment 'The type of the compute engine. Possible values include MAX_COMPUTE, HADOOP, FLINK, ANALYTICDB_FOR_PG, HOLOGRES, and VVP.',
  db_name												string comment 'The name of the underlying project or database for the compute engine. For example, for MaxCompute or Hive, this is the project name; for AnalyticDB for PostgreSQL, this is the database name.',
  shema_name                    string comment 'The schema name.',
  maxcompute_project_name       string comment 'The name of the MaxCompute project.',
  end_point                     string comment 'The access endpoint.',
  config_plaintext						  string comment 'The compute engine configuration in JSON format.',
  owner_id											string comment 'The ID of the compute engine owner.',
  owner_name										string comment 'The name of the compute engine owner.',
 	deploy_type										string comment 'The deployment mode of the data source. Possible values include public deployment, RDS deployment, and self-hosted on ECS.',
  access_type										string comment 'The access type of the data source. Possible values include public access and VPC access.',
  gmt_create 										string comment 'The creation timestamp of the binding.',
	gmt_modified 									string comment 'The last modification timestamp of the binding.'
) comment 'This table stores the binding relationship between projects and compute engines.'
partitioned by (ds string comment 'The partition column, in yyyymmdd format.') 
lifecycle 30;

Data sources

dim_dataphin_data_source

create table if not exists dim_dataphin_data_source(
	tenant_id 							bigint comment 'The ID of the tenant.',
	data_source_id 					bigint comment 'The ID of the data source.',
	data_source_name 				string comment 'The name of the data source.',
  data_source_desc				string comment 'The description of the data source.',
  data_source_type				string comment 'The type of the data source. Examples: MaxCompute, MySQL, SQL Server, Hive, PostgreSQL, E-MapReduce, Oracle, HDFS, DRDS, AnalyticDB for MySQL, MongoDB, Flink, AnalyticDB for PostgreSQL, Hologres, and HBase.',
  scope										string comment 'The scope of the data source. Valid values: OFFLINE (offline), STREAMING (real-time), ALL (offline and real-time), DATA_DISTILL (data extraction), and DIP_META_SYNC (metadata warehouse).',
  jdbc_url								string comment 'The connection URL. This is the endpoint for MaxCompute or the JDBC URL for sources like Hive, MySQL, Oracle, and PostgreSQL.',
  db_name									string comment 'The project or database name within the compute engine. This corresponds to the project name for MaxCompute and Hive, or the database name for MySQL and Oracle.',
  shema_name                           string comment 'The schema name.',
  customized_type					string comment 'A user-defined type.',
  env											string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
	owner_id								string comment 'The ID of the project owner.',
  owner_name							string comment 'The name of the project owner.',
  gmt_create 							string comment 'The timestamp when the record was created.',
	gmt_modified 						string comment 'The timestamp of the last modification.',
  data_source_catalog            string comment 'A unique identifier for the data source, prefixed with ds_.',
  tag_name_list       string comment 'A comma-separated list of data source tag names.'
) comment 'The data source table.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_data_source_config

create table if not exists dim_dataphin_data_source_config(
	tenant_id 						bigint comment 'ID of the tenant.',
	data_source_id 				bigint comment 'ID of the data source.',
	data_source_name 			string comment 'Name of the data source.',
  data_source_desc			string comment 'Description of the data source.',
  data_source_type			string comment 'Specifies the data source type. Examples: MAX_COMPUTE, MYSQL, SQL_SERVER, HIVE, ONE_SERVICE, POSTGRE_SQL, FTP, EMR_HIVE, ORACLE, HDFS, DRDS, ANALYTICDB, MONGODB, FLINK, ANALYTICDB_FOR_PG, HOLOGRES, and HBASE_1_1_X.',
  env										string comment 'Specifies the project environment. Valid values: DEV (development environment) and PROD (production environment).',
  key_name							string comment 'Specifies the scope of the data source. Valid values: OFFLINE (offline data source), STREAMING (streaming data source), ALL (both offline and streaming), DATA_DISTILL (data distillation source), and DIP_META_SYNC (metadata warehouse source).',
  value						    	string comment 'The connection value: an endpoint for MaxCompute or a JDBC URL for sources like HIVE, MYSQL, ORACLE, and POSTGRE_SQL.',
  gmt_create 						string comment 'The time the record was created.',
	gmt_modified 					string comment 'The time the record was last modified.'
) comment 'Data source configuration table.'
partitioned by (ds string comment 'Specifies the partition field, formatted as `yyyymmdd`.') 
lifecycle 30;

Data asset

Physical table

dim_dataphin_table

Note
  • If a physical table is the source table for a registered metric, the is_from_logical field is set to 'Y'.

  • The table access count is the number of times a table is accessed via SQL queries in Dataphin. To calculate this value, use the count function on the fct_dataphin_os_task_table_inputoutput_di table for a specified time range.

create table if not exists dim_dataphin_table(
	tenant_id 										bigint comment 'The ID of the tenant.',
 	project_id 										bigint comment 'The ID of the Dataphin project.',
	project_name 									string comment 'The name of the Dataphin project.', 
  table_name										string comment 'The name of the table.',
	table_comment								  string comment 'Description of the table.',
  table_type										string comment 'Object type. Valid values: TABLE and VIEW.',
	is_from_logical								string comment 'Specifies if the physical table is derived from a logical model. Valid values: Y and N.',
	is_partitioned								string comment 'Specifies if this is a partitioned table. Valid values: Y and N.',
	last_ddl_time									string comment 'Time of the last DDL modification.',
	last_dml_time									string comment 'Time of the last data modification.',
  view_text											string comment 'The SQL definition of the view.',
  physical_data_size						bigint comment 'The physical storage size of the table.',
  data_size											bigint comment 'The logical storage size of the table.',
  create_time										string comment 'The creation time of the table.',
  modify_time										string comment 'The last modification time.',
  record_num										bigint comment 'Number of rows.',
  life_cycle										bigint comment 'Table lifecycle in days. This field applies only to MaxCompute tables.',
	compute_engine_id						 	bigint comment 'The ID of the compute engine.',
  compute_engine_name						string comment 'The name of the compute engine.',
  env														string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
  compute_engine_use_type				string comment 'The usage type of the compute engine. Valid values: OFFLINE for an offline engine, REALTIME for a real-time engine, and ANALYTICAL for an acceleration engine.',
  compute_engine_desc						string comment 'Description of the compute engine.',
  db_type												string comment 'The type of the compute engine. Valid values include max_compute, hadoop, flink, analyticdb_for_pg, hologres, and vvp.',
  db_name												string comment 'Physical namespace of the compute engine. For MaxCompute and Hive, this refers to the project name; for AnalyticDB for PostgreSQL, it is the database name.',
  table_id											string comment 'The Dataphin ID for the table. This field is optional and used for system integration.',
  owner_id											string comment 'The ID of the table owner.',
  owner_name										string comment 'The name of the table owner.',
  is_external_table             string comment 'Specifies if this is an external table. Valid values: Y and N.',
  schema_name                   string comment 'The schema name within the compute engine, used for sources such as Hologres and AnalyticDB for PostgreSQL.',
  last_access_time            string comment 'The last access time.',
  tags_list                   string comment 'Comma-separated list of tags.',
  storage_format              string comment 'The storage format. For Hive tables, this specifies the file format; for other tables, it specifies the data source type.',
  favorites_count             bigint comment 'Number of favorites.',
  pv_count                    bigint comment 'The number of page views.',
  table_location              string comment 'The storage path of the table. This field is populated for Hive and other external tables.',
  last_dml_time_dataphin           string comment 'Last data modification time, tracked internally by Dataphin and supported by all engines.'
) comment 'Metadata for the physical table.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_table_partition

Note

If a physical table is the source table for a registered metric, the is_from_logical field is set to 'Y'.

create table if not exists dim_dataphin_table_partition(
	tenant_id 										bigint comment 'The ID of the tenant.',
 	project_id 										bigint comment 'The ID of the Dataphin project.',
	project_name 									string comment 'The name of the Dataphin project.', 
  table_name										string comment 'The name of the table.',
  partition_name								string comment 'The name of the partition.',
	is_from_logical								string comment 'Indicates whether the physical table is generated from a standardized modeling process. Valid values: Y and N.',
	last_ddl_time									string comment 'The last DDL modification time of the table.',
	last_dml_time									string comment 'The last data modification time of the table.',
  physical_data_size						bigint comment 'The physical storage size of the table.',
  data_size											bigint comment 'The logical storage size of the table.',
  create_time										string comment 'The creation time.',
  modify_time										string comment 'The last modification time.',
  record_num										bigint comment 'The record count.',
	compute_engine_id						 	bigint comment 'The ID of the compute engine.',
  compute_engine_name						string comment 'The name of the compute engine.',
  env														string comment 'The project environment. Valid values: DEV for the development environment and PROD for the production environment.',
  compute_engine_support_type		string comment 'The type of the compute engine. Valid values: OFFLINE for an offline engine, REALTIME for a real-time engine, and ANALYTICAL for an acceleration engine.',
  compute_engine_desc						string comment 'The description of the compute engine.',
  db_type												string comment 'The type of the compute engine, such as MaxCompute, Hadoop, Flink, AnalyticDB for PostgreSQL, Hologres, and Ververica Platform.',
  db_name												string comment 'The name of the physical space for the compute engine. For example, this is the project name for MaxCompute and Hive, and the database name for AnalyticDB for PostgreSQL.',
  table_id											string comment 'The optional ID of the table in Dataphin, used for integration with other systems.',
  owner_id											string comment 'The ID of the table owner.',
  owner_name										string comment 'The name of the table owner.',
  schema_name                   string comment 'The schema name of the compute engine, applicable to engines like Hologres and AnalyticDB for PostgreSQL.'
) comment 'Contains metadata for physical table partitions.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_column

Note

When a physical table is the source table for a registered mounted metric, the is_from_logical field is set to 'Y'.

create table if not exists dim_dataphin_column(
	tenant_id 										bigint comment 'The ID of the tenant.',
 	project_id 										bigint comment 'The ID of the Dataphin project.',
	project_name 									string comment 'The name of the Dataphin project.', 
  table_name										string comment 'The name of the table.',
  column_name										string comment 'The name of the column.',
  column_type										string comment 'The data type of the column.',
  column_comment								string comment 'The comment for the column or partition key.',				
  column_seq										bigint comment 'The sequence number of the column.',
  is_pk													string comment 'Specifies if the column is a primary key. Valid values: Y and N.',
	is_from_logical								string comment 'Specifies if the physical table is generated from a standardized modeling process. Valid values: Y and N.',
	is_partitioned								string comment 'Specifies if this is a partitioned table. Valid values: Y and N.',
	compute_engine_id						 	bigint comment 'The ID of the compute engine.',
  compute_engine_name						string comment 'The name of the compute engine.',
  env														string comment 'The project environment. Valid values: DEV (development environment) and PROD (production environment).',
  compute_engine_support_type		string comment 'The usage type of the compute engine. Valid values: OFFLINE (offline engine), REALTIME (real-time engine), and ANALYTICAL (acceleration engine).',
  compute_engine_desc						string comment 'The description of the compute engine.',
  db_type												string comment 'The type of the compute engine. Possible values: MaxCompute, Hadoop, Flink, AnalyticDB for PostgreSQL, Hologres, and Ververica Platform.',
  db_name												string comment 'The name of the physical space for the compute engine. For example, this refers to the project name for MaxCompute and Hive, or the database name for AnalyticDB for PostgreSQL.',
  table_id											string comment 'Optional. The ID of the Dataphin table, used for integration with other systems.',
  owner_id											string comment 'The ID of the table owner.',
  owner_name										string comment 'The name of the table owner.',
  classify                      string comment 'The data classification. Valid values: S (company data), B (business data), and C (personal data). Custom values are also allowed.',
  security_level                string comment 'The security level. Valid values: L1 (public), L2 (internal), L3 (confidential data), and L4 (top-secret data). Custom values are also allowed.'
) comment 'Physical table column metadata'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.') 
lifecycle 30;

Data service

dim_dataphin_service_logic_unit

create table if not exists dim_dataphin_service_logic_unit(
	tenant_id 										bigint comment 'Tenant ID',
 	logic_unit_id									bigint comment 'Logic unit ID',
 	logic_unit_no									string comment 'Logic unit identifier',
	logic_unit_name								string comment 'Logic unit name',
	logic_unit_type								string comment 'Logic unit type. Valid values: `SINGLE_PHYSICAL` for a single physical table, `MULTI_PHYSICAL` for multiple physical tables, and `DATAPHIN_LOGIC` for a Dataphin logical table.',
	logic_unit_desc								string comment 'Logic unit description',
  model_type										string comment 'Mode. Valid values: `basic` (basic mode) and `prodDev` (development/production mode).',
	status												string comment 'Status. Valid values: `1` (online) and `2` (offline).',
  delete_flag										string comment 'Delete flag. `0`: not deleted, `1`: deleted.',
  logic_unit_version						bigint comment 'Logic unit version',
	lock_version									bigint comment 'Lock version. Used for optimistic locking.',
	group_id											bigint comment 'Group ID',
	group_name										string comment 'Group name',
	record_num										bigint comment 'Maximum number of records to return.',
	source_tables									string comment 'Source tables. Use a semicolon (;) to separate multiple tables.',
	creator_id                    string comment 'Creator ID',
	creator_name                  string comment 'Creator name',
	create_time                   string comment 'Creation time',
	modifier_id                   string comment 'Last modifier ID',
  modifier_name                 string comment 'Last modifier name',
	modified_time									string comment 'Last modified time',
  proj_id                       bigint comment 'Data service project ID',
  proj_name                     string comment 'Data service project name'
) comment 'Data service logic unit table for the production environment'
partitioned by (ds string comment 'Partition key, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_service_logic_unit_column

create table if not exists dim_dataphin_service_logic_unit_column(
	tenant_id 										bigint comment 'Tenant ID.',
  data_source_id      					bigint comment 'Data source ID.',
  data_source_type    					string comment 'Data source type. Valid values: MYSQL, ADS, ES, HBASE, Oracle, SQL Server, PgSQL, MongoDB, Impala, and adb3.0.',
  data_source_name      				string comment 'Data source name.',
  data_source_url       				string comment 'Data source URL.',
  table_name           					string comment 'Physical table name.',
  original_column      					string comment 'Source column name.',
  original_column_type 					string comment 'Data type of the source column.',
  mapping_column       					string comment 'Mapped column name.',
  column_type          					string comment 'Data type of the logic unit column.',
  datetime_format          			string comment 'Date and time format.',
  delete_flag										string comment 'Logical delete flag. 0: not deleted, 1: deleted.',
	family_name                   string comment 'Column family name for HBase data sources.',
  rowkey_sort                   string comment 'Sort order for the HBase rowkey column.',
  rowkey_separator              string comment 'Separator for the HBase rowkey.',
  is_master_table               string comment 'Whether it is a master table. Valid values: Y, N.',
  is_relate_column              string comment 'Whether it is a related column. Valid values: Y, N.',
  logic_unit_id									bigint comment 'Logic unit ID.',
 	logic_unit_no									string comment 'String identifier of the logic unit.',
	logic_unit_name								string comment 'Logic unit name.',
	logic_unit_type								string comment 'Logic unit type. Valid values: SINGLE_PHYSICAL (single physical table), MULTI_PHYSICAL (multiple physical tables), and DATAPHIN_LOGIC (Dataphin logical table).',
	logic_unit_status							string comment 'Logic unit status. 1: online, 2: offline.',
	creator_id                    string comment 'Creator ID.',
	creator_name                  string comment 'Creator name.',
	create_time                   string comment 'Creation time.',
	modifier_id                   string comment 'Last modifier ID.',
        modifier_name                 string comment 'Last modifier name.',
	modified_time									string comment 'Last modification time.',
  proj_id                       bigint comment 'Data service project ID.',
  proj_name                     string comment 'Data service project name.'
) comment 'Column information for logic units in the production environment.'
partitioned by (ds string comment 'Partition column, in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_service_api

create table if not exists dim_dataphin_service_api (
    tenant_id          bigint comment 'The ID of the tenant.'
   ,api_id             bigint comment 'The ID of the API.'
   ,api_no             string comment 'The identifier of the API.'
   ,api_name           string comment 'The name of the API.'
   ,request_method     string comment 'The request method. Valid values: GET and LIST.'
   ,update_rate        string comment 'The update frequency. Valid values: DAY, HOUR, and MIN.'
   ,api_desc           string comment 'The description of the API.'
   ,return_type        string comment 'The return type. The value is JSON.'
   ,protocol           string comment 'The protocol. Valid values: HTTPS, HTTP, and HTTP&HTTPS.'
   ,sql_statement      string comment 'The SQL statement.'
   ,api_version        bigint comment 'The version of the API.'
   ,lock_version       bigint comment 'The version lock, used for optimistic locking.'
   ,document_url       string comment 'The URL of the API documentation.'
   ,create_type        string comment 'The API creation method. Valid values: SELFDEFINE and GUIDE.'
   ,call_time          bigint comment 'The total number of API calls.'
   ,api_gw_group_id    bigint comment 'The group ID returned by the API gateway.'
   ,api_gw_id          string comment 'The unique API identifier returned by the API gateway.'
   ,status             string comment 'The status of the API. Valid values: 1 (published) and 2 (unpublished).'
   ,delete_flag        string comment 'The logical delete flag. 0 = not deleted, 1 = deleted.'
   ,max_return_num     bigint comment 'The maximum number of records to return.'
   ,api_timeout        bigint comment 'The API timeout, in seconds.'
   ,logic_unit_no      string comment 'The ID of the service unit.'
   ,logic_unit_name    string comment 'The name of the service unit.'
   ,logic_unit_type    string comment 'The type of the service unit. Valid values: SINGLE_PHYSICAL (single physical table), MULTI_PHYSICAL (multiple physical tables), and DATAPHIN_LOGIC (Dataphin logical table).'
   ,logic_unit_desc    string comment 'The description of the service unit.'
   ,model_type         string comment 'The mode. Valid values: basic (basic mode) and prodDev (development and production mode).'
   ,logic_unit_version bigint comment 'The version of the service unit.'
   ,group_id           bigint comment 'The ID of the group to which the API belongs.'
   ,group_name         string comment 'The name of the group to which the API belongs.'
   ,source_tables      string comment 'The source tables. Multiple names are separated by a semicolon.'
   ,api_type           string comment 'The type of the API. Valid values: NORMAL, DIRECTAPI, LOGICTABLE, and COMPOSITEAPI.'
   ,data_source_id     bigint comment 'The ID of the data source for a DIRECTAPI.'
   ,data_source_type   string comment 'The type of the data source for a DIRECTAPI. Examples: impala and HoloLens.'
   ,data_source_name   string comment 'The name of the data source for a DIRECTAPI.'
   ,biz_unit_name      string comment 'The business unit name for a LOGICTABLE API.'
   ,table_name         string comment 'The logical table name for a LOGICTABLE API.'
   ,creator_id         string comment 'The ID of the user who created the API.'
   ,creator_name       string comment 'The name of the user who created the API.'
   ,modifier_id        string comment 'The ID of the user who last modified the API.'
   ,modifier_name      string comment 'The name of the user who last modified the API.'
   ,create_time        string comment 'The time the API was created.'
   ,modified_time      string comment 'The time the API was last modified.'
   ,proj_id            bigint comment 'The ID of the data service project.'
   ,proj_name          string comment 'The name of the data service project.'
   ,order_priority     string comment 'The sort priority. `sql_script`: Uses the sorting specified in the SQL script. `ORDER_BY_LIST`: Uses the sorting specified in the `order_by_list` field.'
   ,is_paged_query     bigint comment 'Indicates whether pagination is enabled for query results. 0 = enabled, 1 = disabled.'
   ,order_by_list      string comment 'The fields to use for sorting.'
   ,update_rate_detail string comment 'The custom update frequency. This field is used when `update_rate` is set to CUSTOM.'
   ,call_mode          string comment 'The API call mode. Valid values: `synchronous call` and `asynchronous call`.'
   ,execution_timeout  bigint comment 'The execution timeout for an asynchronous call, in seconds.'
)
comment 'The data service API table.'
partitioned by (
    ds string comment 'The partition field, in yyyymmdd format.'
)
lifecycle 30;

dim_dataphin_service_api_param

create table if not exists dim_dataphin_service_api_param
(
    id                   bigint comment 'The unique, auto-incrementing ID.',
    tenant_id            bigint comment 'The tenant ID.',
    api_id               bigint comment 'The ID of the associated API, referencing `api_id` in the `dim_dataphin_service_api` table.',
    api_no               string comment 'The API identifier.',
    api_name             string comment 'The name of the API.',
    param_name           string comment 'The name of the parameter.',
    param_type           string comment 'The data type of the parameter.',
    type                 bigint comment 'The parameter category. Valid values: 1 for a response parameter, 2 for a request parameter, and 3 for a common parameter (system-default input parameters, e.g., pagination parameters or sorting parameters).',
    sample               string comment 'An example value.',
    param_desc           string comment 'A description of the parameter.',
    must                 bigint comment 'Specifies if the parameter is required. Valid values: 1 for required, 0 for optional.',
    date_format          string comment 'The date format.',
    operator             string comment 'The operator.',
    mapping_column       string comment 'The source field, which is the name of a service unit field (not a database column). In API wizard mode, this field stores the alias.',
    api_request_method   string comment 'The request method (e.g., GET, LIST), referencing `request_method` in the `dim_dataphin_service_api` table.',
    api_desc             string comment 'A description of the API.',
    api_create_type      string comment 'The creation type (e.g., SELFDEFINE, GUIDE, DIRECTAPI), referencing `create_type` in the `dim_dataphin_service_api` table.',
    api_status           string comment 'The API status (1: online, 2: offline), referencing `status` in the `dim_dataphin_service_api` table.',
    api_delete_flag      string comment 'The logical delete flag for the API (0: not deleted, 1: deleted), referencing `delete_flag` in the `dim_dataphin_service_api` table.',
    create_person_id     string comment 'The creator ID.',
    create_person        string comment 'The name of the creator.',
    create_time          string comment 'The creation time.',
    modifier_id          string comment 'The ID of the last modifier.',
    modifier_name        string comment 'The name of the last modifier.',
    modified_time        string comment 'The last modification time.',
    delete_flag          string comment 'The logical delete flag for the parameter (0: not deleted, 1: deleted).',
    proj_id              bigint comment 'The ID of the data service project.',
    proj_name            string comment 'The name of the data service project.',
    update_rate        string comment 'The update frequency. Valid values: DAY, HOUR, MIN, CUSTOM, and REALTIME.',
    update_rate_detail string comment 'Specifies the custom update frequency, used only when `update_rate` is `CUSTOM`.',
    call_mode          string comment 'The API invocation mode. Valid values: synchronous invocation, asynchronous invocation.',
    execution_timeout  bigint comment 'The execution timeout for an asynchronous invocation, in seconds.'
)comment 'Stores parameters for data service APIs.'
partitioned by (ds string comment 'The business date, in yyyymmdd format.')
lifecycle 30;

dim_dataphin_service_app

create table if not exists dim_dataphin_service_app(
    tenant_id     bigint comment 'The tenant ID.',
    app_id    bigint comment 'The application ID.',
    app_name    string comment 'The application name.',
    app_key    bigint comment 'The app key.',
    app_gw_id    string comment 'The unique API identifier returned by the API Gateway.',
    env    string comment 'Specifies the environment. Valid values: `1` for the staging environment and `2` for the production environment.',
    group_id    bigint comment 'The group ID.',
    group_name    string comment 'The group name.',
    app_secret            string comment 'The app secret. The management service generates this value if an API Gateway is not used.',
    enable_white_list     string comment 'Specifies if the whitelist is enabled. Valid values: `1` (disabled) and `2` (enabled).',
    white_list            string comment 'The whitelisted IP addresses.',
    delete_flag    string comment 'The logical delete flag. A value of `0` indicates the record is active, while `1` indicates it is deleted.',
    creator_id                    string comment 'The ID of the creator.',
    creator_name                  string comment 'The name of the creator.',
    modifier_id                   string comment 'The ID of the last modifier.',
    modifier_name                 string comment 'The name of the last modifier.',
    create_time                   string comment 'The creation time.',
    modified_time    string comment 'The last modification time.',
    proj_id                       bigint comment 'The ID of the data service project.',
    proj_name                     string comment 'The name of the data service project.'
) comment 'The data service application table.'
partitioned by (ds string comment 'The partition field, in `yyyymmdd` format.') 
lifecycle 30;

dim_dataphin_service_app_api

create table if not exists dim_dataphin_service_app_api(
	tenant_id 										bigint comment 'The tenant ID.',
  api_id												bigint comment 'The API ID.',
	api_no												string comment 'The API identifier.',
	api_name											string comment 'The API name.',
  api_version										bigint comment 'The API version.',
  app_id												bigint comment 'The app ID.',
	app_name											string comment 'The app name.',
  app_key												bigint comment 'The AppKey.',
	app_gw_id											string comment 'The unique API identifier returned by the API gateway.',
  delete_flag										string comment 'The logical delete flag. 0 means not deleted, and 1 means deleted.',
	creator_id                    string comment 'The creator ID.',
	creator_name                  string comment 'The creator.',
	modifier_id                   string comment 'The last modifier ID.',
  modifier_name                 string comment 'The last modifier name.',
	create_time                   string comment 'The creation time.',
	modified_time									string comment 'The last modified time.',
  proj_id                       bigint comment 'The data service project ID.',
  proj_name                     string comment 'The data service project name.'
) comment 'This table stores the relationship between data service apps and APIs.'
partitioned by (ds string comment 'Partition field in yyyymmdd format.') 
lifecycle 30;

DWS Dataphin service API MI

create table if not exists dws_dataphin_service_api_mi(
  	tenant_id 										bigint comment 'Tenant ID.',
		api_no												string comment 'API identifier.',
		api_name											string comment 'API name.',
		minute												string comment 'Statistics minute (yyyy_MMdd_HHmm).',
  	app_key												string comment 'Application key.',
  	client_ip											string comment 'Client IP address.',
		total_count          					bigint comment 'Total calls.',
		offline_count        					bigint comment 'Offline calls.',
		client_fail_count    					bigint comment 'Client-side failures.',
		total_time_cost      					bigint comment 'Total time cost for all calls (including failures).',
		total_succ_time_cost 					bigint comment 'Total time cost for successful calls.',
    api_ex_count                  bigint comment 'API exceptions.'
)comment 'Minute-level statistics for data service API calls.'
partitioned by (ds string comment 'Partition field (yyyymmdd).') 
lifecycle 90;

dim_dataphin_service_api_direct

create table if not exists dim_dataphin_service_api_direct(
	tenant_id 										bigint comment 'The tenant ID.',
  api_id												bigint comment 'The API ID.',
	api_no												string comment 'The API identifier.',
	api_name											string comment 'The API name.',
  model_type										string comment 'The mode. Valid values: basic: basic mode, prodDev: development and production mode.',
	data_source_id      					bigint comment 'The data source ID.',
  data_source_type    					string comment 'The data source type. Example values: Impala and Hologres.',
  data_source_name      				string comment 'The data source name.',
  delete_flag										string comment 'The logical delete flag. 0: not deleted, 1: deleted.',
	creator_id                    string comment 'The creator ID.',
	creator_name                  string comment 'The creator name.',
	create_time                   string comment 'The time when the record was created.',
	modifier_id                   string comment 'The ID of the user who last modified the record.',
  modifier_name                 string comment 'The name of the user who last modified the record.',
	modified_time									string comment 'The time when the record was last modified.',
  proj_id                       bigint comment 'The data service project ID.',
  proj_name                     string comment 'The data service project name.'
) comment 'Stores information about direct-connection data source APIs.'
partitioned by (ds string comment 'The partition column. The format is yyyymmdd.') 
lifecycle 30;

Data lineage

Note
  • The dim_dataphin_lineage_node_table and dim_dataphin_lineage_node_column tables contain both structured and unstructured data. The enumeration values for the input_table_type and output_table_type fields are as follows:

    • Structured data: TABLE (table), VIEW (view), DIM_LOGIC_TABLE (dimension logic table), FACT_LOGIC_TABLE (fact logic table), SUM_LOGIC_TABLE (summary logic table), REAL_TIME_TABLE (real-time table), LOGIC_VIEW (logical view), DATASOURCE_TABLE (data source table), DATASOURCE_VIEW (data source view), and DATASOURCE_MATERIALIZED_VIEW (data source materialized view).

    • Unstructured data: DATASOURCE (unstructured data source) and DP_DATASET (unstructured dataset).

  • The dim_dataphin_lineage_node table does not contain unstructured data.

dim_dataphin_lineage_node_column

create table if not exists dim_dataphin_lineage_node_column (
    tenant_id                       bigint comment 'The ID of the tenant.'
   ,node_id                         string comment 'The ID of the node.'
   ,node_name                       string comment 'The name of the node.'
   ,node_env                        string comment 'The environment of the node. Valid values: PROD and DEV.'
   ,node_operator_type              string comment 'The type of the task. Possible values include DATAX, DLINK, ONE_SERVICE_SQL, VIRTUAL, PYTHON, and SHELL.'
   ,lineage_type                    string comment 'The type of data lineage. Valid values: SELECT and OTHER.'
   ,input_column_id                 string comment 'This field is deprecated.'
   ,input_column_name               string comment 'The name of the input column.'
   ,output_column_id                string comment 'This field is deprecated.'
   ,output_column_name              string comment 'The name of the output column.'
   ,input_table_id                  string comment 'This field is deprecated.'
   ,input_table_type                string comment 'The type of the input table. Possible values include TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
   ,input_table_name                string comment 'The name of the input table.'
   ,input_data_domain_id            bigint comment 'The ID of the data domain for the input table.'
   ,input_data_domain_name          string comment 'The name of the data domain for the input table.'
   ,input_data_domain_abbreviation  string comment 'The abbreviation of the data domain for the input table.'
   ,input_data_domain_name_cn       string comment 'The Chinese name of the data domain for the input table.'
   ,output_table_id                 string comment 'This field is deprecated.'
   ,output_table_type               string comment 'The type of the output table. Possible values include TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
   ,output_table_name               string comment 'The name of the output table.'
   ,output_data_domain_id           bigint comment 'The ID of the data domain for the output table.'
   ,output_data_domain_name         string comment 'The name of the data domain for the output table.'
   ,output_data_domain_abbreviation string comment 'The abbreviation of the data domain for the output table.'
   ,output_data_domain_name_cn      string comment 'The Chinese name of the data domain for the output table.'
   ,input_project_id                bigint comment 'The ID of the input project.'
   ,input_project_name              string comment 'The English name of the input project.'
   ,input_project_cn_name           string comment 'The Chinese name of the input project.'
   ,input_project_ns_tag            string comment 'The type of the input project. Valid values: BASE, PUBLIC, and APPLICATION.'
   ,output_project_id               bigint comment 'The ID of the output project.'
   ,output_project_name             string comment 'The English name of the output project.'
   ,output_project_cn_name          string comment 'The Chinese name of the output project.'
   ,output_project_ns_tag           string comment 'The type of the output project. Valid values: BASE, PUBLIC, and APPLICATION.'
   ,input_biz_unit_id               bigint comment 'The ID of the input business unit.'
   ,input_biz_unit_name             string comment 'The name of the input business unit.'
   ,output_biz_unit_id              bigint comment 'The ID of the output business unit.'
   ,output_biz_unit_name            string comment 'The name of the output business unit.'
   ,input_env                       string comment 'The input environment. Valid values: PROD and DEV.'
   ,output_env                      string comment 'The output environment. Valid values: PROD and DEV.'
   ,gmt_create                      string comment 'The creation time.'
   ,gmt_modified                    string comment 'The last modification time.'
   ,input_data_source_id            bigint comment 'The ID of the input data source.'
   ,input_data_source_name          string comment 'The name of the input data source.'
   ,input_data_source_type          string comment 'The type of the input data source. For valid values, see the data_source_type field in the dim_dataphin_data_source table.'
   ,input_db_name                   string comment 'The name of the input database.' 
   ,input_schema_name               string comment 'The name of the input schema. For data sources with a schema (such as PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those without a schema (such as MySQL, MaxCompute, DM, CK, and SR), it contains the database or project name.'  
   ,output_data_source_id           bigint comment 'The ID of the output data source.'
   ,output_data_source_name         string comment 'The name of the output data source.'
   ,output_data_source_type         string comment 'The type of the output data source. For valid values, see the data_source_type field in the dim_dataphin_data_source table.'
   ,output_db_name                  string comment 'The name of the output database.' 
   ,output_schema_name              string comment 'The name of the output schema. For data sources with a schema (such as PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those without a schema (such as MySQL, MaxCompute, DM, CK, and SR), it contains the database or project name.' 
   ,input_is_structured_data        string comment 'Specifies whether the input is structured data. Valid values: Y and N.'
   ,output_is_structured_data       string comment 'Specifies whether the output is structured data. Valid values: Y and N.'
   ,input_dataset_version           string comment 'The version of the input dataset.'
   ,output_dataset_version          string comment 'The version of the output dataset.'
)
comment 'Stores column-level data lineage, broken down by node, input table, input column, output table, and output column.'
partitioned by (
    ds string comment 'The partition column, in yyyymmdd format.'
)
lifecycle 30;

dim_dataphin_lineage_node_table

create table if not exists dim_dataphin_lineage_node_column (
    tenant_id                       bigint comment 'The tenant ID.'
   ,node_id                         string comment 'The node ID.'
   ,node_name                       string comment 'The node name.'
   ,node_env                        string comment 'The node''s environment. Valid values: PROD and DEV.'
   ,node_operator_type              string comment 'The task''s operator type. Valid values: DATAX, DLINK, ONE_SERVICE_SQL, VIRTUAL, PYTHON, and SHELL.'
   ,lineage_type                    string comment 'The data lineage type. Valid values: SELECT and OTHER.'
   ,input_column_id                 string comment 'Deprecated field.'
   ,input_column_name               string comment 'The input column name.'
   ,output_column_id                string comment 'Deprecated field.'
   ,output_column_name              string comment 'The output column name.'
   ,input_table_id                  string comment 'Deprecated field.'
   ,input_table_type                string comment 'The input table type. Valid values: TABLE (table), VIEW (view), DIM_LOGIC_TABLE (dimension table), FACT_LOGIC_TABLE (fact table), SUM_LOGIC_TABLE (summary table), REAL_TIME_TABLE (real-time table), LOGIC_VIEW (logical view), DATASOURCE_TABLE (data source table), DATASOURCE_VIEW (data source view), and DATASOURCE_MATERIALIZED_VIEW (data source materialized view).'
   ,input_table_name                string comment 'The input table name.'
   ,input_data_domain_id            bigint comment 'The data domain ID for the input table.'
   ,input_data_domain_name          string comment 'The data domain name for the input table.'
   ,input_data_domain_abbreviation  string comment 'The data domain abbreviation for the input table.'
   ,input_data_domain_name_cn       string comment 'The Chinese name of the input table''s data domain.'
   ,output_table_id                 string comment 'Deprecated field.'
   ,output_table_type               string comment 'The output table type. Valid values: TABLE (table), VIEW (view), DIM_LOGIC_TABLE (dimension table), FACT_LOGIC_TABLE (fact table), SUM_LOGIC_TABLE (summary table), REAL_TIME_TABLE (real-time table), LOGIC_VIEW (logical view), DATASOURCE_TABLE (data source table), DATASOURCE_VIEW (data source view), and DATASOURCE_MATERIALIZED_VIEW (data source materialized view).'
   ,output_table_name               string comment 'The output table name.'
   ,output_data_domain_id           bigint comment 'The data domain ID for the output table.'
   ,output_data_domain_name         string comment 'The data domain name for the output table.'
   ,output_data_domain_abbreviation string comment 'The data domain abbreviation for the output table.'
   ,output_data_domain_name_cn      string comment 'The Chinese name of the output table''s data domain.'
   ,input_project_id                bigint comment 'The input project ID.'
   ,input_project_name              string comment 'The input project''s English name.'
   ,input_project_cn_name           string comment 'The input project''s Chinese name.'
   ,input_project_ns_tag            string comment 'The input project type. Valid values: BASE, PUBLIC, and APPLICATION.'
   ,output_project_id               bigint comment 'The output project ID.'
   ,output_project_name             string comment 'The output project''s English name.'
   ,output_project_cn_name          string comment 'The output project''s Chinese name.'
   ,output_project_ns_tag           string comment 'The output project type. Valid values: BASE, PUBLIC, and APPLICATION.'
   ,input_biz_unit_id               bigint comment 'The input business unit ID.'
   ,input_biz_unit_name             string comment 'The input business unit name.'
   ,output_biz_unit_id              bigint comment 'The output business unit ID.'
   ,output_biz_unit_name            string comment 'The output business unit name.'
   ,input_env                       string comment 'The input environment. Valid values: PROD and DEV.'
   ,output_env                      string comment 'The output environment. Valid values: PROD and DEV.'
   ,gmt_create                      string comment 'The creation time.'
   ,gmt_modified                    string comment 'The modification time.'
   ,input_data_source_id            bigint comment 'The input data source ID.'
   ,input_data_source_name          string comment 'The input data source name.'
   ,input_data_source_type          string comment 'The input data source type. For valid values, see the `data_source_type` column in the `dim_dataphin_data_source` table.'
   ,input_db_name                   string comment 'The input database name.' 
   ,input_schema_name               string comment 'The input schema name. For data sources that use a schema (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those that do not (e.g., MySQL, MaxCompute, DM, CK, and SR), it contains the database name (`db_name`) or project name (`project_name`).'  
   ,output_data_source_id           bigint comment 'The output data source ID.'
   ,output_data_source_name         string comment 'The output data source name.'
   ,output_data_source_type         string comment 'The output data source type. For valid values, see the `data_source_type` column in the `dim_dataphin_data_source` table.'
   ,output_db_name                  string comment 'The output database name.' 
   ,output_schema_name              string comment 'The output schema name. For data sources that use a schema (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For those that do not (e.g., MySQL, MaxCompute, DM, CK, and SR), it contains the database name (`db_name`) or project name (`project_name`).' 
   ,input_is_structured_data        string comment 'Indicates whether the input is structured data. Valid values: Y and N.'
   ,output_is_structured_data       string comment 'Indicates whether the output is structured data. Valid values: Y and N.'
   ,input_dataset_version           string comment 'The input dataset version.'
   ,output_dataset_version          string comment 'The output dataset version.'
)
comment 'Stores data lineage linking input and output columns for each node and table.'
partitioned by (
    ds string comment 'The partition column, in yyyymmdd format.'
)
lifecycle 30;

dim_dataphin_lineage_table

create table if not exists dim_dataphin_lineage_table (
    tenant_id                       bigint comment 'The ID of the tenant.'
   ,input_table_id                  string comment 'Deprecated.'
   ,input_table_type                string comment 'The type of the input table. Valid values: TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_LOGIC_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
   ,input_table_name                string comment 'The name of the input table.'
   ,input_data_domain_id            bigint comment 'The ID of the data domain for the input table.'
   ,input_data_domain_name          string comment 'The name of the data domain for the input table.'
   ,input_data_domain_abbreviation  string comment 'The abbreviation of the input data domain name.'
   ,input_data_domain_name_cn       string comment 'The Chinese name of the input data domain.'
   ,output_table_id                 string comment 'Deprecated.'
   ,output_table_type               string comment 'The type of the output table. Valid values: TABLE, VIEW, DIM_LOGIC_TABLE, FACT_LOGIC_TABLE, SUM_LOGIC_TABLE, REAL_TIME_LOGIC_TABLE, LOGIC_VIEW, DATASOURCE_TABLE, DATASOURCE_VIEW, and DATASOURCE_MATERIALIZED_VIEW.'
   ,output_table_name               string comment 'The name of the output table.'
   ,output_data_domain_id           bigint comment 'The ID of the data domain for the output table.'
   ,output_data_domain_name         string comment 'The name of the data domain for the output table.'
   ,output_data_domain_abbreviation string comment 'The abbreviation of the output data domain name.'
   ,output_data_domain_name_cn      string comment 'The Chinese name of the output data domain.'
   ,input_project_id                bigint comment 'The ID of the input project.'
   ,input_project_name              string comment 'The English name of the input project.'
   ,input_project_cn_name           string comment 'The Chinese name of the input project.'
   ,input_project_ns_tag            string comment 'The type of the input project. Valid values: BASE, PUBLIC, and APPLICATION.'
   ,output_project_id               bigint comment 'The ID of the output project.'
   ,output_project_name             string comment 'The English name of the output project.'
   ,output_project_cn_name          string comment 'The Chinese name of the output project.'
   ,output_project_ns_tag           string comment 'The type of the output project. Valid values: BASE, PUBLIC, and APPLICATION.'
   ,input_biz_unit_id               bigint comment 'The ID of the input business unit.'
   ,input_biz_unit_name             string comment 'The name of the input business unit.'
   ,output_biz_unit_id              bigint comment 'The ID of the output business unit.'
   ,output_biz_unit_name            string comment 'The name of the output business unit.'
   ,input_env                       string comment 'The input environment. Valid values: PROD and DEV.'
   ,output_env                      string comment 'The output environment. Valid values: PROD and DEV.'
   ,gmt_create                      string comment 'The time the record was created.'
   ,gmt_modified                    string comment 'The time the record was last modified.'
   ,input_data_source_id            bigint comment 'The ID of the input data source.'
   ,input_data_source_name          string comment 'The name of the input data source.'
   ,input_data_source_type          string comment 'The type of the input data source. For a list of valid values, see the data_source_type column in the dim_dataphin_data_source table.'
   ,input_db_name                   string comment 'The name of the input database.' 
   ,input_schema_name               string comment 'The name of the input schema. For schema-based data sources (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For other data sources (e.g., MySQL, MaxCompute, DM, ClickHouse, SR), it contains the database or project name.'  
   ,output_data_source_id           bigint comment 'The ID of the output data source.'
   ,output_data_source_name         string comment 'The name of the output data source.'
   ,output_data_source_type         string comment 'The type of the output data source. For a list of valid values, see the data_source_type column in the dim_dataphin_data_source table.'
   ,output_db_name                  string comment 'The name of the output database.' 
   ,output_schema_name              string comment 'The name of the output schema. For schema-based data sources (e.g., PostgreSQL, Oracle, SQL Server, and Hologres), this field contains the schema name. For other data sources (e.g., MySQL, MaxCompute, DM, ClickHouse, SR), it contains the database or project name.' 
)
comment 'Defines the data lineage between input and output tables.'
partitioned by (
    ds string comment 'The partition key, formatted as yyyymmdd.'
)
lifecycle 30;

Security

dim_dataphin_security_classify

create table if not exists dim_dataphin_security_classify(
  	tenant_id 												bigint comment 'Tenant ID.',
  	classify_id												bigint comment 'Classification ID.',
  	classify_name											string comment 'Classification name.',
  	classify_short_name								string comment 'Classification short name.',
  	classify_desc											string comment 'Classification description.',
 		creator_id                    		string comment 'Creator ID.',
		creator_name                  		string comment 'Creator name.',
		modifier_id                   		string comment 'Last modifier ID.',
  	modifier_name                 		string comment 'Last modifier name.',
		gmt_create												string comment 'Creation time.',
		gmt_modified											string comment 'Modification time.',
    status				        						string comment 'Status. Valid values: ENABLE, DISABLE.',
    level_id													bigint comment 'Sensitivity level ID.',
  	level_name												string comment 'Sensitivity level name.',
  	level_index												bigint comment 'Sensitivity level index.',
    parent_path                 			string comment 'Parent classification path.'
)comment 'Security data classification table.'
partitioned by (ds string comment 'Partition field. Format: yyyymmdd.') 
lifecycle 30;

dim_dataphin_security_level

create table if not exists dim_dataphin_security_level(
  	tenant_id 												bigint comment 'tenant id',
  	level_id													bigint comment 'level id',
  	level_name												string comment 'level name',
  	level_short_name									string comment 'level short name',
  	level_desc												string comment 'level description',
  	level_index												bigint comment 'level index',
 		creator_id                    		string comment 'creator id',
		creator_name                  		string comment 'creator name',
		modifier_id                   		string comment 'last modifier id',
  	modifier_name                 		string comment 'last modifier name',
		gmt_create												string comment 'creation time',
		gmt_modified											string comment 'modification time'
)comment 'Stores data security levels'
partitioned by (ds string comment 'Partition key in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_security_identify_rule

create table if not exists dim_dataphin_security_identify_rule(
  	tenant_id 												bigint comment 'The tenant ID.',
 		identify_rule_id             			bigint comment 'The ID of the identification rule.',
    identify_rule_name                string comment 'The name of the identification rule.',
    biz_unit_scope_type          			string comment 'Deprecated in v3.11. The scope of the business unit. Valid values: ALL, ENUMS, and EXPRESSION.',
    biz_unit_ids                 			string comment 'Deprecated in v3.11. The IDs of the enumerated business units.',
    custom_biz_unit_expression   			string comment 'Deprecated in v3.11. The regular expression for matching business unit names.',
    project_scope_type           			string comment 'Deprecated in v3.11. The scope of the project. Valid values: ALL, ENUMS, and EXPRESSION.',
    project_ids                  			string comment 'Deprecated in v3.11. The IDs of the enumerated projects.',
    custom_project_expression    			string comment 'Deprecated in v3.11. The regular expression for matching project names.',
    table_scope_type             			string comment 'Deprecated in v3.11. The scope of the table. Valid values: ALL, ENUMS, and EXPRESSION.',
    custom_table_expression      			string comment 'Deprecated in v3.11. The regular expression for matching table names.',
    enable_by_content            			string comment 'Deprecated in v3.11. Whether content identification is enabled.',
    content_method_name          			string comment 'Deprecated in v3.11. The name of the content identification method.',
    custom_content_expression    			string comment 'Deprecated in v3.11. The custom regular expression for content identification.',
    enable_by_field_name         			string comment 'Deprecated in v3.11. Whether field name identification is enabled.',
    custom_field_name_expression 			string comment 'Deprecated in v3.11. The custom regular expression for field name identification.',
  	priority_level										bigint comment 'The priority.',
  	status								    				string comment 'The status. Valid values: ENABLE and DISABLE.',
  	level_id													bigint comment 'Deprecated in v3.11. The ID of the data level.',
  	level_name												string comment 'Deprecated in v3.11. The name of the data level.',
  	level_index												bigint comment 'Deprecated in v3.11. The index of the data level.',
 		classify_id												bigint comment 'The ID of the data classification.',
  	classify_name											string comment 'The name of the data classification.',
 		owner_id                    			string comment 'The ID of the owner.',
		owner_name                  			string comment 'The name of the owner.',
		creator_id                    		string comment 'The ID of the creator.',
		creator_name                  		string comment 'The name of the creator.',
		modifier_id                   		string comment 'The ID of the modifier.',
  	modifier_name                 		string comment 'The name of the modifier.',
		gmt_create							    			string comment 'The creation time.',
		gmt_modified						    			string comment 'The modification time.',
    classify_ids											string comment 'A comma-separated list of data classification IDs.'
)comment 'This table stores security identification rules.'
partitioned by (ds string comment 'The partition field in yyyymmdd format.') 
lifecycle 30;

dim_dataphin_security_label

create table if not exists dim_dataphin_security_label (
    tenant_id           bigint comment 'The tenant ID.'
   ,label_id            string comment 'The security label ID.'
   ,identify_rule_id    string comment 'The identification rule ID.'
   ,identify_rule_name  string comment 'The identification rule name.'
   ,identify_rate       double comment 'The confidence rate of the identification.'
   ,identify_is_custom  string comment 'Indicates whether the identification rule is custom. Valid values: `TRUE`, `FALSE`.'
   ,biz_unit_id         bigint comment 'The business unit ID.'
   ,biz_unit_name       string comment 'The business unit name.'
   ,project_id          bigint comment 'The project ID.'
   ,project_name        string comment 'The project name.'
   ,table_id            string comment 'The table ID, which corresponds to the `table_id` in the `dim_dataphin_table` table.'
   ,table_catalog       string comment 'The table catalog, such as the project or business unit name.'
   ,table_name          string comment 'The table name.'
   ,table_desc          string comment 'The table description.'
   ,table_partitioned   string comment 'Indicates whether the table is partitioned. Valid values: `TRUE`, `FALSE`.'
   ,table_type          string comment 'The table type. Valid values: `LOGIC_TABLE` (logical table), `LOGIC_DIM_TABLE` (dimension logical table), `LOGIC_FACT_TABLE` (fact logical table), `LOGIC_SUM_TABLE` (summary logical table), `PHYSICAL_TABLE` (physical table), and `REALTIME_LOGICAL_TABLE` (real-time logical table).'
   ,table_env           string comment 'The table environment. Valid values: `PROD`, `DEV`.'
   ,field_id            string comment 'The field ID, which corresponds to the `column_id` in the `dim_dataphin_table` table.'
   ,field_name          string comment 'The field name.'
   ,field_desc          string comment 'The field description.'
   ,biz_date            string comment 'The business date.'
   ,status              string comment 'The status. Valid values: `ENABLE` (enabled), `DISABLE` (disabled).'
   ,level_id            bigint comment 'The security level ID.'
   ,level_name          string comment 'The security level name.'
   ,level_index         bigint comment 'The index of the security level.'
   ,classify_id         bigint comment 'The data category ID.'
   ,classify_name       string comment 'The data category name.'
   ,creator_id          string comment 'The creator ID.'
   ,creator_name        string comment 'The creator name.'
   ,modifier_id         string comment 'The ID of the last modifier.'
   ,modifier_name       string comment 'The name of the last modifier.'
   ,gmt_create          string comment 'The creation time.'
   ,gmt_modified        string comment 'The last modification time.'
   ,classfy_parent_path string comment 'The full path of the data category.'
   ,classfy_level1_path string comment 'The name of the level-1 category.'
   ,record_from         string comment 'The source of the security label. Valid values: `CUSTOM` (manually specified), `AUTO_SCAN` (automatically identified), `AUTO_INHERIT` (automatically inherited).'
   ,field_from          string comment 'A comma-separated list of source field names. This field is populated only when `record_from` is `AUTO_INHERIT`.'
   ,data_source_id      bigint comment 'The data source ID.'
   ,datasource_name     string comment 'The data source name.'
   ,data_source_type    string comment 'The data source type. For valid values, see the `data_source_type` field in the `dim_dataphin_data_source` table.'
)
comment 'Stores the security labels and categories applied to tables and fields by identification scans.'
partitioned by (
    ds string comment 'The partition field, in `yyyymmdd` format.'
)
lifecycle 30;

dim_dataphin_security_desensitize_rule

create table if not exists dim_dataphin_security_desensitize_rule(
  	tenant_id 												bigint comment 'Tenant ID',
  	desensitize_rule_id								bigint comment 'Desensitization rule ID',
  	desensitize_rule_name							string comment 'Desensitization rule name',
 		identify_rule_id             			bigint comment '[Deprecated in v3.11] Identification rule ID',
    identify_rule_name                string comment '[Deprecated in v3.11] Identification rule name',
 		enable_by_temp_query       				string comment 'Enables desensitization for ad-hoc queries.',
    enable_by_read_write       				string comment 'Enables desensitization when writing to development tables.',
    algorithm_type             				string comment 'Algorithm type',
    algorithm_code             				string comment 'Algorithm code',
    rule_method_text_param     				string comment 'Text parameters for the algorithm.',
    rule_method_selected_param 				string comment 'Secret key parameters for the algorithm.',
  	status														string comment 'Status. Valid values: ENABLE, DISABLE.',
 		owner_id                    			string comment 'Owner ID',
		owner_name                  			string comment 'Owner name',
		creator_id                    		string comment 'Creator ID',
		creator_name                  		string comment 'Creator name',
		modifier_id                   		string comment 'Modifier ID',
  	modifier_name                 		string comment 'Modifier name',
		gmt_create												string comment 'Creation time',
		gmt_modified											string comment 'Modification time',
  	classify_id					        			bigint comment 'Classification ID',
  	classify_name				        			string comment 'Classification name'
)comment 'Security desensitization rules.'
partitioned by (ds string comment 'Partition field, yyyymmdd format.') 
lifecycle 30;

dim_dataphin_security_desensitize_rule_allowlist

create table if not exists dim_dataphin_security_desensitize_rule_allowlist(
  	tenant_id 												bigint comment 'The ID of the tenant.',
  	desensitize_allowlist_rule_id		    bigint comment 'The ID of the data desensitization allowlist rule.',
 		identify_rule_id             			bigint comment 'Deprecated in v3.11. The ID of the identification rule.',
    identify_rule_name                string comment 'Deprecated in v3.11. The name of the identification rule.',
 		filter_account                 		string comment 'Accounts exempt from data desensitization.', 
    filter_by_temp_query           		string comment 'Whether the rule applies to ad-hoc queries.', 
    filter_by_read_write           		string comment 'Whether the rule applies to data development (read/write operations).', 
  	biz_unit_scope_type          			string comment 'The scope type for business units. Valid values: ALL, ENUMS, and EXPRESSION.',
    biz_unit_ids                 			string comment 'Deprecated in v3.11. The IDs of the enumerated business units.',
    custom_biz_unit_expression   			string comment 'The regular expression to match business unit names.',
    project_scope_type           			string comment 'The scope type for projects. Valid values: ALL, ENUMS, and EXPRESSION.',
    project_ids                  			string comment 'Deprecated in v3.11. The IDs of the enumerated projects.',
    custom_project_expression    			string comment 'The regular expression to match project names.', 
    table_scope_type             			string comment 'The scope type for tables. Valid values: ALL, ENUMS, and EXPRESSION.',
    custom_table_expression      			string comment 'Deprecated in v3.11. The regular expression to match table names.',
    start_time                     		string comment 'Effective start time.', 
    end_time                       		string comment 'Effective end time.',
  	identify_rule_exists             	string comment 'Whether the identification rule exists.', 
  	status														string comment 'The status of the rule. Valid values: ENABLE (enabled) and DISABLE (disabled).',
 		owner_id                    			string comment 'The ID of the owner.',
		owner_name                  			string comment 'The name of the owner.',
		creator_id                    		string comment 'The ID of the creator.',
		creator_name                  		string comment 'The name of the creator.',
		modifier_id                   		string comment 'The ID of the last modifier.',
  	modifier_name                 		string comment 'The name of the last modifier.',
		gmt_create												string comment 'The creation time.',
		gmt_modified											string comment 'The last modification time.',
  	classify_id					        			bigint comment 'The ID of the classification.',
  	classify_name				        			string comment 'The name of the classification.'
)comment 'Stores the allowlist rules for data desensitization.'
partitioned by (ds string comment 'The partition key, in yyyymmdd format.') 
lifecycle 30;

Quality

dim_dataphin_dp_quality_watch

Note

Field descriptions:

  • table_id

    • If watch_type=TABLE and table_type in (PHYSICAL_TABLE,LOGIC_FACT_TABLE,LOGIC_DIM_TABLE,LOGIC_SUM_TABLE, REALTIME_LOGICAL_TABLE) or watch_type =REALTIME_LOGICAL_TABLE, this is the GUID from the asset inventory.

    • If watch_type=DATASOURCE_TABLE, this is an internal identifier with no specific meaning.

    • If watch_type in (DATASOURCE,INDEX), you can ignore this field.

  • index_id

    If watch_type=INDEX, this is the derived_index_id from the dim_dataphin_derived_index table.

  • data_source_id

    If watch_type=DATASOURCE, this is the data_source_id from the dim_dataphin_data_source table.

create table if not exists dim_dataphin_dp_quality_watch
(
    tenant_id                   bigint comment 'The tenant ID.',
    watch_id                    bigint comment 'The watch object ID.',
    watch_type                  string comment 'The watch object type. Valid values: `INDEX` (index), `TABLE` (Dataphin table), `REALTIME_LOGICAL_TABLE` (real-time logical table), `DATASOURCE` (data source), and `DATASOURCE_TABLE` (global table).',
    table_id                    string comment 'The table ID. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this ID references the `dim_dataphin_table` table. If `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`, this ID references the `dim_dataphin_model` table.',
    table_type                  string comment 'The table type. Valid values include `LOGIC_DIM_TABLE` (dimension logical table), `LOGIC_FACT_TABLE` (fact logical table), `LOGIC_SUM_TABLE` (summary logical table), `LOGIC_LABEL_TABLE` (label logical table), `PHYSICAL_TABLE` (physical table), and `REALTIME_LOGICAL_TABLE` (real-time logical table).',
    is_logical_table            string comment 'Indicates whether the table is a logical table. Valid values: `Y` and `N`.',
    table_env                   string comment 'The table environment.',
    table_name                  string comment 'The table name.',
    table_catalog               string comment 'The table catalog. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this contains the project name. If `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`, this contains the business unit name.',
    table_description           string comment 'The table description.',
    table_is_partitioned        string comment 'Indicates whether the table is a partitioned table. Valid values: `Y` and `N`.',
    table_owner_id              string comment 'The table owner''s ID.',
    table_owner_name            string comment 'The table owner''s name.',
    biz_unit_id                 bigint comment 'The ID of the table''s business unit.',
    biz_unit_name               string comment 'The name of the table''s business unit.',
    project_id                  bigint comment 'The ID of the table''s project.',
    project_name                string comment 'The name of the table''s project.',
    data_source_id              bigint comment 'The data source ID. References the `data_source_id` column in the `dim_dataphin_data_source` table.',
    data_source_type            string comment 'The data source type. For valid values, see the `data_source_type` column in the `dim_dataphin_data_source` table.',
    data_source_env             string comment 'The data source environment.',
    data_source_name            string comment 'The data source name.',
    data_source_scope           string comment 'The data source scope.',
    data_source_owner           string comment 'The data source owner.',
    data_source_from            string comment 'The data source origin.',
    index_id                    string comment 'The GUID of the derived index from the `dim_dataphin_derived_index` table when `watch_type` is `INDEX`.',
    index_name                  string comment 'The index name.',
    index_name_cn               string comment 'The Chinese name of the index.',
    index_env                   string comment 'The index environment.',
    index_owner                 string comment 'The index owner.',
    index_sum_logic_table_name  string comment 'The name of the index''s summary logical table.',
    index_desc                  string comment 'The index description.',
    index_compute_type          string comment 'The index computation type.',
    index_granularity_id        bigint comment 'The index granularity ID.',
    index_granularity_cn        string comment 'The Chinese name of the index granularity.',
    index_period_attribute      string comment 'The index period attribute.',
    index_catalog               string comment 'The index catalog, which is typically the business unit name.',
    quality_owner_id            string comment 'The quality owner''s ID.',
    quality_owner_name          string comment 'The quality owner''s name.',
    watch_status                string comment 'The watch object status. Valid values: `ENABLE` and `DISABLE`.',
    creator_id                  string comment 'The creator''s ID.',
    creator_name                string comment 'The creator''s name.',
    gmt_create                  string comment 'The creation time.',
    modifier_id                 string comment 'The modifier''s ID.',
    modifier_name               string comment 'The modifier''s name.',
    gmt_modified                string comment 'The modification time.'
) comment 'Stores metadata for quality watch objects.'
partitioned by (ds string comment 'The partition column.')
lifecycle 30;

Dataphin DP quality rule

Note

To retrieve details about monitored objects such as data sources, data source tables, or metrics, join the dim_dataphin_dp_quality_watch table on tenant_id and watch_id.

create table if not exists dim_dataphin_dp_quality_rule (
    tenant_id                    bigint comment 'The tenant ID.'
   ,rule_id                      bigint comment 'The rule ID.'
   ,rule_name                    string comment 'The rule name.'
   ,strength                     string comment 'Rule strength. Valid values: STRONG, WEAK.'
   ,rule_desc                    string comment 'Rule description.'
   ,rule_catalogs                string comment 'A comma-separated list of rule category tags. Valid values: CONSISTENT, EFFECTIVE, TIMELINESS, ACCURATE, UNIQUENESS, COMPLETENESS, STABILITY, CUSTOM.'
   ,template_id                  bigint comment 'The template ID.'
   ,template_type                string comment 'Template type.'
   ,template_name                string comment 'Template name.'
   ,watch_id                     bigint comment 'Monitoring object ID.'
   ,watch_type                   string comment 'Type of the monitoring object. Valid values: INDEX (metric), TABLE (Dataphin table), REALTIME_LOGICAL_TABLE (real-time logical table), DATASOURCE (data source), DATASOURCE_TABLE (table in a data source).'
   ,table_id                     string comment 'The table ID. If watch_type is TABLE and table_type is PHYSICAL_TABLE, this ID refers to the dim_dataphin_table. If watch_type is TABLE or INDEX and is_logical_table is ''Y'', this ID refers to the dim_dataphin_model.'
   ,table_type                   string comment 'Table type. Valid values: LOGIC_DIM_TABLE (logical dimension table), LOGIC_FACT_TABLE (logical fact table), LOGIC_SUM_TABLE (logical summary table), LOGIC_LABEL_TABLE (logical label table), PHYSICAL_TABLE (physical table), REALTIME_LOGICAL_TABLE (real-time logical table).'
   ,is_logical_table             string comment 'Indicates whether the table is a logical table. Valid values: Y, N.'
   ,table_env                    string comment 'Table environment. Valid values: PROD, DEV.'
   ,table_name                   string comment 'Table name.'
   ,table_catalog                string comment 'Table group. If watch_type is TABLE and table_type is PHYSICAL_TABLE, this is the project_name. If watch_type is TABLE or INDEX and is_logical_table is ''Y'', this is the biz_unit_name.'
   ,table_description            string comment 'Table description.'
   ,table_is_partitioned         string comment 'Indicates whether the table is partitioned. Valid values: Y, N.'
   ,biz_unit_id                  bigint comment 'ID of the business unit.'
   ,biz_unit_name                string comment 'Name of the business unit.'
   ,project_id                   bigint comment 'ID of the project.'
   ,project_name                 string comment 'Name of the project.'
   ,validate_object_type         string comment 'Type of the validation object. Valid values: TABLE (table-level rule, or rule for a table in a data source), COLUMN (column-level rule), DATASOURCE (data source-level rule), REALTIME (real-time rule), INDEX (metric-level rule), CHAIN (offline pipeline), UNKNOWN.'
   ,validate_object_name         string comment 'Name of the validation object.'
   ,rule_properties              string comment 'Key-value pairs for the dynamic template rule configuration.'
   ,enable_error_archive         string comment 'Indicates whether to enable error archiving. Valid values: Y, N.'
   ,validate_condition           string comment 'Validation condition.'
   ,rule_status                  string comment 'Rule status. Valid values: ENABLE, DISABLE.'
   ,schedule_ids                 string comment 'Comma-separated list of bound scheduling IDs.'
   ,alert_id                     bigint comment 'Alert setting ID.'
   ,alert_scope                  string comment 'Alert scope. Valid values: WATCH, RULE.'
   ,alert_users                  string comment 'IDs of users to be alerted, in JSON format.'
   ,alert_users_channels         string comment 'Channels for sending alerts to users, in JSON format. Valid values: VOICE (voice call), SMS (text message), MAIL (email), DINGTALK_ROBOT (DingTalk chatbot), DingTalk (DingTalk app notification).'
   ,alert_duties                 string comment 'On-call schedule for alerts.'
   ,alert_duties_channels        string comment 'Channels for sending alerts based on the on-call schedule. The values are the same as for alert_users_channels.'
   ,alert_quality_owner          string comment 'Indicates whether to alert the quality owner. Valid values: Y, N.'
   ,alert_quality_owner_channels string comment 'Channels for alerting the quality owner. The values are the same as for alert_users_channels.'
   ,creator_id                   string comment 'Creator ID.'
   ,creator_name                 string comment 'Creator name.'
   ,gmt_create                   string comment 'Creation time.'
   ,modifier_id                  string comment 'ID of the last modifier.'
   ,modifier_name                string comment 'Name of the last modifier.'
   ,gmt_modified                 string comment 'Time of last modification.'
   ,tag                          string comment 'Business tag. Valid values: DATA_STANDARD_MANUAL, DATA_STANDARD_AUTO, PIPELINE.'
   ,tag_unique_key               string comment 'Unique identifier for the business tag.'
   ,tag_properties               string comment 'Additional properties for the business tag, in JSON format.'
   ,archive_mode                 string comment 'Archiving mode. The only valid value is ERROR.'
   ,archive_store_type           string comment 'Archive storage type. The only valid value is FILE_SYSTEM.'
   ,archive_table_name           string comment 'Name of the archive table.',
    data_source_id               bigint comment 'Data source ID.',
    data_source_type             string comment 'Data source type.',
    data_source_name             string comment 'Data source name.',
    attribute_config             string comment 'Key-value pairs for business attributes. Pairs are separated by semicolons (;), and keys are separated from values by equal signs (=). Allowed values include custom text, single-select, and multi-select options, but not ranges.'
)
comment 'Stores data quality rules defined in Dataphin.'
partitioned by (
    ds string comment 'Partition key.'
)
lifecycle 30;

fct_dataphin_dp_quality_rule_task_di

Note

Starting with version 4.3, all validation summary metrics are stored in the execute_context field. A single rule execution record can now contain multiple validation summary metrics.

Summary Statistics

TOTAL_NUMBER: Total row count
NORMAL_NUMBER: Normal row count
NORMAL_RATE: Normal rate (%)
ERROR_NUMBER: Error row count
ERROR_RATE: Error rate (%)
ERROR_DATA: Error data
ERROR_DATA_FULL: Full error record
ERROR_DATA_ARCHIVE_TABLE: Error data archive table
STATISTICAL_VALUE: Statistic
STATISTICAL_DIFF_VALUE: Statistical difference
STATISTICAL_DIFF_RATE: Statistical difference rate (%)
STATISTICAL_VALIDATE_VALUE: Verification field statistic
STATISTICAL_COMPARE_VALUE: Comparison field statistic
ONE_DAY_FLUCTUATE: 1-day fluctuation rate (%)
SEVEN_DAY_FLUCTUATE: 7-day fluctuation rate (%)
THIRTY_DAY_FLUCTUATE: 30-day fluctuation rate (%)
SEVEN_DAYS_AVG_FLUCTUATE: 7-day average fluctuation rate (%)
THIRTY_DAYS_AVG_FLUCTUATE: 30-day average fluctuation rate (%)
FIRST_DAY_OF_MONTH_FLUCTUATE: Fluctuation rate vs. first day of the month (%)
LAST_MONTH_FLUCTUATE: Month-over-month fluctuation rate (%)
LAST_YEAR_FLUCTUATE: Year-over-year fluctuation rate (%)
DATASOURCE_CONNECTIVE: Data source connectivity
TABLE_SCHEMA_CHANGED: Table schema change check
REALTIME_COMPARE_CHAIN_1_DIFF: Chain 1 deviation rate (%)
REALTIME_COMPARE_CHAIN_2_DIFF: Chain 2 deviation rate (%)
REALTIME_COMPARE_CHAIN_1: Real-time chain 1 metric
REALTIME_COMPARE_CHAIN_2: Real-time chain 2 metric
REALTIME_OFFLINE_CHAIN_DIFF: Offline chain deviation rate (%)
REALTIME_STATISTICAL_VALUE: Real-time statistic
REALTIME_ONE_DAY_FLUCTUATE: Real-time 1-day fluctuation rate (%)
REALTIME_SEVEN_DAY_FLUCTUATE: Real-time 7-day fluctuation rate (%)
REALTIME_THIRTY_DAY_FLUCTUATE: Real-time 30-day fluctuation rate (%)
CUSTOM_STATISTICAL_SQL: Custom statistical SQL
CUSTOM_STATISTICAL_SQL_TODAY: Custom metric SQL (today)
CUSTOM_STATISTICAL_SQL_YESTERDAY: Custom metric SQL (yesterday)
CUSTOM_STATISTICAL_SQL_SEVEN_DAY: Custom metric SQL (7-day)
CUSTOM_STATISTICAL_SQL_SEVEN_DAYS_AVG: Custom metric SQL (7-day average)
CUSTOM_STATISTICAL_SQL_THIRTY_DAY: Custom metric SQL (30-day)
CUSTOM_STATISTICAL_SQL_THIRTY_DAYS_AVG: Custom metric SQL (30-day average)
CUSTOM_STATISTICAL_SQL_LAST_MONTH: Custom metric SQL (last month)
CUSTOM_STATISTICAL_SQL_LAST_YEAR: Custom metric SQL (last year)
TABLE_SIZE: Table size (bytes)
TABLE_SIZE_ONE_DAY: Table size (yesterday) (bytes)
TABLE_SIZE_SEVEN_DAY: Table size (7 days ago) (bytes)
TABLE_SIZE_SEVEN_DAYS_AVG: 7-day average table size (bytes)
TABLE_SIZE_THIRTY_DAY: Table size (30 days ago) (bytes)
TABLE_SIZE_THIRTY_DAYS_AVG: 30-day average table size (bytes)
TABLE_SIZE_LAST_MONTH: Last month's table size (bytes)
TABLE_SIZE_LAST_YEAR: Last year's table size (bytes)
TABLE_SIZE_FIRST_DAY_OF_MONTH: Table size (first day of month) (bytes)
TABLE_LINE_COUNT: Table row count
TABLE_LINE_COUNT_ONE_DAY: Table row count (yesterday)
TABLE_LINE_COUNT_SEVEN_DAY: Table row count (7 days ago)
TABLE_LINE_COUNT_SEVEN_DAYS_AVG: 7-day average table row count
TABLE_LINE_COUNT_THIRTY_DAY: Table row count (30 days ago)
TABLE_LINE_COUNT_THIRTY_DAYS_AVG: 30-day average table row count
TABLE_LINE_COUNT_LAST_MONTH: Last month's table row count
TABLE_LINE_COUNT_LAST_YEAR: Last year's table row count
TABLE_LINE_COUNT_FIRST_DAY_OF_MONTH: Table row count (first day of month)
TABLE_PARTITION_COUNT: Table partition count
TABLE_PARTITION_COUNT_ONE_DAY: Table partition count (yesterday)
TABLE_PARTITION_COUNT_SEVEN_DAY: Table partition count (7 days ago)
TABLE_PARTITION_COUNT_SEVEN_DAYS_AVG: 7-day average table partition count
TABLE_PARTITION_COUNT_THIRTY_DAY: Table partition count (30 days ago)
TABLE_PARTITION_COUNT_THIRTY_DAYS_AVG: 30-day average table partition count
TABLE_PARTITION_COUNT_LAST_MONTH: Last month's table partition count
TABLE_PARTITION_COUNT_LAST_YEAR: Last year's table partition count
TABLE_PARTITION_COUNT_FIRST_DAY_OF_MONTH: Table partition count (first day of month)
PARTITION_SIZE: Partition size (bytes)
PARTITION_SIZE_ONE_DAY: Partition size (yesterday) (bytes)
PARTITION_SIZE_SEVEN_DAY: Partition size (7 days ago) (bytes)
PARTITION_SIZE_SEVEN_DAYS_AVG: 7-day average partition size (bytes)
PARTITION_SIZE_THIRTY_DAY: Partition size (30 days ago) (bytes)
PARTITION_SIZE_THIRTY_DAYS_AVG: 30-day average partition size (bytes)
PARTITION_SIZE_LAST_MONTH: Last month's partition size (bytes)
PARTITION_SIZE_LAST_YEAR: Last year's partition size (bytes)
PARTITION_SIZE_FIRST_DAY_OF_MONTH: Partition size (first day of month) (bytes)
PARTITION_LINE_COUNT: Partition row count
PARTITION_LINE_COUNT_ONE_DAY: Partition row count (yesterday)
PARTITION_LINE_COUNT_SEVEN_DAY: Partition row count (7 days ago)
PARTITION_LINE_COUNT_SEVEN_DAYS_AVG: 7-day average partition row count
PARTITION_LINE_COUNT_THIRTY_DAY: Partition row count (30 days ago)
PARTITION_LINE_COUNT_THIRTY_DAYS_AVG: 30-day average partition row count
PARTITION_LINE_COUNT_LAST_MONTH: Last month's partition row count
PARTITION_LINE_COUNT_LAST_YEAR: Last year's partition row count
PARTITION_LINE_COUNT_FIRST_DAY_OF_MONTH: Partition row count (first day of month)
FIELD_AVG: Field average
FIELD_AVG_ONE_DAY: Field average (yesterday)
FIELD_AVG_SEVEN_DAY: Field average (7 days ago)
FIELD_AVG_SEVEN_DAYS_AVG: 7-day average of field average
FIELD_AVG_THIRTY_DAY: Field average (30 days ago)
FIELD_AVG_THIRTY_DAYS_AVG: 30-day average of field average
FIELD_AVG_LAST_MONTH: Last month's field average
FIELD_AVG_LAST_YEAR: Last year's field average
FIELD_AVG_FIRST_DAY_OF_MONTH: Field average (first day of month)
FIELD_MAX: Field maximum
FIELD_MAX_ONE_DAY: Field maximum (yesterday)
FIELD_MAX_SEVEN_DAY: Field maximum (7 days ago)
FIELD_MAX_SEVEN_DAYS_AVG: 7-day average of field maximum
FIELD_MAX_THIRTY_DAY: Field maximum (30 days ago)
FIELD_MAX_THIRTY_DAYS_AVG: 30-day average of field maximum
FIELD_MAX_LAST_MONTH: Last month's field maximum
FIELD_MAX_LAST_YEAR: Last year's field maximum
FIELD_MAX_FIRST_DAY_OF_MONTH: Field maximum (first day of month)
FIELD_MIN: Field minimum
FIELD_MIN_ONE_DAY: Field minimum (yesterday)
FIELD_MIN_SEVEN_DAY: Field minimum (7 days ago)
FIELD_MIN_SEVEN_DAYS_AVG: 7-day average of field minimum
FIELD_MIN_THIRTY_DAY: Field minimum (30 days ago)
FIELD_MIN_THIRTY_DAYS_AVG: 30-day average of field minimum
FIELD_MIN_LAST_MONTH: Last month's field minimum
FIELD_MIN_LAST_YEAR: Last year's field minimum
FIELD_MIN_FIRST_DAY_OF_MONTH: Field minimum (first day of month)
FIELD_SUM: Field sum
FIELD_SUM_ONE_DAY: Field sum (yesterday)
FIELD_SUM_SEVEN_DAY: Field sum (7 days ago)
FIELD_SUM_SEVEN_DAYS_AVG: 7-day average of field sum
FIELD_SUM_THIRTY_DAY: Field sum (30 days ago)
FIELD_SUM_THIRTY_DAYS_AVG: 30-day average of field sum
FIELD_SUM_LAST_MONTH: Last month's field sum
FIELD_SUM_LAST_YEAR: Last year's field sum
FIELD_SUM_FIRST_DAY_OF_MONTH: Field sum (first day of month)
FIELD_COUNT: Field count
FIELD_COUNT_ONE_DAY: Field count (yesterday)
FIELD_COUNT_SEVEN_DAY: Field count (7 days ago)
FIELD_COUNT_SEVEN_DAYS_AVG: 7-day average of field count
FIELD_COUNT_THIRTY_DAY: Field count (30 days ago)
FIELD_COUNT_THIRTY_DAYS_AVG: 30-day average of field count
FIELD_COUNT_LAST_MONTH: Last month's field count
FIELD_COUNT_LAST_YEAR: Last year's field count
FIELD_COUNT_FIRST_DAY_OF_MONTH: Field count (first day of month)
FIELD_DISTINCT_COUNT: Field distinct count
FIELD_DISTINCT_COUNT_ONE_DAY: Field distinct count (yesterday)
FIELD_DISTINCT_COUNT_SEVEN_DAY: Field distinct count (7 days ago)
FIELD_DISTINCT_COUNT_SEVEN_DAYS_AVG: 7-day average of field distinct count
FIELD_DISTINCT_COUNT_THIRTY_DAY: Field distinct count (30 days ago)
FIELD_DISTINCT_COUNT_THIRTY_DAYS_AVG: 30-day average of field distinct count
FIELD_DISTINCT_COUNT_LAST_MONTH: Last month's field distinct count
FIELD_DISTINCT_COUNT_LAST_YEAR: Last year's field distinct count
FIELD_DISTINCT_COUNT_FIRST_DAY_OF_MONTH: Field distinct count (first day of month)
FIELD_GROUP_COUNT: Field group count
FIELD_GROUP_COUNT_ONE_DAY: Field group count (yesterday)
FIELD_GROUP_COUNT_SEVEN_DAY: Field group count (7 days ago)
FIELD_GROUP_COUNT_SEVEN_DAYS_AVG: 7-day average of field group count
FIELD_GROUP_COUNT_THIRTY_DAY: Field group count (30 days ago)
FIELD_GROUP_COUNT_THIRTY_DAYS_AVG: 30-day average of field group count
FIELD_GROUP_COUNT_LAST_MONTH: Last month's field group count
FIELD_GROUP_COUNT_LAST_YEAR: Last year's field group count
FIELD_GROUP_COUNT_FIRST_DAY_OF_MONTH: Field group count (first day of month)
FIELD_NULL_COUNT: Field null count
FIELD_NULL_COUNT_ONE_DAY: Field null count (yesterday)
FIELD_NULL_COUNT_SEVEN_DAY: Field null count (7 days ago)
FIELD_NULL_COUNT_SEVEN_DAYS_AVG: 7-day average of field null count
FIELD_NULL_COUNT_THIRTY_DAY: Field null count (30 days ago)
FIELD_NULL_COUNT_THIRTY_DAYS_AVG: 30-day average of field null count
FIELD_NULL_COUNT_LAST_MONTH: Last month's field null count
FIELD_NULL_COUNT_LAST_YEAR: Last year's field null count
FIELD_NULL_COUNT_FIRST_DAY_OF_MONTH: Field null count (first day of month)
FIELD_NULL_RATE: Field null rate (%)
FIELD_NULL_RATE_ONE_DAY: Field null rate (yesterday) (%)
FIELD_NULL_RATE_SEVEN_DAY: Field null rate (7 days ago) (%)
FIELD_NULL_RATE_SEVEN_DAYS_AVG: 7-day average of field null rate (%)
FIELD_NULL_RATE_THIRTY_DAY: Field null rate (30 days ago) (%)
FIELD_NULL_RATE_THIRTY_DAYS_AVG: 30-day average of field null rate (%)
FIELD_NULL_RATE_LAST_MONTH: Last month's field null rate (%)
FIELD_NULL_RATE_LAST_YEAR: Last year's field null rate (%)
FIELD_NULL_RATE_FIRST_DAY_OF_MONTH: Field null rate (first day of month) (%)
FIELD_DUPLICATE_COUNT: Field duplicate row count
FIELD_DUPLICATE_COUNT_ONE_DAY: Field duplicate row count (yesterday)
FIELD_DUPLICATE_COUNT_SEVEN_DAY: Field duplicate row count (7 days ago)
FIELD_DUPLICATE_COUNT_SEVEN_DAYS_AVG: 7-day average of field duplicate row count
FIELD_DUPLICATE_COUNT_THIRTY_DAY: Field duplicate row count (30 days ago)
FIELD_DUPLICATE_COUNT_THIRTY_DAYS_AVG: 30-day average of field duplicate row count
FIELD_DUPLICATE_COUNT_LAST_MONTH: Last month's field duplicate row count
FIELD_DUPLICATE_COUNT_LAST_YEAR: Last year's field duplicate row count
FIELD_DUPLICATE_COUNT_FIRST_DAY_OF_MONTH: Field duplicate row count (first day of month)
FIELD_DUPLICATE_RATE: Field duplicate row rate (%)
FIELD_DUPLICATE_RATE_ONE_DAY: Field duplicate row rate (yesterday) (%)
FIELD_DUPLICATE_RATE_SEVEN_DAY: Field duplicate row rate (7 days ago) (%)
FIELD_DUPLICATE_RATE_SEVEN_DAYS_AVG: 7-day average of field duplicate row rate (%)
FIELD_DUPLICATE_RATE_THIRTY_DAY: Field duplicate row rate (30 days ago) (%)
FIELD_DUPLICATE_RATE_THIRTY_DAYS_AVG: 30-day average of field duplicate row rate (%)
FIELD_DUPLICATE_RATE_LAST_MONTH: Last month's field duplicate row rate (%)
FIELD_DUPLICATE_RATE_LAST_YEAR: Last year's field duplicate row rate (%)
FIELD_DUPLICATE_RATE_FIRST_DAY_OF_MONTH: Field duplicate row rate (first day of month) (%)
VALUE: Original value
COUNT: Count
DISTINCT: Distinct count
SUM: Sum
MAX: Maximum
MIN: Minimum
AVG: Average
FIELD_VALIDATE_FIELDS: Verification field list
CODE_TABLE_DATA: Code table data
METRIC_N_DAY: N-day statistic

Summary statistics metric types

SINGLE_TABLE_STATISTICAL_DIFF_VALUE: Single-table statistical difference
SINGLE_TABLE_STATISTICAL_DIFF_RATE: Single-table statistical difference rate (%)
DOUBLE_TABLE_STATISTICAL_DIFF_VALUE: Two-table statistical difference
DOUBLE_TABLE_STATISTICAL_VALIDATE_VALUE: Validation field value
DOUBLE_TABLE_STATISTICAL_COMPARE_VALUE: Comparison field value
SINGLE_TABLE_STATISTICAL_VALIDATE_VALUE: Validation field value
SINGLE_TABLE_STATISTICAL_COMPARE_VALUE: Comparison field value
DOUBLE_TABLE_STATISTICAL_DIFF_RATE: Two-table statistical difference rate (%)
ERROR_NUMBER: Error count
NORMAL_NUMBER: Success count
ERROR_RATE: Error rate
NORMAL_RATE: Success rate
STATISTICAL_VALUE: Statistical value
ONE_DAY_FLUCTUATE: 1-day fluctuation (%)
SEVEN_DAY_FLUCTUATE: 7-day fluctuation (%)
THIRTY_DAY_FLUCTUATE: 30-day fluctuation (%)
LAST_MONTH_FLUCTUATE: Month-over-month fluctuation (%)
LAST_YEAR_FLUCTUATE: Year-over-year fluctuation (%)
DATASOURCE_CONNECTIVE: Data source connectivity
TABLE_SCHEMA_CHANGED: Table schema change
CUSTOM_STATISTICAL_SQL: Custom statistical SQL
TABLE_SIZE: Table size
TABLE_LINE_COUNT: Table row count
TABLE_PARTITION_COUNT: Table partition count
PARTITION_SIZE: Partition size
PARTITION_LINE_COUNT: Partition row count
FIELD_NULL_COUNT: Field null count
FIELD_NULL_RATE: Field null rate (%)
FIELD_AVG: Field average
FIELD_MAX: Field maximum
FIELD_MIN: Field minimum
FIELD_SUM: Field sum
FIELD_COUNT: Field count
FIELD_DISTINCT_COUNT: Field distinct count
FIELD_GROUP_COUNT: Field group count
FIELD_DUPLICATE_COUNT: Field duplicate count
FIELD_DUPLICATE_RATE: Field duplicate rate
CODE_TABLE_COMPARE_TOTAL_NUMBER: Code table comparison: Total rows
CODE_TABLE_COMPARE_ERROR_NUMBER: Code table comparison: Error rows
ARCHIVE_TABLE_DDL: Archive table DDL
ARCHIVE_TABLE_DQL: Archive table DQL

Getting aggregated statistical metrics

select
tenant_id
  ,rule_task_id
  ,rule_id
  ,watch_id
  ,metric_type
  ,metric_name
  ,bizdate
  ,metric_value
  ,bizdate_format
from
(select 
  tenant_id
  ,rule_task_id
  ,rule_id
  ,watch_id
  ,get_json_object(metric_value, '$.metricType') as metric_type
  ,get_json_object(metric_value, '$.metricName') as metric_name
  ,get_json_object(metric_value, '$.bizDate') as bizdate
  ,get_json_object(metric_value, '$.metricValue') as metric_value
  ,get_json_object(metric_value, '$.bizDateFormat') as bizdate_format
  ,row_number() over(partition by tenant_id,rule_task_id,rule_id,get_json_object(metric_value, '$.metricName'),get_json_object(metric_value, '$.bizDate')  order by gmt_create desc) as rank_num 
from ( 
    
    select
--regexp_replace(regexp_replace(execute_context,'\\\\r|\\\\t|\\\\v|\\\\f|\\\\n|\\\\u([[:xdigit:]]){4}|\\{\\\\"|\\\\"\\}|\\\\"',''),'"metricValue":NaN','"metricValue":"NaN"}'),
split(regexp_replace(regexp_replace(get_json_object(
regexp_replace(execute_context,'"metricValue":NaN','"metricValue":"NaN"')
    , '$.allMetricValues'), '\\[|\\]', ''),'(\\}(\t\r\v\f\n)*,)','}abcyugongcba'), 'abcyugongcba')  as all_metric_values 
,*
from fct_dataphin_dp_quality_rule_task_di
where ds='${bizdate}'
)  json_table
LATERAL VIEW explode(all_metric_values) exploded_table AS metric_value
)tt
where tt.rank_num = 1
;
create table if not exists fct_dataphin_dp_quality_rule_task_di
(
    tenant_id                   bigint comment 'Tenant ID.',
    rule_task_id                bigint comment 'Rule task ID.',
    watch_task_id               bigint comment 'Associated watch task ID.',
    watch_id                    bigint comment 'Monitored object ID.',
    watch_detail                string comment 'Monitored object details.',
    watch_type                  string comment 'Monitored object type.',
    watch_task_status           string comment 'Watch task status. Values: WAITING, RUNNING, SUCCESS, FAILED.',
    data_source_id              bigint comment 'Data source ID.',
    data_source_type            string comment 'Data source type.',
    data_source_scope           string comment 'Data source scope.',
    data_source_owner           string comment 'Data source owner.',
    table_catalog               string comment 'Table catalog.',
    table_name                  string comment 'Table name.',
    table_type                  string comment 'Table type.',
    table_owner                 string comment 'Table owner.',
    biz_unit_name               string comment 'Business unit name.',
    project_name                string comment 'Project name.',
    template_id                 bigint comment 'Rule template ID.',
    template_type               string comment 'Rule template type.',
    template_name               string comment 'Rule template name.',
    template                    string comment 'Rule template details.',
    rule_id                     bigint comment 'Rule ID.',
    rule_detail                 string comment 'Rule details.',
    rule_name                   string comment 'Rule name.',
    rule_strength               string comment 'Rule strength.',
    rule_catalogs               string comment 'Rule categories, separated by commas. Values: CONSISTENT, EFFECTIVE, TIMELINESS, ACCURATE, UNIQUENESS, COMPLETENESS, STABILITY, CUSTOM.',
    validate_object_type        string comment 'Validated object type.',
    validate_object_name        string comment 'Validated object name.',
    biz_date                    string comment 'Business date.',
    biz_date_format             string comment 'Business date format.',
    quality_owner_id            string comment 'Quality owner ID.',
    quality_owner_name          string comment 'Quality owner name.',
    rule_task_status            string comment 'Rule task status. Values: WAITING, RUNNING, SUCCESS, FAILED.',
    execute_context             string comment 'Rule execution context.',
    is_validate_result          string comment 'Whether the rule validation passed. Values: Y, N.',
    start_time                  string comment 'Start time.',
    end_time                    string comment 'End time.',
    error_code                  string comment 'Error code.',
    schedule_id                 bigint comment 'Schedule setting ID.',
    schedule_type               string comment 'Schedule setting type.',
    schedule_partition_values   string comment 'Calculated partition values for the schedule.',
    schedule_params             string comment 'Parameter values for the schedule.',
    trace_id                    string comment 'Trace ID. Format: type:id.',
    creator_id                  string comment 'Creator''s ID.',
    creator_name                string comment 'Creator''s name.',
    gmt_create                  string comment 'Creation time.',
    modifier_id                 string comment 'Last modifier''s ID.',
    modifier_name               string comment 'Last modifier''s name.',
    gmt_modified                string comment 'Last modification time.',
    tag                         string comment 'Business tag. Values: DATA_STANDARD_MANUAL, DATA_STANDARD_AUTO, PIPELINE.',
    tag_unique_key              string comment 'Unique key for the business tag.'
) comment 'Quality rule task table.'
partitioned by (ds string comment 'Partition field.')
lifecycle 90
;

dim_dataphin_dp_quality_watch_member

create table if not exists dim_dataphin_dp_quality_watch_member
(
    tenant_id                   bigint comment 'Tenant ID.',
    watch_id                    bigint comment 'Monitored object ID.',
    watch_type                  string comment 'Monitored object type. Valid values: INDEX (metric), TABLE (Dataphin table), REALTIME_LOGICAL_TABLE (real-time logical table), DATASOURCE (data source), and DATASOURCE_TABLE (data source table).',
    watch_status                string comment 'Status. Valid values: ENABLE and DISABLE.',
    user_id                     string comment 'User ID.',
    user_name                   string comment 'User name.',
    user_type                   string comment 'Member type. Fixed value: QUALITY_OWNER.',
    creator_id                  string comment 'Creator ID.',
    creator_name                string comment 'Creator name.',
    gmt_create                  string comment 'Creation time.',
    modifier_id                 string comment 'Last modifier ID.',
    modifier_name               string comment 'Last modifier name.',
    gmt_modified                string comment 'Last modification time.'
) comment 'Quality monitoring object owner table.'
partitioned by (ds string comment 'Partition column.')
lifecycle 30;

dim_dataphin_dp_quality_watch_alert

create table if not exists dim_dataphin_dp_quality_watch_alert
(
    tenant_id                   bigint comment 'Tenant ID.',
    alert_id                    bigint comment 'Alert setting ID. For configurations that apply to all rules. For rule-specific configurations, see the `dim_dataphin_dp_quality_rule` table.',
    alert_name                  string comment 'Alert name.',
    alert_scope                 string comment 'Alert scope. The value is `WATCH`.',
    alert_users                 string comment 'User IDs to alert, in JSON format.',
    alert_users_channels        string comment 'Channels for alerting users, in JSON format. Valid values: `VOICE`, `SMS`, `MAIL` (email), `DINGTALK_ROBOT` (DingTalk robot), and `DINGDING` (DingTalk work notification).',
    alert_duties                string comment 'On-call schedule to alert.',
    alert_duties_channels       string comment 'Channels for alerting the on-call schedule. For valid values, see `alert_users_channels`.',
    alert_quality_owner         string comment 'Specifies whether to alert the quality owner. Valid values: `Y`, `N`.',
    alert_quality_owner_channels string comment 'Channels for alerting the quality owner. For valid values, see `alert_users_channels`.',
    custom_alert_scope          string comment 'Custom alert scope in JSON format, containing the rule ID and name.',
    sort_index                  bigint comment 'Sort index.',
    watch_id                    bigint comment 'Monitored object ID.',
    watch_type                  string comment 'Type of the monitored object. Valid values: `INDEX` (metric), `TABLE` (Dataphin table), `REALTIME_LOGICAL_TABLE` (real-time logical table), `DATASOURCE` (data source), and `DATASOURCE_TABLE` (data source table).',
    table_id                    string comment 'Table ID. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this ID is from `dim_dataphin_table`. If `watch_type` is `TABLE` or `INDEX` for a logical table, this ID is from `dim_dataphin_model`.',
    table_type                  string comment 'Table type. Valid values: `LOGIC_DIM_TABLE` (dimension logical), `LOGIC_FACT_TABLE` (fact logical), `LOGIC_SUM_TABLE` (summary logical), `LOGIC_LABEL_TABLE` (label logical), `PHYSICAL_TABLE` (physical), and `REALTIME_LOGICAL_TABLE` (real-time logical).',
    table_env                   string comment 'Table environment. Valid values: `PROD`, `DEV`.',
    table_name                  string comment 'Table name.',
    table_catalog               string comment 'Table group. If `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`, this is the project name. If `watch_type` is `TABLE` or `INDEX` for a logical table, this is the business unit name.',
    creator_id                  string comment 'Creator ID.',
    creator_name                string comment 'Creator name.',
    gmt_create                  string comment 'Creation time.',
    modifier_id                 string comment 'Last modifier ID.',
    modifier_name               string comment 'Last modifier name.',
    gmt_modified                string comment 'Last modification time.'
) comment 'Alert configurations for quality monitoring objects.'
partitioned by (ds string comment 'Partition key.')
lifecycle 90
;

fct_dataphin_dp_quality_rule_score_di

create table if not exists fct_dataphin_dp_quality_rule_score_di
(
    id                          bigint comment 'The auto-incrementing ID.',
    tenant_id                   bigint comment 'The ID of the tenant.',
    rule_id                     bigint comment 'The ID of the rule.',
    rule_name                   string comment 'Rule name.',
    rule_catalogs               string comment 'Rule category tags.',
    rule_weight                 bigint comment 'The scoring weight for the monitored table.',
    rule_score                  double comment 'The rule quality score.',
    strength                    string comment 'Rule strength. Valid values: STRONG, WEAK.',
    template_id                 bigint comment 'The ID of the template.',
    template_type               string comment 'The template type.',
    template_name               string comment 'Template name.',
    template_cn_name            string comment 'Chinese name of the template.',
    watch_id                    bigint comment 'Monitored object ID.',
    watch_type                  string comment 'Type of the monitored object. Valid values: INDEX, TABLE, REALTIME_LOGICAL_TABLE, DATASOURCE, DATASOURCE_TABLE.',
    watch_weight                bigint comment 'The scoring weight for the rule.',
    table_id                    string comment 'Table ID. This is the ID in `dim_dataphin_table` if `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`. This is the ID in `dim_dataphin_model` if `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`.',
    table_type                  string comment 'Table type. Valid values: `LOGIC_DIM_TABLE`, `LOGIC_FACT_TABLE`, `LOGIC_SUM_TABLE`, `LOGIC_LABEL_TABLE`, `PHYSICAL_TABLE`, and `REALTIME_LOGICAL_TABLE`.',
    is_logical_table            string comment 'Indicates if the table is a logical table (Y/N).',
    table_env                   string comment 'Table environment. Valid values: PROD, DEV.',
    table_name                  string comment 'Table name.',
    table_catalog               string comment 'Table catalog. Project name if `watch_type` is `TABLE` and `table_type` is `PHYSICAL_TABLE`. Business unit name if `watch_type` is `TABLE` or `INDEX` and `is_logical_table` is `Y`.',
    biz_unit_id                 bigint comment 'ID of the business unit.',
    biz_unit_name               string comment 'Name of the business unit.',
    project_id                  bigint comment 'Project ID.',
    project_name                string comment 'Project name.',
    validate_object_type        string comment 'Validation object type. Valid values: TABLE, COLUMN, DATASOURCE, REALTIME, INDEX, CHAIN, and UNKNOWN.',
    validate_object_name        string comment 'Validation object name.',
    rule_status                 string comment 'Rule status. Valid values: ENABLE, DISABLE.', 
    biz_date			        string comment 'Business date in YYYY-MM-DD format.',
    rule_task_id                bigint comment 'Rule execution ID.',
    rule_task_status            string comment 'Rule execution status. Valid values: WAITING, RUNNING, SUCCESS, FAILED.',
    is_validate_result          string comment 'Indicates if the validation passed (Y/N).',
    start_time                  string comment 'The start time.',
    end_time                    string comment 'The end time.',
    gmt_create                  string comment 'The creation time.',
    gmt_modified                string comment 'The last modified time.'
) comment 'The quality rule scoring table.'
partitioned by (ds string comment 'The partition column.')
lifecycle 91
;

dim_dataphin_dp_quality_schedule

create table if not exists dim_dataphin_dp_quality_schedule(
    tenant_id                               bigint comment 'The ID of the tenant.',
    schedule_id                             bigint comment 'The ID of the schedule.',
    schedule_type                           string comment 'The schedule type. Valid values include `STATIC_TASK_TRIGGER` (time-based), `TRIGGER_SCHEDULE` (task-triggered), and `CODE_CHECK_TRIGGER` (triggered by data updates).',
    schedule_name                           string comment 'The name of the schedule.',
    watch_id                                bigint comment 'The ID of the associated watch.',
    watch_type                              string comment 'The type of the associated watch.',
    partition_type                          string comment 'The type of the partition expression. Valid value: `CUSTOM`.',
    partition_expression                    string comment 'The partition expression.',
    date_format                             string comment 'The date format.',
    trigger_schedule_nodes                  string comment 'The nodes that trigger the schedule.',
    static_task_trigger_type                string comment 'The trigger mode for a task-triggered schedule. Valid values: `ALL_TASKS_FINISHED` to trigger after all specified tasks complete successfully, `ONE_TASKS_FINISHED` to trigger after each specified task completes successfully, and `PRE_ONE_TASKS_START` to trigger before each specified task starts.',
    cron_expression                         string comment 'The cron expression.',
    schedule_condition                      string comment 'Schedule conditions in JSON format.',
    validate_partition_type                 string comment 'The type of the partition range to validate. Valid values: `TASK_REFERRED_PARTITION` for the partitions updated by the task, and `USER_DEFINED_PARTITION` for a custom partition range.',
    creator_id                              string comment 'The ID of the creator.',
    creator_name                            string comment 'The name of the creator.',
    modifier_id                             string comment 'The ID of the modifier.',
    modifier_name                           string comment 'The name of the modifier.',
    gmt_create                              string comment 'The creation time.',
    gmt_modified                            string comment 'The modification time.'
)comment 'Stores the data quality schedule configurations.'
partitioned by(ds string comment 'The partition field, in `yyyymmdd` format.')
lifecycle 15;

fct_dataphin_dp_quality_problem_list_df

Note
  • To retrieve detailed information about monitored objects such as data sources, data source tables, or metrics, join the dim_dataphin_dp_quality_watch table using tenant_id/watch_id.

  • To retrieve detailed information about a quality rule, join the dim_dataphin_dp_quality_rule table using tenant_id/rule_id.

  • Composition of the problem object name and the validation object name (for displaying the problem object):

    • problem_object_big_type = TABLE/DATASOURCE_TABLE/REALTIME_LOGICAL_TABLE

      • When validate_object_type=TABLE, both problem_object_name and validate_object_name are the table name.

      • When validate_object_type=COLUMN/REALTIME, problem_object_name is table_name.column_name, and validate_object_name is column_name.

      • When validate_object_type=CHAIN, problem_object_name is the real-time table name, and validate_object_name is the offline table name.

    • problem_object_big_type = DATASOURCE

      • When validate_object_type=TABLE, both problem_object_name and validate_object_name are the data source table name.

      • When validate_object_type=DATASOURCE, both problem_object_name and validate_object_name are the data source name.

    • problem_object_big_type = INDEX

      • For system-detected problems, problem_object_name is table_name.metric_name, and validate_object_name is metric_name.

      • For manual entries, both problem_object_name and validate_object_name are the metric name.

    • problem_object_big_type = QD_FEATURE

      Currently, label issues must be entered manually. In this case, problem_object_name is the label name, and validate_object_name is the label code.

create table if not exists fct_dataphin_dp_quality_problem_list_df(
    tenant_id             bigint,
    problem_object_id     bigint comment 'The ID of the problem object.',
    problem_object_big_type string comment 'The primary category of the problem object. Valid values: TABLE for a Dataphin table, DATASOURCE_TABLE for a global table, INDEX for a metric, DATASOURCE for a data source, REALTIME_LOGICAL_TABLE for a real-time meta table, and QD_FEATURE for a tag.',
    validate_object_type  string comment 'The type of the validation object. Valid values: TABLE, COLUMN, DATASOURCE, INDEX, REALTIME, CHAIN for a batch pipeline, and QD_FEATURE for a tag.',
    problem_object_name   string comment 'The name of the problem object.',
    validate_object_name  string comment 'The name of the validation object.',
    watch_id              string comment 'The ID of the monitored object.',
    table_id              string comment 'The ID of the table.',
    table_name            string comment 'The name of the table.',
    table_desc            string comment 'The description of the table.',
    datasource_type       string comment 'The type of the data source.',
    datasource_id         string comment 'The ID of the data source.',
    datasource_name       string comment 'The name of the data source.',
    index_id              string comment 'The ID of the metric.',
    index_name_en         string comment 'The English name of the metric.',
    index_name_cn         string comment 'The Chinese name of the metric.',
    index_biz_unit_name   string comment 'The name of the business unit for the metric.',
    qd_feature_id         string comment 'The ID of the tag.',
    qd_feature_name       string comment 'The name of the tag.',
    qd_feature_code       string comment 'The code for the tag.',
    rule_id               string comment 'The ID of the rule.',
    rule_name             string comment 'The name of the rule.',
    rule_strength         string comment 'The strength of the rule.',
    last_problem_status   string comment 'The latest status of the quality issue. Valid values: NEW (pending), VERIFY (verifying), FINISHED (resolved), and IGNORE (ignored).',
    last_problem_types    string comment 'The latest types of the quality issue, in JSON format.',
    problem_desc          string comment 'The description of the quality issue.',
    problem_submit_type   string comment 'The submission method for the quality issue. Valid values: SYSTEM (system-detected) and MANUAL (manually entered).',
    problem_submit_time   string comment 'The submission time of the quality issue. This is the latest validation time for system-detected issues and the submission time for manually entered issues.',
    problem_submitter_id  string comment 'The ID of the user who manually submitted the quality issue.',
    problem_submitter_name string comment 'The name of the user who manually submitted the quality issue.',
    last_problem_owner_ids string comment 'A comma-separated list of the latest owner IDs for the quality issue.',
    last_problem_owner_names   string comment 'A comma-separated list of the latest owner names for the quality issue.',
    last_validate_scope   string comment 'The scope of the latest system validation.',
    last_validate_result  string comment 'The result of the latest system validation. Valid values: PASS (passed), NOT_PASS (failed), and FAILED (execution failed).',
    last_rectify_id       bigint comment 'The ID of the latest rectification.',
    last_rectify_name     string comment 'The name of the latest rectification.'
) comment 'Fact table for Dataphin quality issues.'
partitioned by (ds string comment 'The partition field, in yyyymmdd format.')
lifecycle 30;

fct_dataphin_dp_quality_rectify_df

create table if not exists fct_dataphin_dp_quality_rectify_df(
    tenant_id              bigint,
    rectify_id             bigint comment 'Rectification ID',
    rectify_name           string comment 'Rectification name',
    problem_object_big_type string comment 'Problem object type. Valid values: TABLE (Dataphin table), DATASOURCE_TABLE (global table), INDEX (index), DATASOURCE (data source), REALTIME_LOGICAL_TABLE (real-time logical table), and QD_FEATURE (tag).',
    problem_num            bigint comment 'Number of problems',
    priority               string comment 'Priority. Valid values: LOW, MIDDLE, and HIGH.',
    last_rectify_status    string comment 'Latest rectification status. Valid values: in progress, pending acceptance, requires re-rectification, and accepted.',
    start_time             string comment 'Start time',
    finish_time            string comment 'Finish time',
    gmt_create             string comment 'Creation time',
    gmt_modified           string comment 'Last modified time',
    submit_user_id         string comment 'Submitter ID',
    submit_user_name       string comment 'Submitter name',
    last_rectify_user_id   string comment 'Latest rectifier ID',
    last_rectify_user_name string comment 'Latest rectifier name',
    last_verify_user_id    string comment 'Latest verifier ID',
    last_verify_user_name  string comment 'Latest verifier name'
) comment 'Data quality rectification tasks'
partitioned by (ds string comment 'Data partition date, in yyyymmdd format.')
lifecycle 30;

fct_dataphin_dp_quality_rectify_operation_record_df

create table if not exists fct_dataphin_dp_quality_rectify_operation_record_df(
    tenant_id              bigint,
    record_id              bigint,
    rectify_id             bigint comment 'Specifies the ID of the rectification task.',
    rectify_name           string comment 'Specifies the name of the rectification task.',
    rectify_operation      string comment 'Specifies the type of rectification operation. Valid values: Add to rectification process, Associate with existing rectification process, Submit for acceptance, Reject, and Accept.',
    rectify_status         string comment 'Specifies the status of the rectification task. Valid values: In progress, Pending acceptance, Rework required, and Accepted.',
    operator_id            string comment 'Specifies the ID of the user who performed the operation.',
    operator_name          string comment 'Specifies the name of the user who performed the operation.',
    operation_time         string comment 'Specifies when the operation was performed.',
    operation_comment      string comment 'A comment or note about the operation.'
) comment 'Stores operation records for quality rectification tasks.'
partitioned by (ds string comment 'The partition field, formatted as yyyymmdd.')
lifecycle 30;

fct_dataphin_dp_quality_rectify_problem_relation_df

create table if not exists fct_dataphin_dp_quality_rectify_problem_relation_df(
    tenant_id              bigint,
    rectify_id             bigint comment 'The ID of the quality rectification.',
    rectify_name           string comment 'The name of the quality rectification.',
    problem_object_id      bigint comment 'The ID of the problem object.',
    problem_object_big_type string comment 'The primary category of the problem object. Valid values: TABLE for a Dataphin table, DATASOURCE_TABLE for a global table, INDEX for an index, DATASOURCE for a data source, REALTIME_LOGICAL_TABLE for a real-time logical table, and QD_FEATURE for a tag.',
    validate_object_type  string comment 'The type of the validation object. Valid values: TABLE for a table, COLUMN for a column, DATASOURCE for a data source, INDEX for an index, REALTIME for real-time, CHAIN for offline, and QD_FEATURE for a tag.',
    problem_object_name   string comment 'The name of the problem object.', 
    validate_object_name  string comment 'The name of the validation object.',
    problem_status        string comment 'The status of the problem in the quality rectification snapshot. Valid values: NEW for pending, VERIFY for validating, FINISHED for resolved, and IGNORE for ignored.',
    last_problem_types    string comment 'The latest problem types, in JSON format.',
    problem_desc          string comment 'Problem description.',
    rule_id               string comment 'Rule ID.',
    rule_name             string comment 'Rule name.',
    rule_strength         string comment 'Rule strength.',
    last_validate_scope   string comment 'Latest validation scope.',
    last_validate_result  string comment 'Latest validation result.',
    last_validate_time    string comment 'Latest validation time.',
    problem_submit_type   string comment 'The problem reporting method. Valid values: SYSTEM for system identification and MANUAL for manual entry.', 
    problem_owner_names   string comment 'Comma-separated names of problem owners.'    
) comment 'This table stores the relationship between quality rectification processes and their associated problems.'
partitioned by (ds string comment 'The partition column, in yyyymmdd format.')
lifecycle 30;

Default

dim_dataphin_datastandard_standard

Note

Starting from V4.3.1, you can retrieve the relationships between standards and code tables.

select 
  tenant_id             
   ,standard_id         
   ,standard_name       
   ,standard_name_en    
   ,standard_code       
   ,standard_status
  ,get_json_object(attribute_relation, '$.attributeId') as attribute_id -- The ID of the standard attribute.
  ,get_json_object(attribute_relation, '$.attributeName') as attribute_name  -- The standard attribute name.
  ,get_json_object(attribute_relation, '$.relationType') as relation_type  -- The relationship type. `LOOKUP_TABLE` indicates a lookup table, and `LOOKUP_TABLE_VALUE` indicates a value in a lookup table.
  ,get_json_object(attribute_relation, '$.lookupTableId') as lookup_table_id  -- The ID of the lookup table.
  ,get_json_object(attribute_relation, '$.lookupTableName') as lookup_table_name  -- The lookup table name.
  ,get_json_object(attribute_relation, '$.lookupTableCode') as lookup_table_code  -- The lookup table code.
from ( 
    select
split(regexp_replace(regexp_replace(
    get_json_object(attributes_with_value_config , '$.attributeLookupTableRelations')
    , '\\[|\\]', ''),'(\\}(\t\r\v\f\n)*,)','}abcyugongcba'), 'abcyugongcba')  as attribute_lookup_table_relations
,*
from dim_dataphin_datastandard_standard
where ds='${bizdate}'
)  json_table
LATERAL VIEW explode(attribute_lookup_table_relations) exploded_table AS attribute_relation
;
Note
  • standard_status: standard status.

  • The possible values for standard_stage=PROD are:

    NOT_ACTIVATED: Not activated; ACTIVE: Active; EXPIRED: Expired.

  • The possible values for standard_stage=DEV are:

    DRAFT: Draft; UNDER_REVISION: Under revision; UNDER_REVIEW: Under review; REVIEW_PASSED: Review passed; PUBLISHING: Publishing.

create table if not exists dim_dataphin_datastandard_standard
(
    tenant_id                     bigint comment 'Tenant ID.',
    standard_id                   bigint comment 'Standard ID.',
    standard_name                 string comment 'Standard name.',
    standard_name_en              string comment 'Standard English name.',
    standard_code                 string comment 'Standard code.',
    standard_type                 string comment 'Standard type.',
    version                       bigint comment 'Version number.',
    standard_status               string comment 'Standard status. `NOT_ACTIVATED`: Not activated. `ACTIVE`: Active. `EXPIRED`: Expired.',
    standard_stage                string comment 'Standard stage. `DEV`: Draft, in review, or being published. `PROD`: In production.',
    standard_set_id               bigint comment 'Standard set ID.',
    standard_set_name             string comment 'Standard set name.',
    basic_info                    string comment 'Basic standard information.',
    standard_set_reference        string comment 'Associated standard set reference.',
    attributes_with_value_config  string comment 'Standard attribute configuration.',
    owner_id                      string comment 'Owner ID.',
    owner_name                    string comment 'Owner name.',
    creator_id                    string comment 'Creator ID.',
    creator_name                  string comment 'Creator name.',
    modifier_id                   string comment 'Last modifier ID.',
    modifier_name                 string comment 'Last modifier name.',
    gmt_create                    string comment 'Creation time.',
    gmt_modified                  string comment 'Last modification time.',
    monitor_config                string comment 'Monitoring configuration in JSON format.',
    standard_template_id          bigint comment 'Standard template ID.',
    standard_template_name        string comment 'Standard template name.',
    standard_template_reference   string comment 'Associated standard template reference.'
)comment 'Standard table.'
partitioned by (ds string comment 'Partition field.')
lifecycle 30;

dim_dataphin_datastandard_rule

create table if not exists dim_dataphin_datastandard_rule
(
    tenant_id                   bigint comment 'Tenant ID.',
    standard_rule_id            bigint comment 'Standard mapping rule ID.',
    standard_rule_name          string comment 'Standard mapping rule name.',
    standard_rule_desc          string comment 'Standard mapping rule description.',
    version                     bigint comment 'Version number.',
    asset_type                  string comment 'Asset type. INDEX for metric, COLUMN for column.',    
    --standard_set_id             bigint comment 'Deprecated. Standard set ID.',
    --standard_set_name           string comment 'Deprecated. Standard set name.',
    basic_info                  string comment 'Basic standard information.',
    evaluate_scope_config       string comment 'Evaluation scope configuration.',
    mapping_rule_config         string comment 'Standard mapping rule configuration.',
      is_valid                  string comment 'Indicates if the rule is active: Y for active, N for inactive.',
    owner_id                    string comment 'Owner ID.',
    owner_name                  string comment 'Owner name.',
    creator_id                  string comment 'Creator ID.',
    creator_name                string comment 'Creator name.',
    modifier_id                 string comment 'Last modifier ID.',
    modifier_name               string comment 'Last modifier name.',
    gmt_create                  string comment 'Creation time.',
    gmt_modified                string comment 'Last modification time.',
    standard_set_ids            string comment 'List of standard set IDs.',
    standard_template_id        bigint comment 'Standard template ID.',
    standard_template_name      string comment 'Standard template name.'
) comment 'Standard mapping rules.'
partitioned by (ds string comment 'Partition key.')
lifecycle 30;

dim_dataphin_datastandard_rule_asset_relation

Note
  • For data standard mapping results, we recommend using the dim_dataphin_datastandard_asset_relation table.

  • When a mapping rule runs, it first deletes existing mapping relationships and then regenerates them based on the latest configuration. If the meta warehouse shared model task and the standard mapping rule task run concurrently, data loss or inconsistencies may occur.

  • Starting with version V4.3.1, you can view standard mapping monitoring results for your existing mapping relationships.

  • When you delete a standard mapping rule, its data remains in the table, but the corresponding standard_rule_name is set to null.

select 
  tenant_id
  ,guid
  ,asset_type
  ,standard_id
  ,standard_name
  ,standard_name_en
  ,standard_rule_id
  ,standard_rule_name
  ,standard_set_id
  ,standard_set_name
  ,gmt_create
  ,monitor_status
  ,total_monitor_num  --Total number of monitoring rules.
  ,passed_monitor_num --Number of passed monitoring rules.
  ,get_json_object(single_column_monitor_result, '$.monitorResultStatus') as monitor_result_status --The status of the monitoring result. Valid values: `FAILED` (failed), `PASSED` (passed), `NOT_CONFIGURED` (not configured), and `WAIT_EVALUATE` (waiting for evaluation).
  ,get_json_object(single_column_monitor_result, '$.monitorType') as monitor_type  --The type of the monitoring rule. Valid values: `METADATA` (metadata) and `QUALITY` (data quality).
  ,get_json_object(single_column_monitor_result, '$.ruleName') as rule_name  --The name of the rule.
  ,get_json_object(single_column_monitor_result, '$.attributeId') as attribute_id  --The ID of the standard attribute.
  ,get_json_object(single_column_monitor_result, '$.attributeName') as attribute_name  --The name of the standard attribute.
  ,get_json_object(single_column_monitor_result, '$.attributeValue') as attribute_value  --The value of the standard attribute.
  ,get_json_object(single_column_monitor_result, '$.assetColumnValue') as asset_column_value  --The value of the asset object. Populated only when `monitor_type` = 'METADATA'.
  ,get_json_object(single_column_monitor_result, '$.qualityRuleId') as quality_rule_id  --The ID of the data quality rule. Populated only when `monitor_type` = 'QUALITY'.
  ,get_json_object(single_column_monitor_result, '$.recentExecuteHistoryId') as quality_rule_task_id  --The ID of the data quality task. Populated only when `monitor_type` = 'QUALITY'.
from ( 
    select
split(regexp_replace(regexp_replace(
    get_json_object(monitor_result , '$.simpleSingleColumnMonitorResults')
    , '\\[|\\]', ''),'(\\}(\t\r\v\f\n)*,)','}abcyugongcba'), 'abcyugongcba')  as all_column_monitor_results
,get_json_object(monitor_result , '$.totalMonitorNum') as total_monitor_num
,get_json_object(monitor_result , '$.passedMonitorNum') as passed_monitor_num
,*
from dim_dataphin_datastandard_rule_asset_relation
where ds='${bizdate}'
)  json_table
LATERAL VIEW explode(all_column_monitor_results) exploded_table AS single_column_monitor_result
;
create table if not exists dim_dataphin_datastandard_rule_asset_relation
(
    tenant_id                 bigint comment 'The tenant ID.',
    guid                			string comment 'The asset GUID.',
    asset_type               	string comment 'The asset type. Valid values: `INDEX` for a metric and `COLUMN` for a column. Other values are custom.',
    standard_id               bigint comment 'The standard ID.',
    standard_name							string comment 'The standard name.',
    standard_name_en					string comment 'The English name of the standard.',
    standard_status						string comment 'The status of the standard. Valid values: `NOT_ACTIVATED` for inactive, `ACTIVE` for active, and `EXPIRED` for published standards that have expired.',
    standard_rule_id          bigint comment 'The ID of the standard rule or evaluation job.',
    standard_rule_name        string comment 'The name of the standard rule. Populated only when `job_type` is `STANDARD_RULE`.',
    standard_set_id           bigint comment 'The ID of the standard set.',
    standard_set_name					string comment 'The name of the standard set.',
    gmt_create                string comment 'The creation time.',
    gmt_modified              string comment 'The last modified time.',
    asset_snapshot            string comment 'The asset snapshot.',
    standard_version          bigint comment 'The standard version.',
    monitor_result            string comment 'The monitoring result.',
    monitor_status            string comment 'The status of the monitoring result. Valid values: NOT_CONFIGURED, PASSED, and FAILED.',
    is_latest_rule            string comment 'Indicates whether this mapping is the result of the latest run of the standard rule for the given standard and asset pair. Valid values: `Y` and `N`.',
    job_type                  string comment 'The job type. Valid values: `STANDARD_RULE` for a standard rule and `EVALUATE_JOB` for an evaluation job.'
) comment 'This table maps standard rules to assets.'
partitioned by (ds string comment 'The partition field.')
lifecycle 30;

dim_dataphin_datastandard_asset_relation

Note

This table is for standard bid selection results.

create table if not exists dim_dataphin_datastandard_asset_relation
(
    tenant_id                   bigint comment 'Tenant ID',
    guid                				string comment 'Asset GUID',
    asset_type               		string comment 'Asset type. `INDEX`: indicator, `COLUMN`: column. Other values are user-defined.',
    asset_name                  string comment 'Asset name',
    asset_belong_name           string comment 'Name of the parent asset. For a column asset, this is the full table name.',
    standard_id                 bigint comment 'Standard ID',
    standard_name								string comment 'Standard name',
    standard_name_en						string comment 'English name of the standard',
    standard_status							string comment 'Status of the standard. `NOT_ACTIVATED`: not activated, `ACTIVE`: active, `EXPIRED`: expired. A standard that was published but has passed its expiration time.',
    standard_set_id             bigint comment 'Standard set ID',
    standard_set_name						string comment 'Standard set name',
    gmt_create                  string comment 'Create time',
    gmt_modified                string comment 'Last modified time',
    standard_template_id        bigint comment 'Standard template ID',
    standard_template_name      string comment 'Standard template name'
) comment 'Table for standard-to-asset mapping'
partitioned by (ds string comment 'The partition column.')
lifecycle 90
;

DataPhin: Data standard lookup table

create table if not exists dim_dataphin_datastandard_lookup_table
(
    tenant_id                    bigint comment 'Tenant ID',
    lookup_table_id              bigint comment 'Lookup table ID',
    lookup_table_name            string comment 'Lookup table name',
    lookup_table_code            string comment 'Lookup table code',
    lookup_table_des             string comment 'Lookup table description',
    lookup_table_value_name      string comment 'Lookup value name',
    lookup_table_value_value     string comment 'Lookup value',
    lookup_table_value_en_name   string comment 'Lookup value English name',
    lookup_table_value_des       string comment 'Lookup value description',
    lookup_table_directory       string comment 'Lookup table directory',
    owner_id                     string comment 'Owner ID',
    owner_name                   string comment 'Owner name',
    creator_id                   string comment 'Creator ID',
    creator_name                 string comment 'Creator name',
    modifier_id                  string comment 'Last modifier ID',
    modifier_name                string comment 'Last modifier name',
    gmt_create                   string comment 'Creation time',
    gmt_modified                 string comment 'Last modification time'
)
comment 'Contains metadata for data standard lookup tables.'
partitioned by (ds string comment 'Partition key in yyyymmdd format.')
lifecycle 30;

Real-time

dim_DataPhin_stream_table

create table if not exists dim_dataphin_stream_table 
(
    tenant_id               bigint comment 'ID of the tenant.',
    project_id              bigint comment 'ID of the project.',
    project_name            string comment 'Name of the project.',
    table_name              string comment 'Name of the metadata table.',
    env                     string comment 'The environment. Valid values are DEV and PROD.',
    owner_id                string comment 'ID of the table owner.',
    owner_name			    		string comment 'Name of the table owner.',
    compute_engine_id				bigint comment 'ID of the compute engine.',
    compute_engine_name			string comment 'Name of the compute engine.',
    compute_engine_type			string comment 'Type of the compute engine. Valid values are OPEN_FLINK, Flink, and VVP.',
    data_source_id          bigint comment 'ID of the data source.',
		data_source_name 	    	string comment 'Name of the data source.',
    data_source_type	    	string comment 'Type of the data source. Examples include MaxCompute, MySQL, SQL Server, Hive, ONE_SERVICE, PostgreSQL, FTP, EMR_HIVE, Oracle, HDFS, DRDS, AnalyticDB, MongoDB, Flink, AnalyticDB for PG, Hologres, and HBase 1.1.x.',
    src_table_name          string comment 'Name of the source table.',
    connector               string comment 'Connector.',
    gmt_create              string comment 'Creation time.',
    gmt_modified            string comment 'Modification time.'
) comment 'Metadata for real-time metadata tables.'
partitioned by (ds string comment 'Partition field, in yyyymmdd format.')
lifecycle 31 ;

dim_dataphin_stream_column

create table if not exists dim_dataphin_stream_column 
(
    tenant_id               bigint comment 'The ID of the tenant.',
    project_id              bigint comment 'The ID of the project.',
    project_name            string comment 'The name of the project.',
    table_name              string comment 'The name of the metatable.',
    column_name							string comment 'The name of the column.',
    column_type							string comment 'The data type of the column.',
    column_comment					string comment 'The comment for the column or partition key.',				
    column_seq							bigint comment 'The ordinal position of the column.',
    raw_column_type					string comment 'The original data type of the column.',
    env                     string comment 'The environment. Possible values are DEV (development) and PROD (production).',
    create_time             string comment 'Timestamp of record creation.',
    modify_time             string comment 'Timestamp of the last modification.'
) comment 'Stores metadata for columns in real-time metatables.'
partitioned by (ds string comment 'The partition field, formatted as yyyymmdd.')
lifecycle 31 ;

Label platform

dim_dataphin_featurex_market

create table if not exists dim_dataphin_featurex_market (
    tenant_id           bigint comment 'The tenant ID.',
    market_id           string comment 'The ID of the market.',
    market_name         string comment 'The name of the market.',
    market_type         bigint comment 'The market type. Set to 0 for a public market or 1 for a private market.',
    market_desc         string comment 'The description of the market.',
    owner_id            string comment 'The ID of the owner.',
    owner_name          string comment 'The name of the owner.',
    gmt_create          string comment 'The time when the record was created.',
    gmt_modified        string comment 'The time when the record was last modified.'
)comment 'Stores market information for the Dataphin label platform.'
partitioned by (ds string comment 'The partition field, in the `yyyymmdd` format.')
lifecycle 30;

dim_dataphin_featurex_lineage

create table if not exists dim_dataphin_featurex_lineage (
    tenant_id           bigint comment 'The tenant ID.',
    from_id             string comment 'The ID of the source resource.',
    from_version        bigint comment 'The version of the source resource.',
    from_type           string comment 'The type of the source resource. Valid values: view, feature, data_source, application, subscription, behavior, cluster, market, project, entity_identity, physical_table, profile, event, and view_field.',
    from_code           string comment 'The code of the source resource.',
    from_status			bigint comment 'The status of the source resource. Valid values: -1 (Publishing) | -3 (Submitting) | 0 (Editing) | 1 (Published) | 2 (Listed) | 3 (Submitted).',
    to_id				string comment 'The ID of the target resource.',
    to_type				string comment 'The type of the target resource. For valid values, see the description of `from_type`.',
    to_code				string comment 'The code of the target resource.',
    to_status			bigint comment 'The status of the target resource. For valid values, see the description of `from_status`.',
    gmt_create          string comment 'The time when the record was created.',
    gmt_modified        string comment 'The time when the record was last modified.'
)comment 'Stores data lineage information for the Dataphin label platform.'
partitioned by (ds string comment 'The partition field, in the `yyyymmdd` format.')
lifecycle 30;

Asset inventory

dim_dataphin_mdc_object

Note

The asset_sub_type field has the following values:

  • Data source: DATASOURCE_TABLE (data source table), DATASOURCE_VIEW (data source view), DATASOURCE_MATERIALIZED_VIEW (data source materialized view).

  • Compute source: PHYSICAL_TABLE (compute source physical table), PHYSICAL_VIEW (compute source physical view), PHYSICAL_MATERIALIZED_VIEW (compute source physical materialized view).

  • Logical table: DIM_NORMAL (standard dimension logical table), DIM_LEVEL (hierarchical dimension logical table), DIM_ENUM (enumerated dimension logical table), DIM_VIRTUAL (virtual dimension logical table), FACT_PROCESS (process fact logical table), FACT_EVENT (event fact logical table), FACT_SNAPSHOT (snapshot fact logical table), SUM_BIZ_UNIT (business unit summary logical table), LOGIC_VIEW (logical view).

create table if not exists dim_dataphin_mdc_object
(
    tenant_id                               bigint comment 'Tenant ID',
    asset_from                              string comment 'The source of the asset. `dataphin` indicates compute source tables and logical tables. `biz_system` indicates data source tables in business systems.',
    object_name                             string comment 'Table name',
    display_name                            string comment 'Display name',
    object_desc                             string comment 'Table description',
    asset_type                              string comment 'Asset type. Possible values: TABLE, COLUMN, INDEX, PROJECT, BIZ_UNIT, DATA_CELL, FUNCTION, DATASOURCE, DATASERVICE, DATA_STANDARD',
    asset_sub_type                          string comment 'Asset sub-type',
    project_id                              bigint comment 'Project ID',
    project_name                            string comment 'Project name',
    biz_unit_id                             bigint comment 'Business unit ID',
    biz_unit_name                           string comment 'Business unit name',
    data_source_id                          bigint comment 'Data source ID',
    data_source_name                        string comment 'Data source name',
    data_source_type                        string comment 'Data source type',
    data_source_desc                        string comment 'Data source description',
    biz_system_id                           bigint comment 'Associated business system ID',
    biz_system_name                         string comment 'Associated business system name',
    env                                     string comment 'Asset environment. Possible values: DEV, PROD',
    db_type                                 string comment 'Data source or compute source type',
    db_name                                 string comment 'Name of the database or project where the asset resides',
    schema_name                             string comment 'Name of the schema where the asset resides. This applies only to certain data sources or compute sources.',
    owner_id                                string comment 'Table owner ID',
    owner_name                              string comment 'Table owner name',
    is_from_logical                         string comment 'Specifies whether the physical table is generated from a standardized model (Y/N).',
    is_partitioned                          string comment 'Specifies whether the table is a partitioned table (Y/N).',
    is_external_table                       string comment 'Specifies whether the table is an external table (Y/N).',
    last_ddl_time                           string comment 'Last DDL modification time',
    last_dml_time                           string comment 'Last data modification time',
    last_access_time                        string comment 'Last access time',
    physical_data_size                      bigint comment 'Storage size in bytes',
    record_num                              bigint comment 'Record count',
    table_lifecycle                         string comment 'Table lifecycle',
    view_text                               string comment 'View definition',
    storage_format                          string comment 'Storage format. For Hive tables, this is the specific file format. For other sources, this is the data source type.',
    tags_list                               string comment 'Comma-separated list of asset inventory tags',
    create_time                             string comment 'Creation time',
    modify_time                             string comment 'Modification time',
    favorites_count                         bigint comment 'Number of favorites',
    pv_count                                bigint comment 'Number of page views',
    table_location                          string comment 'The storage path of the table. This field applies to all Hive tables and other external tables.',
    data_domain_id                         bigint comment 'Data domain ID',
    data_domain_name                       string comment 'Data domain code',
    data_domain_name_cn                    string comment 'Data domain name'
) comment 'Dataphin global table metadata'
partitioned by (ds string comment 'Partition field in yyyymmdd format')
lifecycle 30;

dim_dataphin_mdc_column

Note

asset_sub_type: Same as the asset_sub_type field in the dim_dataphin_mdc_object table.

create table if not exists dim_dataphin_mdc_column
(
    tenant_id                               bigint comment 'Tenant ID',
    asset_from                              string comment 'The source of the asset. `dataphin` indicates compute source tables and logical tables. `biz_system` indicates data source tables in business systems.',
    table_name                              string comment 'Table name',
    column_name                             string comment 'Column name',
    display_name                            string comment 'Display name',
    column_data_type                        string comment 'Column data type',
    column_comment                          string comment 'Column comment or partition key comment',
    column_seq                              bigint comment 'Column sequence number',
    column_key_type                         string comment 'Column key type. Applies only to certain data sources.',
    is_nullable                             string comment 'Specifies whether the column can contain null values (Y/N). Applies to data source tables.',
    default_value                           string comment 'Default value. Applies to data source tables.',
    is_partition_column                     string comment 'Specifies whether this is a partition field (Y/N).',
    partition_type                          string comment 'Partition type. Applies only to certain data sources.',
    is_primary_key                          string comment 'Specifies whether this is a primary key (Y/N). Applies to data source tables.',
    asset_type                              string comment 'Asset type. The value is always `TABLE`.',
    asset_sub_type                          string comment 'Asset sub-type',
    project_id                              bigint comment 'Project ID',
    project_name                            string comment 'Project name',
    biz_unit_id                             bigint comment 'Business unit ID',
    biz_unit_name                           string comment 'Business unit name',
    data_source_id                          bigint comment 'Data source ID',
    data_source_name                        string comment 'Data source name',
    data_source_type                        string comment 'Data source type',
    biz_system_id                           bigint comment 'Associated business system ID',
    biz_system_name                         string comment 'Associated business system name',
    env                                     string comment 'Asset environment. Possible values: DEV, PROD',
    db_type                                 string comment 'Data source or compute source type',
    db_name                                 string comment 'Name of the database or project where the asset resides',
    schema_name                             string comment 'Name of the schema where the asset resides. This applies only to certain data sources or compute sources.'
) comment 'Dataphin global metadata for columns'
partitioned by (ds string comment 'Partition field in yyyymmdd format')
lifecycle 15;

Asset directory

dim_dataphin_mdc_topic_directory

create table if not exists dim_dataphin_mdc_topic_directory(
    tenant_id                       bigint comment 'The tenant ID.',
    directory_id                    bigint comment 'The directory ID.',
    directory_name                  string comment 'The directory name.',
    directory_des                   string comment 'The directory description.',
    directory_parent_id             bigint comment 'The parent directory ID.',
    directory_parent_path           string comment 'The parent directory path.',
    directory_modifier_id           string comment 'The ID of the user who last modified the directory.',
    directory_modifier_name         string comment 'The name of the user who last modified the directory.',
    directory_gmt_create            string comment 'The time the directory was created.',
    directory_gmt_modified          string comment 'The time the directory was last updated.',
    topic_id                        bigint comment 'The asset topic ID.',
    topic_name                      string comment 'The asset topic name.',
    topic_des                       string comment 'The asset topic description.',
    topic_status                    string comment 'The status of the asset topic. Valid values: DRAFT (draft), IN_PUBLISH (publishing), or PUBLISHED (published).',
    topic_asset_type                string comment 'The asset type. Valid values: TABLE (table) or INDEX (technical index).',
    topic_admin_users               string comment 'A comma-separated list of user IDs of the asset topic administrators.',
    topic_visibility_type           string comment 'The visibility scope of the asset topic. Valid values: PUBLIC (visible to all users) or ADMINS_AND_SPECIFIED (visible only to administrators and specified users).',
    topic_visible_users             string comment 'A comma-separated list of user IDs allowed to view the asset topic.',
    topic_visible_user_groups       string comment 'A comma-separated list of user group IDs allowed to view the asset topic.',
    topic_enable_publish_approve    string comment 'Specifies whether publish approval is enabled (`true`) or disabled (`false`).',
    topic_modifier_id               string comment 'The ID of the user who last modified the asset topic.',
    topic_modifier_name             string comment 'The name of the user who last modified the asset topic.',
    topic_gmt_create                string comment 'The time the asset topic was created.',
    topic_gmt_modified              string comment 'The time the asset topic was last updated.'
)comment 'Stores information about asset topic directories.'
partitioned by (ds string comment 'The business date, in yyyymmdd format.')
lifecycle 15;

dim_dataphin_mdc_list_object

Note
  • Version 4.5 includes information for table assets only. Versions 5.4 and later also include information for technical and business indexes.

  • owner_user_id: The asset owner ID when the asset was put on-shelf.

  • owner_user_name: The asset owner name when the asset was put on-shelf.

  • asset_sub_type: Same as the asset_sub_type field in the dim_dataphin_mdc_object table.

create table if not exists dim_dataphin_mdc_list_object
(
    tenant_id                               bigint comment 'The tenant ID.',
    name                                    string comment 'The object name.',
    display_name                            string comment 'The display name.',
    full_name                               string comment 'The full name of the object.',
    asset_type                              string comment 'The asset type. Valid values: TABLE, COLUMN, PROJECT, BIZ_UNIT, DATA_CELL, FUNCTION, DATASOURCE, DATASERVICE, DATA_STANDARD, INDEX, and BIZ_INDEX.',
    asset_sub_type                          string comment 'The asset subtype. For indexes, valid values include: INDEX (standardized modeling technical index), CUSTOM_INDEX (custom technical index), and BIZ_INDEX (business index). Other subtypes are not documented.',
    env                                     string comment 'The asset environment. Valid values: DEV and PROD.',
    asset_source                            string comment 'The asset source.',
    project_id                              bigint comment 'The project ID.',
    project_name                            string comment 'The project name.',
    biz_unit_id                             bigint comment 'The business unit ID.',
    biz_unit_name                           string comment 'The business unit name.',
    data_source_id                          bigint comment 'The data source ID.',
    data_source_name                        string comment 'The data source name.',
    data_source_type                        string comment 'The data source type.',
    directory_ids                           string comment 'A comma-separated list of IDs of the directories to which the asset belongs.',
    directory_names                         string comment 'A comma-separated list of names of the directories to which the asset belongs.',
    topic_ids                               string comment 'A comma-separated list of IDs of the asset topics to which the asset belongs.',
    topic_names                             string comment 'A comma-separated list of names of the asset topics to which the asset belongs.',
    tags_list                               string comment 'A comma-separated list of tags for the on-shelf asset.',
    view_scope_type                         string comment 'The visibility scope for the on-shelf asset. Valid values: ALL_USERS_CAN_VIEW (visible to all users), PART_USERS_CAN_VIEW (visible to some users), PART_USERS_CAN_NOT_VIEW (not visible to some users), or ALL_USERS_CAN_NOT_VIEW (not visible to any user).',
    view_scope_user_ids                     string comment 'A comma-separated list of user IDs referenced by the visibility scope.',
    view_scope_user_groups                  string comment 'A comma-separated list of user group IDs referenced by the visibility scope.',
    asset_from_is_deleted                   string comment 'Specifies if the source asset has been deleted (`Y` or `N`). This field is deprecated.',
    owner_user_id                           string comment 'The asset owner ID.',
    owner_user_name                         string comment 'The asset owner name.',
    on_shelve_time                          string comment 'The time the asset was last put on-shelf.',
    db_type                                 string comment 'The type of the data source or compute engine.',
    db_name                                 string comment 'The name of the database or project where the asset resides.',
    schema_name                             string comment 'The name of the schema to which the asset belongs. This field is populated only for certain data sources or compute engines.',
    favorites_count                         bigint comment 'The favorites count.',
    pv_count                                bigint comment 'The PV count.',
    shelve_description                      string comment 'The description of the on-shelf asset.',
    custom_attribute                        string comment 'The custom attributes, in JSON format.',
    first_on_shelve_time                    string comment 'The time the asset was first put on-shelf.',
    on_shelve_user_id                       string comment 'The user ID for the last on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.',
    on_shelve_user_name                     string comment 'The user name for the last on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.',
    index_guid                              string comment 'The GUID of the index.',
    index_logic                             string comment 'The index logic. This field is not populated for custom technical indexes.',
    index_attribute                         string comment 'The attribute information for the index, in JSON format. For attribute information of some standardized modeling indexes, query the dim_dataphin_derived_index table.',
    index_belong_table_name                 string comment 'The name of the table to which the index belongs. This field is not populated for business indexes.',
    biz_index_related_tech_index_name_list  string comment 'A comma-separated list of names of associated technical indexes. To retrieve details, query this table using the index name and the tenant_id.',
    biz_index_related_biz_index_name_list   string comment 'A comma-separated list of names of associated business indexes. To retrieve details, query this table using the index name and the tenant_id.',
    biz_index_related_biz_index_relation_type_list string comment 'A comma-separated list of relationship types for the associated business indexes. Valid values: POSITIVE (positive correlation), NEGATIVE (negative correlation), or OTHER (other).',
    first_on_shelve_user_id                 string comment 'The user ID for the first on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.',
    first_on_shelve_user_name               string comment 'The user name for the first on-shelf operation. For scheduled automatic operations, this is the rule owner; for manual operations, this is the operator.'
) comment 'Stores information about on-shelf assets.'
partitioned by (ds string comment 'The business date, in the yyyymmdd format.')
lifecycle 15;

Signal tables

data_share_finish

create table if not exists data_share_finish
(
    module_name         string comment 'Module name: data_share',
    finish_time         string comment 'Completion time'
)comment 'Signals the completion of processing for basic shared metadata.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format') 
lifecycle 360;

data_security_finish

create table if not exists data_security_finish
(
    module_name         string comment 'Module name: data_security',
    finish_time         string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data security module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format') 
lifecycle 360;

data_service_finish

create table if not exists data_service_finish
(
    module_name         string comment 'Module name: data_service',
    finish_time         string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data service module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format') 
lifecycle 360;

data_quality_finish

create table if not exists data_quality_finish
(
    module_name         string comment 'Module name: data_quality',
    finish_time         string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data quality module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format') 
lifecycle 360;

data_standard_finish

create table if not exists data_standard_finish
(
    module_name         string comment 'Module name: data_standard',
    finish_time         string comment 'Completion time'
)comment 'Signals the completion of metadata processing for the data standard module.'
partitioned by (ds string comment 'Partition key, in yyyymmdd format') 
lifecycle 360;

data_featurex_finish

create table if not exists data_featurex_finish (
    module_name         string comment 'Module name: data_featurex', 
    finish_time         string comment 'Completion time'
) comment 'Signals the completion of metadata processing for the tag platform module.'
PARTITIONED BY (ds string comment 'Partition key, in yyyymmdd format')
lifecycle 360;
上一篇: Standard access method for metadata warehouse shared model 下一篇: Metadata warehouse sharing model (for Hadoop, StarRocks, and other compute engines)
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈