To use Open Data for metadata collection, you must first install the Open Data package in your project. This topic describes the installation and verification process.
DataWorks has launched a new Open Data feature that supports visualization for intuitive and efficient metadata management. For more information, see Manage Open Data.
Recommendation
The DataWorks Open Data feature has transitioned from its early invitation-only phase to a stable, generally available (GA) release. The GA version provides visualization for easier metadata browsing and management. This document is being retired. We recommend that you use the GA version. For instructions, see Manage Open Data.
Limitations
-
The Open Data feature is available only in DataWorks Enterprise Edition. With this invitation-only version, you can only install the Open Data package from the command line.
-
Open Data is supported only in the MaxCompute compute engine.
Install the Open Data package
-
Obtain authorization.
Before you install the Open Data package, you must obtain authorization. You can submit a ticket to provide your project information to DataWorks technical support. After DataWorks technical support authorizes your project, you can install the package to collect metadata.
NoteWhen you submit a ticket, provide your Alibaba Cloud account ID, workspace name, workspace ID, and the region where the workspace resides.
We recommend that the project meets the following requirements for authorization:
-
The project has a robust permission management system to prevent data leaks from the unauthorized sharing of metadata.
-
The project can be shared within your company or team. This allows metadata from DataWorks to be shared among different data development teams.
-
-
Install the Open Data package as the project owner.
This example shows how to install the package in DataStudio. Log on to the DataStudio page of the authorized project and run the installation commands on a MaxCompute node.
Note-
If the authorized project is in standard mode, where the development environment and production environment are isolated, you must install the DataWorks Open Data package in both environments.
-
In the installation commands, replace the u_meta project name and the Open Data package name based on the installation region. The following commands show how to install the Open Data package for the China (Hangzhou) region.
-- Install the Open Data package for the China (Hangzhou) region in the development environment (work_test_2_dev). INSTALL PACKAGE u_meta_hangzhou.systables; -- Install the Open Data package for the China (Hangzhou) region in the production environment (work_test_2). USE work_test_2; INSTALL PACKAGE u_meta_hangzhou.systables;Parameters:
-
u_meta_hangzhou: The name of the u_meta project for the Open Data package in the China (Hangzhou) region.
-
systables: The name of the Open Data package. After installation, you can use its tables and views to collect various types of metadata, including partition and table lineage metadata.
Alibaba Cloud provides Open Data packages for multiple regions. The u_meta project name varies by region. For a list of u_meta project names for each region, see Appendix 2: List of available projects. You must install the correct Open Data package for the region where you intend to use Open Data. Replace the u_meta project name with the one for your region.
-
-
Verify the installation.
In the list of installed packages, find the package where the PackageName and SourceProject match those from the previous step. If the Status is OK, the Open Data package is installed successfully.
-- Check if u_meta_hangzhou.systables is in the list of installed packages for the project. SHOW PACKAGES; -- Sample output +-------------+-----------------+--------------------------+--------+ | PackageName | SourceProject | InstallTime | Status | +-------------+-----------------+--------------------------+--------+ | systables | u_meta_hangzhou | 2020-11-26T15:25:22+0800 | OK | +-------------+-----------------+--------------------------+--------+
View Open Data tables and views
When you use Open Data, you must enter the full name of the table or view. The names may differ between versions of the Open Data package.
Run the following command to list the tables and views in your Open Data installation, along with their names and permissions.
DESCRIBE PACKAGE u_meta_hangzhou.systables;
The following example shows how to run the query command in DataStudio and the resulting output.
The names of tables and views in Open Data include version information, which changes with feature iterations. The version number that you use must be based on the one in the systables package, as it is the definitive source. For example, in the view name raw_v_tenant_user_v1_1, v1_1 is the version number.
-- View the tables or views contained in the u_meta_hangzhou.systables package.
DESCRIBE PACKAGE u_meta_hangzhou.systables;
-- Sample output
CreateTime: 2020-11-18T20:17:24+0800
PackageName: systables
SourceProject: u_meta_hangzhou
Object List
+------------+-----------------------------------+------------------+
| ObjectType | ObjectName | ObjectPrivileges |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_biz_table_wiki_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_column_usage_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_column_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_database_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_partition_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_detail_log_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_join_map_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_lineage_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_output_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_usage_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_view_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_di_resgroup_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_node_relation_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_node_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_task_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_res_group_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_user_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_workspace_user_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_workspace_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | rpt_v_meta_ind_table_core_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | rpt_v_meta_ind_table_extra_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
View schema and field descriptions
Run the following command to view the schema and field descriptions of a specific table or view.
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_1;
Here, rpt_v_meta_ind_table_core_v1_0 is the name of the table or view that you want to view. You can obtain the actual name of the table or view from View tables or views provided by DataWorks Open Data.
DataWorks provides data across multiple dimensions, including but not limited to detail data, metric data, and dimensional data. You can selectively retrieve and save data to your DataWorks project based on your needs. This data can be used as historical data for data governance or O&M to avoid unnecessary storage consumption.
The following example shows how to use DataStudio to view the rpt_v_meta_ind_table_core view, showing the command and its output.
-- View the field descriptions for the rpt_v_meta_ind_table_core view.
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_1;
-- Sample output
+------------------------------------------------------------------------------------+
| Owner: ALIYUN$dataworks-datagovernance | Project: u_meta_hangzhou |
| TableComment: Core metrics for tables in the metadata module |
+------------------------------------------------------------------------------------+
| CreateTime: 2020-12-07 20:02:53 |
| LastDDLTime: 2020-12-07 20:02:53 |
| LastModifiedTime: 2020-12-07 20:02:53 |
+------------------------------------------------------------------------------------+
| VirtualView : YES | ViewText: CREATE OR REPLACE VIEW rpt_v_meta_ind_table_core_v1_1 (@param_biz_date STRING)
RETURNS @ret_result TABLE (
tenant_id BIGINT COMMENT 'Dimension: DataWorks tenant ID',
project_id BIGINT COMMENT 'Dimension: DataWorks project (workspace) ID',
catalog_name STRING COMMENT 'Dimension: Catalog. For MaxCompute projects, the value is odps.',
database_name STRING COMMENT 'Dimension: Database or MaxCompute project name',
table_name STRING COMMENT 'Dimension: Table name',
table_uuid STRING COMMENT 'Dimension: Unique table identifier',
owner_yun_acct STRING COMMENT 'Dimension: Alibaba Cloud account of the table owner',
dim_life_cycle BIGINT COMMENT 'Dimension: Lifecycle period. 0 indicates that no lifecycle is set; other values represent specific periods.',
is_partition_table BOOLEAN COMMENT 'Dimension: Specifies whether the table is partitioned. true indicates a partitioned table.',
entity_type BIGINT COMMENT 'Dimension: Entity type. 0: table, 1: view.',
categories STRING COMMENT 'Dimension: Category information',
last_access_time BIGINT COMMENT 'Dimension: Last access time of the table (10-digit UNIX timestamp).',
`size` BIGINT COMMENT 'Table size. This refers to the logical storage size of the data. The value is NULL for a view.',
column_count BIGINT COMMENT 'The column count, including partition columns.',
partition_count BIGINT COMMENT 'The partition count. This value is NULL for non-partitioned tables.',
detail_view_count BIGINT COMMENT 'The number of times the table details were viewed in the UI.',
favorite_count BIGINT COMMENT 'The number of times the table has been added to favorites.',
biz_date STRING COMMENT 'The business date of the data.'
) COMMENT 'Core metrics for tables in the metadata module' AS
SELECT * FROM u_meta_hangzhou.rpt_v_meta_ind_table_core_proxy(@param_biz_date) |
+------------------------------------------------------------------------------------+
| Native Columns: |
+------------------------------------------------------------------------------------+
| Field | Type | Label | Comment |
+------------------------------------------------------------------------------------+
| tenant_id | bigint | | Dimension: DataWorks tenant ID |
| project_id | bigint | | Dimension: DataWorks project (workspace) ID |
| catalog_name | string | | Dimension: Catalog. For MaxCompute projects, the value is odps. |
| database_name | string | | Dimension: Database or MaxCompute project name |
| table_name | string | | Dimension: Table name |
| table_uuid | string | | Dimension: Unique table identifier |
| owner_yun_acct | string | | Dimension: Alibaba Cloud account of the table owner |
| dim_life_cycle | bigint | | Dimension: Lifecycle period. 0 indicates that no lifecycle is set; other values represent specific periods. |
| is_partition_table | boolean | | Dimension: Specifies whether the table is partitioned. true indicates a partitioned table. |
| entity_type | bigint | | Dimension: Entity type. 0: table, 1: view. |
| categories | string | | Dimension: Category information |
| last_access_time | bigint | | Dimension: Last access time of the table (10-digit UNIX timestamp). |
| size | bigint | | Table size. This refers to the logical storage size of the data. The value is NULL for a view. |
| column_count | bigint | | The column count, including partition columns. |
| partition_count | bigint | | The partition count. This value is NULL for non-partitioned tables. |
| detail_view_count | bigint | | The number of times the table details were viewed in the UI. |
| favorite_count | bigint | | The number of times the table has been added to favorites. |
| biz_date | string | | The business date of the data. |
+------------------------------------------------------------------------------------+
For more details on table and view schemas, see Appendix 1: Lists and structure details of metadata tables.
Uninstall the Open Data package
This example shows how to uninstall the Open Data package in DataStudio.
If the authorized project is in standard mode, where the development environment and production environment are isolated, you must uninstall the DataWorks Open Data package from both environments.
-- Uninstall from the development project (work_test_2_dev).
UNINSTALL PACKAGE u_meta_hangzhou.systables;
-- Uninstall from the production project (work_test_2).
USE work_test_2;
UNINSTALL PACKAGE u_meta_hangzhou.systables;
Next steps
After you install the Open Data package, you can use Open Data to collect metadata to support data governance or O&M. For more information about how to use Open Data, see Use Open Data.