DataWorks Open Data provides a centralized and unified collection of metadata from the DataWorks platform. Without complex configuration, you can quickly obtain standardized and traceable metadata by authorizing access through MaxCompute Package views. It provides detailed metadata for tables, task nodes and instances, workspaces, members, projects, Data Quality, and data assets to support data governance and analysis.
Use cases
DataWorks Open Data is a fully upgraded, publicly available version of the previous invitation-only Open Data. The new version replaces the previous command-line method with a visual interface, allowing you to manage metadata more intuitively and efficiently.
You can use Open Data for the following:
-
Data inventory: Obtain clear statistics on data objects, such as tables and tasks, that are managed by workspaces and their owners. Understand data structures, sources, update frequencies, and dependencies to eliminate data black boxes.
-
End-to-end lineage tracking: Query data lineage metadata to trace the upstream and downstream paths of specific tables. This allows you to trace the complete data flow from source to application, helping you quickly locate issues or analyze their impact.
-
Custom metadata analysis: In addition to the existing OpenAPI, you can directly access and query metadata by using SQL. This simplifies the analysis process and shortens the data governance lifecycle.
Prerequisites
You must have a DataWorks workspace that is bound to MaxCompute computing resources.
Limitations
-
Edition requirements: Only DataWorks Enterprise Edition and later versions are supported.
-
Permission limits: Only users with the tenant owner, tenant administrator, or data governance administrator role, or RAM users granted the
DataWorksFullAccesspermission, can install and uninstall DataWorks Open Data. -
Authorization limits: Authorization for metadata views can be granted only through MaxCompute. Other resource types are not supported.
-
Data update latency: Metadata is updated with a T+1 latency. This means that you can query the statistics from the previous day.
Manage Open Data
You can view, install, and use DataWorks Open Data.
View Open Data
Understand the metadata and use cases for packages in Open Data to select the ones that meet your needs.
-
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. On the page that appears, click Go to Data Map.
-
In the left-side navigation pane of the Data Map page, click the
icon to go to the Data Catalog page. -
In the catalog list, click DataWorks Open Data to go to the DataWorks Open Data page.
-
Switch to the Package List tab, then click Details for the desired package. In the Table List, you can view the metadata tables and their descriptions.
-
Differences between MetaData and Examples metadata:
Package name
Open Data
Use cases
MetaData
Table information, such as table schema, field descriptions, and data lineage.
Scheduling nodes and instances, such as task execution status and dependencies.
Management metadata, such as workspaces, members, and projects.
Metadata such as Data Quality details and data governance details.
Suitable for data asset inventory, data lineage analysis, and dependency management.
Examples
A collection of metadata metric data.
NoteThe Examples package is provided by DataWorks and contains a collection of sample metric tables generated from metadata. The actual data is subject to change based on specific business requirements. The data displayed on the UI prevails.
Combined with the raw data from the MetaData package, you can quickly build common analysis scenarios, such as resource utilization analysis and task health assessment.
ImportantFor details about the schemas and fields of the metadata tables, see Details of Open Data table schemas.
-
-
To learn more about a metadata table, click Details for the table. On the table details page, you can view the Fields and Description to quickly understand the relationships between the Open Data metadata tables.
Install Open Data
Based on your business needs, you can install the relevant data views within your tenant as a package. This process authorizes the MaxCompute computing resources of a specified workspace to access these views.
-
Go to the Open Data page. On the Package List tab, select the target package (Metadata or Examples) and click Details in the Actions column.
-
On the package page, install the package that you want to use.
-
First-time installation: In the upper-right corner of the package page, click Load.
-
If an installation record exists: On the Installation History tab of the package page, click Load.
-
-
In the Install DataWorks Metadata dialog box, select the target workspace and the MaxCompute project to which you want to grant permissions.
Note-
The authorized MaxCompute project is the compute engine that is bound to the target workspace.
-
If the MaxCompute compute engine that is bound to the target workspace is in the Installed state, you do not need to install it again.
-
For workspaces in standard mode, we recommend installing and granting permissions to the MaxCompute compute engines for both the development and production environments.
-
-
Read the installation notes, select the confirmation checkbox, and then click Confirm Installation. After the installation is successful, you can view the new installation record on the Installation History tab.
To use metadata from other packages in the MaxCompute compute engine of the target workspace, you can follow the steps above to install and grant permissions.
Use Open Data
After you install Open Data for the MaxCompute computing resources of the target workspace, you can directly access the authorized metadata views in Data Studio or DataAnalysis by using the compute engine of that workspace.
-
In the left-side navigation pane of the Data Map page, click the
icon to go to the Data Catalog page. -
In the catalog list, click MaxCompute, and then click the package whose name is prefixed with
u_metaand suffixed with the region name. -
On the details page, click or to go to the corresponding module to query and use the authorized metadata views.
-
Use in Data Studio:
-
In the top menu bar of Data Studio, switch to the region and workspace where Open Data is installed.
-
Create a MaxCompute node. On the node editing page, you can use Open Data to develop tasks.
-
You can use the following sample code to verify that the installation is successful.
SELECT dt ,COUNT(*) AS database_count FROM u_meta_hangzhou.databases GROUP BY dt ORDER BY dt ASC LIMIT 32 ;Important-
When you test the code, you must use the authorized MaxCompute computing resources.
-
Replace
u_meta_hangzhouwith the name of the package in your MaxCompute data catalog. The package name is prefixed withu_metaand suffixed with the region name.
-
-
-
Use in DataAnalysis:
-
The SQL query page in DataAnalysis provides sample metadata analysis scripts that you can modify for your use.
-
Click the
icon in the upper-right corner of the SQL query page. Select your authorized workspace and the data source that was automatically created when you bound the MaxCompute computing resources. The data source has the same name as the compute engine. Then, you can run queries on the authorized metadata in DataAnalysis.NoteBefore you run the sample script, replace the
REPLACE_WITH_WORKSPACE_IDparameter with the ID of the workspace that you want to query. Otherwise, an error occurs. For information about how to obtain a workspace ID, see Configure a workspace.
-
-
Uninstall Open Data
If you no longer need to use Open Data or want to revoke authorization from a MaxCompute project in a workspace, you can uninstall the corresponding Open Data package.
-
In the left-side navigation pane of the Data Map page, click the
icon to go to the Data Catalog page. -
In the catalog list, select MaxCompute and then click the package whose name is prefixed with
u_metaand suffixed with the region name. -
On the Accessible Projects tab of the details page, find the target project and click Uninstall in the Actions column. In the Confirm Uninstallation of DataWorks Metadata dialog box that appears, read the uninstallation notes, select the confirmation checkbox, and then click Confirm Uninstallation.
ImportantUninstalling revokes permissions and causes dependent tasks to fail. Ensure that no active tasks rely on these views before you proceed.
FAQ
-
Q: How does the metadata update latency affect usage?
A: Metadata is generated with a T+1 latency, which means that it reflects the activity from the previous day. For real-time metadata, we recommend using the DataWorks OpenAPI.
-
Q: Can I uninstall DataWorks Open Data after it is installed?
A: Yes. Uninstalling revokes authorization for the views. Ensure that no active tasks depend on these views to avoid disruptions.
-
Q: How do I ensure metadata security?
A: Use MaxCompute data access controls to manage the access scope and prevent sensitive metadata from being shared with unauthorized teams.