Manage Open Data

更新时间:
复制 MD 格式

DataWorks Open Data provides a centralized and unified collection of metadata from the DataWorks platform. Without complex configuration, you can quickly obtain standardized and traceable metadata by authorizing access through MaxCompute Package views. It provides detailed metadata for tables, task nodes and instances, workspaces, members, projects, Data Quality, and data assets to support data governance and analysis.

Use cases

DataWorks Open Data is a fully upgraded, publicly available version of the previous invitation-only Open Data. The new version replaces the previous command-line method with a visual interface, allowing you to manage metadata more intuitively and efficiently.

You can use Open Data for the following:

  • Data inventory: Obtain clear statistics on data objects, such as tables and tasks, that are managed by workspaces and their owners. Understand data structures, sources, update frequencies, and dependencies to eliminate data black boxes.

  • End-to-end lineage tracking: Query data lineage metadata to trace the upstream and downstream paths of specific tables. This allows you to trace the complete data flow from source to application, helping you quickly locate issues or analyze their impact.

  • Custom metadata analysis: In addition to the existing OpenAPI, you can directly access and query metadata by using SQL. This simplifies the analysis process and shortens the data governance lifecycle.

Prerequisites

You must have a DataWorks workspace that is bound to MaxCompute computing resources.

Limitations

  • Edition requirements: Only DataWorks Enterprise Edition and later versions are supported.

  • Permission limits: Only users with the tenant owner, tenant administrator, or data governance administrator role, or RAM users granted the DataWorksFullAccess permission, can install and uninstall DataWorks Open Data.

  • Authorization limits: Authorization for metadata views can be granted only through MaxCompute. Other resource types are not supported.

  • Data update latency: Metadata is updated with a T+1 latency. This means that you can query the statistics from the previous day.

Manage Open Data

You can view, install, and use DataWorks Open Data.

View Open Data

Understand the metadata and use cases for packages in Open Data to select the ones that meet your needs.

  1. Log on to the DataWorks console. In the target region, click Data Governance > Data Map in the left-side navigation pane. On the page that appears, click Go to Data Map.

  2. In the left-side navigation pane of the Data Map page, click the image icon to go to the Data Catalog page.

  3. In the catalog list, click DataWorks Open Data to go to the DataWorks Open Data page.

  4. Switch to the Package List tab, then click Details for the desired package. In the Table List, you can view the metadata tables and their descriptions.

    • Differences between MetaData and Examples metadata:

      Package name

      Open Data

      Use cases

      MetaData

      Table information, such as table schema, field descriptions, and data lineage.

      Scheduling nodes and instances, such as task execution status and dependencies.

      Management metadata, such as workspaces, members, and projects.

      Metadata such as Data Quality details and data governance details.

      Suitable for data asset inventory, data lineage analysis, and dependency management.

      Examples

      A collection of metadata metric data.

      Note

      The Examples package is provided by DataWorks and contains a collection of sample metric tables generated from metadata. The actual data is subject to change based on specific business requirements. The data displayed on the UI prevails.

      Combined with the raw data from the MetaData package, you can quickly build common analysis scenarios, such as resource utilization analysis and task health assessment.

      Important

      For details about the schemas and fields of the metadata tables, see Details of Open Data table schemas.

  5. To learn more about a metadata table, click Details for the table. On the table details page, you can view the Fields and Description to quickly understand the relationships between the Open Data metadata tables.

Install Open Data

Based on your business needs, you can install the relevant data views within your tenant as a package. This process authorizes the MaxCompute computing resources of a specified workspace to access these views.

  1. Go to the Open Data page. On the Package List tab, select the target package (Metadata or Examples) and click Details in the Actions column.

  2. On the package page, install the package that you want to use.

    • First-time installation: In the upper-right corner of the package page, click Load.

    • If an installation record exists: On the Installation History tab of the package page, click Load.

  3. In the Install DataWorks Metadata dialog box, select the target workspace and the MaxCompute project to which you want to grant permissions.

    Note
    • The authorized MaxCompute project is the compute engine that is bound to the target workspace.

    • If the MaxCompute compute engine that is bound to the target workspace is in the Installed state, you do not need to install it again.

    • For workspaces in standard mode, we recommend installing and granting permissions to the MaxCompute compute engines for both the development and production environments.

  4. Read the installation notes, select the confirmation checkbox, and then click Confirm Installation. After the installation is successful, you can view the new installation record on the Installation History tab.

To use metadata from other packages in the MaxCompute compute engine of the target workspace, you can follow the steps above to install and grant permissions.

Use Open Data

After you install Open Data for the MaxCompute computing resources of the target workspace, you can directly access the authorized metadata views in Data Studio or DataAnalysis by using the compute engine of that workspace.

  1. In the left-side navigation pane of the Data Map page, click the image icon to go to the Data Catalog page.

  2. In the catalog list, click MaxCompute, and then click the package whose name is prefixed with u_meta and suffixed with the region name.

  3. On the details page, click Use Now > DataStudio or Use Now > DataAnalysis to go to the corresponding module to query and use the authorized metadata views.

    • Use in Data Studio:

      1. In the top menu bar of Data Studio, switch to the region and workspace where Open Data is installed.

      2. Create a MaxCompute node. On the node editing page, you can use Open Data to develop tasks.

      3. You can use the following sample code to verify that the installation is successful.

        SELECT  dt
                ,COUNT(*) AS database_count
        FROM    u_meta_hangzhou.databases
        GROUP BY dt
        ORDER BY dt ASC
        LIMIT   32
        ;
        Important
        • When you test the code, you must use the authorized MaxCompute computing resources.

        • Replace u_meta_hangzhou with the name of the package in your MaxCompute data catalog. The package name is prefixed with u_meta and suffixed with the region name.

    • Use in DataAnalysis:

      1. The SQL query page in DataAnalysis provides sample metadata analysis scripts that you can modify for your use.

      2. Click the image icon in the upper-right corner of the SQL query page. Select your authorized workspace and the data source that was automatically created when you bound the MaxCompute computing resources. The data source has the same name as the compute engine. Then, you can run queries on the authorized metadata in DataAnalysis.

        Note

        Before you run the sample script, replace the REPLACE_WITH_WORKSPACE_ID parameter with the ID of the workspace that you want to query. Otherwise, an error occurs. For information about how to obtain a workspace ID, see Configure a workspace.

Uninstall Open Data

If you no longer need to use Open Data or want to revoke authorization from a MaxCompute project in a workspace, you can uninstall the corresponding Open Data package.

  1. In the left-side navigation pane of the Data Map page, click the image icon to go to the Data Catalog page.

  2. In the catalog list, select MaxCompute and then click the package whose name is prefixed with u_meta and suffixed with the region name.

  3. On the Accessible Projects tab of the details page, find the target project and click Uninstall in the Actions column. In the Confirm Uninstallation of DataWorks Metadata dialog box that appears, read the uninstallation notes, select the confirmation checkbox, and then click Confirm Uninstallation.

    Important

    Uninstalling revokes permissions and causes dependent tasks to fail. Ensure that no active tasks rely on these views before you proceed.

FAQ

  • Q: How does the metadata update latency affect usage?

    A: Metadata is generated with a T+1 latency, which means that it reflects the activity from the previous day. For real-time metadata, we recommend using the DataWorks OpenAPI.

  • Q: Can I uninstall DataWorks Open Data after it is installed?

    A: Yes. Uninstalling revokes authorization for the views. Ensure that no active tasks depend on these views to avoid disruptions.

  • Q: How do I ensure metadata security?

    A: Use MaxCompute data access controls to manage the access scope and prevent sensitive metadata from being shared with unauthorized teams.