DataWorks data source updates

更新时间:
复制 MD 格式

Dear DataWorks users,

To provide a more unified product experience, starting October 20, 2023, we will consolidate several compute engines within DataWorks. The MaxCompute, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and ClickHouse compute engines will be merged into data source management. The E-MapReduce and CDH engines will be merged into open source cluster management. This document explains the key changes.

Engines merged into data source management

The MaxCompute, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and ClickHouse compute engines in DataWorks will be merged into data source management. The changes are as follows:

  • Creating a compute engine: We have decommissioned the previous interface for binding compute engine instances. To create a compute engine, you must now create a data source. Previously, you would navigate to the Workspace page, select the Compute Engine Information tab, and then click Add Instance. Now, you must go to the Data Source page and click Add Data Source to create and manage the corresponding compute engine data source.

    Note

    Data sources created across different regions, with a different Alibaba Cloud account, or by using an AccessKey pair cannot be used for data development or task scheduling. They can only be used for data synchronization.

    To use a data source for data development, you must bind it in DataStudio after creating it. In the previous interface, you would click Computing Resource in the left-side navigation pane to access the compute resource management page, view the list of created compute resources, and click the Bind button next to a resource to bind it to the workspace.

  • Bound compute engines: Previously bound compute engines are now managed in Computing Resource.

  • Editing a compute engine: You can no longer directly edit a compute engine. To make changes, you must edit the corresponding data source.

  • Unbinding a compute engine: A workspace administrator can directly unbind a compute engine. This action does not delete the underlying data source.

Engines merged into open source cluster management

The E-MapReduce and CDH engines in DataWorks will be merged into open source cluster management. The changes are as follows:

  • Creating a compute engine: We have decommissioned the previous interface for binding compute engine instances. You must now create these compute engines by registering a cluster. After you register the cluster, you can use it for data development tasks.

    Note

    Clusters registered across different regions, with a different Alibaba Cloud account, or by using an AccessKey pair cannot be used for data development or task scheduling. They can only be used for data synchronization.

    Previously, the path was Workspace > Compute Engine Information > Add Instance. The new entry point is the Open Source Cluster page in the left-side navigation pane. Click Register Cluster and select the cluster type (CDH or E-MapReduce) in the pop-up window to complete the registration.

  • Bound compute engines: Previously bound compute engines are now managed under Admin Center > Open Source Cluster.

  • Editing a compute engine: You can no longer directly edit a compute engine. To make changes, you must edit the registered cluster.

  • Unbinding a compute engine: A workspace administrator can directly unbind a compute engine. This action does not delete the registered cluster.

Permission changes

To provide a more secure product experience, we are implementing the following permission changes:

  • Authentication method changes: You can no longer create a data source or engine using an AccessKey pair. For existing data sources or engines, you can continue to use the configured AccessKey pair. However, you cannot use an AccessKey pair that belongs to a different Alibaba Cloud account. Data sources or engines created using an AccessKey pair cannot be used for data development.

  • Default access identity changes: If you need to set the default access identity of a data source or cluster to the task owner or an identity other than your own, you must have the AdministratorAccess policy.

    Note

    The default access identity setting applies only to specific data sources and clusters. For details, refer to the options available in the user interface.

  • Configuring cross-account data sources or clusters: To create a data source or cluster using resources from a different Alibaba Cloud account, you must use a RAM role.

If you have questions about these changes, scan the QR code below to join the DataWorks public service DingTalk group for feedback.

image.png