Resource group operations and network connectivity

更新时间:
复制 MD 格式

DataWorks network capabilities for batch synchronization

Before you synchronize data using Data Integration, you must know the following:

  • The VPC, vSwitch, and region of the database that you want to synchronize, and the region of your DataWorks workspace.

  • Whether your database and DataWorks workspace are in different Alibaba Cloud accounts or regions.

If you encounter an issue during a data synchronization task, see Supported data sources and read/write plug-ins for troubleshooting steps.

If a data source connectivity test fails, see Network connectivity solutions for solutions.

When you use an exclusive resource group for Data Integration, follow these steps: purchase the resource group, associate it with the VPC of your database, evaluate whether to add a route, configure the database IP address whitelist, and associate the resource group with your workspace. For more information, see Use an exclusive resource group for Data Integration.

Network connectivity with self-managed ECS databases

If you need to use an exclusive resource group for Data Integration to synchronize a self-managed database on an ECS instance over an internal network, you need to configure the network for the resource group. For more information, see Use an exclusive resource group for Data Integration. The key points of the connectivity solution are as follows:

  • Associate the exclusive resource group for Data Integration with the VPC where the ECS instance resides. A route to the VPC CIDR block is automatically added. Do not manually delete this route. Deleting it can cause access failures for other databases and lead to task errors.

  • Add the CIDR block of the vSwitch that is associated with the exclusive resource group for Data Integration to your database IP address whitelist. For more information, see Add an IP address whitelist.

Cross-region network connectivity

Before you begin, review the solutions in Network connectivity solutions and choose one that fits your scenario. The key is:

If you synchronize data from a cross-region database over the Internet, you must add the EIP of the exclusive resource group to the database IP address whitelist. For more information, see Add an IP address whitelist.

Note

Data transfer over the Internet incurs fees. For more information, see Internet traffic billing.

Cross-account network connectivity

Before you begin, review the solutions in Network connectivity solutions and choose one that fits your scenario.

  • If you synchronize data from a cross-account database over the Internet, you must add the EIP of the exclusive resource group to the database IP address whitelist. For more information, see Add an IP address whitelist.

    Note

    Data transfer over the Internet incurs fees. For more information, see Internet traffic billing.

  • To synchronize data from a cross-account database over an internal network, follow these steps:

    1. Connect the networks of the two Alibaba Cloud accounts using a service such as VPN Gateway or Express Connect.

    2. Associate the exclusive resource group for Data Integration with the VPC that is connected to the other account's network.

    3. After associating the VPC, add a custom route. Set the next hop to Local IDC and specify the IP address of the target database.

    4. Add the CIDR block of the vSwitch associated with the exclusive resource group to the database IP address whitelist. For more information, see Add an IP address whitelist.

Troubleshoot VPC connectivity failures

  • If you added the data source using a VPC endpoint:

    1. Confirm that the exclusive resource group for Data Integration is associated with the VPC where the database resides.

    2. Confirm that the CIDR block of the vSwitch associated with the exclusive resource group is added to the database IP address whitelist. For more information, see Add an IP address whitelist.

  • If you added the data source by using a public endpoint and a connectivity test fails when using an exclusive resource group for Data Integration, confirm that the EIP of the resource group is added to the database IP address whitelist. For more information, see Add an IP address whitelist.

    Note

    Data transfer over the Internet incurs fees. For more information, see Internet traffic billing.

Troubleshoot intermittent connectivity

Check if your tasks are using a shared resource group. Network connectivity for these groups can be unstable. For a stable connection, we recommend that you use an exclusive resource group for Data Integration.

Network connectivity for Serverless and Hologres

The Serverless resource group must be associated with the VPC where the Hologres instance is located. To do this, follow these steps:

  1. Log on to the DataWorks console and find the target Serverless resource group in the resource group list.

  2. Click Network Settings. In the Data Scheduling and Data Integration area, click Add VPC Association.

  3. Select the VPC, availability zone, and vSwitch where the Hologres instance is located, and then click OK.

  4. After the association is complete, create a Hologres data source in Data Integration. Click Test Connectivity to confirm that a connection can be established.

If there is a CIDR block conflict between a VPC already associated with the resource group and the Hologres VPC, you must first remove the conflicting VPC association before adding the Hologres VPC. Associating a VPC creates an elastic network interface in the VPC and consumes a quota. Do not delete the elastic network interface.

For detailed instructions on switching Hologres networks, see Guidance for switching networks for DataWorks On Hologres.

Exclusive resource group not found

Ensure the exclusive resource group is associated with the DataWorks workspace. For more information, see Use an exclusive resource group for Data Integration.

Check the resource group type from logs

  • If a task runs on the Default resource group, the log contains a message similar to this: running in Pipeline[basecommon_ group_xxxxxxxxx].

  • If a task runs on a custom resource group, the log contains a message similar to this: running in Pipeline[basecommon_xxxxxxxxx].

  • If a task runs on an exclusive resource group for Data Integration, the log contains a message similar to this: running in Pipeline[basecommon_S_res_group_xxx].

Change the resource group for a task

  • To change the scheduling resource group and the resource group for a Data Integration task in Operation Center, go to the Operation Center > Scheduled Task O&M > Scheduled Tasks page. Select the tasks that you want to modify, and then click Modify Scheduling Resource Group or Modify Data Integration Resource Group at the bottom of the page to make batch changes.

  • Change the resource group for production tasks in DataStudio and then deploy the task.

    Note

    When using these methods to change a resource group, you must deploy the task for the changes to apply. In standard mode, committed changes apply only to the development environment. The changes apply to scheduled tasks in the production environment only after you deploy the task. You can then verify the update on the Scheduled Tasks page in Operation Center.

    1. To change the scheduling resource group, open the Schedule panel for the task node. In the Resource Properties section, click the Scheduling Resource Group drop-down list and select the target resource group, such as shared scheduling resource group.

    2. To change the resource group for a Data Integration task, click the Configure Resource Group for Data Integration tab in the right-side pane. From the Exclusive resource group for Data Integration drop-down list, select the target resource group, such as xiangcui_vpc.

Troubleshoot custom resource group gateway issues

Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups and go to the Custom Resource Groups tab. Click Server Management for the task's scheduling resource group and check if the server is stopped or occupied by other tasks.

If the preceding steps do not resolve the issue, run the following command to restart the service:

su - admin /home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart

Find the public IP of a resource group

To synchronize data over the Internet using an exclusive resource group for Data Integration, you must add the EIP address of the resource group to your database IP address whitelist. To obtain the EIP address, follow these steps:

In the DataWorks console, go to the Resource Group page. On the Exclusive Resource Groups tab, click Details next to the Data Integration resource group. In the Basic Information section, copy the EIP Address and add it to your database's IP address whitelist.

Troubleshoot 'insufficient resources' errors

Check the resource group details. This error usually occurs when the remaining resources are insufficient to start a new task, for example, if other tasks are queued for the resource group.

Serverless resource group VPC binding quota

Each Serverless resource group can be associated with a maximum of two VPCs.

Log on to the DataWorks console. On the Resource Groups page, find the target Serverless resource group and click Details. In the Basic Information section of the resource group details page, the VPCs That Can Be Associated field shows the current quota usage in the format Bound: X, Remaining: Y.

When the quota of two VPCs is full (Bound: 2, Remaining: 0), you cannot add more associations. You can resolve this issue in one of the following ways:

  • You can either deploy the data source in a bound VPC or connect the data source's VPC with a bound VPC by using Cloud Enterprise Network (CEN).

  • Use Cloud Enterprise Network (CEN) to establish cross-VPC connectivity. Add a bound VPC and the target VPC to the same CEN instance to connect the networks. This allows you to access data sources in the target VPC without consuming the VPC binding quota.

VPC bindings in a resource group's network settings are for two purposes: data service and data scheduling. The two-VPC quota is a total limit for the resource group and is shared by both purposes.