Solution 5: Network connectivity for an on-premises data source

更新时间:
复制 MD 格式

This topic uses a MySQL database deployed in an on-premises data center as an example to explain how to establish network connectivity between your data source and DataWorks.

Use cases

This solution is recommended if your data source meets the following condition.

  • The data source is deployed in an on-premises data center.

How it works

If your data source is deployed in an on-premises data center, we recommend using a VPC connection. You can use Express Connect to connect the on-premises network to the VPC of the resource group for your DataWorks workspace. This establishes network connectivity.

Network connectivity diagram

幻灯片8

Prerequisites

Billing

This solution uses the paid Express Connect service. For more information, see Billing of Express Connect.

Configure network connectivity

Step 1: Obtain basic information

Data source side

  • On-premises data center CIDR block

    You can connect to your on-premises data center server to obtain the CIDR block, or contact your network administrator or data center provider.

DataWorks side

  • VPC and vSwitch information of the bound resource group

    1. Go to the Resource Groups page in the DataWorks console. Find the target resource group and click Operation in the Network Settings column.

    2. In the relevant feature module section, view the bound VPC and vSwitch CIDR Block.

      For example, if you need to connect a MySQL database deployed in your on-premises data center to DataWorks for data synchronization, find the VPC and VPC under vSwitch CIDR Block.

Step 2: Establish network connectivity

You can use a dedicated physical connection to connect your on-premises data center to Alibaba Cloud. This establishes network connectivity between the VPC and your on-premises data center.

Note

If you encounter issues while establishing the network connection, submit a ticket to contact technical support for the relevant product.

Step 3: Add a resource group route

To allow DataWorks to access the on-premises data source, you must add a custom route in the DataWorks resource group for the CIDR block of your on-premises data center.

  1. Go to the Resource Groups page in the DataWorks console. Find the target resource group and click Operation in the Network Settings column.

  2. In the relevant feature module section, find the bound VPC and click Operation in the Custom Route column.

  3. Click Add Route. Set CIDR Block to Destination CIDR Block and set Destination CIDR Block to the CIDR block of your on-premises data center.

Step 4: (Optional) Configure an IP allowlist

If your data source uses an IP allowlist for access control, you must add the vSwitch CIDR block bound to the resource group to the IP allowlist.

This topic uses MySQL as an example to show how to configure an IP allowlist that grants a user access to the database only from the resource group's vSwitch CIDR block.

  1. Log on to the database as an administrator.

  2. Create a user account for DataWorks to access the data source and grant the required permissions.

    -- The username is "dataworks_user". You can customize it.
    -- The password is "StrongPassword123!". You can customize it.
    CREATE USER 'dataworks_user'@'<vswitch-cidr-block>' IDENTIFIED BY 'StrongPassword123!';
    -- Grant the user permissions to access a specific database, such as mydatabase, from the vSwitch CIDR block.
    GRANT ALL PRIVILEGES ON mydatabase.* TO 'dataworks_user'@'<vswitch-cidr-block>' WITH GRANT OPTION;
  3. Execute the FLUSH PRIVILEGES; command to refresh privileges, and then exit the database (exit).

Step 5: (Optional) Configure the on-premises firewall

Firewall configurations vary by software. This topic uses firewalld as an example. Adapt these steps for your specific firewall software.

Allow the vSwitch CIDR block bound to the resource group to access the MySQL database:

sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="<vswitch-cidr-block>" port port="3306" protocol="tcp" accept'
sudo firewall-cmd --reload

Verify network connectivity

  1. Log on to the DataWorks console. In the target region, click Data Integration > Data Integration in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Data Sources. On the data source page, click Add Data Source, select a data source type, and configure the connection parameters.

  3. In the resource group list at the bottom, select the resource group that is connected to the data source, and click Test Connectivity. On the Connection Configuration page of the data source, select the corresponding resource group (such as Serverless_Resource_Group), click Test Connectivity, and confirm that the connectivity status is displayed as Connected.

    Note

    If the connectivity test returns Failed, you can use the Network Connectivity Diagnostic Tool to troubleshoot the issue. If the issue persists, submit a ticket for technical support.