Configure a PolarDB data source

更新时间:
复制 MD 格式

When you synchronize data from PolarDB to Hologres in real time, PolarDB is the source data source and Hologres is the destination data source. Before you run a synchronization task, you must configure network settings, whitelists, and other settings for the data source as described in this topic. These configurations prepare the network environment and grant the necessary account permissions for data synchronization.

Prerequisites

Before you configure the data source, complete the following preparations:
  • Data source preparation: You have a PolarDB for MySQL instance to use as the source data source and a Hologres instance to use as the destination data source. This topic uses an Alibaba Cloud PolarDB for MySQL instance as an example.
  • Resource planning and preparation: You have a configured exclusive resource group for data integration. For more information, see Resource planning and configuration.
  • Network environment evaluation and planning: You must establish a network connection between your data sources and the exclusive resource group for data integration. After the connection is established, configure network access settings, such as vSwitches and whitelists.
    • If the data sources and the exclusive resource group for data integration are in the same region and the same Virtual Private Cloud (VPC), the network is connected by default.
    • If the data sources and the exclusive resource group are in different network environments, connect them by using a VPN gateway or a similar method.

Background

When you synchronize data, the source and destination data sources must be connected to the DataWorks exclusive resource group for data integration, and the account used must have the required permissions.
  • Network whitelist
    The following example assumes that you use a single VPC environment. You must add the CIDR block of the exclusive resource group for data integration to the data source whitelist to ensure that the resource group can access the data source.网络联通vpc
  • Account permissions

    You need an account that can access the data source. DataWorks uses this account to read data from the data source.

  • Other access restrictions.

    If the source data source is an Alibaba Cloud PolarDB for MySQL instance, you must enable the binary log. Alibaba Cloud PolarDB for MySQL is a cloud-native database that is fully compatible with MySQL. By default, it uses higher-level physical logs instead of the binary log. To better integrate with the MySQL ecosystem, PolarDB allows you to enable the binary log.

Limits

  • You can synchronize data only from PolarDB for MySQL data sources. Other PolarDB types are not supported. In this topic, PolarDB refers to a PolarDB for MySQL data source.
  • For real-time synchronization with PolarDB, you must use the primary node (read/write node).
  • XA ROLLBACK is not supported. For transaction data that has already been prepared by using XA PREPARE, real-time synchronization replicates the data to the destination. If an XA ROLLBACK occurs, real-time synchronization does not roll back the previously replicated XA PREPARE data on the destination. To handle an XA ROLLBACK, you must manually remove the affected tables from the real-time synchronization task, add them again, and perform a new full data initialization and incremental real-time synchronization.

Procedure

  1. Configure a whitelist.
    Add the CIDR block of the VPC that contains the exclusive resource group for data integration to the PolarDB cluster whitelist.
    1. View and record network information.
      1. Log on to the DataWorks console.
      2. In the left-side navigation pane, click Resource Groups.
      3. On the Exclusive Resource Groups tab, find the target data integration resource group and click View Information in the Actions column.
      4. In the dialog box that appears, copy the EIPAddress and CIDR Blocks.
      5. On the Exclusive Resource Groups tab, find the target data integration resource group and click Network Settings in the Actions column.
      6. On the VPC Binding tab, view the CIDR block of the vSwitch and add it to the database whitelist.
    2. Add the IP addresses to the whitelist.
      In the left-side navigation pane of the PolarDB console, choose Configurations and Management > Cluster Whitelist. Click Add Whitelist Group and add the EIP and CIDR blocks to the whitelist. For more information, see Set a whitelist.
  2. Create an account and configure permissions.
    You must create a database logon account for data synchronization. This account requires the SELECT, REPLICATION SLAVE, REPLICATION CLIENT permissions.
    1. Create an account.
      For more information, see Create a database account.
    2. Grant permissions.
      You can run the following command to grant the required permissions to the account. Alternatively, you can directly grant the SUPER privilege to the account.
      -- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'sync_account_password';
      GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sync_account'@'%';
  3. Enable the binary log.
    For more information, see Enable binary logging.

Next steps

After this configuration, network connectivity is established between the source data source, exclusive resource group for data integration, and destination data source, and the required access permissions are granted. You can then add the data sources in DataWorks, which lets you associate them when you create a data synchronization solution.

For more information about how to add a data source, see Add a data source.