Configure a MySQL data source

更新时间:
复制 MD 格式

When you synchronize data from MySQL to Hologres, you must configure the network, whitelist, and permissions for the data source to set up the required network environment and account permissions for data synchronization tasks.

Prerequisites

Before you configure the data source, ensure that you have completed the following preparations:
  • Prepare data sources: You have purchased the source MySQL data source and the destination Hologres data source.
  • Plan and prepare resources: You have purchased and configured an exclusive resource group for Data Integration. For more information, see Plan and configure resources.
  • Evaluate and plan the network environment: Before you integrate data, you must establish a network connection between the data source and the exclusive resource group for Data Integration based on your business requirements. After the network is connected, refer to this article to configure access settings, such as vSwitches and the whitelist.
    • If the data source and the exclusive resource group for Data Integration are in the same VPC in the same region, they are connected by default.
    • If the data source and the exclusive resource group for Data Integration are in different network environments, you must connect them by using methods such as a VPN gateway.
  • Check whether your database version is MySQL5.x or8.x. You can run the following statement to check the version.
    select version();
    Note Real-time data synchronization from MySQL in DataWorks uses real-time subscriptions to MySQL. This feature supports only ApsaraDB RDS for MySQL instances that run MySQL5.x or8.x. PolarDB-X 1.0 is not supported. If your current database is not an ApsaraDB RDS for MySQL instance that runs version5.x or8.x5.x8.x, switch to a supported version. Otherwise, data integration tasks will fail.

Background information

When you synchronize data from a source data source to a destination data source, you must ensure that a network connection is established between the data source and the exclusive resource group for Data Integration and that account permissions do not restrict access.
  • Network whitelist
    The following example assumes that you are using a single VPC. You must add the CIDR block of the VPC where the exclusive resource group for Data Integration is located to the whitelist to ensure that the resource group can access the data source.网络联通vpc
  • Account permissions

    You need to prepare an account to access the data source. This account is used to perform read and write operations during data synchronization.

  • Other access requirements

    When the source data source is MySQL, you must enable the binary log. The binary log records all changes to database table structures, such as CREATE and ALTER operations, and all changes to table data, such as INSERT, UPDATE, and DELETE operations. You can use the binary log to view the change history of a database, perform incremental database backups and restores, and implement primary-replica replication.

    The binary log has the following formats:
    • Statement: statement-based replication. The binary log stores each SQL statement that is used to modify data.
    • Row: row-based replication. The binary log does not store the context of SQL statements. It only stores the modified rows.
    • Mixed: mixed-mode replication, which combines the Statement and Row formats. By default, Statement format is used for general statements, such as functions. If a replication operation cannot be completed in Statement format, the Row format is used to store the binary log. MySQL automatically selects a format based on each executed SQL statement.

Limits

  • Real-time data synchronization from MySQL in DataWorks uses real-time subscriptions to MySQL. This feature supports only ApsaraDB RDS for MySQL instances that run MySQL5.x or8.x. Do not configure a PolarDB-X 1.0 instance as a MySQL data source. Instead, you can directly configure it as a PolarDB-X 1.0 data source. For more information, see Configure a PolarDB-X 1.0 data source.
  • XA ROLLBACK is not supported. For data in transactions prepared by XA PREPARE, real-time synchronization writes the data to the destination. If an XA ROLLBACK is performed, real-time synchronization does not roll back the prepared data. To handle XA ROLLBACK scenarios, you must manually remove the table affected by the XA ROLLBACK from the real-time synchronization task, add the table again, and then perform a full data initialization and an incremental data synchronization.
  • If you add a MySQL instance that belongs to another Alibaba Cloud account as a data source, synchronization tasks that use this data source can run only on an exclusive resource group for Data Integration. You cannot use a shared resource group for Data Integration to access this data source.

Procedure

  1. Configure a whitelist.
    Add the CIDR block of the VPC where your exclusive resource group for Data Integration is located to the whitelist of MySQL.
    1. View and record the VPC information about the exclusive resource group for Data Integration.
      1. Log on to the DataWorks console.
      2. In the left-side navigation pane, click Resource Groups.
      3. On the Exclusive Resource Groups tab, find the target resource group and click View information in the Actions column.
      4. In the dialog box that appears, copy the elastic IP address (EIP) and CIDR block to the database whitelist.
      5. On the Exclusive Resource Groups tab, find the target resource group and click Network settings in the Actions column.
      6. On the VPC binding tab, view the CIDR block of the vSwitch and add it to the database whitelist.
    2. Add the elastic IP address (EIP) and CIDR block of the exclusive resource group for Data Integration that you recorded to the whitelist of the MySQL cluster.
  2. Create an account and grant permissions to the account.
    You need to prepare a database logon account. This account must have the SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions.
    1. Create an account.
      For more information, see Create a MySQL account.
    2. Grant permissions.
      You can run the following statements to grant permissions to the account, or grant theSUPER permission to the account. When you run the following statements, replace 'sync_account' with the account that you created.
      -- Create a synchronization account, set a password for the account, and allow the account to log on to the database from any host. The percentage sign (%) specifies any host.
      CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password'; 
      -- Grant the SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions on the database to the synchronization account.
      GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sync_account'@'%'; 
      The*.* syntax indicates that the synchronization account is granted the specified permissions on all tables in all databases. You can also grant the permissions on a specified table in a specified database to the synchronization account. For example, to grant the permissions on theuser table in thetest database to the synchronization account, you can run theGRANT SELECT, REPLICATION CLIENT ON test.user TO 'sync_account'@'%'; statement.
      Note TheREPLICATION SLAVE permission is a global permission and cannot be granted on a specified table of a specified database.
  3. Enable the binary log for MySQL.
    Check whether the binary log is enabled and query the binary log format.
    • Run the following statement to check whether the binary log is enabled.
      show variables like "log_bin";

      IfON is returned, the binary log is enabled.

    • If you use a standby database to synchronize data, you can also run the following statement to check whether the binary log is enabled.
      show variables like "log_slave_updates";

      IfON is returned, the binary log is enabled for the standby database.

    If a different result is returned, see theofficial MySQL documentation to enable the binary log.
    Run the following statement to query the format of the binary log.
    show variables like "binlog_format";
    The following table describes the returned results.
    • IfROW is returned, the binary log format isROW.
    • IfSTATEMENT is returned, the binary log format isSTATEMENT.
    • IfMIXED is returned, the binary log format isMIXED.

Next steps

After you configure the data source, a network connection is established between the source data source, the resource instance, and the destination data source, and access restrictions are removed. You can then add the source and destination data sources to DataWorks, which allows you to associate them when you create a data synchronization solution.

For more information about how to add a data source, see Add a data source.