Configure a change tracking task

更新时间:
复制 MD 格式

The change tracking feature of Data Transmission Service (DTS) captures real-time incremental data from your databases. You can consume this data for various business needs, such as updating caches, decoupling asynchronous services, synchronizing data between heterogeneous sources, and performing complex extract, transform, and load (ETL) operations. This topic describes how to create a change tracking task for an ApsaraDB RDS for MySQL instance by using a dedicated cluster.

Prerequisites

Considerations

Type

Description

Source database limitations

  • Tables that you want to track must have a primary key or a unique constraint, and the values in the constrained columns must be unique. Otherwise, duplicate data may be tracked.

  • If you track data at the table level, a single change tracking task supports a maximum of 500 tables. If you exceed this limit, an error is reported when you submit the task. In this case, we recommend that you either split the tables into multiple tasks or configure a single task to track the entire database.

  • Binlog:

    • The Binlog feature is enabled by default for ApsaraDB RDS for MySQL instances. You must check the instance parameters to ensure that the binlog_row_image parameter is set to full. Otherwise, the pre-check fails and the task cannot start.

    • The local Binlog of the ApsaraDB RDS for MySQL instance must be retained for at least 3 days (7 days recommended). For self-managed MySQL databases, the local Binlog must be retained for at least 7 days. If the retention period is shorter, the task may fail because DTS cannot obtain the Binlog. In extreme cases, this can lead to data inconsistency or loss. Issues caused by a Binlog retention period that is shorter than required are not covered by the DTS Service Level Agreement (SLA).

      Note
      • For an ApsaraDB RDS for MySQL instance, the local log retention period is the same as the Binlog retention period.

      • For a self-managed MySQL database, the binlog_expire_log_days parameter in the configuration file (typically my.cnf or my.ini) specifies the Binlog retention period.

  • If the source instance is a read-only instance or a temporary instance, ensure that it records transaction logs.

    Note
    • In the ApsaraDB RDS console, you can check whether an ApsaraDB RDS for MySQL instance is a read-only instance or a temporary instance.

    • For a self-managed MySQL database, run the SHOW VARIABLES LIKE 'read_only'; command. If the result is ON, the database is a read-only instance.

Other limitations

  • Verify that the tracking precision for columns of the FLOAT or DOUBLE data type meets your business requirements. DTS reads values in these columns by using ROUND(COLUMN,PRECISION). If no precision is defined, DTS tracks FLOAT values with a precision of 38 digits and DOUBLE values with a precision of 308 digits.

  • Online DDL changes made by using tools such as pt-online-schema-change are not tracked. This may cause write failures on the destination if the schema becomes inconsistent.

  • If a single row of data exceeds 16 MB, it cannot be consumed. This may cause an out-of-memory (OOM) error in the consumer client.

  • If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.

    Note

    Adjusting parameters modifies only DTS instance parameters and does not affect your database parameters.

Procedure

  1. Go to the DTS dedicated clusters page.

  2. To the right of the Workbench, select the region where the dedicated cluster is located.

    Note

    If you are logged in to the Data Management Service (DMS) console, select the region to the right of Cluster Task.

  3. Find the target dedicated cluster and in the Actions column, choose Configure Task > Configure Change Tracking Task.

  4. Configure the Source Database and Consumer Network Type.

    Warning

    After you specify the source database instance, we recommend that you read the Limits that are displayed in the upper part of the page. Otherwise, the task may fail or the tracked data cannot be consumed.

    Section

    Parameter

    Description

    Task Name

    N/A

    The name of the change tracking task. DTS automatically assigns a name to the task. We recommend that you specify a descriptive name that makes it easy to identify the task. You do not need to use a unique task name.

    Source Database

    Select Existing Connection

    You can choose whether to use an existing instance, as needed.

    • If you use an existing instance, the database information below is automatically filled in. You do not need to enter it again.

    • If you do not use an existing instance, you must enter the database information below.

    Database Type

    Select MySQL.

    Access Method

    Select Alibaba Cloud Instance.

    Instance Region

    The region where the source ApsaraDB RDS for MySQL instance is located. You specify this when you create the dedicated cluster, and it cannot be changed.

    Replicate Data Across Alibaba Cloud Accounts

    For this example, select No.

    RDS Instance ID

    Select the ID of the source ApsaraDB RDS for MySQL instance.

    Database Account

    Enter the database account of the ApsaraDB RDS for MySQL instance. You can use a read-only account or a custom account with the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.

    Database Password

    The password that is used to access the database instance.

    Encryption

    Specifies whether to encrypt the connection to the database instance. Select Non-encrypted or SSL-encrypted based on your requirements. If you select SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance before you configure the change tracking task. For more information, see Configure the SSL encryption feature.

    Consumer Network Type

    Network Type

    Only Virtual Private Cloud (VPC) is supported for the consumer network type. Select the required VPC and vSwitch.

    Note

    The network type cannot be changed after the task is configured. You must use the same network type to consume the tracked data.

  5. In the lower part of the page, click Test Connectivity and Proceed.

    If the source database instance is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, DTS automatically adds the CIDR blocks of DTS servers in the corresponding region to the whitelist of the instance. If the source database instance is a self-managed database hosted on an ECS instance, DTS automatically adds the CIDR blocks of DTS servers in the corresponding region to the security group rules of the ECS instance. To allow DTS to access the database, you must also manually add the CIDR blocks of DTS servers in the corresponding region to the security settings of the database. If the source database instance is a self-managed database that is deployed in a data center or provided by a third-party cloud service provider, you must manually add the CIDR blocks of DTS servers in the corresponding region to the security settings of the database to allow DTS to access the database. For more information, see the CIDR blocks of DTS servers section of the DTS server IP whitelist topic.

    Warning

    If the public CIDR blocks of DTS servers are automatically or manually added to the whitelist of a database instance or to the security group rules of an ECS instance, security risks may arise. Therefore, before you use DTS to track data changes, you must understand and acknowledge the potential risks and take preventive measures, including but not limited to the following measures: enhancing the security of your username and password, limiting the ports that are exposed, authenticating API calls, regularly checking the whitelist or security group rules and forbidding unauthorized CIDR blocks, or connecting the database instance to DTS by using Express Connect, VPN Gateway, or Smart Access Gateway.

  6. Configure the task objects.

    1. On the Configure Objects page, specify the objects to track.

      Parameter

      Description

      Data Change Types

      Data Change Types are selected by default and cannot be modified.

      • Data Update

        DTS tracks data updates of the selected objects, including the INSERT, DELETE, and UPDATE operations.

      • Schema Update

        DTS tracks the create, delete, and modify operations that are performed on all object schemas of the source instance. You must use the change tracking client to filter the data to be tracked.

      Source Objects

      Select one or more objects from the Source Objects section and click the Right arrow icon to add the objects to the Selected Objects section.

      Note

      You can select tables or databases as the objects for change tracking.

      • If you select a database as the object, DTS tracks incremental data of all objects, including new objects in the database.

      • If you select a table as the object, DTS tracks only incremental data of this table. In this case, if you want to track data changes of another table, you must add the table to the object list. For more information, see Modify the objects for change tracking.

    2. Click Next: Advanced Settings to configure advanced parameters.

      Parameter

      Description

      Dedicated Cluster for Task Scheduling

      The current dedicated cluster is selected by default and cannot be changed.

      Retry Time for Failed Connections

      The retry time range for failed connections. If the change tracking task fails, DTS immediately retries a connection within the time range. Valid values: 10 to 1440. Unit: minutes. Default value: 720. We recommend that you set the parameter to a value greater than 30. If DTS is reconnected to the source database instance within the specified time range, DTS resumes the change tracking task. Otherwise, the change tracking task fails.

      Note
      • If multiple change tracking tasks are configured for a database instance, the shortest retry time range takes precedence. For example, Task A and Task B are configured for the same database instance. Task A is configured with a retry time range of 30 minutes, and Task B is configured with a retry time range of 60 minutes. In this case, the retry time range of 30 minutes takes precedence.

      • When DTS retries a connection, fees are charged. We recommend that you specify the retry time range based on your business requirements, or release the DTS instance at the earliest opportunity after the source database instance is released.

      Retry Time for Other Issues

      The retry time range for other issues. For example, if DDL or DML operations fail to be performed after the change tracking task is started, DTS immediately retries the operations within the retry time range. Valid values: 1 to 1440. Unit: minutes. Default value: 10. We recommend that you set the parameter to a value greater than 10. If the failed operations are successfully performed within the specified retry time range, DTS resumes the change tracking task. Otherwise, the change tracking task fails.

      Important

      The value of the Retry Time for Other Issues parameter must be smaller than the value of the Retry Time for Failed Connections parameter.

      Environment Tag

      You can select an environment tag to identify the instance. This parameter is optional and not required for this example.

      Whether to delete SQL operations on heartbeat tables of forward and reverse tasks

      Choose whether DTS writes heartbeat SQL information to the source database while the instance is running.

      • Yes: Does not write heartbeat SQL information to the source database. The DTS instance may display latency.

      • No: Writes heartbeat SQL information to the source database. This may interfere with source database operations like physical backups and cloning.

      Monitoring and Alerting

      Specifies whether to enable alerting for the change tracking task. If alerting is configured and the task fails or the latency exceeds the threshold, alert notifications are sent. Valid values:

      • No: does not enable alerting.

      • Yes: enables alerting. In this case, you must also configure the alert threshold and alert contacts. For more information, see Configure monitoring and alerts.

  7. Click Next: Save Task Settings and Precheck in the lower part of the page.

    You can move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters to view the parameter settings of the API operation that is called to configure the instance.

    Note
    • Before you can start the change tracking task, DTS performs a precheck. You can start the change tracking task only after the task passes the precheck.

    • If the task fails to pass the precheck, click View Details next to each failed item. After you troubleshoot the issues based on the error message, you can run a precheck again.

    • If an alert is generated for an item during the precheck, perform the following operations based on the scenario:

      • If an alert item cannot be ignored, click View Details next to the failed item and troubleshoot the issues. Then, run a precheck again.

      • If the alert item can be ignored, click Confirm Alert Details. In the View Details dialog box, click Ignore. In the message that appears, click OK. Then, click Precheck Again to run a precheck again. If you ignore the alert item, data inconsistency may occur and your business may be exposed to potential risks.

  8. When the Success Rate of the pre-check reaches 100%, click Next: Select DTS Instance Type.

  9. In the New Instance Class section, configure the Instance Class for the task. You can configure a minimum of 1 DU and a maximum of the remaining available DUs.

  10. After completing the configuration, read and select the Data Transmission Service (Pay-as-you-go) Service Terms checkbox.

  11. Click Start Task. The change tracking task begins.

    You can filter for the target task in the Cluster Task List to view its progress.

What's next

After the change tracking task starts running, create consumer groups so your downstream clients can consume the tracked data.