Synchronize data from a MySQL database to Tablestore using DTS

更新时间: 2026-01-20 21:45:27

You can use Data Transmission Service (DTS) to synchronize data from a MySQL database, such as a self-managed MySQL database or an ApsaraDB RDS for MySQL instance, to a Tablestore instance. This simplifies data transfers.

Note

This topic uses an ApsaraDB RDS for MySQL instance as the source database to describe the configuration process. The process for other data sources is similar.

Background information

Data Transmission Service (DTS) is a real-time data streaming service provided by Alibaba Cloud. It supports data exchange between data sources such as relational databases (RDBMS), NoSQL databases, and online analytical processing (OLAP) databases. DTS integrates data synchronization, migration, subscription, integration, and processing to help you build a secure, scalable, and highly available data architecture. For more information, see What is Data Transmission Service?.

Prerequisites

Notes

Category

Description

Source database limits

  • The tables to be synchronized must have a PRIMARY KEY constraint or a UNIQUE constraint, and the fields must be unique. Otherwise, duplicate data may appear in the destination database.

  • If you synchronize data at the table level and need to edit the objects, such as mapping table or column names, you can synchronize a maximum of 1,000 tables in a single task. If this limit is exceeded, an error is reported after you submit the task. In this case, split the tables into multiple synchronization tasks or configure a task to synchronize the entire database.

  • Binary logs:

    • Binary logging is enabled for RDS for MySQL instances by default. You must set the binlog_row_image parameter to full. Otherwise, an error is reported during the precheck and the data synchronization task cannot start. For more information about how to set instance parameters, see Set instance parameters.

      Important
      • If the source instance is a self-managed MySQL database, you must enable binary logging and set binlog_format to row and binlog_row_image to full.

      • If the source self-managed MySQL database is a primary/primary cluster where the two databases are the primary and secondary of each other, you must enable the log_slave_updates parameter. This ensures that DTS can obtain all binary logs. For more information, see Create a database account for a self-managed MySQL database and configure binary logging.

    • Retain binary logs for at least 3 days (7 days recommended) on ApsaraDB RDS for MySQL instances and at least 7 days on self-managed MySQL databases. Otherwise, the DTS task may fail because DTS cannot obtain the binary logs. In extreme cases, data inconsistency or data loss may occur. Issues that are caused by a binary log retention period shorter than the required period are not covered by the DTS Service-Level Agreement (SLA).

      Note

      For more information about how to set the Retention Period of binary logs for an ApsaraDB RDS for MySQL instance, see Automatically delete local logs.

  • During schema synchronization and full data synchronization, do not perform Data Definition Language (DDL) operations that change the schema of databases or tables. Otherwise, the data synchronization task fails.

    Note

    During the full synchronization phase, DTS queries the source database, which acquires metadata locks. This may block DDL operations on the source database.

  • During synchronization, DTS does not synchronize data changes that are not recorded in binary logs (such as data restored from a physical backup or data generated by cascade operations).

    Note

    If this occurs, you can remove the database or table that contains the data from the synchronization objects and then add it back. This is allowed if your business permits. For more information, see Modify synchronization objects.

  • If the source database is a MySQL database of version 8.0.23 or later and the data to be synchronized contains invisible columns, data loss may occur because the data in these columns cannot be obtained.

    Note

    Run ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE; to make the invisible columns visible. For more information, see Invisible Columns.

Other limits

  • Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During initial full data synchronization, DTS uses the read and write resources of the source and destination databases. This may increase the loads of the database servers.

  • During initial full data synchronization, concurrent INSERT operations cause fragmentation in the tables of the destination database. Therefore, after initial full data synchronization is complete, the size of the used tablespace of the destination database is larger than that of the source database.

  • Data inconsistency between the source and destination databases may occur if data from other sources is written to the destination database during data synchronization.

  • During data synchronization, do not use tools such as pt-online-schema-change to perform online DDL operations on the objects to be synchronized from the source database. Otherwise, the data synchronization task fails.

  • Do not synchronize more than 64 tables to the destination Tablestore instance. Otherwise, a request error occurs. If you want to synchronize more than 64 tables to a Tablestore instance, contact Tablestore technical support to lift the limit.

  • The names of the tables and columns to be synchronized must comply with the following naming conventions of Tablestore:

    • The name of a table or column can contain letters, digits, and underscores (_). The name must start with a letter or an underscore (_).

    • The name of a table or column must be 1 to 255 characters in length.

  • If the RDS MySQL instance has the always-confidential database (EncDB) feature enabled, full data synchronization is not supported.

    Note

    For RDS for MySQL instances with Transparent Data Encryption (TDE) enabled, schema synchronization, full data synchronization, and incremental data synchronization are supported.

  • If the task fails, DTS technical support will attempt to recover it within 8 hours. During the recovery process, operations such as restarting the task or adjusting its parameters may be performed.

    Note

    When parameters are adjusted, only DTS task parameters are modified. Database parameters remain unchanged.The parameters that may be modified include but are not limited to those described in Modify instance parameters.

Special cases

  • When the source database is a self-managed MySQL database:

    • If a primary/secondary switchover occurs on the source database during synchronization, the sync task fails.

    • The latency of DTS is calculated by comparing the timestamp of the last synchronized data record in the destination database with the current timestamp. If no DML operations are performed on the source database for a long time, the displayed latency may be inaccurate. If the displayed latency is too high, you can perform a DML operation on the source database to update the latency information.

      Note

      If you choose to synchronize the entire database, you can also create a heartbeat table. The heartbeat table is updated or written to every second.

    • DTS periodically runs the CREATE DATABASE IF NOT EXISTS `test` command on the source database to advance the binary log offset.

    • If the source database is an Amazon Aurora MySQL instance or another cluster-mode MySQL instance, make sure that the domain name or IP address configured for the task and its resolved result always point to the read/write (RW) node address. Otherwise, the sync task may not run as expected.

  • When the source database is an RDS for MySQL instance:

    • RDS for MySQL instances that do not record transaction logs, such as read-only instances of RDS for MySQL 5.6, are not supported as a source.

    • DTS periodically runs the CREATE DATABASE IF NOT EXISTS `test` command on the source database to advance the binary log offset.

Synchronization types

  • Schema synchronization

    DTS synchronizes the schema definitions of objects from the source database to the destination database. These objects can include tables, views, triggers, and stored procedures.

  • Full synchronization

    DTS synchronizes all historical data of the specified objects from the source database to the destination database. This data serves as the baseline for subsequent incremental synchronization. To simplify the data synchronization process, we recommend that you select both schema synchronization and full synchronization when you configure the data synchronization task.

  • Incremental synchronization

    DTS synchronizes incremental data that is generated in the source database to the destination database in real time.

SQL operations that support incremental synchronization

Operation type

SQL statement

DML

INSERT, UPDATE, and DELETE

Permissions required for database accounts

Database

Required permissions

Creation and authorization

Source RDS MySQL

Read and write permissions on the objects to be synchronized

Create an account and Modify the permissions of an account

Note

If you use a source database account that was not created and granted permissions in the RDS for MySQL console, ensure that the account has the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.

Procedure

  1. Go to the Synchronization Tasks page.

    1. Log on to the Data Management (DMS) console.

    2. In the top menu bar, click Integration and Development.

    3. In the left navigation pane, choose Data Transmission (DTS) > Data Synchronization.

    4. To the right of Synchronization Tasks, select the region where the synchronization instance is located.

    Note
  2. Click Create Task to open the task configuration page.

    1. Optional: In the upper-right corner, click New Configuration Page.

      Note
      • If you are already on the new configuration page (the Back to Previous Version button is in the upper-right corner), you can skip this step.

      • Some parameters differ between the new and old configuration pages. We recommend that you use the new configuration page.

    2. Configure the source and destination database information.

      Category

      Configuration

      Description

      None

      Task Name

      DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.

      Source Database Information

      Select DMS Database Instance

      Select an existing instance as needed.

      • If you use an existing instance, the database information is automatically filled in. You do not need to enter it again.

      • If you do not use an existing instance, you must configure the following database information.

      Note

      Database Type

      Select MySQL.

      Access Method

      Select Alibaba Cloud Instance.

      Instance Region

      Select the region where the source ApsaraDB RDS for MySQL instance resides.

      Replicate Data Across Alibaba Cloud Accounts

      This topic describes synchronization within the same Alibaba Cloud account. Select No. For cross-account scenarios, see Configure a task for cross-account data synchronization.

      RDS Instance ID

      Select the ID of the source ApsaraDB RDS for MySQL instance.

      Database Account

      Enter the database account for the source ApsaraDB RDS for MySQL instance. For permission requirements, see Permissions for database accounts.

      Database Password

      Enter the password that corresponds to the database account.

      Encryption

      Select Non-encrypted or SSL-encrypted based on your database requirements. If you set this parameter to SSL-encrypted, you must enable Secure Sockets Layer (SSL) encryption for the RDS for MySQL instance beforehand. For more information, see Quickly enable SSL encryption using a cloud certificate.

      Destination Database Information

      Select DMS Database Instance

      Select an existing instance as needed.

      • If you use an existing instance, the database information is automatically filled in. You do not need to enter it again.

      • If you do not use an existing instance, you must configure the following database information.

      Note

      Database Type

      Select Tablestore.

      Access Method

      Select Alibaba Cloud Instance.

      Instance Region

      Select the region where the destination Tablestore instance resides.

      Instance ID

      Select the name of the destination Tablestore instance.

      Alibaba Cloud Account AccessKey ID

      Enter the AccessKey ID used to identify the user. For more information about how to obtain an AccessKey ID, see Create an AccessKey pair.

      Important

      If you enter the AccessKey ID of a RAM user, you must grant the AliyunOTSFullAccess permission (permission to manage Tablestore) to the RAM user. For more information, see Grant permissions to a RAM user using a RAM policy.

      Alibaba Cloud Account AccessKey Secret

      Enter the AccessKey secret used to authenticate the user. For more information about how to obtain an AccessKey secret, see Create an AccessKey pair.

    3. After completing the configuration, click Test Connection and Proceed at the bottom of the page.

      If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL instance or an ApsaraDB for MongoDB instance, DTS automatically adds the IP addresses of the DTS servers in the corresponding region to the whitelist of the Alibaba Cloud database instance. If the source or destination database is a self-managed database hosted on an ECS instance, DTS automatically adds the IP addresses of the DTS servers in the corresponding region to the security rules of the ECS instance. You must also ensure that the self-managed database allows access from the ECS instance. If the database is deployed in a cluster on multiple ECS instances, you must manually add the IP addresses of the DTS servers in the corresponding region to the security rules of each ECS instance. If the source or destination database is a self-managed database in an on-premises data center or another cloud database, you must manually add the IP addresses of the DTS servers in the corresponding region to allow access from the DTS servers. For a list of DTS server IP addresses, see IP address blocks of DTS servers.

      Warning

      Adding the public IP address blocks of DTS servers, whether automatically by DTS or manually by you, may pose security risks. Using this product, you acknowledge and accept these potential risks. You must implement basic security measures, such as using strong passwords, restricting open ports for each CIDR block, enabling authentication for internal API communication, and periodically reviewing and restricting unnecessary CIDR blocks. Alternatively, you can use a private network connection, such as a leased line, VPN Gateway, or Smart Access Gateway.

  3. You can configure the task object.

    1. On the Configure Objects page, you can configure the objects to synchronize.

      Configuration

      Description

      Synchronization Types

      Incremental Data Synchronization is selected by default. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS initializes the full data of the objects to be synchronized from the source instance to the destination cluster, which serves as the baseline data for subsequent incremental synchronization.

      Processing Mode of Conflicting Tables

      • Precheck and Report Errors: Checks whether a table with the same name exists in the destination database. If a table with the same name does not exist, the precheck passes. If a table with the same name exists, an error is reported during the precheck, and the data synchronization task does not start.

        Note

        If you cannot delete or rename the table with the same name in the destination database, you can map it to a different table name. For more information, see Map table and column names.

      • Ignore Errors and Proceed: Skips the check for duplicate table names in the destination database.

        Warning

        Selecting Ignore Errors and Proceed may cause data inconsistency and put your business at risk. For example:

        • If the table schemas are the same and a record in the destination database has the same primary key or unique key value as a record in the source database:

          • During full data synchronization, DTS retains the record in the destination cluster. The corresponding record from the source database is not synchronized.

          • During incremental data synchronization, the record from the source database overwrites the record in the destination database.

        • If the table schemas are different, initial data synchronization may fail, data from only some columns may be synchronized, or the entire synchronization may fail. Proceed with caution.

      Synchronization Operation Types

      Select the operation types to synchronize as needed. By default, all operation types are selected.

      Dirty Data Processing Policy

      Select a policy for handling data write errors. The options are:

      • Skip

      • Block

      Data Write Mode

      Select a data write mode. The options are:

      • Row Update: Uses PutRowChange to perform row-level updates.

      • Row Overwrite: Uses UpdateRowChange to perform row-level overwrites.

      Batch Write Method

      The API operation that is called for batch writes. The options are:

      • BulkImportRequest: Offline write.

      • BatchWriteRowRequest: Batch write.

      Note

      We recommend using BulkImportRequest for improved read/write efficiency and lower billing costs for your Tablestore instance.

      More Settings

      Based on your business requirements, you can click More Settings to configure the following parameters:

      • Queue Size: The queue length of the data write process for the Tablestore instance.

      • Thread Count: The number of callback processing threads for the data write process of the Tablestore instance.

      • Concurrency: The concurrent request limit for the Tablestore instance.

      • Bucket Count: The number of concurrent buckets for sequential writing of incremental data. Increasing this value can improve concurrent write performance.

        Note

        The value of Buckets must be less than or equal to the value of Concurrency.

      Case Sensitivity Policy for Destination Object Names

      You can configure the case sensitivity policy for database, table, and column object names that are synchronized to the destination instance. By default, the DTS default policy is selected. You can also choose to use the default policies of the source and destination databases. For more information, see Case sensitivity policy for destination object names.

      Source Objects

      In the Source Objects box, click an object to sync, and then click 向右 to move it to the Selected Objects box.

      Note

      You can select objects at the database or table level.

      Selected Objects

      • To rename a single synchronization object in the destination instance, right-click the object in the Selected Objects box. For more information about renaming objects, see Map a single database, table, or column.

      • To change the names of multiple synchronization objects in the destination instance in a batch, click Batch Edit in the upper-right corner of the Selected Objects box. For more information, see Map databases, tables, and columns in a batch.

      Note
      • Database name mapping is not supported. You can map table and column names only when you select objects at the table level. If you use the mapping feature, other objects that depend on the mapped object may fail to synchronize.

      • When you use the column name mapping feature, you can modify the type of the corresponding column in the destination Tablestore instance.

      • To filter data using a WHERE clause, right-click the table to synchronize in the Selected Objects box and set the filter condition in the dialog box that appears. For more information, see Set filter conditions.

    2. Click Next: Advanced Settings to configure advanced settings.

      Configuration

      Description

      Dedicated Cluster for Task Scheduling

      By default, DTS schedules tasks on a shared cluster, and you do not need to select a cluster. For more stable performance, you can purchase a dedicated cluster to run DTS synchronization tasks. For more information, see What is a DTS dedicated cluster?.

      Retry Time for Failed Connections

      After a synchronization task starts, if the connection to the source or destination database fails, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can also specify a custom retry duration from 10 to 1,440 minutes. We recommend that you set the duration to 30 minutes or more. If DTS successfully reconnects to the database within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails.

      Note
      • If you have multiple DTS instances (for example, Instance A and Instance B) that share the same source or destination, and you set the network retry time to 30 minutes for Instance A and 60 minutes for Instance B, the shorter duration of 30 minutes is used for both.

      • Because DTS charges for task runtime during the connection retry period, we recommend that you customize the retry duration based on your business needs or release the DTS instance as soon as possible after the source and destination database instances are released.

      Retry Time for Other Issues

      After the synchronization task starts, if other non-connectivity issues occur with the source or destination database (such as DDL or DML execution exceptions), DTS reports an error and immediately starts continuous retry operations. The default retry duration is 10 minutes. You can also customize the retry duration within the range of 1 to 1,440 minutes. We recommend that you set it to 10 minutes or more. If the relevant operations are successful within the set retry duration, the synchronization task automatically resumes. Otherwise, the task fails.

      Important

      The value for Retry Time for Other Issues must be less than that for Retry Time for Failed Connections.

      Enable Throttling for Full Data Synchronization

      During the full synchronization phase, DTS uses read and write resources from the source and destination databases, which can increase the database load. To reduce the load on the destination database, you can set a rate limit for the full synchronization task by configuring the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters.

      Note
      • This configuration item is available only when Synchronization Types is set to Full Data Synchronization.

      • You can also adjust the full synchronization rate after the synchronization instance is running.

      Enable Throttling for Incremental Data Synchronization

      You can also set a rate limit for the incremental synchronization task. To relieve pressure on the destination database, set the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).

      Environment Tag

      You can select an environment tag to identify the instance based on your requirements. For this example, you do not need to select a tag.

      Whether to delete SQL operations on heartbeat tables of forward and reverse tasks

      Choose whether to write heartbeat SQL information to the source database when the DTS instance is running.

      • Yes: Heartbeat SQL information is not written to the source database. This may cause the DTS instance to report a delay.

      • No: Writes heartbeat SQL information to the source database. This may interfere with features such as physical backup and cloning of the source database.

      Configure ETL

      Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values:

      Monitoring and Alerting

      Specify whether to configure alerts. If the synchronization fails or the latency exceeds the specified threshold, a notification is sent to an alert contact.

    3. Click Next: Configure Table and Field and then click OK in the Note dialog box.

      By default, DTS uses the primary key from the original table as the Primary Key Column. You can change this by setting Definition Status to All.

      Note

      You can select multiple columns under Primary Key Column to form a composite primary key.

  4. At the bottom of the page, click Next: Save Task and Precheck.

    • To view the API operation parameters for configuring this instance, move the pointer over the Next: Save Task Settings and Precheck button, and then click Preview OpenAPI parameters in the bubble.

    • If you have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.

    Note
    • Before the synchronization job starts, DTS runs a precheck. The synchronization job can start only after the task passes the precheck.

    • If the precheck fails, click View Details next to the failed check item. Fix the issue as prompted, and then run the precheck again.

    • If the precheck generates a warning:

      • For failed check items that cannot be ignored, click View Details next to the item. Fix the issue as prompted, and then run the precheck again.

      • If a check item can be ignored, you can skip its warning and rerun the precheck by clicking Click to Confirm Alert Details, Confirm Mute, OK, and then Rerun Precheck. However, muting a warning may cause issues such as data inconsistency and pose risks to your business.

  5. When the Success Rate is 100%, click Next: Purchase Instance.

  6. Purchase the instance.

    1. On the Purchase page, select the billing method and configure the data synchronization instance. The following table describes the parameters.

      Category

      Parameter

      Description

      New Instance Class

      Billing Method

      • Subscription: Pay when you create the instance. This is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription duration, the greater the discount.

      • Pay-as-you-go: Billed on an hourly basis. This is suitable for short-term needs. You can release the instance immediately after use to save costs.

      • Pay-as-you-go Serverless: Billed on an hourly basis. Resources are dynamically adjusted at the minute level based on the instance load, with real-time billing (a billing order is generated every hour). You only pay for what you use, saving significant costs.

      Resource Group Configuration

      The resource group to which the instance belongs. The default is default resource group. For more information, see What is Resource Management?.

      Link Specification

      DTS provides synchronization specifications with different performance levels. The synchronization link specification affects the synchronization rate. You can choose a specification based on your business scenario. For more information, see Data synchronization link specifications.

      Note

      This option is available only when the billing method is Subscription or Pay-as-you-go.

      Serverless Scaling Configuration (DU)

      Serverless DTS uses DTS Units (DUs) for minute-level billing and scaling. The number of DUs is dynamically adjusted within a specified range based on the business load. For more information, see What is a DTS Serverless instance?.

      Note

      This option is available only when the billing method is Pay-as-you-go Serverless.

      Subscription Duration

      In subscription mode, select the duration and quantity for the subscription instance. You can choose a monthly subscription for 1 to 9 months, or a yearly subscription for 1, 2, 3, or 5 years.

      Note

      This option is available only when the billing method is Subscription.

    2. After you complete the configuration, read and select Data Transmission Service (Pay-As-You-Go) Terms of Service.

    3. Click Purchase and Start. In the Confirm dialog box, click OK.

      You can view the task progress on the Data Synchronization page.

Billing

Data synchronization

Migration type

Task configuration fee

Data transfer cost

Schema migration and full data migration

Free of charge.

You are charged for data transfer if you migrate data from Alibaba Cloud over the Internet. For more information, see Billing overview.

Incremental data migration

Charged. For more information, see Billing overview.

Tablestore

  • When you import data into Tablestore, Tablestore charges a storage fee based on the data volume.

  • When you use a migration tool to access Tablestore, Tablestore charges for read and write operations based on the billing method.

    Note

    For more information about instance types and CUs, see Instances and Read/write throughput, respectively.

    Billing mode

    Compute capacity description

    VCU mode (Provisioned mode)

    Pay for compute performance consumption by pre-purchasing reserved VCUs based on resource assessment results or by paying for actual usage after enabling elastic capacity. Compute capacity includes the compute consumption for data reads and writes.

    CU mode (On-demand mode)

    Meters and charges for specific read and write requests based on read and write throughput. Depending on the instance type, billing distinguishes between pay-as-you-go read/write CUs and reserved read/write CUs.

    Note

    For more information about instance types and CUs, see Instances and Read/write throughput, respectively.

上一篇: Synchronize data from MySQL using Canal 下一篇: Tapdata Cloud: Import MySQL data
阿里云首页 表格存储 相关技术圈