Synchronize data from a MySQL database to Tablestore using DTS
You can use Data Transmission Service (DTS) to synchronize data from a MySQL database, such as a self-managed MySQL database or an ApsaraDB RDS for MySQL instance, to a Tablestore instance. This simplifies data transfers.
This topic uses an ApsaraDB RDS for MySQL instance as the source database to describe the configuration process. The process for other data sources is similar.
Background information
Data Transmission Service (DTS) is a real-time data streaming service provided by Alibaba Cloud. It supports data exchange between data sources such as relational databases (RDBMS), NoSQL databases, and online analytical processing (OLAP) databases. DTS integrates data synchronization, migration, subscription, integration, and processing to help you build a secure, scalable, and highly available data architecture. For more information, see What is Data Transmission Service?.
Prerequisites
A Tablestore instance is created. For more information, see Create an instance.
Obtain the AccessKey ID and AccessKey secret of the Alibaba Cloud account that owns the Tablestore instance. For more information, see Create an AccessKey pair for an Alibaba Cloud account (not recommended).
Notes
Category | Description |
Source database limits |
|
Other limits |
|
Special cases |
|
Synchronization types
Schema synchronization
DTS synchronizes the schema definitions of objects from the source database to the destination database. These objects can include tables, views, triggers, and stored procedures.
Full synchronization
DTS synchronizes all historical data of the specified objects from the source database to the destination database. This data serves as the baseline for subsequent incremental synchronization. To simplify the data synchronization process, we recommend that you select both schema synchronization and full synchronization when you configure the data synchronization task.
Incremental synchronization
DTS synchronizes incremental data that is generated in the source database to the destination database in real time.
SQL operations that support incremental synchronization
Operation type | SQL statement |
DML | INSERT, UPDATE, and DELETE |
Permissions required for database accounts
Database | Required permissions | Creation and authorization |
Source RDS MySQL | Read and write permissions on the objects to be synchronized |
If you use a source database account that was not created and granted permissions in the RDS for MySQL console, ensure that the account has the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.
Procedure
Go to the Synchronization Tasks page.
Log on to the Data Management (DMS) console.
In the top menu bar, click Integration and Development.
In the left navigation pane, choose .
To the right of Synchronization Tasks, select the region where the synchronization instance is located.
NoteThe actual operations may vary depending on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
You can also log on to the new DTS Synchronization Tasks page.
Click Create Task to open the task configuration page.
Optional: In the upper-right corner, click New Configuration Page.
NoteIf you are already on the new configuration page (the Back to Previous Version button is in the upper-right corner), you can skip this step.
Some parameters differ between the new and old configuration pages. We recommend that you use the new configuration page.
Configure the source and destination database information.
Category
Configuration
Description
None
Task Name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.
Source Database Information
Select DMS Database Instance
Select an existing instance as needed.
If you use an existing instance, the database information is automatically filled in. You do not need to enter it again.
If you do not use an existing instance, you must configure the following database information.
NoteIn the DMS console, you can click Add DMS Database Instance to add a database instance. For more information, see Add a cloud database instance and Add a database instance from another cloud or a self-managed database.
In the DTS console, you can add a database to DTS on the Database Connections page or the new configuration page. For more information, see Data Connection Management.
Database Type
Select MySQL.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the source ApsaraDB RDS for MySQL instance resides.
Replicate Data Across Alibaba Cloud Accounts
This topic describes synchronization within the same Alibaba Cloud account. Select No. For cross-account scenarios, see Configure a task for cross-account data synchronization.
RDS Instance ID
Select the ID of the source ApsaraDB RDS for MySQL instance.
Database Account
Enter the database account for the source ApsaraDB RDS for MySQL instance. For permission requirements, see Permissions for database accounts.
Database Password
Enter the password that corresponds to the database account.
Encryption
Select Non-encrypted or SSL-encrypted based on your database requirements. If you set this parameter to SSL-encrypted, you must enable Secure Sockets Layer (SSL) encryption for the RDS for MySQL instance beforehand. For more information, see Quickly enable SSL encryption using a cloud certificate.
Destination Database Information
Select DMS Database Instance
Select an existing instance as needed.
If you use an existing instance, the database information is automatically filled in. You do not need to enter it again.
If you do not use an existing instance, you must configure the following database information.
NoteIn the DMS console, you can click Add DMS Database Instance to add a database instance. For more information, see Add a cloud database instance and Add a database instance from another cloud or a self-managed database.
In the DTS console, you can add a database to DTS on the Database Connections page or the new configuration page. For more information, see Data Connection Management.
Database Type
Select Tablestore.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the destination Tablestore instance resides.
Instance ID
Select the name of the destination Tablestore instance.
Alibaba Cloud Account AccessKey ID
Enter the AccessKey ID used to identify the user. For more information about how to obtain an AccessKey ID, see Create an AccessKey pair.
ImportantIf you enter the AccessKey ID of a RAM user, you must grant the AliyunOTSFullAccess permission (permission to manage Tablestore) to the RAM user. For more information, see Grant permissions to a RAM user using a RAM policy.
Alibaba Cloud Account AccessKey Secret
Enter the AccessKey secret used to authenticate the user. For more information about how to obtain an AccessKey secret, see Create an AccessKey pair.
After completing the configuration, click Test Connection and Proceed at the bottom of the page.
If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL instance or an ApsaraDB for MongoDB instance, DTS automatically adds the IP addresses of the DTS servers in the corresponding region to the whitelist of the Alibaba Cloud database instance. If the source or destination database is a self-managed database hosted on an ECS instance, DTS automatically adds the IP addresses of the DTS servers in the corresponding region to the security rules of the ECS instance. You must also ensure that the self-managed database allows access from the ECS instance. If the database is deployed in a cluster on multiple ECS instances, you must manually add the IP addresses of the DTS servers in the corresponding region to the security rules of each ECS instance. If the source or destination database is a self-managed database in an on-premises data center or another cloud database, you must manually add the IP addresses of the DTS servers in the corresponding region to allow access from the DTS servers. For a list of DTS server IP addresses, see IP address blocks of DTS servers.
WarningAdding the public IP address blocks of DTS servers, whether automatically by DTS or manually by you, may pose security risks. Using this product, you acknowledge and accept these potential risks. You must implement basic security measures, such as using strong passwords, restricting open ports for each CIDR block, enabling authentication for internal API communication, and periodically reviewing and restricting unnecessary CIDR blocks. Alternatively, you can use a private network connection, such as a leased line, VPN Gateway, or Smart Access Gateway.
You can configure the task object.
On the Configure Objects page, you can configure the objects to synchronize.
Configuration
Description
Synchronization Types
Incremental Data Synchronization is selected by default. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS initializes the full data of the objects to be synchronized from the source instance to the destination cluster, which serves as the baseline data for subsequent incremental synchronization.
Processing Mode of Conflicting Tables
Precheck and Report Errors: Checks whether a table with the same name exists in the destination database. If a table with the same name does not exist, the precheck passes. If a table with the same name exists, an error is reported during the precheck, and the data synchronization task does not start.
NoteIf you cannot delete or rename the table with the same name in the destination database, you can map it to a different table name. For more information, see Map table and column names.
Ignore Errors and Proceed: Skips the check for duplicate table names in the destination database.
WarningSelecting Ignore Errors and Proceed may cause data inconsistency and put your business at risk. For example:
If the table schemas are the same and a record in the destination database has the same primary key or unique key value as a record in the source database:
During full data synchronization, DTS retains the record in the destination cluster. The corresponding record from the source database is not synchronized.
During incremental data synchronization, the record from the source database overwrites the record in the destination database.
If the table schemas are different, initial data synchronization may fail, data from only some columns may be synchronized, or the entire synchronization may fail. Proceed with caution.
Synchronization Operation Types
Select the operation types to synchronize as needed. By default, all operation types are selected.
Dirty Data Processing Policy
Select a policy for handling data write errors. The options are:
Skip
Block
Data Write Mode
Select a data write mode. The options are:
Row Update: Uses PutRowChange to perform row-level updates.
Row Overwrite: Uses UpdateRowChange to perform row-level overwrites.
Batch Write Method
The API operation that is called for batch writes. The options are:
BulkImportRequest: Offline write.
BatchWriteRowRequest: Batch write.
NoteWe recommend using BulkImportRequest for improved read/write efficiency and lower billing costs for your Tablestore instance.
More Settings
Based on your business requirements, you can click More Settings to configure the following parameters:
Queue Size: The queue length of the data write process for the Tablestore instance.
Thread Count: The number of callback processing threads for the data write process of the Tablestore instance.
Concurrency: The concurrent request limit for the Tablestore instance.
Bucket Count: The number of concurrent buckets for sequential writing of incremental data. Increasing this value can improve concurrent write performance.
NoteThe value of Buckets must be less than or equal to the value of Concurrency.
Case Sensitivity Policy for Destination Object Names
You can configure the case sensitivity policy for database, table, and column object names that are synchronized to the destination instance. By default, the DTS default policy is selected. You can also choose to use the default policies of the source and destination databases. For more information, see Case sensitivity policy for destination object names.
Source Objects
In the Source Objects box, click an object to sync, and then click
to move it to the Selected Objects box.NoteYou can select objects at the database or table level.
Selected Objects
To rename a single synchronization object in the destination instance, right-click the object in the Selected Objects box. For more information about renaming objects, see Map a single database, table, or column.
To change the names of multiple synchronization objects in the destination instance in a batch, click Batch Edit in the upper-right corner of the Selected Objects box. For more information, see Map databases, tables, and columns in a batch.
NoteDatabase name mapping is not supported. You can map table and column names only when you select objects at the table level. If you use the mapping feature, other objects that depend on the mapped object may fail to synchronize.
When you use the column name mapping feature, you can modify the type of the corresponding column in the destination Tablestore instance.
To filter data using a WHERE clause, right-click the table to synchronize in the Selected Objects box and set the filter condition in the dialog box that appears. For more information, see Set filter conditions.
Click Next: Advanced Settings to configure advanced settings.
Configuration
Description
Dedicated Cluster for Task Scheduling
By default, DTS schedules tasks on a shared cluster, and you do not need to select a cluster. For more stable performance, you can purchase a dedicated cluster to run DTS synchronization tasks. For more information, see What is a DTS dedicated cluster?.
Retry Time for Failed Connections
After a synchronization task starts, if the connection to the source or destination database fails, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can also specify a custom retry duration from 10 to 1,440 minutes. We recommend that you set the duration to 30 minutes or more. If DTS successfully reconnects to the database within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails.
NoteIf you have multiple DTS instances (for example, Instance A and Instance B) that share the same source or destination, and you set the network retry time to 30 minutes for Instance A and 60 minutes for Instance B, the shorter duration of 30 minutes is used for both.
Because DTS charges for task runtime during the connection retry period, we recommend that you customize the retry duration based on your business needs or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues
After the synchronization task starts, if other non-connectivity issues occur with the source or destination database (such as DDL or DML execution exceptions), DTS reports an error and immediately starts continuous retry operations. The default retry duration is 10 minutes. You can also customize the retry duration within the range of 1 to 1,440 minutes. We recommend that you set it to 10 minutes or more. If the relevant operations are successful within the set retry duration, the synchronization task automatically resumes. Otherwise, the task fails.
ImportantThe value for Retry Time for Other Issues must be less than that for Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization
During the full synchronization phase, DTS uses read and write resources from the source and destination databases, which can increase the database load. To reduce the load on the destination database, you can set a rate limit for the full synchronization task by configuring the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters.
NoteThis configuration item is available only when Synchronization Types is set to Full Data Synchronization.
You can also adjust the full synchronization rate after the synchronization instance is running.
Enable Throttling for Incremental Data Synchronization
You can also set a rate limit for the incremental synchronization task. To relieve pressure on the destination database, set the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag
You can select an environment tag to identify the instance based on your requirements. For this example, you do not need to select a tag.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks
Choose whether to write heartbeat SQL information to the source database when the DTS instance is running.
Yes: Heartbeat SQL information is not written to the source database. This may cause the DTS instance to report a delay.
No: Writes heartbeat SQL information to the source database. This may interfere with features such as physical backup and cloning of the source database.
Configure ETL
Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values:
Yes: Enables the ETL feature. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task.
No: Disables the ETL feature.
Monitoring and Alerting
Specify whether to configure alerts. If the synchronization fails or the latency exceeds the specified threshold, a notification is sent to an alert contact.
No: No alerts are configured.
Yes: Sets an alert. You also need to specify the alert threshold and alert contact. For more information, see Configure monitoring and alerting during task configuration.
Click Next: Configure Table and Field and then click OK in the Note dialog box.
By default, DTS uses the primary key from the original table as the Primary Key Column. You can change this by setting Definition Status to All.
NoteYou can select multiple columns under Primary Key Column to form a composite primary key.
At the bottom of the page, click Next: Save Task and Precheck.
To view the API operation parameters for configuring this instance, move the pointer over the Next: Save Task Settings and Precheck button, and then click Preview OpenAPI parameters in the bubble.
If you have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
NoteBefore the synchronization job starts, DTS runs a precheck. The synchronization job can start only after the task passes the precheck.
If the precheck fails, click View Details next to the failed check item. Fix the issue as prompted, and then run the precheck again.
If the precheck generates a warning:
For failed check items that cannot be ignored, click View Details next to the item. Fix the issue as prompted, and then run the precheck again.
If a check item can be ignored, you can skip its warning and rerun the precheck by clicking Click to Confirm Alert Details, Confirm Mute, OK, and then Rerun Precheck. However, muting a warning may cause issues such as data inconsistency and pose risks to your business.
When the Success Rate is 100%, click Next: Purchase Instance.
Purchase the instance.
On the Purchase page, select the billing method and configure the data synchronization instance. The following table describes the parameters.
Category
Parameter
Description
New Instance Class
Billing Method
Subscription: Pay when you create the instance. This is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription duration, the greater the discount.
Pay-as-you-go: Billed on an hourly basis. This is suitable for short-term needs. You can release the instance immediately after use to save costs.
Pay-as-you-go Serverless: Billed on an hourly basis. Resources are dynamically adjusted at the minute level based on the instance load, with real-time billing (a billing order is generated every hour). You only pay for what you use, saving significant costs.
Resource Group Configuration
The resource group to which the instance belongs. The default is default resource group. For more information, see What is Resource Management?.
Link Specification
DTS provides synchronization specifications with different performance levels. The synchronization link specification affects the synchronization rate. You can choose a specification based on your business scenario. For more information, see Data synchronization link specifications.
NoteThis option is available only when the billing method is Subscription or Pay-as-you-go.
Serverless Scaling Configuration (DU)
Serverless DTS uses DTS Units (DUs) for minute-level billing and scaling. The number of DUs is dynamically adjusted within a specified range based on the business load. For more information, see What is a DTS Serverless instance?.
NoteThis option is available only when the billing method is Pay-as-you-go Serverless.
Subscription Duration
In subscription mode, select the duration and quantity for the subscription instance. You can choose a monthly subscription for 1 to 9 months, or a yearly subscription for 1, 2, 3, or 5 years.
NoteThis option is available only when the billing method is Subscription.
After you complete the configuration, read and select Data Transmission Service (Pay-As-You-Go) Terms of Service.
Click Purchase and Start. In the Confirm dialog box, click OK.
You can view the task progress on the Data Synchronization page.
Billing
Data synchronization
Migration type | Task configuration fee | Data transfer cost |
Schema migration and full data migration | Free of charge. | You are charged for data transfer if you migrate data from Alibaba Cloud over the Internet. For more information, see Billing overview. |
Incremental data migration | Charged. For more information, see Billing overview. |
Tablestore
When you import data into Tablestore, Tablestore charges a storage fee based on the data volume.
When you use a migration tool to access Tablestore, Tablestore charges for read and write operations based on the billing method.
NoteFor more information about instance types and CUs, see Instances and Read/write throughput, respectively.
Billing mode
Compute capacity description
Pay for compute performance consumption by pre-purchasing reserved VCUs based on resource assessment results or by paying for actual usage after enabling elastic capacity. Compute capacity includes the compute consumption for data reads and writes.
Meters and charges for specific read and write requests based on read and write throughput. Depending on the instance type, billing distinguishes between pay-as-you-go read/write CUs and reserved read/write CUs.
NoteFor more information about instance types and CUs, see Instances and Read/write throughput, respectively.