This topic describes how to synchronize full and incremental data from an ApsaraDB RDS instance.
Important
The RDS Migration feature was deprecated on March 10, 2023. LTS instances purchased after this date cannot use this feature. LTS instances purchased before March 10, 2023 can continue to use this feature.
Use cases
-
Archive historical data from ApsaraDB RDS to reduce costs.
-
Migrate full data from ApsaraDB RDS to Lindorm.
Prerequisites
-
Your LTS instance was purchased before March 10, 2023.
-
Log on to the LTS console. For more information, see Accessing Synchronization Tasks.
-
Ensure network connectivity is established between LTS, the destination ApsaraDB for HBase cluster, and the source ApsaraDB RDS instance. This step is not required if they are all deployed in the same virtual private cloud (VPC).
Features
-
Synchronize full and incremental data from ApsaraDB RDS to a Lindorm wide-table model instance that is compatible with HBase access.
-
Transform ApsaraDB RDS data during migration. For more information, see Configurations.
-
Synchronize data from multiple ApsaraDB RDS tables.
Limitations
-
The data source for full data synchronization must be MySQL.
-
The data source for incremental data synchronization must be Data Transmission Service (DTS).
-
The destination data source can be Lindorm with SQL access or a Lindorm wide-table model instance that is compatible with HBase access.
Procedure
-
In the LTS console, choose Data Import > RDS Migration.
-
Click Create a task.
-
Select the ApsaraDB RDS data source, Data Transmission Service (DTS) data source, and the destination data source.
Enter a name for the task in the Task Name field. In the Select Tables section, select the tables that you want to synchronize, and then click Generate Configuration. In the Operations section, select the required options, such as Schema Migration, Incremental Data Synchronization, and Historical Data Migration. After you complete the configuration, click Create.
-
Click Edit to view or modify the default configuration. For more information, see Configurations.
-
Select the tables to synchronize, and then click Generate Configurations.
Note-
The RDS Migration task first migrates historical data. After this migration is complete, the task starts incremental data synchronization.
-
When you migrate data to a Lindorm instance that is compatible with Cassandra Query Language (CQL), the system, by default, generates a configuration where the destination columns match the source ApsaraDB RDS columns in name and type. You can manually edit the configuration to change column names and mappings. For details, see Configurations.
-
When you migrate data to an ApsaraDB for HBase or Lindorm instance, the system, by default, generates a column family named f. The system maps the columns from the ApsaraDB RDS table to columns in the f column family. The row key is generated by concatenating the primary key strings from the source ApsaraDB RDS table.
-
By default, the generated configuration skips delete operations from the source ApsaraDB RDS database. If you want to replicate deletions, you must manually modify the configuration. For details, see Configurations.
-
-
Click Create.
Configurations
The following code provides a sample configuration for synchronizing data to an SQL table. For syntax details, see the Jtwig Reference Manual.
{
"reader": {
"querySql": [
"select * from dts.cluster where id < 1000",// Query for full data synchronization. Each statement corresponds to one read thread.
"select * from dts.cluster where id >= 1000"// We recommend splitting the query to improve performance and reduce the cost of retries.
]
},
"writer": {
"columns": [
{
"name": "id", // The name of the column in the destination table.
"value": "id",// The name of the column in the source table.
"isPk": true , // Specifies whether the column is a primary key.
"type": "BIGINT" // Optional. If you do not specify this parameter, the data type of the destination column is the same as that of the source ApsaraDB RDS column.
},
{
"name": "cluster_id",
"value": "cluster_id",
"isPk": false
},
{
"name": "id_and_cluster",
"value": "{{concat(id, cluster_id)}}",// Jtwig expressions are supported for data transformation.
"isPk": true
},
],
"config": {
"skipDelete": true // Skips delete operations.
},
"table": {
"name": "dts.cluster", // The name of the Lindorm table. Use a period (.) as a separator.
"parameter": {
"compression": "ZSTD"
}
},
"sourceTable": "dts.cluster"
}
}
The following code provides a sample configuration for synchronizing data by using the HBase API.
{
"reader": {
"querySql": [
"select * from dts.cluster where id < 1000",// Query for full data synchronization. Each statement corresponds to one read thread.
"select * from dts.cluster where id >= 1000"// We recommend splitting the query to improve performance and reduce the cost of retries.
]
},
"writer": {
"columns": [
{
"name": "f:id",// The name of the column in the destination table.
"value": "id", // The name of the column in the source table.
"isPk": false // This parameter is ignored and does not affect synchronization.
},
{
"name": "f:cluster_id",
"value": "cluster_id",
"isPk": false
},
{
"name": "f:id_and_cluster",
"value": "{{concat(id, cluster_id)}}",// Jtwig expressions are supported for data transformation.
}
],
"rowkey": {
"value": "id" // Defines the columns from the ApsaraDB RDS table that compose the row key in the HBase model. Jtwig syntax is supported.
},
"config": {
"skipDelete": true// Skips delete operations.
},
"table": {
"name": "dts:cluster",// The name of the table in Lindorm or ApsaraDB for HBase.
"parameter": {
"compression": "ZSTD",// The compression algorithm for the new table in Lindorm or ApsaraDB for HBase. We recommend using ZSTD.
"split":["1", "5", "9", "b"] // Specifies the split keys to pre-split the new table.
}
},
"sourceTable": "dts.cluster"
}
}