The data shipping feature of Data Transmission Service (DTS) lets you use an SDK to ship data from various sources to DTS. DTS then synchronizes the data to a destination database. This supports a wider variety of data sources.
Use cases
The data shipping feature is suitable for the following scenarios:
-
The source is a database type that DTS does not natively support for data synchronization.
-
The source consists of log data or data of special types.
-
The source database cannot be directly connected to DTS for security reasons, such as the need to keep database credentials confidential.
Prerequisites
-
Create a destination instance to receive data. Currently, only AnalyticDB for PostgreSQL is supported. For instructions, see Create an instance.
-
In the destination AnalyticDB for PostgreSQL instance, create a database and a schema to receive the data. In this topic, the schema is named dts_deliver_test. For instructions, see Import data.
-
If you need to set the Access Method for the source database to Express Connect, VPN Gateway, or Smart Access Gateway, allow DTS to access your database over a VPN gateway. For instructions, see Connect a data center to DTS through VPN Gateway.
Usage notes
-
Programming skills are required to use the SDK to ship source data to DTS.
-
The schema name in your AnalyticDB for PostgreSQL instance, the database name entered during the Drop data object configuration step, and the dbName specified in the SDK must be identical. Otherwise, the destination may not receive the data.
-
You cannot change the number of shards after the instance is created.
-
After you create a data shipping instance, promptly start shipping data by using the SDK. Otherwise, the instance will fail because it cannot collect incremental data.
-
If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.
NoteOnly DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.
Billing
For more information, see Billing overview.
Create a data shipping instance
-
Go to the Data Synchronization Tasks page.
-
Log on to the DMS console.
-
In the top navigation bar, click Data + AI.
-
In the left-side navigation pane, choose .
Note-
Operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
-
You can also go to the Data Synchronization Tasks page of the new DTS console.
-
Click Create Task to open the task configuration page.
-
Configure the source and destination databases.
Category
Parameter
Description
N/A
Task Name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not need to be unique.
Source Database
Select Existing Connection
Do not select a database instance. For a data shipping task, you must manually enter the database information below.
Database Type
Select Data Shipping.
Access Method
Select an access method based on your requirements. In this topic, Public IP Address is selected.
NoteIf you select Express Connect, VPN Gateway, or Smart Access Gateway, you must also select the VPC and vSwitch to which the VPN gateway belongs.
Instance Region
Select the region where your client is located.
NoteIf the region where your database is located is not in the list, select the geographically closest region.
Destination Database
Select Existing Connection
Select the registered database instance with DTS from the drop-down list. The database information below is automatically configured.
NoteIn the DMS console, this configuration item is Select a DMS database instance.
If you have not registered the database instance or do not need to use a registered instance, manually configure the database information below.
Database Type
Select AnalyticDB for PostgreSQL.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the destination AnalyticDB for PostgreSQL instance is located.
Instance ID
Select the ID of the destination AnalyticDB for PostgreSQL instance.
Database Name
Enter the name of the database in the destination AnalyticDB for PostgreSQL instance to receive data.
Database Account
Enter the account of the destination AnalyticDB for PostgreSQL instance. The account must have read and write permissions. To create an account, see Create and manage users.
Database Password
Enter the password for the specified database account.
After completing the configuration, click Test Connectivity and Proceed at the bottom of the page.
NoteEnsure that you add the CIDR blocks of the DTS servers (either automatically or manually) to the security settings of both the source and destination databases to allow access. For more information, see Add the IP address whitelist of DTS servers.
If the source or destination is a self-managed database (i.e., the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
-
Configure the task objects.
-
On the Configure Objects page, specify the objects to synchronize.
Parameter
Description
Processing Mode of Conflicting Tables
Precheck and Report Errors: Checks for tables with the same names in the destination database. If any tables with the same names are found, an error is reported during the precheck and the data synchronization task does not start. Otherwise, the precheck is successful.
NoteIf you cannot delete or rename the table with the same name in the destination database, you can map it to a different name in the destination. For more information, see Database Table Column Name Mapping.
Ignore Errors and Proceed: Skips the check for tables with the same name in the destination database.
WarningSelecting Ignore Errors and Proceed may cause data inconsistency and put your business at risk. For example:
If the table schemas are consistent and a record in the destination database has the same primary key or unique key value as a record in the source database:
During full data synchronization, DTS retains the destination record and skips the source record.
During incremental synchronization, DTS overwrites the destination record with the source record.
If the table schemas are inconsistent, data initialization may fail. This can result in only partial data synchronization or a complete synchronization failure. Use with caution.
Capitalization of Object Names in Destination Instance
Configure the case-sensitivity policy for database, table, and column names in the destination instance. By default, the DTS default policy is selected. You can also choose to use the default policy of the source or destination database. For more information, see Case policy for destination object names.
Drop data object configuration
-
Click Add Library. In the New Database dialog box, enter the source database name.
Important-
The source database name must be the same as the schema name in the AnalyticDB for PostgreSQL instance. This example uses dts_deliver_test.
-
If a database is already in the list and you need to add another, click the Add button next to the existing database.
-
-
Click OK.
-
Click
next to the database you just added to expand the database list. -
Click Add Table next to Table. In the Add Table dialog box, enter the source table name.
ImportantThe source table name must be the same as the tableName configured in the SDK. This example uses tab1,tab2,tab3. For more information, see Parameters.
-
Click OK.
-
(Optional) Configure table and column name mappings.
-
Click Edit next to the table you just created.
-
Configure table name mapping.
Modify the Table Name.
-
Configure column name mapping.
Clear the Sync All Columns checkbox, then modify the Column Name and Map column name.
Important-
The Column Name and Map column name parameters correspond to settings in the SDK. Column Name is the source column name from the SDK (specifically, the name parameter of createField in the FakeSource.java file), and Map column name is the column name in the destination AnalyticDB for PostgreSQL instance.
-
Click the
icon to add more columns.
-
-
After the configuration is complete, click OK.
NoteYou can repeat the operations to add, edit, batch edit, or delete databases or tables as needed.
-
-
Click Next: Advanced Settings.
Parameter
Description
Dedicated Cluster for Task Scheduling
By default, DTS schedules the task on a shared cluster, which you do not need to select. You can purchase a dedicated cluster of a specified instance class to run DTS tasks. For details, see What is a DTS dedicated cluster?
Retry Time for Failed Connections
If the connection to the source or destination database fails after the synchronization task starts, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can customize the retry time to a value from 10 to 1,440 minutes. We recommend a duration of 30 minutes or more. If the connection is restored within this period, the task resumes automatically. Otherwise, the task fails.
NoteIf multiple DTS instances (e.g., Instance A and B) share a source or destination, DTS uses the shortest configured retry duration (e.g., 30 minutes for A, 60 for B, so 30 minutes is used) for all instances.
DTS charges for task runtime during connection retries. Set a custom duration based on your business needs, or release the DTS instance promptly after you release the source/destination instances.
Retry Time for Other Issues
If a non-connection issue (e.g., a DDL or DML execution error) occurs, DTS reports an error and immediately retries the operation. The default retry duration is 10 minutes. You can also customize the retry time to a value from 1 to 1,440 minutes. We recommend a duration of 10 minutes or more. If the related operations succeed within the set retry time, the synchronization task automatically resumes. Otherwise, the task fails.
ImportantThe value of Retry Time for Other Issues must be less than that of Retry Time for Failed Connections.
Enable Throttling for Incremental Data Synchronization
You can choose whether to set a rate limit for the incremental synchronization task by setting the RPS of Incremental Data Migration and BPS of Incremental Data Migration to reduce the load on the destination database.
Environment Tag
Select an environment tag to identify the instance based on your requirements.
Configure ETL
Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values:
-
Yes: Enables the ETL feature. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task.
-
No: Disables the ETL feature.
Monitoring and Alerting
Select whether to set alerts and receive alert notifications based on your business needs.
-
No: Does not set an alert.
-
Yes: Configure alerts by setting an alert threshold and an alert contact. If a migration fails or the latency exceeds the threshold, the system sends an alert notification.
-
Save the task and perform a precheck.
To view the parameters for configuring this instance via an API operation, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the tooltip.
If you have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
NoteBefore a synchronization task starts, DTS performs a precheck. You can start the task only if the precheck passes.
If the precheck fails, click View Details next to the failed item, fix the issue as prompted, and then rerun the precheck.
If the precheck generates warnings:
For non-ignorable warning, click View Details next to the item, fix the issue as prompted, and run the precheck again.
For ignorable warnings, you can bypass them by clicking Confirm Alert Details, then Ignore, and then OK. Finally, click Precheck Again to skip the warning and run the precheck again. Ignoring precheck warnings may lead to data inconsistencies and other business risks. Proceed with caution.
-
Purchase the instance.
When the Success Rate reaches 100%, click Next: Purchase Instance.
-
On the Purchase page, select the billing method and instance class for the data synchronization instance. The following table describes the parameters.
Category
Parameter
Description
New Instance Class
Billing Method
-
Subscription: You are charged when you create the instance. This method is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription period, the higher the discount.
-
Pay-as-you-go: You are charged on an hourly basis. This method is suitable for short-term needs. You can release the instance at any time to save costs.
Number of Shards
The number of partitions for the destination topic.
ImportantThe number of shards cannot be modified after the instance is created. Select the value with caution.
Resource Group
The resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?
Instance Class
DTS provides various instance classes with different performance levels. The instance class affects the synchronization speed. Select an instance class based on your business scenario. For more information, see Specifications of data synchronization instances.
Subscription Duration
For the subscription billing method, select the duration of the instance. You can select a monthly subscription for 1 to 9 months, or a yearly subscription for 1, 2, 3, or 5 years.
NoteThis option is available only when the billing method is Subscription.
-
Read and select the checkbox for Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, and then click OK in the OK dialog box.
You can monitor the task progress on the data synchronization page.
Ship data with the SDK
-
Configure the DTS data shipping SDK.
-
Open an IDE such as IntelliJ IDEA and create a project.
-
In the new project, find the Project Object Model (POM) file: pom.xml.
-
Add the following dependency to the pom.xml file:
<dependency> <groupId>com.aliyun.dts.deliver</groupId> <artifactId>dts-deliver-client</artifactId> <version>1.0.0</version> </dependency>NoteYou can find the latest Maven dependency on the dts-deliver-client page.
-
Download the sample code and refer to the DtsDeliverTest.java file in the dts-deliver-test folder to ship data. For the sample code, see Use sample code.
NoteIn the FakeSource.java file, the read method is an example of a data source, and the name in createField is the source column name. You must modify the code based on your requirements.
In the Data Synchronization Tasks list, click the ID of the target data shipping instance, and in the navigation pane on the left, click Basic Information to obtain the parameters used in the SDK sample code.
Parameter
Description
How to obtain
ip:port
The endpoint of the data shipping channel.
In the Shipping Channel Information section, click Copy next to Public Endpoint or VPC Endpoint.
ImportantYou can use the VPC Endpoint only if the client's network is within the same Virtual Private Cloud (VPC) as the current instance.
ak
The AccessKey ID and AccessKey secret of the account that owns the DTS data shipping instance.
For more information, see Create an AccessKey pair and View the AccessKey information of a RAM user.
secret
dts_job_id
NoteThe value to be entered for DTS_JOB_ID.getKey().
The task ID of the DTS data shipping instance.
ImportantDo not use the instance ID.
You can call the DescribeDtsJobs API operation to query the ID.
NoteIn the Shipped Topic, the dts_job_id is the string between _vpc_ and _data_delivery_. For example, if the Shipped Topic is cn_hangzhou_vpc_cxti86dc11z***_data_delivery_version2, then the dts_job_id is cxti86dc11z***.
Topic
The destination shipping topic.
In the Shipping Channel Information section, click Copy next to Shipped Topic.
partition
The number of partitions in the destination topic.
In the Shipping Channel Information section, view the value of Shards.
region
The region of the DTS data shipping instance.
In the Shipping Channel Information section, view the value of Instance Region.
dbName
The source database name. This must be the same as the schema name in the AnalyticDB for PostgreSQL instance.
Enter the name based on your requirements. This example uses dts_deliver_test.
tableName
The source table name. This must be the same as the table name entered in the Drop data object configuration step.
Enter the names based on your requirements. This example uses tab1, tab2, and tab3.
-
-
After configuring the parameters, record the current time and start the SDK.
-
Modify the current offset of the data shipping instance. For more information, see Change the checkpoint of a data synchronization or migration instance.
NoteBy default, the Current Offset of a data shipping instance is the time when the Incremental Write module starts. You must change the offset to the time when the SDK started.
-
View the data in the destination database.