How to use the DTS data shipping feature-Data Transmission Service(DTS)-阿里云帮助中心

Use cases

The data shipping feature is suitable for the following scenarios:

The source is a database type that DTS does not natively support for data synchronization.
The source consists of log data or data of special types.
The source database cannot be directly connected to DTS for security reasons, such as the need to keep database credentials confidential.

Prerequisites

Create a destination instance to receive data. Currently, only AnalyticDB for PostgreSQL is supported. For instructions, see Create an instance.
In the destination AnalyticDB for PostgreSQL instance, create a database and a schema to receive the data. In this topic, the schema is named dts_deliver_test. For instructions, see Import data.
If you need to set the Access Method for the source database to Express Connect, VPN Gateway, or Smart Access Gateway, allow DTS to access your database over a VPN gateway. For instructions, see Connect a data center to DTS through VPN Gateway.

Usage notes

Programming skills are required to use the SDK to ship source data to DTS.
The schema name in your AnalyticDB for PostgreSQL instance, the database name entered during the Drop data object configuration step, and the dbName specified in the SDK must be identical. Otherwise, the destination may not receive the data.
You cannot change the number of shards after the instance is created.
After you create a data shipping instance, promptly start shipping data by using the SDK. Otherwise, the instance will fail because it cannot collect incremental data.
If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.

Note
Only DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.

Billing

For more information, see Billing overview.

Create a data shipping instance

Go to the Data Synchronization Tasks page.
1. Log on to the DMS console.
2. In the top navigation bar, click Data + AI.
3. In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.
Note
- Operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
- You can also go to the Data Synchronization Tasks page of the new DTS console.
Click Create Task to open the task configuration page.

Configure the source and destination databases.

Category	Parameter	Description
N/A	Task Name	DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not need to be unique.
Source Database	Select Existing Connection	Do not select a database instance. For a data shipping task, you must manually enter the database information below.
	Database Type	Select Data Shipping.
	Access Method	Select an access method based on your requirements. In this topic, Public IP Address is selected. Note If you select Express Connect, VPN Gateway, or Smart Access Gateway, you must also select the VPC and vSwitch to which the VPN gateway belongs.
	Instance Region	Select the region where your client is located. Note If the region where your database is located is not in the list, select the geographically closest region.
Destination Database	Select Existing Connection	Select the registered database instance with DTS from the drop-down list. The database information below is automatically configured. Note In the DMS console, this configuration item is Select a DMS database instance. If you have not registered the database instance or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select AnalyticDB for PostgreSQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region where the destination AnalyticDB for PostgreSQL instance is located.
	Instance ID	Select the ID of the destination AnalyticDB for PostgreSQL instance.
	Database Name	Enter the name of the database in the destination AnalyticDB for PostgreSQL instance to receive data.
	Database Account	Enter the account of the destination AnalyticDB for PostgreSQL instance. The account must have read and write permissions. To create an account, see Create and manage users.
	Database Password	Enter the password for the specified database account.

After completing the configuration, click Test Connectivity and Proceed at the bottom of the page.
Note
- Ensure that you add the CIDR blocks of the DTS servers (either automatically or manually) to the security settings of both the source and destination databases to allow access. For more information, see Add the IP address whitelist of DTS servers.
- If the source or destination is a self-managed database (i.e., the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.

Configure the task objects.

On the Configure Objects page, specify the objects to synchronize.

Parameter	Description
Processing Mode of Conflicting Tables	Precheck and Report Errors: Checks for tables with the same names in the destination database. If any tables with the same names are found, an error is reported during the precheck and the data synchronization task does not start. Otherwise, the precheck is successful. Note If you cannot delete or rename the table with the same name in the destination database, you can map it to a different name in the destination. For more information, see Database Table Column Name Mapping. Ignore Errors and Proceed: Skips the check for tables with the same name in the destination database. Warning Selecting Ignore Errors and Proceed may cause data inconsistency and put your business at risk. For example: If the table schemas are consistent and a record in the destination database has the same primary key or unique key value as a record in the source database: During full data synchronization, DTS retains the destination record and skips the source record. During incremental synchronization, DTS overwrites the destination record with the source record. If the table schemas are inconsistent, data initialization may fail. This can result in only partial data synchronization or a complete synchronization failure. Use with caution.
Capitalization of Object Names in Destination Instance	Configure the case-sensitivity policy for database, table, and column names in the destination instance. By default, the DTS default policy is selected. You can also choose to use the default policy of the source or destination database. For more information, see Case policy for destination object names.
Drop data object configuration	Click Add Library. In the New Database dialog box, enter the source database name. Important The source database name must be the same as the schema name in the AnalyticDB for PostgreSQL instance. This example uses dts_deliver_test. If a database is already in the list and you need to add another, click the Add button next to the existing database. Click OK. Click next to the database you just added to expand the database list. Click Add Table next to Table. In the Add Table dialog box, enter the source table name. Important The source table name must be the same as the tableName configured in the SDK. This example uses tab1,tab2,tab3. For more information, see Parameters. Click OK. (Optional) Configure table and column name mappings. Click Edit next to the table you just created. Configure table name mapping. Modify the Table Name. Configure column name mapping. Clear the Sync All Columns checkbox, then modify the Column Name and Map column name. Important The Column Name and Map column name parameters correspond to settings in the SDK. Column Name is the source column name from the SDK (specifically, the name parameter of createField in the FakeSource.java file), and Map column name is the column name in the destination AnalyticDB for PostgreSQL instance. Click the icon to add more columns. After the configuration is complete, click OK. Note You can repeat the operations to add, edit, batch edit, or delete databases or tables as needed.

Click Next: Advanced Settings.

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules the task on a shared cluster, which you do not need to select. You can purchase a dedicated cluster of a specified instance class to run DTS tasks. For details, see What is a DTS dedicated cluster?
Retry Time for Failed Connections	If the connection to the source or destination database fails after the synchronization task starts, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can customize the retry time to a value from 10 to 1,440 minutes. We recommend a duration of 30 minutes or more. If the connection is restored within this period, the task resumes automatically. Otherwise, the task fails. Note If multiple DTS instances (e.g., Instance A and B) share a source or destination, DTS uses the shortest configured retry duration (e.g., 30 minutes for A, 60 for B, so 30 minutes is used) for all instances. DTS charges for task runtime during connection retries. Set a custom duration based on your business needs, or release the DTS instance promptly after you release the source/destination instances.
Retry Time for Other Issues	If a non-connection issue (e.g., a DDL or DML execution error) occurs, DTS reports an error and immediately retries the operation. The default retry duration is 10 minutes. You can also customize the retry time to a value from 1 to 1,440 minutes. We recommend a duration of 10 minutes or more. If the related operations succeed within the set retry time, the synchronization task automatically resumes. Otherwise, the task fails. Important The value of Retry Time for Other Issues must be less than that of Retry Time for Failed Connections.
Enable Throttling for Incremental Data Synchronization	You can choose whether to set a rate limit for the incremental synchronization task by setting the RPS of Incremental Data Migration and BPS of Incremental Data Migration to reduce the load on the destination database.
Environment Tag	Select an environment tag to identify the instance based on your requirements.
Configure ETL	Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values: Yes: Enables the ETL feature. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. No: Disables the ETL feature.
Monitoring and Alerting	Select whether to set alerts and receive alert notifications based on your business needs. No: Does not set an alert. Yes: Configure alerts by setting an alert threshold and an alert contact. If a migration fails or the latency exceeds the threshold, the system sends an alert notification.

Save the task and perform a precheck.
- To view the parameters for configuring this instance via an API operation, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the tooltip.
- If you have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
Note
- Before a synchronization task starts, DTS performs a precheck. You can start the task only if the precheck passes.
- If the precheck fails, click View Details next to the failed item, fix the issue as prompted, and then rerun the precheck.
- If the precheck generates warnings:
  For non-ignorable warning, click View Details next to the item, fix the issue as prompted, and run the precheck again.
  For ignorable warnings, you can bypass them by clicking Confirm Alert Details, then Ignore, and then OK. Finally, click Precheck Again to skip the warning and run the precheck again. Ignoring precheck warnings may lead to data inconsistencies and other business risks. Proceed with caution.

Purchase the instance.

When the Success Rate reaches 100%, click Next: Purchase Instance.

On the Purchase page, select the billing method and instance class for the data synchronization instance. The following table describes the parameters.

Category	Parameter	Description
New Instance Class	Billing Method	Subscription: You are charged when you create the instance. This method is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription period, the higher the discount. Pay-as-you-go: You are charged on an hourly basis. This method is suitable for short-term needs. You can release the instance at any time to save costs.
	Number of Shards	The number of partitions for the destination topic. Important The number of shards cannot be modified after the instance is created. Select the value with caution.
	Resource Group	The resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?
	Instance Class	DTS provides various instance classes with different performance levels. The instance class affects the synchronization speed. Select an instance class based on your business scenario. For more information, see Specifications of data synchronization instances.
	Subscription Duration	For the subscription billing method, select the duration of the instance. You can select a monthly subscription for 1 to 9 months, or a yearly subscription for 1, 2, 3, or 5 years. Note This option is available only when the billing method is Subscription.

Read and select the checkbox for Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, and then click OK in the OK dialog box.
You can monitor the task progress on the data synchronization page.

Ship data with the SDK

Configure the DTS data shipping SDK.

Open an IDE such as IntelliJ IDEA and create a project.
In the new project, find the Project Object Model (POM) file: pom.xml.

Add the following dependency to the pom.xml file:

<dependency>
  <groupId>com.aliyun.dts.deliver</groupId>
  <artifactId>dts-deliver-client</artifactId>
  <version>1.0.0</version>
</dependency>

Note

You can find the latest Maven dependency on the dts-deliver-client page.

Download the sample code and refer to the DtsDeliverTest.java file in the dts-deliver-test folder to ship data. For the sample code, see Use sample code.

Note

In the FakeSource.java file, the read method is an example of a data source, and the name in createField is the source column name. You must modify the code based on your requirements.

In the Data Synchronization Tasks list, click the ID of the target data shipping instance, and in the navigation pane on the left, click Basic Information to obtain the parameters used in the SDK sample code.

Parameter	Description	How to obtain
ip:port	The endpoint of the data shipping channel.	In the Shipping Channel Information section, click Copy next to Public Endpoint or VPC Endpoint. Important You can use the VPC Endpoint only if the client's network is within the same Virtual Private Cloud (VPC) as the current instance.
ak	The AccessKey ID and AccessKey secret of the account that owns the DTS data shipping instance.	For more information, see Create an AccessKey pair and View the AccessKey information of a RAM user.
secret
dts_job_id Note The value to be entered for DTS_JOB_ID.getKey().	The task ID of the DTS data shipping instance. Important Do not use the instance ID.	You can call the DescribeDtsJobs API operation to query the ID. Note In the Shipped Topic, the dts_job_id is the string between _vpc_ and _data_delivery_. For example, if the Shipped Topic is cn_hangzhou_vpc_cxti86dc11z_data_delivery_version2, then the dts_job_id* is cxti86dc11z***.
Topic	The destination shipping topic.	In the Shipping Channel Information section, click Copy next to Shipped Topic.
partition	The number of partitions in the destination topic.	In the Shipping Channel Information section, view the value of Shards.
region	The region of the DTS data shipping instance.	In the Shipping Channel Information section, view the value of Instance Region.
dbName	The source database name. This must be the same as the schema name in the AnalyticDB for PostgreSQL instance.	Enter the name based on your requirements. This example uses dts_deliver_test.
tableName	The source table name. This must be the same as the table name entered in the Drop data object configuration step.	Enter the names based on your requirements. This example uses tab1, tab2, and tab3.

After configuring the parameters, record the current time and start the SDK.
Modify the current offset of the data shipping instance. For more information, see Change the checkpoint of a data synchronization or migration instance.

Note
By default, the Current Offset of a data shipping instance is the time when the Incremental Write module starts. You must change the offset to the time when the SDK started.
View the data in the destination database.