You can use the AI data preparation feature to transfer data from a MySQL database to an AnalyticDB for PostgreSQL instance. This topic demonstrates the procedure using an ApsaraDB RDS for MySQL instance.
Applicability
-
You have created a target AnalyticDB for PostgreSQL instance with more storage capacity than the source ApsaraDB RDS for MySQL instance occupies.
Note-
The engine version must be 7.0 standard edition, the node specification (segment) must be 4C16G or higher, the AI node switch must be set to Open, and the number of AI nodes must be at least 1.
-
For more information, see Create an instance.
-
-
You have enabled vector search engine optimization on the target AnalyticDB for PostgreSQL instance.
-
You have created a database, a schema, and a table with a primary key in the target AnalyticDB for PostgreSQL instance. For more information, see SQL syntax and Data type mapping.
NoteWe recommend matching the table and column names in the target AnalyticDB for PostgreSQL instance to those in the source database.
Considerations
Source database limits
-
The source database server must have sufficient egress bandwidth (≥ 100 Mb/s). Otherwise, the data transmission speed is reduced.
-
Binlog requirements:
-
Binlog is enabled by default for RDS MySQL instances. You must ensure that the
binlog_row_imageparameter is set tofull. Otherwise, the precheck fails and you cannot start the data synchronization task. For information about how to set instance parameters, see Set Instance Parameters.Important-
If the source instance is a self-managed MySQL database, enable binlog, set the
binlog_formatparameter torow, and set thebinlog_row_imageparameter tofull. -
If the source self-managed MySQL database is a dual-master cluster, you must enable the
log_slave_updatesparameter to ensure that DTS can capture all binlog events. For more information, see Create an Account and Set Binlog for a Self-managed MySQL Database.
-
-
Retain local binlog files for at least 3 days (7 days recommended) for an RDS MySQL instance. For a self-managed MySQL database, retain local binlog files for at least 7 days. Otherwise, the data synchronization task may fail because DTS cannot obtain the binlog. In extreme cases, this can cause data inconsistency or data loss. Issues caused by a binlog retention period shorter than required by DTS are not covered by the DTS SLA.
NoteFor information about how to set the retention period for the local binlog of an RDS MySQL instance, see Automatically Delete Local Logs.
-
-
Synchronization object limits:
-
DTS does not synchronize data generated by operations that are not recorded in the binlog, such as physical backups or cascading operations, to the destination database.
-
The VARBIT, GEOMETRY, ARRAY, UUID, TSQUERY, TSVECTOR, TXID_SNAPSHOT, and POINT data types are not supported.
-
Prefix indexes are not supported.
-
If you synchronize objects at the table level and need to edit them (for example, configure table and column mapping), a single data synchronization task supports up to 1,000 tables. If you exceed this limit, split the objects across multiple data synchronization tasks, or configure a data synchronization task for a full database.
-
If the source data contains the datetime value
0000-00-00 00:00:00, the data synchronization task may fail.NoteWhen DTS synchronizes this value to the destination database, DTS converts it to
NULL. You can temporarily change the value in the source database to0001-01-01 00:00:00or set the corresponding column in the destination database to allow NULL values. -
If the source database runs MySQL 8.0.23 or later and the data includes an invisible column, data loss may occur because DTS cannot read data from that column.
Note-
Run the
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE;command to make the invisible column visible. For more information, see Invisible Columns. -
If a table has no primary key, an invisible primary key is automatically generated. You must also make this invisible primary key visible. For more information, see Generated Invisible Primary Keys.
-
-
-
Operational limitations on the source database:
-
If the source database is a self-managed MySQL database and you perform a primary/secondary switchover, the data synchronization task fails.
-
While the data synchronization task is running, do not perform DDL operations that modify primary keys or add comments (for example,
ALTER TABLE table_name COMMENT='table_comment';). Otherwise, the DDL statement may fail.
-
Other limits
-
Destination tables cannot be append-only (AO) tables.
-
If the source table has a primary key, the primary key columns in the destination table must match those in the source table. If the source table does not have a primary key, the primary key columns in the destination table must match the distribution key.
-
The unique key (including primary key columns) of the destination table must include all columns of the distribution key.
-
Before you create a data synchronization task, evaluate the performance of the source and destination databases. We recommend that you start the task during off-peak hours.
-
DTS periodically executes the CREATE DATABASE IF NOT EXISTS
testcommand in the source database to advance the binlog position. -
If the destination table schema does not match the source table schema, data loss or task failure may occur.
-
DTS attempts to resume data synchronization tasks that have failed within the last seven days. Therefore, before the cutover to the destination instance, you must stop or release the task, or use the
revokecommand to revoke the write permissions of the DTS account on the destination instance. This prevents the source data from overwriting the data in the destination instance after DTS automatically resumes the task. -
DTS calculates the latency of an incremental task by comparing the timestamp of the last record written to the destination database with the current timestamp. If the source database has not been updated for a long time, the latency information may be inaccurate. If the reported latency for a data synchronization task is too high, you can run an update operation on the source database to refresh the latency.
-
If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.
NoteOnly DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.
Pricing
For details, see AI data preparation pricing.
SQL operations
|
Operation type |
SQL statements |
|
DML |
INSERT, UPDATE, and DELETE |
Permission requirements for database accounts
|
Database |
Required permissions |
Actions |
|
Source ApsaraDB RDS for MySQL instance |
Read and write permissions on the source data. |
|
|
Destination AnalyticDB for PostgreSQL instance |
Read and write permissions on the destination database. |
Create and manage users and manage user permissions. Note
You can use the initial account or an account with RDS_SUPERUSER permissions. |
If the database account for the source instance was not created and authorized in the ApsaraDB RDS for MySQL console, you must ensure that the account has the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.
Procedure
-
Go to the data preparation task list page for the destination region.
-
Log in to the DTS console.
-
In the navigation pane on the left, click Data Preparation.
-
In the upper-left corner of the page, select the region where the data preparation instance is located.
-
Optional: Click the Data Preparation Tasks tab.
-
-
Click Create Task to navigate to the task configuration page.
-
Configure the source and destination databases.
Category
Setting
Description
N/A
Task Name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not need to be unique.
Source Database
Select Existing Connection
-
To use a database instance that you have already saved, select the instance from the drop-down list. The database information is then automatically populated.
-
If you have not saved the database instance, or if you want to use a different one, manually configure the database information.
Database Type
Select MySQL.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the source RDS MySQL instance is located.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance under the current Alibaba Cloud account is used. Select No.
RDS Instance ID
Select the ID of the source RDS MySQL instance.
Database Account
Enter the database account of the source RDS MySQL instance. For the required permissions, see Database account permissions.
Database Password
Enter the password for the database account.
Encryption
Select Non-encrypted or SSL-encrypted based on your database configuration.
NoteFor information about the SSL encryption feature of RDS MySQL instances, see Enable SSL encryption with a cloud certificate.
Destination Database
Select Existing Connection
-
To use a database instance that you have already saved, select the instance from the drop-down list. The database information is then automatically populated.
-
If you have not saved the database instance, or if you want to use a different one, manually configure the database information.
Database Type
Select AnalyticDB for PostgreSQL.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the destination AnalyticDB for PostgreSQL instance is located.
Instance ID
Select the ID of the destination AnalyticDB for PostgreSQL instance.
Database Name
Enter the name of the database in the destination AnalyticDB for PostgreSQL instance.
Database Account
Enter the database account of the destination AnalyticDB for PostgreSQL instance. For the required permissions, see Database account permissions.
Database Password
Enter the password for the specified database account.
-
-
After you complete the configuration, click Test Connectivity and Proceed at the bottom of the page.
Note-
Ensure that the IP address segment of the DTS service is automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add DTS server IP addresses to a whitelist.
-
If the source or destination database is a self-managed database (the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box that appears.
-
-
Configure the task objects.
-
In the Configure Objects section, select the objects to synchronize.
Setting
Description
Task Stages
Select options based on your business requirements. DTS supports Full synchronization and Incremental synchronization.
DDL and DML Operations to Be Synchronized
You can select the SQL operations for incremental data synchronization at the instance level.
Processing Mode of Conflicting Tables
No configuration is required. Keep the default value.
Source Objects
In the Source Objects pane, select the objects that you want to synchronize, and then click
to move them to the Selected Objects pane.NoteIf the table and column names in the destination database exactly match those in the source database, you can select objects at the database or table level. Otherwise, you can select objects only at the table level.
Selected Objects
-
In the Selected Objects pane, right-click the database that you moved from the Source Objects pane.
-
In the Edit Database Name dialog box, change the Destination Database Name to the name of the schema in the destination AnalyticDB for PostgreSQL instance.
-
Click OK.
Note-
If the table or column names in the destination AnalyticDB for PostgreSQL instance are different from those in the source database, you must use the schema, table, and column name mapping feature to modify them.
-
To select SQL operations to synchronize at the database or table level, right-click the desired object in the Selected Objects pane and select the SQL operations in the dialog box that appears.
-
To set a WHERE clause to filter data, right-click the desired table in the Selected Objects pane and set a filter condition in the dialog box that appears.
-
-
In the Mapping Column section, configure columns to store the vector data from the data preparation.
-
Click Create Mapping Column.
-
In the Source (Library/Table/Column) column, select the source database, table, and column you want to map.
DTS populates the databases, tables, and columns that you select in Source (Library/Table/Column) into Target (Library/Table) and Destination Mapping Column.
-
Optional: Change the values in the Target (Library/Table) and Destination Mapping Column fields to the database name, table name, and column name in the destination AnalyticDB for PostgreSQL instance.
-
Optional: Repeat the preceding steps to add more mapping columns.
-
-
Click Next: Vector Configuration.
To configure parsing, chunking, and embedding policies for the mapping columns, perform the following steps.
-
In the row of the destination mapping column, click Edit Settings in the Actions column.
-
In the Edit Settings panel that appears, select Recommended policy for the desired policy type.
-
Modify the recommended policy.
Policy name
Parameter name
Description
Parsing policy
Retain elements
This policy filters data in the mapped column at the element level by including elements to retain and exclude. Compliant data is then synchronized to the mapped column of the destination AnalyticDB for PostgreSQL instance.
NoteSeparate multiple elements with commas (,).
Exclude elements
Chunking policy
Chunk algorithm
The algorithm used to split text into chunks.
Maximum number of characters per chunk
The maximum number of characters per chunk (ChunkSize).
Maximum number of overlapping characters between two Chunks.
The maximum number of overlapping characters allowed between two adjacent chunks (ChunkOverlap).
Retain the original Chunk
If you need to retain the original chunk content, you must also enter the name of the column that stores this content in the destination AnalyticDB for PostgreSQL instance.
Retain Chunk ID
If you need to retain the chunk ID, you also need to enter the column name for storing the chunk ID in the destination AnalyticDB for PostgreSQL instance.
Embedding policy
Embedding Model
The default value is AnalyticDB for PostgreSQL and cannot be changed.
-
After you complete the configuration, click OK.
-
-
Click Next: Advanced Settings.
Setting
Description
Retry Time for Failed Connections
If the connection to the source or destination database fails after the task starts, DTS reports an error and immediately starts retrying. The default retry duration is 720 minutes. You can also specify a custom duration from 10 to 1,440 minutes. We recommend a duration of 30 minutes or more. If DTS reconnects to the source or destination database within the specified duration, the task automatically resumes. Otherwise, the task fails.
Note-
If multiple DTS instances (for example, instance A and instance B) share the same source or destination database, and you set the retry duration to 30 minutes for instance A and 60 minutes for instance B, the shorter duration of 30 minutes applies to both.
-
You are charged for the task's runtime during connection retries. We recommend that you specify a custom retry duration based on your business needs, or release the DTS instance as soon as the source and destination database instances are released.
Retry Time for Other Issues
If a non-connection issue (such as a DDL or DML error) occurs after the task starts, DTS reports an error and immediately starts retrying the operation. The default retry duration is 10 minutes. You can also specify a custom duration from 1 to 1,440 minutes. We recommend a duration of 10 minutes or more. If the operation succeeds within the specified duration, the task automatically resumes. Otherwise, the task fails.
ImportantThe value of Retry Time for Other Issues must be less than the value of Retry Time for Failed Connections.
Enable Throttling for Incremental Data Synchronization
To reduce the load on the destination database, you can limit the rate of the incremental data synchronization task by setting the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Enclose Object Names in Quotation Marks
Select whether to add quotation marks to destination objects.
-
Select Yes: If the source database meets any of the following conditions, DTS automatically adds single quotation marks (') or double quotation marks (") to the names of applicable schemas, tables, or columns during incremental data synchronization.
-
The source database is case-sensitive and uses mixed-case names.
-
A source table name starts with a character other than a letter, or it contains characters other than letters, digits, or the supported special characters.
NoteThe supported special characters are underscores (_), number signs (#), and dollar signs ($).
-
The name of a schema, table, or column is a keyword, reserved word, or illegal character in the destination database.
-
-
Select No: DTS does not add quotation marks to destination objects.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks
Choose whether DTS writes heartbeat SQL information to the source database while the instance is running.
Yes: Does not write heartbeat SQL information to the source database. The DTS instance may display latency.
No: Writes heartbeat SQL information to the source database. This may interfere with source database operations like physical backups and cloning.
Environment Tag
You can select an environment tag to identify the instance. This setting is optional.
Configure ETL
The ETL feature is not used in this example. Keep the default setting. If you need to use the ETL feature to process data, select Yes and enter the corresponding data processing statements in the text box.
Monitoring and Alerting
Select whether to configure alerts. If the data preparation task fails or the latency exceeds a specified threshold, a notification is sent to the alert contact.
-
No: No alert is configured.
-
Yes: An alert is configured. You must also set the alert threshold and specify an Alert Contact. For more information, see Configure monitoring and alerts during task setup.
-
-
-
Save the task and run a precheck.
-
To view the API parameters for configuring the instance, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the tooltip.
-
When you are ready, click Next: Save Task Settings and Precheck at the bottom of the page.
Note-
A precheck runs before the task starts. The task can only start if the precheck is successful.
-
If the precheck fails, click View Details next to the failed check item, resolve the issue as prompted, and then run the precheck again.
-
If the precheck returns a warning:
-
For non-ignorable warnings, click View Details next to the failed check item. After you fix the issue, run the precheck again.
-
For ignorable warnings, you can click Confirm Alert Details, Ignore, OK, and then Precheck Again to bypass the warning and run the precheck again. Be aware that ignoring warnings can cause issues such as data inconsistency and pose risks to your business.
-
-
-
Purchase the instance.
-
When the Success Rate is 100%, click Next: Purchase Instance.
-
On the Purchase page, select the billing method and link specification for the data preparation instance.
Category
Parameter
Description
New Instance Class
Resource Group
The resource group for the instance. The default value is default resource group. For more information, see What is Resource Management?.
Link Specification
The default value is large and cannot be changed.
-
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start, and in the pop-up OK dialog box, click OK.
You can view the task progress on the data preparation page.
-