RDS MySQL to AnalyticDB for PostgreSQL data transfer

更新时间:
复制 MD 格式

You can use the AI data preparation feature to transfer data from a MySQL database to an AnalyticDB for PostgreSQL instance. This topic demonstrates the procedure using an ApsaraDB RDS for MySQL instance.

Applicability

  • You have created a target AnalyticDB for PostgreSQL instance with more storage capacity than the source ApsaraDB RDS for MySQL instance occupies.

    Note
    • The engine version must be 7.0 standard edition, the node specification (segment) must be 4C16G or higher, the AI node switch must be set to Open, and the number of AI nodes must be at least 1.

    • For more information, see Create an instance.

  • You have enabled vector search engine optimization on the target AnalyticDB for PostgreSQL instance.

  • You have created a database, a schema, and a table with a primary key in the target AnalyticDB for PostgreSQL instance. For more information, see SQL syntax and Data type mapping.

    Note

    We recommend matching the table and column names in the target AnalyticDB for PostgreSQL instance to those in the source database.

Considerations

Source database limits

  • The source database server must have sufficient egress bandwidth (≥ 100 Mb/s). Otherwise, the data transmission speed is reduced.

  • Binlog requirements:

    • Binlog is enabled by default for RDS MySQL instances. You must ensure that the binlog_row_image parameter is set to full. Otherwise, the precheck fails and you cannot start the data synchronization task. For information about how to set instance parameters, see Set Instance Parameters.

      Important
      • If the source instance is a self-managed MySQL database, enable binlog, set the binlog_format parameter to row, and set the binlog_row_image parameter to full.

      • If the source self-managed MySQL database is a dual-master cluster, you must enable the log_slave_updates parameter to ensure that DTS can capture all binlog events. For more information, see Create an Account and Set Binlog for a Self-managed MySQL Database.

    • Retain local binlog files for at least 3 days (7 days recommended) for an RDS MySQL instance. For a self-managed MySQL database, retain local binlog files for at least 7 days. Otherwise, the data synchronization task may fail because DTS cannot obtain the binlog. In extreme cases, this can cause data inconsistency or data loss. Issues caused by a binlog retention period shorter than required by DTS are not covered by the DTS SLA.

      Note

      For information about how to set the retention period for the local binlog of an RDS MySQL instance, see Automatically Delete Local Logs.

  • Synchronization object limits:

    • DTS does not synchronize data generated by operations that are not recorded in the binlog, such as physical backups or cascading operations, to the destination database.

    • The VARBIT, GEOMETRY, ARRAY, UUID, TSQUERY, TSVECTOR, TXID_SNAPSHOT, and POINT data types are not supported.

    • Prefix indexes are not supported.

    • If you synchronize objects at the table level and need to edit them (for example, configure table and column mapping), a single data synchronization task supports up to 1,000 tables. If you exceed this limit, split the objects across multiple data synchronization tasks, or configure a data synchronization task for a full database.

    • If the source data contains the datetime value 0000-00-00 00:00:00, the data synchronization task may fail.

      Note

      When DTS synchronizes this value to the destination database, DTS converts it to NULL. You can temporarily change the value in the source database to 0001-01-01 00:00:00 or set the corresponding column in the destination database to allow NULL values.

    • If the source database runs MySQL 8.0.23 or later and the data includes an invisible column, data loss may occur because DTS cannot read data from that column.

      Note
      • Run the ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE; command to make the invisible column visible. For more information, see Invisible Columns.

      • If a table has no primary key, an invisible primary key is automatically generated. You must also make this invisible primary key visible. For more information, see Generated Invisible Primary Keys.

  • Operational limitations on the source database:

    • If the source database is a self-managed MySQL database and you perform a primary/secondary switchover, the data synchronization task fails.

    • While the data synchronization task is running, do not perform DDL operations that modify primary keys or add comments (for example, ALTER TABLE table_name COMMENT='table_comment';). Otherwise, the DDL statement may fail.

Other limits

  • Destination tables cannot be append-only (AO) tables.

  • If the source table has a primary key, the primary key columns in the destination table must match those in the source table. If the source table does not have a primary key, the primary key columns in the destination table must match the distribution key.

  • The unique key (including primary key columns) of the destination table must include all columns of the distribution key.

  • Before you create a data synchronization task, evaluate the performance of the source and destination databases. We recommend that you start the task during off-peak hours.

  • DTS periodically executes the CREATE DATABASE IF NOT EXISTS test command in the source database to advance the binlog position.

  • If the destination table schema does not match the source table schema, data loss or task failure may occur.

  • DTS attempts to resume data synchronization tasks that have failed within the last seven days. Therefore, before the cutover to the destination instance, you must stop or release the task, or use the revoke command to revoke the write permissions of the DTS account on the destination instance. This prevents the source data from overwriting the data in the destination instance after DTS automatically resumes the task.

  • DTS calculates the latency of an incremental task by comparing the timestamp of the last record written to the destination database with the current timestamp. If the source database has not been updated for a long time, the latency information may be inaccurate. If the reported latency for a data synchronization task is too high, you can run an update operation on the source database to refresh the latency.

  • If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.

    Note

    Only DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.

Pricing

For details, see AI data preparation pricing.

SQL operations

Operation type

SQL statements

DML

INSERT, UPDATE, and DELETE

Permission requirements for database accounts

Database

Required permissions

Actions

Source ApsaraDB RDS for MySQL instance

Read and write permissions on the source data.

Create an account and modify account permissions.

Destination AnalyticDB for PostgreSQL instance

Read and write permissions on the destination database.

Create and manage users and manage user permissions.

Note

You can use the initial account or an account with RDS_SUPERUSER permissions.

Note

If the database account for the source instance was not created and authorized in the ApsaraDB RDS for MySQL console, you must ensure that the account has the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.

Procedure

  1. Go to the data preparation task list page for the destination region.

    1. Log in to the DTS console.

    2. In the navigation pane on the left, click Data Preparation.

    3. In the upper-left corner of the page, select the region where the data preparation instance is located.

    4. Optional: Click the Data Preparation Tasks tab.

  2. Click Create Task to navigate to the task configuration page.

  3. Configure the source and destination databases.

    Category

    Setting

    Description

    N/A

    Task Name

    DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not need to be unique.

    Source Database

    Select Existing Connection

    • To use a database instance that you have already saved, select the instance from the drop-down list. The database information is then automatically populated.

    • If you have not saved the database instance, or if you want to use a different one, manually configure the database information.

    Database Type

    Select MySQL.

    Access Method

    Select Alibaba Cloud Instance.

    Instance Region

    Select the region where the source RDS MySQL instance is located.

    Replicate Data Across Alibaba Cloud Accounts

    In this example, a database instance under the current Alibaba Cloud account is used. Select No.

    RDS Instance ID

    Select the ID of the source RDS MySQL instance.

    Database Account

    Enter the database account of the source RDS MySQL instance. For the required permissions, see Database account permissions.

    Database Password

    Enter the password for the database account.

    Encryption

    Select Non-encrypted or SSL-encrypted based on your database configuration.

    Note

    For information about the SSL encryption feature of RDS MySQL instances, see Enable SSL encryption with a cloud certificate.

    Destination Database

    Select Existing Connection

    • To use a database instance that you have already saved, select the instance from the drop-down list. The database information is then automatically populated.

    • If you have not saved the database instance, or if you want to use a different one, manually configure the database information.

    Database Type

    Select AnalyticDB for PostgreSQL.

    Access Method

    Select Alibaba Cloud Instance.

    Instance Region

    Select the region where the destination AnalyticDB for PostgreSQL instance is located.

    Instance ID

    Select the ID of the destination AnalyticDB for PostgreSQL instance.

    Database Name

    Enter the name of the database in the destination AnalyticDB for PostgreSQL instance.

    Database Account

    Enter the database account of the destination AnalyticDB for PostgreSQL instance. For the required permissions, see Database account permissions.

    Database Password

    Enter the password for the specified database account.

  4. After you complete the configuration, click Test Connectivity and Proceed at the bottom of the page.

    Note
    • Ensure that the IP address segment of the DTS service is automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add DTS server IP addresses to a whitelist.

    • If the source or destination database is a self-managed database (the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box that appears.

  5. Configure the task objects.

    1. In the Configure Objects section, select the objects to synchronize.

      Setting

      Description

      Task Stages

      Select options based on your business requirements. DTS supports Full synchronization and Incremental synchronization.

      DDL and DML Operations to Be Synchronized

      You can select the SQL operations for incremental data synchronization at the instance level.

      Processing Mode of Conflicting Tables

      No configuration is required. Keep the default value.

      Source Objects

      In the Source Objects pane, select the objects that you want to synchronize, and then click 向右小箭头 to move them to the Selected Objects pane.

      Note

      If the table and column names in the destination database exactly match those in the source database, you can select objects at the database or table level. Otherwise, you can select objects only at the table level.

      Selected Objects

      1. In the Selected Objects pane, right-click the database that you moved from the Source Objects pane.

      2. In the Edit Database Name dialog box, change the Destination Database Name to the name of the schema in the destination AnalyticDB for PostgreSQL instance.

      3. Click OK.

      Note
      • If the table or column names in the destination AnalyticDB for PostgreSQL instance are different from those in the source database, you must use the schema, table, and column name mapping feature to modify them.

      • To select SQL operations to synchronize at the database or table level, right-click the desired object in the Selected Objects pane and select the SQL operations in the dialog box that appears.

      • To set a WHERE clause to filter data, right-click the desired table in the Selected Objects pane and set a filter condition in the dialog box that appears.

    2. In the Mapping Column section, configure columns to store the vector data from the data preparation.

      1. Click Create Mapping Column.

      2. In the Source (Library/Table/Column) column, select the source database, table, and column you want to map.

        DTS populates the databases, tables, and columns that you select in Source (Library/Table/Column) into Target (Library/Table) and Destination Mapping Column.

      3. Optional: Change the values in the Target (Library/Table) and Destination Mapping Column fields to the database name, table name, and column name in the destination AnalyticDB for PostgreSQL instance.

      4. Optional: Repeat the preceding steps to add more mapping columns.

    3. Click Next: Vector Configuration.

      To configure parsing, chunking, and embedding policies for the mapping columns, perform the following steps.

      1. In the row of the destination mapping column, click Edit Settings in the Actions column.

      2. In the Edit Settings panel that appears, select Recommended policy for the desired policy type.

      3. Modify the recommended policy.

        Policy name

        Parameter name

        Description

        Parsing policy

        Retain elements

        This policy filters data in the mapped column at the element level by including elements to retain and exclude. Compliant data is then synchronized to the mapped column of the destination AnalyticDB for PostgreSQL instance.

        Note

        Separate multiple elements with commas (,).

        Exclude elements

        Chunking policy

        Chunk algorithm

        The algorithm used to split text into chunks.

        Maximum number of characters per chunk

        The maximum number of characters per chunk (ChunkSize).

        Maximum number of overlapping characters between two Chunks.

        The maximum number of overlapping characters allowed between two adjacent chunks (ChunkOverlap).

        Retain the original Chunk

        If you need to retain the original chunk content, you must also enter the name of the column that stores this content in the destination AnalyticDB for PostgreSQL instance.

        Retain Chunk ID

        If you need to retain the chunk ID, you also need to enter the column name for storing the chunk ID in the destination AnalyticDB for PostgreSQL instance.

        Embedding policy

        Embedding Model

        The default value is AnalyticDB for PostgreSQL and cannot be changed.

      4. After you complete the configuration, click OK.

    4. Click Next: Advanced Settings.

      Setting

      Description

      Retry Time for Failed Connections

      If the connection to the source or destination database fails after the task starts, DTS reports an error and immediately starts retrying. The default retry duration is 720 minutes. You can also specify a custom duration from 10 to 1,440 minutes. We recommend a duration of 30 minutes or more. If DTS reconnects to the source or destination database within the specified duration, the task automatically resumes. Otherwise, the task fails.

      Note
      • If multiple DTS instances (for example, instance A and instance B) share the same source or destination database, and you set the retry duration to 30 minutes for instance A and 60 minutes for instance B, the shorter duration of 30 minutes applies to both.

      • You are charged for the task's runtime during connection retries. We recommend that you specify a custom retry duration based on your business needs, or release the DTS instance as soon as the source and destination database instances are released.

      Retry Time for Other Issues

      If a non-connection issue (such as a DDL or DML error) occurs after the task starts, DTS reports an error and immediately starts retrying the operation. The default retry duration is 10 minutes. You can also specify a custom duration from 1 to 1,440 minutes. We recommend a duration of 10 minutes or more. If the operation succeeds within the specified duration, the task automatically resumes. Otherwise, the task fails.

      Important

      The value of Retry Time for Other Issues must be less than the value of Retry Time for Failed Connections.

      Enable Throttling for Incremental Data Synchronization

      To reduce the load on the destination database, you can limit the rate of the incremental data synchronization task by setting the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).

      Enclose Object Names in Quotation Marks

      Select whether to add quotation marks to destination objects.

      • Select Yes: If the source database meets any of the following conditions, DTS automatically adds single quotation marks (') or double quotation marks (") to the names of applicable schemas, tables, or columns during incremental data synchronization.

        • The source database is case-sensitive and uses mixed-case names.

        • A source table name starts with a character other than a letter, or it contains characters other than letters, digits, or the supported special characters.

          Note

          The supported special characters are underscores (_), number signs (#), and dollar signs ($).

        • The name of a schema, table, or column is a keyword, reserved word, or illegal character in the destination database.

      • Select No: DTS does not add quotation marks to destination objects.

      Whether to delete SQL operations on heartbeat tables of forward and reverse tasks

      Choose whether DTS writes heartbeat SQL information to the source database while the instance is running.

      • Yes: Does not write heartbeat SQL information to the source database. The DTS instance may display latency.

      • No: Writes heartbeat SQL information to the source database. This may interfere with source database operations like physical backups and cloning.

      Environment Tag

      You can select an environment tag to identify the instance. This setting is optional.

      Configure ETL

      The ETL feature is not used in this example. Keep the default setting. If you need to use the ETL feature to process data, select Yes and enter the corresponding data processing statements in the text box.

      Monitoring and Alerting

      Select whether to configure alerts. If the data preparation task fails or the latency exceeds a specified threshold, a notification is sent to the alert contact.

  6. Save the task and run a precheck.

    • To view the API parameters for configuring the instance, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the tooltip.

    • When you are ready, click Next: Save Task Settings and Precheck at the bottom of the page.

    Note
    • A precheck runs before the task starts. The task can only start if the precheck is successful.

    • If the precheck fails, click View Details next to the failed check item, resolve the issue as prompted, and then run the precheck again.

    • If the precheck returns a warning:

      • For non-ignorable warnings, click View Details next to the failed check item. After you fix the issue, run the precheck again.

      • For ignorable warnings, you can click Confirm Alert Details, Ignore, OK, and then Precheck Again to bypass the warning and run the precheck again. Be aware that ignoring warnings can cause issues such as data inconsistency and pose risks to your business.

  7. Purchase the instance.

    1. When the Success Rate is 100%, click Next: Purchase Instance.

    2. On the Purchase page, select the billing method and link specification for the data preparation instance.

      Category

      Parameter

      Description

      New Instance Class

      Resource Group

      The resource group for the instance. The default value is default resource group. For more information, see What is Resource Management?.

      Link Specification

      The default value is large and cannot be changed.

    3. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.

    4. Click Buy and Start, and in the pop-up OK dialog box, click OK.

      You can view the task progress on the data preparation page.