Tapdata Cloud: Import MySQL to Table Store

更新时间: 2026-06-04 05:15:23

Tapdata Cloud's visual interface lets you synchronize data from self-hosted MySQL, Alibaba Cloud RDS MySQL, and third-party cloud databases to Tablestore in real time.

Prerequisites

Before you use Tapdata Cloud to sync data from MySQL to Table Store, ensure you have the following:

Background

Tapdata Cloud is a real-time data service that integrates data replication and data development. It provides millisecond-level real-time data synchronization and data fusion across clouds, regions, and data sources. For more information, see What is Tapdata.

Notes

Data synchronization supports MySQL versions 5.0, 5.1, 5.5, 5.6, 5.7, and 8.x.

Procedure

This section outlines how to synchronize data from MySQL to Table Store using Tapdata Cloud.

  1. Configure a MySQL database and an authorized account. For more information, see Step 1: Configure a MySQL database and an authorized account.

  2. Deploy Tapdata Agent on the machine that hosts the MySQL database or on a machine in the same network. For more information, see Step 2: Deploy Tapdata Agent.

    The Tapdata Agent (Agent) is a core component for data synchronization, data development, and processing heterogeneous data. It uses streaming technology to retrieve data from the source, process and transform it, and send it to the target.

    Important

    Your data does not pass through Tapdata Cloud, and Tapdata Cloud does not upload or store your data.

  3. Use Tapdata Cloud to connect the source and target databases. For more information, see Step 3: Connect the source and target databases.

  4. Create a data replication task. For more information, see Step 4: Configure a data replication task.

Step 1: Configure MySQL and an authorized account

Before you synchronize data from your MySQL database, create a database account for the data replication task. To enable incremental data synchronization, you must also enable Binlog on the database.

  1. Log on to your MySQL database.

  2. Run the appropriate command for your database version to create a database account.

    • For MySQL versions 5.0, 5.1, 5.5, 5.6, and 5.7, run the following command:

      CREATE USER 'username'@'host' IDENTIFIED BY 'password';
    • For MySQL 8.x, run the following command:

      CREATE USER 'username'@'host' IDENTIFIED WITH mysql_native_password BY 'password';

    The following table describes these parameters.

    Parameter

    Example

    Description

    username

    tapdatamysql

    The database username.

    password

    Tap**********

    The database password.

    host

    %

    The host from which the user can log on. The percent sign (%) indicates any host.

  3. Grant privileges to the database account you just created.

    For production environments, grant more granular privileges based on your business requirements. The following examples show how to grant the SELECT privilege on a specific database or grant global privileges.

    • Grant the SELECT privilege on a specific database

      GRANT SELECT, SHOW VIEW, CREATE ROUTINE, LOCK TABLES ON database_name.table_name TO 'username' IDENTIFIED BY 'password';
    • Grant global privileges

      GRANT RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'username' IDENTIFIED BY 'password';

    The following table describes these parameters.

    Parameter

    Example

    Description

    database_name.table_name

    datademo.userinfo

    The database and table to grant privileges on.

    username

    tapdatamysql

    The database username.

    password

    Tap**********

    The database password.

  4. To read incremental data, enable Binlog.

    1. Run the vim command to modify the configuration in $MYSQL_HOME/mysql.cnf. An example configuration is as follows:

      server_id         = 112233
      log_bin           = mysql-binlog
      expire_logs_days  = 1
      binlog_format     = row
      binlog_row_image  = full

      The following table describes these parameters.

      Parameter

      Example

      Description

      server_id

      112233

      A unique ID that identifies a server or replication client in MySQL. The server_id must be unique for each server and replication client. The value must be an integer greater than 0.

      log_bin

      mysql-binlog

      The base name for the sequence of binary log files.

      expire_logs_days

      1

      The number of days to retain binary log files before they are automatically deleted.

      binlog_format

      row

      The format of the binary log. Set this parameter to row.

      binlog_row_image

      full

      Specifies how row images are written to the binary log for row-based replication. Set this parameter to full to log all columns for both the before-image and after-image of changed rows.

    2. After modifying the configuration, run the following command to restart the MySQL process.

      /etc/init.d/mysqld restart
    3. After you log on to the MySQL database, execute the following command to confirm that the Binlog configuration has taken effect, where the value of the format parameter is ROW.

      SHOW VARIABLES LIKE 'binlog_format';

      The following is a sample output:

      +---------------+-------+
      | Variable_name | Value |
      +---------------+-------+
      | binlog_format | ROW   |
      +---------------+-------+
      1 row in set (0.00 sec)

Step 2: Deploy the Tapdata Agent

Important

The machine where you deploy the Agent must have internet access.

Data flow is often time-sensitive. To minimize the impact of network latency, we recommend deploying the Tapdata Agent on the machine that hosts the MySQL database or on a machine within the same network.

Note

This section uses Linux (64-bit) as an example. To deploy on other platforms (such as Windows or Alibaba Cloud Compute Nest), see Install an Agent.

  1. Log in to the Tapdata Cloud platform.

  2. In the left-side navigation pane, click Agent Management.

  3. On the Agent Management page, click Create Agent.

  4. In the Agent Download and Installation dialog box, select Linux (64 bit), and then follow the on-screen instructions to install the Agent.

    Important

    Before you begin, ensure that Java 1.8 is installed in your deployment environment and that the required environment variables are configured correctly.

    1. Log in to the machine where you plan to deploy the Agent.

    2. Run the following command to create a tapdata directory for the Tapdata Agent installation and deployment.

      mkdir tapdata
    3. Click Copy for the command in step 3 of the dialog box, and then run it in the tapdata directory.

      Note

      Running this command automatically downloads, deploys, and starts the Tapdata Agent.

    Wait for the command to complete. The Tapdata Agent is then installed and started.

    The following output indicates a successful Tapdata Agent startup.

    WORK DIR:/root/tapdata
    Init tapdata...
    Download components...
    Downloading 100.00% 225.48 MB, Total size: 225.48 MB, avg speed: 12.3 MB/s, remaining time: 0m 0s
    Downloading 100.00% 0.00 KB .Total size: 1.07 KB
    Downloading 100.00% 0.00 KB .Total size: 0.14 KB
    organize files...
    WORK DIR:/root/tapdata
    Testing JDK...
    java version:1.8
    Java environment OK.
    The max memory will be used by Tapdata is 1119M
    Waiting for the flow engine to start \
    FlowEngine is startup at : 2022-12-22 14:54

Step 3: Connect to source and destination databases

Connect to the source and destination databases in Tapdata Cloud.

  1. Log in to the Tapdata Cloud platform.

  2. In the left-side navigation pane, click Connection Management.

  3. Connect to the MySQL database as the source.

    1. On the Connection Management page, click Create Connection.

    2. In the Create Connection dialog box, on the Verified Data Sources tab, click MySQL.

    3. On the Create Connection page, configure the data source parameters as follows.

      Parameter

      Description

      Connection Name

      The name of the data connection. This name must be unique.

      Connection Type

      MySQL can be used as a source or destination. Valid values are Source and Target and Source.

      Address

      The database connection address.

      Port

      The database service port.

      Database

      The database name. Each connection corresponds to one database.

      Note

      To connect to multiple databases, create a separate connection for each.

      Account

      The database account.

      Password

      The database password.

      Advanced Settings

      Connection parameters

      Additional connection parameters. This parameter is empty by default.

      Time Zone

      Defaults to the database's time zone. You can also specify a time zone based on your business requirements.

      For example, if the source database uses the default time zone (+8:00) and the destination database uses a specified time zone (+0:00), a time value of 2020-01-01 16:00:00 in the source is stored as 2020-01-01 08:00:00 in the destination.

      Include Tables

      The tables to include. Valid values:

      • All (default): Contains all tables in the database.

      • Custom: Specify the tables to include. Use a comma (,) to separate multiple tables or expressions. In expressions, you can use an asterisk (*) to represent any string of characters.

      Exclude Tables

      Excludes specified tables from those matched by the Include Tables rule.

      Turn on the Exclude Tables switch and specify the tables to exclude. Separate multiple tables or expressions with a comma (,). The asterisk (*) can be used as a wildcard.

      Agent Settings

      The agent to use for the connection. Valid values:

      • Platform automatic allocation (default): The platform allocates a node for connection access.

      • User-specified: After setting this parameter to User-specified, select the Agent to use.

      Model Loading Time

      If a data source contains more than 10,000 models, Tapdata Cloud periodically refreshes them at the specified time.

    4. Click Connection Test. If the test is successful, click Save.

      Note

      If the connection test fails, follow the on-screen instructions to resolve the issue.

  4. Connect to the destination Tablestore instance.

    1. On the Connection Management page, click Create Connection.

    2. In the Create Connection dialog box, on the Beta Data Sources tab, click Tablestore.

    3. On the Create Connection page, configure the data source parameters as follows.

      Parameter

      Description

      Connection Name

      The name of the data connection. This name must be unique.

      Connection Type

      The value is fixed to Target. This data connection can only be used as a destination in Tapdata Cloud, not as a source.

      Endpoint

      The endpoint of the Tablestore instance. For more information, see Endpoints.

      Instance Name

      The name of the Tablestore instance. For more information, see Instances.

      AccessKey ID

      The AccessKey ID and AccessKey Secret of the RAM user.

      Note

      When using a temporary access credential from STS, use the AccessKey ID and AccessKey Secret from that credential.

      AccessKey Secret

      AccessKey Token

      The token for the temporary STS credential. This parameter is required only when using an STS credential.

      Agent Settings

      The agent to use for the connection. Valid values:

      • Automatic allocation by the platform (Default): The platform allocates nodes to provide access.

      • Manual specification: After you set this parameter to Manual specification, select the Agent to use.

      Model Loading Frequency

      If a data source contains more than 10,000 models, Tapdata Cloud periodically refreshes them at the specified frequency.

    4. Click Connection Test. If the test is successful, click Save.

      Note

      If the connection test fails, follow the on-screen instructions to resolve the issue.

Step 4: Configure a data replication task

Configure a data replication task to synchronize data in real time between homogeneous or heterogeneous data sources. This feature is ideal for various business scenarios, including data migration, data synchronization, disaster recovery, and read performance scaling.

  1. Log in to the Tapdata Cloud platform.

  2. In the left-side navigation pane, click Data Replication.

  3. On the Data Replication page, click Create.

  4. From the left side of the page, drag the MySQL and Tablestore databases to the canvas on the right, and then connect them.

  5. Click the MySQL database. In the right-side panel, configure the parameters as described in the following table.

    Parameter

    Description

    Node name

    By default, this is the connection name. You can change it as needed.

    DDL event collection

    Since Tablestore does not support DDL statement synchronization, you can skip this parameter.

    Select tables

    Select the tables to replicate. The following options are available:

    • By table name: Select the tables to replicate.

      In the Tables to Replicate area, select the desired tables, and then click the fig_row icon.

      You can also click Bulk Select to enter multiple table names for bulk selection. Separate the table names with commas (,).

    • By regular expression: Use a regular expression to select tables for replication.

      When you use this method, any new tables created in the source database that match the expression are automatically synchronized to the target database.

    Batch read size

    The number of records read per batch during full synchronization. The default is 100.

  6. Click the Tablestore database to preview the data schema and configure advanced settings.

    Parameter

    Description

    Node name

    By default, this is the connection name. You can change it as needed.

    Inferred schema

    View the table schema (such as field names and data types) and modify the field types as needed.

    Note
    • To modify a field type, click the fig_down icon next to the target field type and complete the configuration in the dialog box that appears.

    • A single Tablestore table can have a maximum of 32 columns. If the source MySQL table exceeds this limit, drag a field processor node from the left-side panel and connect it between the MySQL and Tablestore nodes. Then, in the field processor node, you can filter out unnecessary columns to meet this limit.

    Advanced settings

    Duplicate handling policy

    Specifies what to do if the target table already exists. The default is Keep existing schema and data in the target. If the target table has no data and its schema differs from the source table, you can select Clear existing schema and data from the target.

    Data write mode

    Specifies how data is written to Tablestore. The following options are available:

    • Process by event type: Sets separate data write policies for insert, update, and delete events.

    • Append-only: Processes only insert events and discards update and delete events.

    Data write policy

    The policy for writing data based on event type. This parameter applies only when Data write mode is set to Process by event type.

    The policy for insert events is Update if target exists, and the policy for update and delete events is Discard if target does not exist.

    Full multi-threaded write

    The number of concurrent threads for full data writes. The default is 8. You can adjust this value based on the target database's write performance.

    Incremental write threads

    The number of concurrent threads for incremental data writes. This is disabled by default. Turn on the Incremental write threads switch and adjust the value based on the target's write performance.

  7. Optional: Click the fig_settingicon icon at the top of the canvas and configure the task properties as described in the following table.

    Parameter

    Description

    Task name

    A custom name for the synchronization task.

    Synchronization type

    Specifies the data synchronization method. The following options are available:

    • Full + incremental: After completing a full data synchronization, the task synchronizes incremental data.

    • Full: Copies all existing data from the source to the target.

    • Incremental: Copies new data and data changes from the source to the target in real time.

    Task description

    An optional description for the task.

    Advanced settings

    Scheduled start time

    Specifies when the task should start.

    To specify a start time, turn on the Scheduled start time switch and select a date and time.

    Data validation

    If you turn on the Data validation switch, the task automatically performs full and incremental validation on the synchronized data to ensure consistency. The number of inconsistent rows is displayed on the monitoring page.

    Important

    If a task meets any of the following conditions, the task will not be validated, even if you enable the Data Validation switch.

    • The task includes an intermediate processing node.

    • The source connection does not support validation.

    • The target connection does not support validation.

    Incremental concurrent write

    If you turn on the Incremental concurrent write switch, the task writes source data to the target concurrently during incremental synchronization.

    Incremental lag threshold

    Helps you quickly determine if the replication task is lagging.

    Turn on the Incremental lag threshold switch and set a threshold. During incremental synchronization, the task is considered lagging if its latency exceeds this threshold.

    The default value is 1 second.

    Processor threads

    Specifies the number of processor threads.

    Incremental data processing mode

    The processing mode for incremental data during synchronization. Valid values are Batch and Row by row.

    Agent settings

    Specifies which agent to use. The following options are available:

    • Platform-assigned (default): The platform automatically assigns an agent for the connection.

    • Manually specify: If you set this parameter to Manually specify, select the agent to use.

  8. After verifying the settings, click Start.

    After the task starts, you can monitor its status, including metrics such as QPS, latency, and task duration.

上一篇: Synchronize MySQL data using DTS 下一篇: Sync Kafka data
阿里云首页 表格存储 相关技术圈