This topic describes how to use DataX to synchronize data from a MySQL database to Tablestore.
Background
DataX is an offline data synchronization tool from Alibaba Cloud. It connects to a MySQL database over JDBC to retrieve data by using SQL statements. DataX then caches the data in the local JVM and uses writer threads to write it to a Tablestore data table. For more information about DataX, see DataX.
Prerequisites
-
Prepare the required MySQL information, including the username, password, and JDBC connection information.
-
Activate Tablestore, and create an instance and a data table to store the synchronized data. For details, see Activate Tablestore and create an instance and Create a data table.
ImportantUse the original primary key or a unique index from MySQL as the primary key for the Tablestore data table. For the sample MySQL table and Tablestore data table used in this topic, see Appendix: Sample data.
-
Obtain the endpoint and name of your Tablestore instance.
-
Log on to the Tablestore console.
-
In the top navigation bar, select a resource group and a region.
-
On the Overview page, click the instance alias or click Instance Management in the Actions column.
-
On the Instance Details tab, view the instance name and endpoint.
-
-
Obtain an AccessKey pair. You can use the AccessKey pair of an Alibaba Cloud account or a RAM user for configuration. For details on how to obtain an AccessKey pair, see Obtain an AccessKey pair.
ImportantFor security reasons, use a RAM user to access Tablestore. You can create a RAM user, grant the
AliyunOTSFullAccesspermission to the user, and create an AccessKey pair for the RAM user. For details, see Use the AccessKey pair of a RAM user to access Tablestore.
Procedure
The procedures in this topic are performed on an ECS instance that runs Alibaba Cloud Linux 3.2104 LTS 64-bit or Ubuntu 22.04 64-bit.
Step 1: Install dependencies
-
Install Python (Python 2 or Python 3 is supported).
Alibaba Cloud Linux and Ubuntu on ECS come with Python 3 pre-installed. If you use a different server, you must install Python yourself.
-
Install JDK (1.8 or later). JDK 1.8 is recommended.
The following commands show how to install JDK 1.8 on the specified systems. If you use a different server, you must install the JDK yourself.
Alibaba Cloud Linux
yum -y install java-1.8.0-openjdk-devel.x86_64Ubuntu
apt update && apt upgrade apt install openjdk-8-jdk
Step 2: Install DataX
-
Download the DataX package.
wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz -
Decompress the package.
tar -zxvf datax.tar.gz
If you need to build DataX from source, see the DataX Installation Guide.
Step 3: Create a configuration file
-
Go to the
bindirectory of DataX.cd datax/bin -
Create a configuration file named
mysql_to_ots.json.vi mysql_to_ots.jsonRefer to the following configuration sample. The MySQL reader supports two data reading modes. Select a mode based on your business requirements.
-
QuerySQL mode: Retrieves data by running an SQL query. This mode supports table joins.
-
Table mode: Defines the data to be exported by specifying a table name, column names, and a WHERE clause. DataX automatically assembles an SQL statement to extract the data. This mode also supports parallel synchronization through data sharding.
QuerySQL mode
Replace the placeholder values in the configuration file with your specific information.
{ "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "mysql_username", "password": "mysql_password", "connection": [ { "querySql": [ "select * from table_name" ], "jdbcUrl": [ "jdbc:mysql://server_ip:3306/database_name?useSSL=false" ] } ] } }, "writer": { "name": "otswriter", "parameter": { "endpoint":"endpoint", "accessId":"accesskey_id", "accessKey":"accesskey_secret", "instanceName":"instance_name", "table":"table_name", "primaryKey":[ {"name":"order_id", "type":"string"} ], "column":[ {"name":"user_id","type":"string"}, {"name":"sku_id","type":"string"}, {"name":"price","type":"double"}, {"name":"num","type":"int"}, {"name":"total_price","type":"double"}, {"name":"order_status","type":"string"}, {"name":"create_time","type":"string"}, {"name":"modified_time","type":"string"} ], "writeMode":"UpdateRow" } } } ] } }The following table describes the MySQL Reader parameters that you must replace in the configuration.
Parameter
Description
username
The username for the MySQL database connection.
password
The password for the MySQL user.
querySql
The SQL query that defines the scope of data to synchronize.
jdbcUrl
The JDBC connection information.
For more information about MySQL Reader, see MySQL Reader.
The following table describes the Tablestore Writer parameters that you must replace in the configuration.
Parameter
Description
endpoint
The endpoint of the instance.
accessId
The AccessKey ID of your Alibaba Cloud account or RAM user.
accessKey
The AccessKey secret of your Alibaba Cloud account or RAM user.
instanceName
The name of the instance.
table
The name of the target table.
primaryKey
The list of primary key columns for the target table.
column
The attribute columns to be written.
For more information about Tablestore Writer, see Tablestore Writer.
Table mode
Replace the placeholder values in the configuration file with your specific information.
{ "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "mysql_username", "password": "mysql_password", "column": [ "order_id", "user_id" , "sku_id" , "price", "num", "total_price", "order_status", "create_time", "modified_time" ], "splitPk": "num", "connection": [ { "table": [ "table_name" ], "jdbcUrl": [ "jdbc:mysql://server_ip:3306/database_name?useSSL=false" ] } ] } }, "writer": { "name": "otswriter", "parameter": { "endpoint":"endpoint", "accessId":"accesskey_id", "accessKey":"accesskey_secret", "instanceName":"instance_name", "table":"table_name", "primaryKey":[ {"name":"order_id", "type":"string"} ], "column":[ {"name":"user_id","type":"string"}, {"name":"sku_id","type":"string"}, {"name":"price","type":"double"}, {"name":"num","type":"int"}, {"name":"total_price","type":"double"}, {"name":"order_status","type":"string"}, {"name":"create_time","type":"string"}, {"name":"modified_time","type":"string"} ], "writeMode":"UpdateRow" } } } ] } }The following table describes the MySQL Reader parameters that you must replace in the configuration.
Parameter
Description
username
The username for the MySQL database connection.
password
The password for the MySQL user.
column
The columns of the source table to synchronize.
splitPk
The column used for data sharding. DataX uses this column to split the dataset for parallel processing only when the
channelvalue is greater than 1.NoteFor best performance, shard data on an integer-based column.
table
The name of the MySQL source table.
jdbcUrl
The JDBC connection information.
For more information about MySQL Reader, see MySQL Reader.
The following table describes the Tablestore Writer parameters that you must replace in the configuration.
Parameter
Description
endpoint
The endpoint of the instance.
accessId
The AccessKey ID of your Alibaba Cloud account or RAM user.
accessKey
The AccessKey secret of your Alibaba Cloud account or RAM user.
instanceName
The name of the instance.
table
The name of the target table.
primaryKey
The list of primary key columns for the target table.
column
The attribute columns to be written.
For more information about Tablestore Writer, see Tablestore Writer.
-
Step 4: Run the synchronization command
-
Run the following command to start data synchronization.
python3 datax.py mysql_to_ots.jsonAfter the synchronization job is complete, DataX prints a summary of the job status.
2025-02-10 18:02:17.355 [job-0] INFO JobContainer - Job start time : 2025-02-10 18:02:06 Job end time : 2025-02-10 18:02:17 Total time elapsed : 11s Average throughput : 871.56KB/s Record write speed : 10000rec/s Total records read : 100000 Total read/write failures : 0 -
Verify the result.
View the imported data in the Tablestore console.
Related documents
-
For the data types supported by Tablestore attribute columns, see Data types.
Appendix: Sample data
MySQL source table
The following CREATE TABLE statement is for the sample MySQL source table:
create table orders (
order_id varchar(50) primary key comment 'Order ID',
user_id varchar(10) not null comment 'User ID',
sku_id varchar(10) not null comment 'SKU ID',
price decimal(12, 2) not null comment 'Unit price',
num int not null comment 'Quantity',
total_price decimal(12, 2) not null comment 'Total order price',
order_status varchar(2) not null comment 'Order status',
create_time timestamp not null default current_timestamp comment 'Order creation time',
modified_time timestamp not null default current_timestamp on update current_timestamp comment 'Last modified time'
);
Tablestore target table
Because Tablestore is schema-free, you only need to specify the primary key order_id when you create the target table.
For this example, the order_id primary key is a STRING that also serves as the partition key. The table is configured with Max Versions set to 1, a data lifecycle of Never Expire, no predefined attribute columns, and a disabled search index.