Migrate self-managed MySQL on ECS to MaxCompute

更新时间:
复制 MD 格式

This article describes how to use an exclusive resource group for Data Integration to migrate data from a self-managed MySQL database on an ECS instance to MaxCompute.

Prerequisites

  • You have at least one ECS instance bound to a VPC. Do not use a classic network. You must also have a MySQL database installed on the instance, along with a user and test data. This example uses the following statements to create the test data for the self-managed MySQL database.

    CREATE TABLE IF NOT EXISTS good_sale(
       create_time timestamp,
       category varchar(20),
       brand varchar(20),
       buyer_id varchar(20),
       trans_num varchar(20),
       trans_amount DOUBLE,
       click_cnt varchar(20)
       );
    insert into good_sale values('2018-08-21','coat','brandA','lilei',3,500.6,7),
    ('2018-08-22','food','brandB','lilei',1,303,8),
    ('2018-08-22','coat','brandC','hanmeimei',2,510,2),
    ('2018-08-22','bath','brandA','hanmeimei',1,442.5,1),
    ('2018-08-22','food','brandD','hanmeimei',2,234,3),
    ('2018-08-23','coat','brandB','jimmy',9,2000,7),
    ('2018-08-23','food','brandA','jimmy',5,45.1,5),
    ('2018-08-23','coat','brandE','jimmy',5,100.2,4),
    ('2018-08-24','food','brandG','peiqi',10,5560,7),
    ('2018-08-24','bath','brandF','peiqi',1,445.6,2),
    ('2018-08-24','coat','brandA','ray',3,777,3),
    ('2018-08-24','bath','brandG','ray',3,122,3),
    ('2018-08-24','coat','brandC','ray',1,62,7) ;
  • Take note of the private IP, VPC, and vSwitch of your ECS instance.

  • Configure the security group for the ECS instance to allow inbound traffic on the default MySQL port, 3306. For more information, see Add a security group rule. Take note of your security group name.

  • Create a DataWorks workspace. This example uses a DataWorks workspace in basic mode that uses MaxCompute as the compute engine. Ensure that your ECS instance and DataWorks workspace are in the same region. For information about how to create a workspace, see Create a workspace.

  • Purchase an exclusive resource group for Data Integration and bind it to the VPC where the ECS instance resides. The exclusive resource group must be in the same zone as the ECS instance. For more information, see Use an exclusive resource group for Data Integration. After the binding is complete, you can view your exclusive resource group in the Resource Groups.

  • In the Network Settings > VPC Binding section, check whether the VPC, vSwitch, and Security Group information is consistent with that of the ECS.

  • Add a MaxCompute data source. For more information, see Associate a MaxCompute compute resource.

Background information

An exclusive resource group for Data Integration ensures fast and stable data transfers. The exclusive resource group that you purchase, the data source to be accessed (the self-managed MySQL database on an ECS instance in this article), and the DataWorks workspace must be in the same region. Additionally, the resource group and the ECS instance must be in the same zone.

Procedure

  1. Create a MySQL data source in DataWorks.

      1. Log on to the DataWorks console. In the target region, click More > Management Center in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Management Center.

      2. On the Workspace Management page, click Data Sources in the left-side navigation pane to open the data source page.

    1. Click Add Data Source.

    2. In the Add Data Source dialog box, click MySQL.

    3. In the Create MySQL Data Source dialog box, configure the parameters. For more information, see Configure a MySQL data source.

      This example uses the User-created Data Store with Public IP Addresses. For Connection Address, enter the private IP address of your ECS instance and the default MySQL port number, 3306.

      Note

      The Test Connectivity feature is not supported for self-managed MySQL data sources within a VPC. A failed connectivity test is expected at this stage.

    4. Click Test Connectivity for the required resource group.

      You can use an exclusive resource group for Data Integration to synchronize data. Each task can use only one resource group. If multiple resource groups are available, you must test the connectivity of each resource group that you want to use. This ensures that the resource group used for the synchronization task can connect to the data source. Otherwise, the data synchronization task will fail. For more information, see Network connectivity solutions.

    5. After the connectivity test passes, click Finish.

  2. Create a MaxCompute table.

    You need to create a table in DataWorks to receive the test data from MySQL.

    1. Click the 图标 icon in the upper-left corner and choose All Products > Data Development and O&M > DataStudio.

    2. Create a Workflow. For more information, see Create a scheduled workflow.

    3. Right-click the new workflow and choose Create Table > Table.

    4. Enter a name for the MaxCompute table. This example uses the name good_sale, which is the same as the MySQL source table. Click DDL, enter the DDL statement, and then click Generate Table Schema.

      This example uses the following DDL statement. Note the data type conversions.

      CREATE TABLE IF NOT EXISTS good_sale(
         create_time string,
         category STRING,
         brand STRING,
         buyer_id STRING,
         trans_num BIGINT,
         trans_amount DOUBLE,
         click_cnt BIGINT
         );
    5. Enter a Display Name for the table and click Commit to Production Environment to create the good_sale table in MaxCompute.

  3. Configure a Data Integration task.

    1. Right-click the workflow and choose Create Node > Data Integration > Batch Synchronization to create a Data Integration task.

    2. Select the MySQL data source that you added as the source and the MaxCompute data source as the destination. Click Conversion script to switch the Data Integration task to script mode.

      At this point, if an error occurs or you cannot select a Table from the data source, this is normal. Click image Script Conversion in the toolbar to switch to script mode.

    3. In the right-side pane, click Resource Group configuration and select the exclusive resource group that you purchased.

      If you do not switch the task to the exclusive resource group for Data Integration, the task will fail.

    4. Enter the following script for the Data Integration task.

      {
          "type": "job",
          "steps": [
              {
                  "stepType": "mysql",
                  "parameter": {
                      "column": [//Source column names
                          "create_time",
                          "category",
                          "brand",
                          "buyer_id",
                          "trans_num",
                          "trans_amount",
                          "click_cnt"
                      ],
                      "connection": [
                          {
                              "datasource": "shuai",//Source data source
                              "table": [
                                  "good_sale"//Source database table name. This must be an array in square brackets.
                              ]
                          }
                      ],
                      "where": "",
                      "splitPk": "",
                      "encoding": "UTF-8"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "odps",
                  "parameter": {
                      "partition": "",
                      "truncate": true,
                      "datasource": "odps_source",//Destination MaxCompute data source name
                      "column": [//Destination column names
                          "create_time",
                          "category",
                          "brand",
                          "buyer_id",
                          "trans_num",
                          "trans_amount",
                          "click_cnt"
                      ],
                      "emptyAsNull": false,
                      "table": "good_sale"//Destination table name
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "version": "2.0",
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          },
          "setting": {
              "errorLimit": {
                  "record": "0"
              },
              "speed": {
                  "throttle": false,
                  "concurrent": 2
              }
          }
      }
    5. Click Run. You can check the Operational Logs at the bottom of the page to confirm that the data has been transferred to MaxCompute.

Results

You can create an ODPS SQL node to query the data in the MaxCompute table.

Enter the query statement select * from good_sale; and click Running. After the query is complete, the results show the data transferred to the MaxCompute table.