Migrate data from MaxCompute to Tablestore

更新时间:
复制 MD 格式

This topic explains how to use the data integration and development features of DataWorks to migrate data from MaxCompute to Tablestore.

Procedure

  1. Activate services

    1. Activate MaxCompute

    2. Activate DataWorks

  2. Create a table in DataWorks

    1. Log on to the DataWorks console and select a region in the upper-left corner.

    2. In the left navigation pane, click Workspace.

    3. On the Workspaces page, in the Actions column for the target workspace, choose Shortcuts > Data Studio.

    4. On the Data Studio page, create a MaxCompute SQL node. Run the following statement to create a table named transs.

      CREATE TABLE transs
      (name    STRING,
      id    STRING,
      gender    STRING);
  3. Import data into the transs table

    1. Download the sample CSV file to your computer: demo_data.csv.

    2. In the left navigation pane, choose Data Integration > Data Upload and Download.

    3. Click Go to Data Upload and Download.

    4. In the left navigation bar, click the upload icon image, and click Upload Data.

      Upload the local data to the table named transs in MaxCompute.

  4. Create a table in the Tablestore console

    1. Log on to the Tablestore console and create an instance.

    2. Create a data table named trans. For more information, see Create a data table.

      Set name and id as the primary keys.

  5. Add a MaxCompute data source in DataWorks

    1. Log on to the DataWorks console and select a region in the upper-left corner.

    2. In the left navigation pane, click Workspace.

    3. On the Workspaces page, click the name of the target workspace.

    4. On the Workspace Details page, click Data Sources in the left navigation pane.

    5. On the Data Source tab, click Add Data Source, and select MaxCompute.

    6. On the Add MaxCompute Data Source page, fill in the Basic Information.

      For more information, see Configure a MaxCompute data source.

  6. Add a Tablestore data source. For more information, see Configure a Tablestore data source.

  7. Configure MaxCompute Reader and Tablestore Writer

    1. Log on to the DataWorks console and select a region in the upper-left corner.

    2. In the left navigation pane, choose Data Development and O&M > Data Development.

    3. Select Workspace and click Go to Data Studio.

    4. On the left side of Data Studio, click image, and select Create Node > Data Integration > Batch Synchronization.

      • For Source, select MaxCompute.

      • For Destination, select Tablestore.

    5. On the node configuration page, configure the following parameters.

      • Data Source-Source: Select the MaxCompute data source that you added.

      • Data Source-Destination: Select the Tablestore data source that you added.

      • Source-Table: Select the MaxCompute table that you created.

      • Destination-Table: Select the Tablestore table that you created.

      • Runtime Resource Group: Select the exclusive resource group that you created.

      • Keep the default values for the other parameters.

      Alternatively, click the 切换代码 icon above the configuration area to switch to script mode. The following code provides an example:

      {
          "type": "job",
          "steps": [
              {
                  "stepType": "odps",
                  "parameter": {
                      "partition": [],
                      "datasource": "odps_first",
                      "column": [
                          "name",
                          "id",
                          "gender"
                      ],
                      "table": "transs"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "ots",
                  "parameter": {
                      "datasource": "transs",
                      "column": [
                          {
                              "name": "gender",
                              "type": "STRING"
                          }
                      ],
                      "writeMode": "UpdateRow",
                      "table": "trans",
                      "primaryKey": [
                          {
                              "name": "name",
                              "type": "STRING"
                          },
                          {
                              "name": "id",
                              "type": "STRING"
                          }
                      ]
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "version": "2.0",
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          },
          "setting": {
              "errorLimit": {
                  "record": "0"
              },
              "speed": {
                  "throttle": false,
                  "concurrent": 1,
                  "dmu": 1
              }
          }
      }
  8. Verify the data in the Tablestore console.

    1. Log on to the Tablestore console. In the upper-left corner, select a region.

    2. In the navigation pane on the left, select All Instances.

    3. On the All Instances page, click the instance name to go to the Instance Management page.

    4. On the Instance Management page, click the Instance Details tab.

    5. On the Instance Details tab, in the Tables area, click the name of the data table that you want to view.

    6. On the data table management page, click the Query Data tab to view the data in the table.