How to configure splits when syncing full data with Tablestore Reader-Tablestore(Tablestore)-阿里云帮助中心

If the synchronization speed is slow when you sync full data with Tablestore Reader, configure splits in the sync script.

Problem

The synchronization speed is slow when you sync full data with Tablestore Reader. The sync script is configured as follows:

"reader": {
  "plugin": "ots",
  "parameter": {
    "datasource": "",
    "table": "",
    "column": [],
    "range": {
      "begin": [
        {
          "type": "INF_MIN"
        }
      ],
      "end": [
        {
          "type": "INF_MAX"
        }
      ]
    }
  }
}

Cause

The data volume is large, and splits are not configured in the sync script. The sync task uses a single thread to pull data, which slows down the synchronization.

Solution

If the full data volume is large, you can configure splits in the sync script. The procedure is as follows:

Obtain the split points using one of the following methods.

Use the Java software development kit (SDK) to call the ComputeSplitPointsBySize API operation. For more information, see Calculate shards by size.

The following is a sample response:

LowerBound:pkname1:INF_MIN, pkname2:INF_MIN
UpperBound:pkname1:cbcf23c8cdf831261f5b3c052db3479e, pkname2:INF_MIN
LowerBound:pkname1:cbcf23c8cdf831261f5b3c052db3479e, pkname2:INF_MIN
UpperBound:pkname1:INF_MAX, pkname2:INF_MAX

Download the Tablestore command-line interface (CLI) and run the points -s splitSize -t tablename command to obtain the split points. For more information, see Command-line interface.

Note

The unit for `splitSize` is 100 MB. You do not need to configure split points for small data volumes. For large data volumes, set `splitSize` based on the maximum concurrency that your sync environment supports.

The following is a sample response:

[

  {

    "LowerBound": {

      "PrimaryKeys": [

        {

          "ColumnName": "pkname1",

          "Value": null,

          "PrimaryKeyOption": 2

        },

        {

          "ColumnName": "pkname2",

          "Value": null,

          "PrimaryKeyOption": 2

        }

      ]

    },

    "UpperBound": {

      "PrimaryKeys": [

        {

          "ColumnName": "pkname1",

          "Value": "cbcf23c8cdf831261f5b3c052db3479e\u0000",

          "PrimaryKeyOption": 0

        },

        {

          "ColumnName": "pkname2",

          "Value": null,

          "PrimaryKeyOption": 2

        }

      ]

    },

    "Location": "80310717938EDF503FB1E26F70710391"

  },

  {

    "LowerBound": {

      "PrimaryKeys": [

        {

          "ColumnName": "pkname1",

          "Value": "cbcf23c8cdf831261f5b3c052db3479e\u0000",

          "PrimaryKeyOption": 0

        },

        {

          "ColumnName": "pkname2",

          "Value": null,

          "PrimaryKeyOption": 2

        }

      ]

    },

    "UpperBound": {

      "PrimaryKeys": [

        {

          "ColumnName": "pkname1",

          "Value": null,

          "PrimaryKeyOption": 3

        },

        {

          "ColumnName": "pkname2",

          "Value": null,

          "PrimaryKeyOption": 3

        }

      ]

    },

    "Location": "80310717938EDF503FB1E26F70710391"

  }

]

Find the `Value` of the first primary key (the partition key) in the response. For example, the `pkname1` value is `null` for the first `LowerBound`, `"cbcf23c8cdf831261f5b3c052db3479e\u0000"` for the first `UpperBound`, `"cbcf23c8cdf831261f5b3c052db3479e\u0000"` for the second `LowerBound`, and `null` for the second `UpperBound`. In this case, configure the `split` parameter in the full data sync script as follows:

"split" : [

 {

   "type":"STRING",

   "value":"cbcf23c8cdf831261f5b3c052db3479e\u0000"

 }

]

With this configuration, Tablestore splits the full data into two ranges: (INF_MIN,cbcf23c8cdf831261f5b3c052db3479e\u0000) and [cbcf23c8cdf831261f5b3c052db3479e\u0000,INF_MAX). Tablestore then pulls data from these ranges concurrently to improve the synchronization speed.

Configure the split points in the sync script. The following is a sample script:

"range": {
      "begin": [
        {
          "type": "INF_MIN"
        }
      ],
      "end": [
        {
          "type": "INF_MAX"
        }
      ],
      "split": [
        {
          "type": "STRING",
          "value": "splitPoint1"
        },
        {
          "type": "STRING",
          "value": "splitPoint2"
        },
        {
          "type": "STRING",
          "value": "splitPoint3"
        }
      ]
}

If the synchronization speed does not improve after you configure the split points, submit a ticket or join the DingTalk group 23307953 (Tablestore Technical Exchange Group-2) to contact Tablestore technical support for assistance.