If the synchronization speed is slow when you sync full data with Tablestore Reader, configure splits in the sync script.
Problem
The synchronization speed is slow when you sync full data with Tablestore Reader. The sync script is configured as follows:
"reader": {
"plugin": "ots",
"parameter": {
"datasource": "",
"table": "",
"column": [],
"range": {
"begin": [
{
"type": "INF_MIN"
}
],
"end": [
{
"type": "INF_MAX"
}
]
}
}
}Cause
The data volume is large, and splits are not configured in the sync script. The sync task uses a single thread to pull data, which slows down the synchronization.
Solution
If the full data volume is large, you can configure splits in the sync script. The procedure is as follows:
Obtain the split points using one of the following methods.
Use the Java software development kit (SDK) to call the ComputeSplitPointsBySize API operation. For more information, see Calculate shards by size.
The following is a sample response:
LowerBound:pkname1:INF_MIN, pkname2:INF_MIN UpperBound:pkname1:cbcf23c8cdf831261f5b3c052db3479e, pkname2:INF_MIN LowerBound:pkname1:cbcf23c8cdf831261f5b3c052db3479e, pkname2:INF_MIN UpperBound:pkname1:INF_MAX, pkname2:INF_MAXDownload the Tablestore command-line interface (CLI) and run the
points -s splitSize -t tablenamecommand to obtain the split points. For more information, see Command-line interface.NoteThe unit for `splitSize` is 100 MB. You do not need to configure split points for small data volumes. For large data volumes, set `splitSize` based on the maximum concurrency that your sync environment supports.
The following is a sample response:
[ { "LowerBound": { "PrimaryKeys": [ { "ColumnName": "pkname1", "Value": null, "PrimaryKeyOption": 2 }, { "ColumnName": "pkname2", "Value": null, "PrimaryKeyOption": 2 } ] }, "UpperBound": { "PrimaryKeys": [ { "ColumnName": "pkname1", "Value": "cbcf23c8cdf831261f5b3c052db3479e\u0000", "PrimaryKeyOption": 0 }, { "ColumnName": "pkname2", "Value": null, "PrimaryKeyOption": 2 } ] }, "Location": "80310717938EDF503FB1E26F70710391" }, { "LowerBound": { "PrimaryKeys": [ { "ColumnName": "pkname1", "Value": "cbcf23c8cdf831261f5b3c052db3479e\u0000", "PrimaryKeyOption": 0 }, { "ColumnName": "pkname2", "Value": null, "PrimaryKeyOption": 2 } ] }, "UpperBound": { "PrimaryKeys": [ { "ColumnName": "pkname1", "Value": null, "PrimaryKeyOption": 3 }, { "ColumnName": "pkname2", "Value": null, "PrimaryKeyOption": 3 } ] }, "Location": "80310717938EDF503FB1E26F70710391" } ]Find the `Value` of the first primary key (the partition key) in the response. For example, the `pkname1` value is `null` for the first `LowerBound`, `"cbcf23c8cdf831261f5b3c052db3479e\u0000"` for the first `UpperBound`, `"cbcf23c8cdf831261f5b3c052db3479e\u0000"` for the second `LowerBound`, and `null` for the second `UpperBound`. In this case, configure the `split` parameter in the full data sync script as follows:
"split" : [ { "type":"STRING", "value":"cbcf23c8cdf831261f5b3c052db3479e\u0000" } ]With this configuration, Tablestore splits the full data into two ranges:
(INF_MIN,cbcf23c8cdf831261f5b3c052db3479e\u0000)and[cbcf23c8cdf831261f5b3c052db3479e\u0000,INF_MAX). Tablestore then pulls data from these ranges concurrently to improve the synchronization speed.
Configure the split points in the sync script. The following is a sample script:
"range": { "begin": [ { "type": "INF_MIN" } ], "end": [ { "type": "INF_MAX" } ], "split": [ { "type": "STRING", "value": "splitPoint1" }, { "type": "STRING", "value": "splitPoint2" }, { "type": "STRING", "value": "splitPoint3" } ] }
If the synchronization speed does not improve after you configure the split points, submit a ticket or join the DingTalk group 23307953 (Tablestore Technical Exchange Group-2) to contact Tablestore technical support for assistance.