Shard operations

更新时间:
复制 MD 格式

Shard operations

DataHub provides two modes for managing shards: shard horizontal scaling and split and merge. Choose based on your requirements:

  1. Shard horizontal scaling does not support merging shards; split and merge does.

  2. To consume a topic using the Kafka protocol, enable shard horizontal scaling.

  3. After you enable shard horizontal scaling, the key range feature is unavailable. The BeginHashKey and EndHashKey values are identical across all shards, so writes based on a HashKey or PartitionKey are not supported. Implement custom hash-based routing at the application layer, and account for the fact that scaling operations can change the target shard for your writes.

Shard horizontal scaling mode

DataHub supports horizontal scaling for topic shards. Enable shard auto-scaling mode when creating a topic to use this feature.

Step 1

Enable the Shard Auto-scaling toggle at the bottom of the New Topic dialog box.

Step 2

Click the edit icon to modify the shard count.

On the topic details page, the current shard count is displayed above the Shard list tab. Click the edit icon next to it to change the count.

Step 3

View the shards after scaling. The Shard list shows the updated shard count, and the status of each new shard is ACTIVE.

Split and merge shards

DataHub lets you scale the number of shards in a topic up or down using the SplitShard and MergeShard operations.

Use cases

Use split and merge to respond to traffic changes without over-provisioning. For example, during a major sales event, data traffic can surge and the current shard count may be insufficient. Use the SplitShard operation to scale up. You can increase the number of shards to a maximum of 256, supporting throughput up to 1,280 MB/s under current flow control limits. After the event, when traffic returns to normal, merge adjacent shards using the MergeShard operation to reduce the shard count and reclaim capacity.

Shard properties

The ListShard API returns information about all shards in a topic. Each shard has the following properties:

{
    "ShardId": "string",
    "State": "string",
    "ClosedTime": uint64,
    "BeginHashKey": "string",
    "EndHashKey": "string",
    "ParentShardIds": [string,string,],
    "LeftShardId": "string",
    "RightShardId": "string"
}

SplitShard

Specify a 128-bit hash key and a shard ID—either through the SDK or the console. The SplitShard operation splits the specified shard into two child shards and returns their IDs and key ranges. The original (parent) shard then enters the CLOSED state.

For example, starting with this shard:

ShardId:0 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

To split it using the SDK:

String shardId = "0";
SplitShardRequest req = new SplitShardRequest(projectName, topicName, shardId, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
SplitShardResult resp = client.splitShard(req);

After the split, you have three shards:

ShardId:0 Status:CLOSED BeginHashKey:00000000000000000000000000000000
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ShardId:1 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
                    EndHashKey:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ShardId:2 Status:ACTIVE BeginHashKey:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Merge shards

Specify two adjacent shard IDs—either through the SDK or the console. The MergeShard operation combines the two shards into a single new shard and returns its ID and key range. Both parent shards then enter the CLOSED state.

Shard adjacency

Two shards are adjacent when their hash key ranges form a contiguous set with no gaps. Attempting to merge non-adjacent shards will fail.

For example, starting with these two shards:

ShardId:0 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
                    EndHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
ShardId:1 Status:ACTIVE BeginHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

To merge them using the SDK:

String shardId = "0";
String adjacentShardId = "1";
MergeShardRequest req = new MergeShardRequest(projectName, topicName, shardId, adjacentShardId);
MergeShardResult resp = client.mergeShard(req);

After the merge, you have three shards:

ShardId:0 Status:CLOSED BeginHashKey:00000000000000000000000000000000
                    EndHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
ShardId:1 Status:CLOSED BeginHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ShardId:2 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Usage notes

After a split or merge, the parent shards enter the CLOSED state and progress through the following lifecycle:

  • CLOSED: The shard no longer accepts writes, and you cannot perform another split or merge operation on it. Existing data remains readable by consumers.

  • Reclaimed: After the topic's lifecycle period expires, the system automatically reclaims the CLOSED shard and all data in it becomes inaccessible. If a connector is configured, its task is automatically suspended once it finishes replicating data from the CLOSED shard, and the task is deleted after the shard is reclaimed.

New shards must reach the ACTIVE state before accepting reads or writes. This typically takes less than 5 seconds. To confirm a shard is ready, call ListShard and check that its State field is ACTIVE.