Shard operations
DataHub provides two modes for managing shards: shard horizontal scaling and split and merge. Choose based on your requirements:
Shard horizontal scaling does not support merging shards; split and merge does.
To consume a topic using the Kafka protocol, enable shard horizontal scaling.
After you enable shard horizontal scaling, the key range feature is unavailable. The
BeginHashKeyandEndHashKeyvalues are identical across all shards, so writes based on aHashKeyorPartitionKeyare not supported. Implement custom hash-based routing at the application layer, and account for the fact that scaling operations can change the target shard for your writes.
Shard horizontal scaling mode
DataHub supports horizontal scaling for topic shards. Enable shard auto-scaling mode when creating a topic to use this feature.
Step 1
Enable the Shard Auto-scaling toggle at the bottom of the New Topic dialog box.
Step 2
Click the edit icon to modify the shard count.
On the topic details page, the current shard count is displayed above the Shard list tab. Click the edit icon next to it to change the count.
Step 3
View the shards after scaling. The Shard list shows the updated shard count, and the status of each new shard is ACTIVE.
Split and merge shards
DataHub lets you scale the number of shards in a topic up or down using the SplitShard and MergeShard operations.
Use cases
Use split and merge to respond to traffic changes without over-provisioning. For example, during a major sales event, data traffic can surge and the current shard count may be insufficient. Use the SplitShard operation to scale up. You can increase the number of shards to a maximum of 256, supporting throughput up to 1,280 MB/s under current flow control limits. After the event, when traffic returns to normal, merge adjacent shards using the MergeShard operation to reduce the shard count and reclaim capacity.
Shard properties
The ListShard API returns information about all shards in a topic. Each shard has the following properties:
{
"ShardId": "string",
"State": "string",
"ClosedTime": uint64,
"BeginHashKey": "string",
"EndHashKey": "string",
"ParentShardIds": [string,string,],
"LeftShardId": "string",
"RightShardId": "string"
}
SplitShard
Specify a 128-bit hash key and a shard ID—either through the SDK or the console. The SplitShard operation splits the specified shard into two child shards and returns their IDs and key ranges. The original (parent) shard then enters the CLOSED state.
For example, starting with this shard:
ShardId:0 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
To split it using the SDK:
String shardId = "0";
SplitShardRequest req = new SplitShardRequest(projectName, topicName, shardId, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
SplitShardResult resp = client.splitShard(req);
After the split, you have three shards:
ShardId:0 Status:CLOSED BeginHashKey:00000000000000000000000000000000
EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ShardId:1 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
EndHashKey:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ShardId:2 Status:ACTIVE BeginHashKey:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Merge shards
Specify two adjacent shard IDs—either through the SDK or the console. The MergeShard operation combines the two shards into a single new shard and returns its ID and key range. Both parent shards then enter the CLOSED state.
Shard adjacency
Two shards are adjacent when their hash key ranges form a contiguous set with no gaps. Attempting to merge non-adjacent shards will fail.
For example, starting with these two shards:
ShardId:0 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
EndHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
ShardId:1 Status:ACTIVE BeginHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
To merge them using the SDK:
String shardId = "0";
String adjacentShardId = "1";
MergeShardRequest req = new MergeShardRequest(projectName, topicName, shardId, adjacentShardId);
MergeShardResult resp = client.mergeShard(req);
After the merge, you have three shards:
ShardId:0 Status:CLOSED BeginHashKey:00000000000000000000000000000000
EndHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
ShardId:1 Status:CLOSED BeginHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ShardId:2 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Usage notes
After a split or merge, the parent shards enter the CLOSED state and progress through the following lifecycle:
CLOSED: The shard no longer accepts writes, and you cannot perform another split or merge operation on it. Existing data remains readable by consumers.
Reclaimed: After the topic's
lifecycleperiod expires, the system automatically reclaims theCLOSEDshard and all data in it becomes inaccessible. If a connector is configured, its task is automatically suspended once it finishes replicating data from theCLOSEDshard, and the task is deleted after the shard is reclaimed.
New shards must reach the ACTIVE state before accepting reads or writes. This typically takes less than 5 seconds. To confirm a shard is ready, call ListShard and check that its State field is ACTIVE.