The Compaction Service is a new feature introduced in EMR Serverless StarRocks version 3.5 (currently in Beta). It offloads compaction tasks from your primary compute group to a dedicated service, providing workload isolation, auto scaling, and performance optimization.
Overview
Core benefits
|
Capability |
Description |
|
Workload isolation |
Running compaction tasks on a dedicated service prevents resource contention with business workloads such as queries and data ingestion. This ensures application stability. |
|
Auto scaling |
Scales automatically based on the compaction workload. Configure Min CU and Max CU to balance compaction timeliness and cost. |
|
Out-of-the-box |
Automatically created for clusters running version 3.5. Enable it with a single click in the console—no additional resource purchase or configuration required. |
Performance optimizations
Beyond workload isolation, the Compaction Service provides the following performance optimizations:
-
Peer cache reads: During a compaction task, the service directly pulls data from cached nodes in the primary compute group (peer cache). This avoids accessing object storage (remote I/O) and significantly improves read performance for compaction.
-
Cache push: After compaction is complete, the Compaction Service asynchronously pushes the merged data files to the nodes in the primary compute group. This prevents query performance from degrading due to cache misses that would otherwise require accessing object storage.
Prerequisites
-
Your EMR Serverless StarRocks cluster is version 3.5 or later.
-
Your cluster uses the storage-compute separation architecture.
Enable the Compaction Service
To enable the Compaction Service, follow these steps in the EMR Serverless StarRocks console:
-
Go to the E-MapReduce Serverless StarRocks instance list page.
-
Log on to the E-MapReduce console.
-
In the navigation pane on the left, choose .
-
In the top menu bar, select the required region.
-
-
Click the ID of the target instance.
-
Click the Compaction Service tab. On the Basic Information page, click Start the service.
-
In the panel that appears, configure Minimum CU and Maximum CU.
-
Click Start the service.
Stop the Compaction Service
In the console, click Disable the service. In the confirmation panel, select the Risk Confirmation checkbox, and then click Confirm service shutdown. After the service is stopped, compaction tasks revert to running on the primary compute group. In-progress tasks will complete normally.
Auto scaling
CU configuration
Configure the auto scaling range for the Compaction Service:
|
Parameter |
Description |
Recommendation |
|
Min CU |
The minimum number of compute units. The service scales in to this value during idle periods. |
Set this to the minimum value required to meet baseline compaction needs. |
|
Max CU |
The maximum number of compute units. The service scales out to this value during peak loads. |
Set this based on your peak write throughput and compaction score. |
Scaling policy
The Compaction Service scales automatically based on the following metrics:
-
compaction score: Reflects the accumulation of data versions. A higher score indicates greater compaction pressure.
-
Task load: The ratio of current compaction tasks to available resources.
The system automatically scales out when the compaction score consistently rises or when tasks are queued. After the load decreases, the system gradually scales in to the configured Min CU.
Best practices
Use the Compaction Service in the following scenarios:
-
High write-throughput workloads: Continuous, high-frequency writes cause the compaction score to rise, degrading query performance.
-
Query-sensitive workloads: The service prevents compaction from competing for query resources, which is ideal for latency-sensitive applications.
-
Cost-optimization scenarios: Auto scaling uses on-demand resources for compaction, reducing standing costs.
Usage notes
-
The Compaction Service is only available for clusters that use the storage-compute separation (shared-data) architecture.
-
After you enable the Compaction Service, compaction tasks for all tables run on the service.
-
The compute unit (CU) resources for the Compaction Service are billed separately based on usage. Configure Min CU and Max CU appropriately.
-
If you stop the Compaction Service, compaction tasks automatically revert to running on their respective primary compute groups. In-progress tasks will complete normally.
-
To minimize potential system impact, enable the Compaction Service for the first time during off-peak hours.
Troubleshooting and tuning
Troubleshoot common compaction performance issues—slow compaction, small file accumulation, and compaction alerts—with the following diagnostic steps and parameter tuning recommendations.
Compaction takes too long or newly added nodes do not participate in tasks
If compaction tasks take longer than expected or newly added CN (Compute Node) nodes do not participate in compaction, follow these steps to diagnose and resolve the issue.
Diagnose the issue
-
Run the following SQL statement to check the status of all CN nodes. Verify that newly added nodes are in the
Alivestate and belong to the expected compute group.SHOW COMPUTE NODES; -
Run the following SQL statement to view the current compaction task status, including the number of running and pending tasks.
SHOW PROC '/compactions'; -
Query the
information_schema.be_cloud_native_compactionsview for detailed compaction task information, including the specific tablet, transaction, and progress of each task.SELECT * FROM information_schema.be_cloud_native_compactions;
Tune parameters
Adjust the following BE configuration parameters to improve compaction concurrency and efficiency:
|
Parameter |
Recommended value |
Description |
|
|
8 |
Number of concurrent compaction threads. Increasing this value allows more compaction tasks to run in parallel. |
|
|
100 |
Maximum number of rowsets that a single cumulative compaction can merge. Lowering this value causes more frequent but faster compaction cycles. |
Too many small files causing merge failures or out-of-memory errors
When a table accumulates too many small files (rowsets), compaction may fail due to excessive memory consumption, or the CN node may run out of memory (OOM). Use the following approach.
Immediate actions
-
Scale out CN nodes: Add elastic CN nodes to distribute the compaction workload and reduce memory pressure on individual nodes.
-
Reduce ingestion frequency: Temporarily lower the frequency of data ingestion or pause ingestion to allow compaction to catch up with the backlog.
-
Tune parameters for faster merging: Adjust the following BE configuration parameters to accelerate compaction:
Parameter
Recommended value
Description
compact_threads8
Increase compaction parallelism.
max_cumulative_compaction_num_singleton_deltas300
Allow merging more rowsets per cumulative compaction cycle.
lake_pk_compaction_max_input_rowsets300
Increase the maximum number of input rowsets for Primary Key table compaction.
Long-term optimizations
-
Adjust the bucketing strategy to ensure that data is evenly distributed across buckets. Keep the data volume per bucket under 5 GB.
-
Review and optimize the ingestion pipeline to reduce write frequency and use larger batch sizes.
Compaction alert handling
When you receive compaction alerts (such as high compaction score or task backlog warnings), take the following steps:
-
Switch to batch ingestion: Adjust your data ingestion tasks to batch mode to reduce write frequency. This gives compaction more time to process pending tasks.
-
Increase base compaction check interval: Set
base_compaction_check_interval_secondsfrom the default value of 60 seconds to 600 seconds. This reduces the frequency of base compaction checks and allows the system to prioritize cumulative compaction.-- Adjust via BE configuration base_compaction_check_interval_seconds = 600 -
Optimize bucket count: For small tables, use 8 to 10 buckets. Over-bucketing creates more tablets, which increases compaction overhead.
-
Increase cumulative merge limit: Increase
max_cumulative_compaction_num_singleton_deltasto allow more rowsets to be merged per cumulative compaction cycle, accelerating the processing of small version files. -
Monitor compaction status: Run
SHOW PROC '/compactions';regularly to monitor the compaction task queue and overall progress.
Risk of increasing compaction thread count under high memory usage
When the CN node memory usage is high (for example, above 80%), increasing the compact_threads parameter requires careful assessment:
-
Evaluate resource headroom: Before adjusting
compact_threads, check the current CPU and memory utilization of the CN node. Each additional compaction thread consumes additional memory for reading and merging rowsets. -
Adjust incrementally: Increase the value gradually (for example, from 4 to 8) and monitor the impact on memory usage. Avoid making large jumps in the thread count.
-
OOM risk: If the CN node is already under high memory pressure, increasing compaction threads may trigger an out-of-memory (OOM) error, which causes the node to restart. Only adjust this parameter when the node has sufficient CPU and memory headroom.