Compaction Service for EMR Serverless StarRocks: features, setup, auto scaling configuration, and best practices.-E-MapReduce(EMR)-阿里云帮助中心

Overview

Core benefits

Capability	Description
Workload isolation	Running compaction tasks on a dedicated service prevents resource contention with business workloads such as queries and data ingestion. This ensures application stability.
Auto scaling	Scales automatically based on the compaction workload. Configure Min CU and Max CU to balance compaction timeliness and cost.
Out-of-the-box	Automatically created for clusters running version 3.5. Enable it with a single click in the console—no additional resource purchase or configuration required.

Performance optimizations

Beyond workload isolation, the Compaction Service provides the following performance optimizations:

Peer cache reads: During a compaction task, the service directly pulls data from cached nodes in the primary compute group (peer cache). This avoids accessing object storage (remote I/O) and significantly improves read performance for compaction.
Cache push: After compaction is complete, the Compaction Service asynchronously pushes the merged data files to the nodes in the primary compute group. This prevents query performance from degrading due to cache misses that would otherwise require accessing object storage.

Prerequisites

Your EMR Serverless StarRocks cluster is version 3.5 or later.
Your cluster uses the storage-compute separation architecture.

Enable the Compaction Service

To enable the Compaction Service, follow these steps in the EMR Serverless StarRocks console:

Go to the E-MapReduce Serverless StarRocks instance list page.
1. Log on to the E-MapReduce console.
2. In the navigation pane on the left, choose EMR Serverless > StarRocks.
3. In the top menu bar, select the required region.
Click the ID of the target instance.
Click the Compaction Service tab. On the Basic Information page, click Start the service.
In the panel that appears, configure Minimum CU and Maximum CU.
Click Start the service.

Stop the Compaction Service

In the console, click Disable the service. In the confirmation panel, select the Risk Confirmation checkbox, and then click Confirm service shutdown. After the service is stopped, compaction tasks revert to running on the primary compute group. In-progress tasks will complete normally.

Auto scaling

CU configuration

Configure the auto scaling range for the Compaction Service:

Parameter	Description	Recommendation
Min CU	The minimum number of compute units. The service scales in to this value during idle periods.	Set this to the minimum value required to meet baseline compaction needs.
Max CU	The maximum number of compute units. The service scales out to this value during peak loads.	Set this based on your peak write throughput and compaction score.

Scaling policy

The Compaction Service scales automatically based on the following metrics:

compaction score: Reflects the accumulation of data versions. A higher score indicates greater compaction pressure.
Task load: The ratio of current compaction tasks to available resources.

The system automatically scales out when the compaction score consistently rises or when tasks are queued. After the load decreases, the system gradually scales in to the configured Min CU.

Best practices

Use the Compaction Service in the following scenarios:

High write-throughput workloads: Continuous, high-frequency writes cause the compaction score to rise, degrading query performance.
Query-sensitive workloads: The service prevents compaction from competing for query resources, which is ideal for latency-sensitive applications.
Cost-optimization scenarios: Auto scaling uses on-demand resources for compaction, reducing standing costs.

Usage notes

The Compaction Service is only available for clusters that use the storage-compute separation (shared-data) architecture.
After you enable the Compaction Service, compaction tasks for all tables run on the service.
The compute unit (CU) resources for the Compaction Service are billed separately based on usage. Configure Min CU and Max CU appropriately.
If you stop the Compaction Service, compaction tasks automatically revert to running on their respective primary compute groups. In-progress tasks will complete normally.
To minimize potential system impact, enable the Compaction Service for the first time during off-peak hours.

Troubleshooting and tuning

Troubleshoot common compaction performance issues—slow compaction, small file accumulation, and compaction alerts—with the following diagnostic steps and parameter tuning recommendations.

Compaction takes too long or newly added nodes do not participate in tasks

If compaction tasks take longer than expected or newly added CN (Compute Node) nodes do not participate in compaction, follow these steps to diagnose and resolve the issue.

Diagnose the issue

Run the following SQL statement to check the status of all CN nodes. Verify that newly added nodes are in the Alive state and belong to the expected compute group.
```
SHOW COMPUTE NODES;
```
Run the following SQL statement to view the current compaction task status, including the number of running and pending tasks.
```
SHOW PROC '/compactions';
```
Query the information_schema.be_cloud_native_compactions view for detailed compaction task information, including the specific tablet, transaction, and progress of each task.
```
SELECT * FROM information_schema.be_cloud_native_compactions;
```

Tune parameters

Adjust the following BE configuration parameters to improve compaction concurrency and efficiency:

Parameter	Recommended value	Description
`compact_threads`	8	Number of concurrent compaction threads. Increasing this value allows more compaction tasks to run in parallel.
`max_cumulative_compaction_num_singleton_deltas`	100	Maximum number of rowsets that a single cumulative compaction can merge. Lowering this value causes more frequent but faster compaction cycles.

Too many small files causing merge failures or out-of-memory errors

When a table accumulates too many small files (rowsets), compaction may fail due to excessive memory consumption, or the CN node may run out of memory (OOM). Use the following approach.

Immediate actions

Scale out CN nodes: Add elastic CN nodes to distribute the compaction workload and reduce memory pressure on individual nodes.
Reduce ingestion frequency: Temporarily lower the frequency of data ingestion or pause ingestion to allow compaction to catch up with the backlog.

Tune parameters for faster merging: Adjust the following BE configuration parameters to accelerate compaction:

Parameter	Recommended value	Description
`compact_threads`	8	Increase compaction parallelism.
`max_cumulative_compaction_num_singleton_deltas`	300	Allow merging more rowsets per cumulative compaction cycle.
`lake_pk_compaction_max_input_rowsets`	300	Increase the maximum number of input rowsets for Primary Key table compaction.

Long-term optimizations

Adjust the bucketing strategy to ensure that data is evenly distributed across buckets. Keep the data volume per bucket under 5 GB.
Review and optimize the ingestion pipeline to reduce write frequency and use larger batch sizes.

Compaction alert handling

When you receive compaction alerts (such as high compaction score or task backlog warnings), take the following steps:

Switch to batch ingestion: Adjust your data ingestion tasks to batch mode to reduce write frequency. This gives compaction more time to process pending tasks.
Increase base compaction check interval: Set base_compaction_check_interval_seconds from the default value of 60 seconds to 600 seconds. This reduces the frequency of base compaction checks and allows the system to prioritize cumulative compaction.
```
-- Adjust via BE configuration
base_compaction_check_interval_seconds = 600
```
Optimize bucket count: For small tables, use 8 to 10 buckets. Over-bucketing creates more tablets, which increases compaction overhead.
Increase cumulative merge limit: Increase max_cumulative_compaction_num_singleton_deltas to allow more rowsets to be merged per cumulative compaction cycle, accelerating the processing of small version files.
Monitor compaction status: Run SHOW PROC '/compactions'; regularly to monitor the compaction task queue and overall progress.

Risk of increasing compaction thread count under high memory usage

When the CN node memory usage is high (for example, above 80%), increasing the compact_threads parameter requires careful assessment:

Evaluate resource headroom: Before adjusting compact_threads, check the current CPU and memory utilization of the CN node. Each additional compaction thread consumes additional memory for reading and merging rowsets.
Adjust incrementally: Increase the value gradually (for example, from 4 to 8) and monitor the impact on memory usage. Avoid making large jumps in the thread count.
OOM risk: If the CN node is already under high memory pressure, increasing compaction threads may trigger an out-of-memory (OOM) error, which causes the node to restart. Only adjust this parameter when the node has sufficient CPU and memory headroom.