Common questions and answers about Data Cache disk usage, eviction behavior, and cross-node variance.
Q1: Why is the disk space occupied by the Data Cache directory, as reported by the du and ls commands, much larger than the actual data size?
Data Cache disk usage reflects the historical high watermark, not the current cache size. For example, if you import 100 GB of data, which grows to 200 GB after compaction and shrinks back to 100 GB after GC, the reported disk usage remains at the 200 GB high watermark, even though only 100 GB is currently cached.
Q2: Does Data Cache evict data automatically?
Data Cache evicts data only when disk usage reaches the configured limit (80% of disk capacity by default). Eviction does not delete files. Instead, it marks the space occupied by old cache entries as overwritable so that new data can be written over it. As a result, disk usage does not decrease after eviction. This is expected behavior and does not affect operations.
Q3: Why doesn't disk usage decrease?
See Q2. Cache eviction does not delete data files; it only marks old entries as overwritable. As a result, disk usage does not decrease. This is expected behavior and does not affect operations.
Q4: Why doesn't cache size decrease after DROP TABLE?
DROP TABLE does not immediately delete the corresponding cached data. The cache's internal LRU logic gradually evicts these entries over time. This does not affect operations.
Q5: Why is actual disk usage higher than the configured limit?
Data Cache strictly controls its own disk usage and does not exceed the configured limit. If total disk usage exceeds the 80% threshold, other factors are likely responsible:
-
Log files generated during runtime.
-
Core files generated from CN crashes.
-
Persistent indexes for primary key tables, stored in the
${storage_root_path}/persist/directory. -
Co-located BE, CN, and FE instances sharing the same disk.
-
External table cache files, stored in the
${STARROCKS_HOME}/block_cache/directory. -
Discrepancies in disk usage reporting caused by uncommon file systems like ext3.
Troubleshooting suggestion: To resolve this, lower the Data Cache disk capacity limit by decreasing the starlet_star_cache_disk_size_percent parameter.
Q6: Why does cache usage vary across nodes?
Data Cache is node-local, so cache usage naturally varies between nodes. This variance is normal as long as it does not affect query latency. Common causes include:
-
The nodes were added to the cluster at different times.
-
The nodes store different numbers of tablets.
-
The tablets on each node contain different amounts of data.
-
The progress of compaction and GC varies by node.
-
A node has experienced a crash or OOM event.