StarRocks provides multiple cache types to reduce remote storage access and improve query performance. Select the caching solution that best fits your instance type and workload.
Features
StarRocks provides multiple caching mechanisms that improve query performance by caching hot data in the memory or disk of local BE and CN nodes, reducing repeated access to remote storage such as HDFS and object storage.
Cache types
|
Cache type |
Use cases |
Default state |
Available since |
|
shared-data Data Cache |
Accelerates queries on internal tables in shared-data (serverless) instances. |
Enabled by default |
v3.1.7 / v3.2.3 |
|
data lake Data Cache |
Accelerates queries on external tables from an External Catalog (such as Hive, Iceberg, and Hudi). |
Enabled by default since v3.3.0 |
v2.5 |
|
Index Cache |
Caches indexes for shared-data instances, ideal for scenarios where disk capacity is insufficient to cache the full dataset. |
Enabled by default |
v3.3.13 |
Since v3.4.0, queries on internal tables in shared-data instances and queries on data lakes share the same Data Cache instance, eliminating the need for separate configurations.
Recommendations
-
Shared-data instance: Use the shared-data Data Cache. It automatically loads data on demand from remote storage to the local cache, requiring no extra configuration.
-
Data lake external tables: Use the data lake Data Cache. It supports caching remote files in formats like Parquet and ORC, making it ideal for repeated scans of large tables in ad-hoc analytics and report queries.
-
Insufficient disk capacity for the full dataset: Enable the Index Cache. It caches only indexes, significantly improving query performance with low disk overhead.
-
Preloading hot data: Use Data Cache preheating (
CACHE SELECT) to load specific data into the cache in advance, to avoid the performance impact of a cold start.