Data Cache for storage-compute separation

更新时间:
复制 MD 格式

Data Cache improves query performance in storage-compute separation clusters by caching hot data from remote storage to local CN nodes.

How it works

Starting from StarRocks v3.1.7 and v3.2.3, Data Cache is enabled by default in storage-compute separation clusters. It loads data from remote storage to the local cache on demand in megabyte-sized blocks, replacing the File Cache feature from earlier versions.

Note

Starting from v3.4.0, queries on internal tables and data lakes in a storage-compute separation environment share the same Data Cache instance.

Limitations

  • This feature applies only to Serverless StarRocks instances with storage-compute separation.

  • Supported and enabled by default starting from StarRocks v3.1.7 and v3.2.3.

Configure Data Cache

Configure Data Cache with the following CN configuration items.

Disk cache capacity

The larger of the datacache_disk_size and starlet_star_cache_disk_size_percent parameters determines the disk cache capacity for a storage-compute separation cluster.

Parameter

Description

datacache_disk_size

The maximum amount of data that can be cached on a single disk. Set this as a percentage (for example, 80%) or an absolute size (for example, 2 TB or 500 GB). For instance, if you have two cache disks and set this parameter to 20 GB, the total cache capacity is 40 GB. The default value is 0, which means only memory is used for caching.

starlet_star_cache_disk_size_percent

The percentage of disk capacity that Data Cache can use. The default value is 80 (%).

Table-level cache configuration

  • Disable caching for a specific table: Set the datacache.enable property of an internal table to false. This prevents the table from using Data Cache.

  • Limit the time range for cached data: Use the datacache.partition_duration property to define a retention period for hot data. StarRocks does not cache data outside this time range. Supported time units are YEAR, MONTH, DAY, and HOUR. Examples include 7 DAY and 12 HOUR. If you do not specify this property, StarRocks treats all data as hot data, making it eligible for caching.

    Note

    This property applies only when datacache.enable is set to true.

Check the Data Cache status

  • Run the following SQL statement to view the disk usage limit for Data Cache:

    SELECT * FROM information_schema.be_configs
    WHERE NAME LIKE '%starlet_star_cache_disk_size_percent%'
       OR NAME LIKE '%datacache_disk_size%';
  • In the Alibaba Cloud console, navigate to Business Insights > Cache Insights to view the cache hit rate for an instance or compute group.

Disable Data Cache

To disable Data Cache, run the following SQL statement:

SET [GLOBAL] skip_local_disk_cache = true;
Note

To disable caching for a specific table only, configure the table-level datacache.enable property instead of disabling Data Cache globally.