Hot and cold data separation

更新时间:
复制 MD 格式

Lindorm supports hot and cold data separation, which classifies data by access frequency and stores it on different media. Infrequently accessed cold data is stored in cost-effective Capacity storage, while hot data remains on higher-performance storage, reducing overall storage costs.

Background information

In big data scenarios, a table may store large amounts of historical data, such as orders and monitoring data. As this data ages, it is rarely accessed, and its storage cost becomes a challenge. Lindorm addresses this by separating hot and cold data across different storage media. Cold data is stored in Capacity storage, while hot data is stored in Standard, Performance, local SSD, or local HDD storage. The unit price of Capacity storage is 80% lower than Standard storage, significantly reducing cold data storage costs.

How it works

Lindorm stores hot and cold data in the same table but on separate storage media. Data is routed to hot or cold storage based on the timestamps or custom time columns and the hot-cold data boundary specified for the table. New data is written to hot storage first and transferred to cold storage after its age exceeds the boundary.

You can access a table with hot and cold data separation enabled in the same way as a normal table. When querying such a table, you can specify hints or a time range to query only hot data.

Lindorm separately stores hot data and cold data based on custom time columns or timestamps.

  • Hot and cold data separation based on custom time columns: You can configure a custom time column and specify a hot-cold data boundary for the table. Lindorm routes data to cold or hot storage based on the custom time column value and the boundary. Rows without a value in the custom time column are stored in hot storage. For more information, see Separate hot and cold data based on a custom time column.

  • Hot and cold data separation based on timestamps: You can specify a timestamp when writing data to a table. Lindorm routes data to cold or hot storage based on the timestamp and the hot-cold data boundary. If no timestamp is specified, the write time is used to determine whether to archive data to cold storage. For more information, see Separately store hot data and cold data based on timestamps.

Limits

  • Hot and cold data separation based on custom time columns: Only tables created with SQL statements are supported. Tables created with an HBase shell or API are not supported. This method is recommended for SQL-created tables.

  • Hot and cold data separation based on timestamps: Tables created with SQL statements, HBase shells, and HBase APIs are supported. This method is recommended when custom time columns cannot be configured, such as for tables created with an HBase shell or API.

Usage notes

  • Capacity storage has limited IOPS and is best suited for infrequently queried data.

  • The write throughput of Capacity storage is close to that of standard storage.

  • Capacity storage is not suitable for high-concurrency read workloads. Errors may occur under heavy concurrent read load.

  • If you have a large Capacity storage allocation, you can adjust the read IOPS to match your business requirements. For more information, contact technical support.

  • We recommend storing no more than 30 TB of cold data per node. To increase this limit, contact technical support.

  • When Capacity storage utilization exceeds 95%, no more data can be written to it. Monitor your instance's Capacity storage utilization. For more information, see View the size of cold storage.

For more information about the read performance of Capacity storage, see Capacity storage read throttling.