Delta Lake Catalog is a type of External Catalog that lets StarRocks query and import data from Delta Lake without copying it into StarRocks first. Supported starting from StarRocks 2.5.
With a Delta Lake Catalog, you can:
Query Delta Lake tables directly using standard SQL, without data import.
Transform Delta Lake data and load it into StarRocks using INSERT INTO.
Prerequisites
Before you begin, make sure the StarRocks cluster can reach both the storage system and the metadata service of your Delta Lake cluster.
Storage systems
| Storage system | Description |
|---|---|
| HDFS | Hadoop Distributed File System |
| OSS | Alibaba Cloud Object Storage Service |
Metadata services
| Metadata service | Description |
|---|---|
| Hive Metastore (HMS) | Default port: 9083 (Thrift protocol) |
| Data Lake Formation (DLF) | Uses the default DLF Catalog unless a catalog ID is specified |
Limits
StarRocks only supports the Parquet file format when querying Delta Lake data.
Supported Parquet compression formats: SNAPPY, LZ4, ZSTD, GZIP, and NO_COMPRESSION.
MAP and STRUCT data types are not supported.
Create a Delta Lake catalog
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "deltalake",
MetastoreParams
)Parameters
| Parameter | Required | Description |
|---|---|---|
catalog_name | Yes | Name of the Delta Lake Catalog. Must be 1–64 characters, start with a letter, and contain only letters, digits, and underscores (_). Case-sensitive. |
comment | No | Description of the Delta Lake Catalog. |
type | Yes | Type of the data source. Set to deltalake. |
MetastoreParams | Yes | Parameters for StarRocks to access the metadata service. Configuration varies by metadata service type. See below. |
MetastoreParams — Hive Metastore
| Parameter | Required | Description |
|---|---|---|
hive.metastore.type | Yes | Type of the metadata service. Set to hive. |
hive.metastore.uris | Yes | URI of the Hive Metastore service. Format: thrift://<metastore_IP_address>:<port>. Default port: 9083. |
MetastoreParams — Data Lake Formation (DLF)
| Parameter | Required | Description |
|---|---|---|
hive.metastore.type | Yes | Type of the metadata service. Set to dlf. |
dlf.catalog.id | No | ID of an existing data catalog in DLF. If not set, the system uses the default DLF Catalog. |
Example
The following example creates a Delta Lake Catalog named deltalake_catalog_hms using Hive Metastore.
CREATE EXTERNAL CATALOG deltalake_catalog_hms
PROPERTIES
(
"type" = "deltalake",
"hive.metastore.type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);View Delta Lake catalogs
Run SHOW CATALOGS to list all catalogs in the current StarRocks instance:
SHOW CATALOGS;To view the creation statement of a specific catalog, use SHOW CREATE CATALOG. For example:
SHOW CREATE CATALOG deltalake_catalog_hms;Switch to a Delta Lake catalog
Use either of the following methods:
Set the catalog for the current session with
SET CATALOG, then specify the database withUSE:SET CATALOG <catalog_name>; USE <db_name>;Switch directly to a database under the target catalog:
USE <catalog_name>.<db_name>;
View table schema
Query the table schema:
DESC[RIBE] <catalog_name>.<database_name>.<table_name>;View the table structure and file storage location:
SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;
Query Delta Lake data
List the databases in the Delta Lake cluster associated with the catalog:
SHOW DATABASES FROM <catalog_name>;Switch to the target catalog and database.
Query the target table:
SELECT count(*) FROM <table_name> LIMIT 10;
Import Delta Lake data into StarRocks
Use INSERT INTO to transform Delta Lake data and write it to a StarRocks internal table. The following example writes all rows from deltalake_table into the StarRocks table olap_tbl in the default_catalog:
INSERT INTO default_catalog.olap_db.olap_tbl SELECT * FROM deltalake_table;Delete a Delta Lake catalog
Use DROP CATALOG to delete an External Catalog. For example:
DROP CATALOG deltalake_catalog_hms;