Delta Lake Catalog

更新时间:
复制 MD 格式

Delta Lake Catalog is a type of External Catalog that lets StarRocks query and import data from Delta Lake without copying it into StarRocks first. Supported starting from StarRocks 2.5.

With a Delta Lake Catalog, you can:

  • Query Delta Lake tables directly using standard SQL, without data import.

  • Transform Delta Lake data and load it into StarRocks using INSERT INTO.

Prerequisites

Before you begin, make sure the StarRocks cluster can reach both the storage system and the metadata service of your Delta Lake cluster.

Storage systems

Storage systemDescription
HDFSHadoop Distributed File System
OSSAlibaba Cloud Object Storage Service

Metadata services

Metadata serviceDescription
Hive Metastore (HMS)Default port: 9083 (Thrift protocol)
Data Lake Formation (DLF)Uses the default DLF Catalog unless a catalog ID is specified

Limits

  • StarRocks only supports the Parquet file format when querying Delta Lake data.

  • Supported Parquet compression formats: SNAPPY, LZ4, ZSTD, GZIP, and NO_COMPRESSION.

  • MAP and STRUCT data types are not supported.

Create a Delta Lake catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "deltalake",
    MetastoreParams
)

Parameters

ParameterRequiredDescription
catalog_nameYesName of the Delta Lake Catalog. Must be 1–64 characters, start with a letter, and contain only letters, digits, and underscores (_). Case-sensitive.
commentNoDescription of the Delta Lake Catalog.
typeYesType of the data source. Set to deltalake.
MetastoreParamsYesParameters for StarRocks to access the metadata service. Configuration varies by metadata service type. See below.

MetastoreParams — Hive Metastore

ParameterRequiredDescription
hive.metastore.typeYesType of the metadata service. Set to hive.
hive.metastore.urisYesURI of the Hive Metastore service. Format: thrift://<metastore_IP_address>:<port>. Default port: 9083.

MetastoreParams — Data Lake Formation (DLF)

ParameterRequiredDescription
hive.metastore.typeYesType of the metadata service. Set to dlf.
dlf.catalog.idNoID of an existing data catalog in DLF. If not set, the system uses the default DLF Catalog.

Example

The following example creates a Delta Lake Catalog named deltalake_catalog_hms using Hive Metastore.

CREATE EXTERNAL CATALOG deltalake_catalog_hms
PROPERTIES
(
    "type" = "deltalake",
    "hive.metastore.type" = "hive",
    "hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

View Delta Lake catalogs

Run SHOW CATALOGS to list all catalogs in the current StarRocks instance:

SHOW CATALOGS;

To view the creation statement of a specific catalog, use SHOW CREATE CATALOG. For example:

SHOW CREATE CATALOG deltalake_catalog_hms;

Switch to a Delta Lake catalog

Use either of the following methods:

  • Set the catalog for the current session with SET CATALOG, then specify the database with USE:

    SET CATALOG <catalog_name>;
    USE <db_name>;
  • Switch directly to a database under the target catalog:

    USE <catalog_name>.<db_name>;

View table schema

  • Query the table schema:

    DESC[RIBE] <catalog_name>.<database_name>.<table_name>;
  • View the table structure and file storage location:

    SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;

Query Delta Lake data

  1. List the databases in the Delta Lake cluster associated with the catalog:

    SHOW DATABASES FROM <catalog_name>;
  2. Switch to the target catalog and database.

  3. Query the target table:

    SELECT count(*) FROM <table_name> LIMIT 10;

Import Delta Lake data into StarRocks

Use INSERT INTO to transform Delta Lake data and write it to a StarRocks internal table. The following example writes all rows from deltalake_table into the StarRocks table olap_tbl in the default_catalog:

INSERT INTO default_catalog.olap_db.olap_tbl SELECT * FROM deltalake_table;

Delete a Delta Lake catalog

Use DROP CATALOG to delete an External Catalog. For example:

DROP CATALOG deltalake_catalog_hms;

References