Paimon catalog

更新时间:
复制 MD 格式

A Paimon catalog is an external catalog supported by StarRocks since version 3.1. Use it to query Apache Paimon data directly, create Paimon databases and tables, and write data back to Paimon—all from within StarRocks.

Creating Paimon databases and tables, and inserting data into Paimon tables, require StarRocks 3.2.9 or later.
A Paimon catalog in StarRocks maps one-to-one with a catalog in the native Paimon API. The configuration parameters and their meanings are identical.

Prerequisites

Before you begin, make sure your StarRocks cluster can reach both the storage system and the metadata service of your Paimon cluster. Supported combinations:

Metadata service

Storage backends

Hive Metastore (HMS)

Hadoop Distributed File System (HDFS), Object Storage Service (OSS), OSS-HDFS

File system

HDFS, OSS, OSS-HDFS

Data Lake Formation (DLF) 1.0 (Legacy)

HDFS, OSS, OSS-HDFS

Data Lake Formation (DLF)

HDFS, OSS, OSS-HDFS

If you use DLF, configure a Resource Access Management (RAM) user and use it for all operations in StarRocks Manager. For details, see Use aDLFcatalog.

Create a Paimon catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "paimon",
    CatalogParams,
    StorageCredentialParams
);

Parameters

Common parameters

Parameter

Required

Description

catalog_name

Yes

The name of the Paimon catalog. Must start with a letter and contain only letters (a–z, A–Z), digits (0–9), or underscores (\_). Maximum 64 characters.

comment

No

A description of the Paimon catalog.

type

Yes

The data source type. Set to paimon.

CatalogParams

CatalogParams specifies how StarRocks connects to the Paimon metadata service. Configuration varies by metadata type.

Use HMS

Property

Required

Description

paimon.catalog.type

Yes

Set to hive.

hive.metastore.uris

Yes

The HMS URI. Format: thrift://<HMS IP address>:<port>. Default port: 9083. For high-availability mode, specify multiple addresses separated by commas: "thrift://<ip1>:<port1>,thrift://<ip2>:<port2>".

Use filesystem

Property

Required

Description

paimon.catalog.type

Yes

Set to filesystem.

paimon.catalog.warehouse

Yes

The storage path of the warehouse where Paimon data is stored. Supported formats: oss://<yourBucketName>/<yourPath> for OSS or OSS-HDFS, or an HDFS path. If you use OSS or OSS-HDFS, also configure StorageCredentialParams.

Use DLF 1.0 (Legacy)

Property

Required

Description

paimon.catalog.type

Yes

Set to dlf.

paimon.catalog.warehouse

Yes

The storage path of the warehouse. Format: oss://<yourBucketName>/<yourPath> for OSS or OSS-HDFS. If you use OSS or OSS-HDFS, also configure StorageCredentialParams.

dlf.catalog.id

No

The ID of an existing data catalog in DLF. If omitted, the system uses the default DLF catalog.

Use DLF

Important

If you use DLF, use a configured RAM user to perform operations in StarRocks Manager. For details, see Use a DLF catalog.

Property

Required

Description

type

Yes

Set to paimon.

paimon.catalog.type

Yes

Set to rest.

uri

Yes

The REST API address of DLF. Format: http://<VPC endpoint>. For more information, see Service endpoints. For example, http://cn-hangzhou-vpc.dlf.aliyuncs.com.

paimon.catalog.warehouse

Yes

The name of the Paimon catalog. Find it on the Catalogs page in the Data Lake Formation console.

token.provider

Yes

The REST service provider. Set to dlf.

StorageCredentialParams

StorageCredentialParams specifies how StarRocks accesses the Paimon file storage. This parameter group is only required when using OSS or OSS-HDFS as storage. Skip it if you use HDFS.

"aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>"

Property

Description

aliyun.oss.endpoint

The endpoint of your OSS or OSS-HDFS storage. Find it on the Overview page of your bucket, under the Port section. See OSS regions and endpoints for region-specific values. Examples: oss-cn-hangzhou.aliyuncs.com (OSS), cn-hangzhou.oss-dls.aliyuncs.com (OSS-HDFS).

Important

After setting aliyun.oss.endpoint, go to the Parameter Configuration page in the EMR Serverless StarRocks console and update fs.oss.endpoint in core-site.xml and jindosdk.cfg to the same value.

Examples

Use HMS

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "hive",
    "hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

Use filesystem

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "filesystem",
    "paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
    "aliyun.oss.endpoint" = "cn-hangzhou.oss-dls.aliyuncs.com"
);

Use DLF 1.0 (Legacy)

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "dlf",
    "paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
    "dlf.catalog.id" = "paimon_dlf_test"
);

Use DLF

CREATE EXTERNAL CATALOG dlf_catalog
PROPERTIES
(
    "type" = "paimon",
    "uri" = "http://cn-hangzhou-vpc.dlf.aliyuncs.com",
    "paimon.catalog.type" = "rest",
    "paimon.catalog.warehouse" = "test_paimon",
    "token.provider" = "dlf"
);

View Paimon catalogs

List all catalogs in the current StarRocks cluster:

SHOW CATALOGS;

View the creation statement of a specific catalog:

SHOW CREATE CATALOG paimon_catalog;

Delete a Paimon catalog

DROP CATALOG paimon_catalog;

Create a Paimon database

CREATE DATABASE IF NOT EXISTS <catalog_name>.<database_name>;

Create a Paimon table

Syntax

CREATE TABLE IF NOT EXISTS <catalog_name>.<database_name>.<table_name>
(
   id STRING,
   name STRING,
   day INT
)
PRIMARY KEY (id, day)        -- Omit for append-only tables
PARTITION BY (`day`)         -- Omit for non-partitioned tables
PROPERTIES (                 -- Omit if no properties are needed
  key = value
);

PROPERTIES parameters

Key

Description

file.format

The data storage format. Default: parquet.

bucket

The number of buckets for data distribution and query optimization. Default: -1 (bucketing disabled).

bucket_key

The column used to distribute records into buckets.

Limitations

  • For partitioned tables, all partition key columns must appear at the end of the column definition.

  • For partitioned tables, all partition key columns must be included in the primary key definition.

  • bucket_key cannot be part of a partition key or primary key.

Example

CREATE TABLE dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics(
    final_gender_code STRING COMMENT 'gender',
    age_level STRING COMMENT 'age_level',
    pvalue_level STRING COMMENT 'consumption_level',
    clicks INT COMMENT 'clicks',
    total_behaviors INT COMMENT 'total_behaviors'
);

View a Paimon table schema

View the table schema:

DESC[RIBE] <catalog_name>.<database_name>.<table_name>;

View the schema and file storage location from the CREATE TABLE statement:

SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;

Query data in a Paimon table

Switch to the target catalog and database, then run a SELECT statement:

-- Step 1: List databases in the catalog
SHOW DATABASES FROM <catalog_name>;

-- Step 2: Switch to the catalog and database
SET CATALOG <catalog_name>;
USE <db_name>;

-- Or switch directly to a specific database within the catalog
USE <catalog_name>.<db_name>;

-- Step 3: Query the table
SELECT count(*) FROM <table_name> LIMIT 10;

Alternatively, specify the full path in the SELECT statement without switching context:

SELECT * FROM <catalog_name>.<database_name>.<table_name>;

Write data to Paimon

Usage notes

StarRocks writes to Paimon tables in batches (Batch Write). Due to Paimon's limitations, you cannot write to tables whose bucket mode is HASH_DYNAMIC or CROSS_PARTITION. This includes primary key tables where the bucket property is not set—the default value is -1 (Dynamic Bucket Mode), which prevents write operations.

The Paimon software development kit (SDK) requires that partition key or primary key columns in the data being written are not null. StarRocks does not precheck data before writing, so an exception is thrown if a null value is encountered.

Write to a Paimon table

INSERT INTO <catalog_name>.<database_name>.<table_name> (column1, column2, ...) VALUES (value1, value2, ...);

Example—insert rows into ads_age_pvalue_analytics:

INSERT INTO dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics (final_gender_code, age_level, pvalue_level, clicks, total_behaviors)
VALUES
('M', '18-24', 'Low', 1500, 2500),
('F', '25-34', 'Medium', 2200, 3300),
('M', '35-44', 'High', 2800, 4000);

Import Paimon data into an internal table

Transform and load data from a Paimon table into a StarRocks OLAP table:

INSERT INTO default_catalog.olap_db.olap_tbl SELECT * FROM <paimon_catalog>.<db_name>.<table_name>;

What's next