External Catalog

更新时间:
复制 MD 格式

External Catalogs let AnalyticDB for MySQL query external data sources directly with zero ETL—no data import or migration is required. You can plug in Apache Hive, Apache Iceberg, Delta Lake, and Apache Paimon data lakes, plus any MySQL-compatible database (RDS for MySQL, StarRocks, or another AnalyticDB for MySQL instance). The result is transparent cross-source queries, lakehouse analytics, and on-demand data ingestion.

Supported catalog types

Type

Source

Cluster version requirement

Metastore

Supported operations

Paimon Catalog

Apache Paimon

3.2.7

  • FileSystem

  • HMS

  • DLF 2.5

Query, write, DDL (from 3.2.8)

Hive Catalog

Apache Hive

3.2.8

HMS

Query, write, DDL

Iceberg Catalog

Apache Iceberg

3.2.8

HMS

Query, write, DDL

Delta Catalog

Delta Lake

3.2.8

HMS

Query only

MySQL Catalog

RDS for MySQL, AnalyticDB for MySQL, StarRocks, or any MySQL-compatible database

3.2.8

N/A (direct JDBC)

Query only

Prerequisites

  • An AnalyticDB for MySQL cluster is running on a kernel version that meets the cluster version requirement of the catalog type you plan to use.

  • ENI network access is enabled.

  • The OSS bucket is in the same region as the cluster (VPCs in the same region can access OSS over the internal network by default).

  • When using HMS: the cluster and HMS are in the same VPC, and the HMS security group allows traffic from the cluster's ENI CIDR range.

  • When using DLF 2.5: the cluster's VPC is on the DLF allowlist, and the service-linked role AliyunServiceRoleForAnalyticDBForMySQL has been granted permissions on the target Catalog in the DLF console under Data Permissions (select the preset permission type Data Editor).

  • When using MySQL Catalog: the target database's security group or allowlist permits traffic from the cluster's ENI CIDR range.

Important notes

CREATE EXTERNAL CATALOG validates only SQL syntax and parameter format. It does not verify connectivity to the underlying source. Even if the OSS path is missing or the Metastore endpoint is wrong, the statement still returns success—errors surface only when you query through the catalog. After creation, immediately run SHOW DATABASES FROM <catalog_name> to confirm connectivity.

Hive Catalog

Hive Catalog manages and queries Apache Hive data. Available since AnalyticDB for MySQL 3.2.8; HMS only.

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type" = "hive",
    "metastore-uri" = "<hms_thrift_uri>"
);

Property

Required

Description

type

Yes

Catalog type. Must be hive.

metastore-uri

Yes

The Hive Metastore Thrift endpoint, in the form thrift://<host>:<port>.

Example

CREATE EXTERNAL CATALOG hive_hms
PROPERTIES
(
    "type" = "hive",
    "metastore-uri" = "thrift://hms-host:9083"
);

DDL and DML

From AnalyticDB for MySQL 3.2.8 onward, Hive Catalog supports the following:

-- Create database
CREATE DATABASE IF NOT EXISTS <catalog_name>.<db_name>;

-- Create table (PARQUET only)
CREATE TABLE IF NOT EXISTS <catalog_name>.<db_name>.<table_name> (
    <col1> <data_type>,
    <col2> <data_type>
)
STORED AS PARQUET;

-- Insert
INSERT INTO <catalog_name>.<db_name>.<table_name> VALUES (...);

-- Bulk insert
INSERT INTO <catalog_name>.<db_name>.<table_name>
SELECT * FROM <source_catalog>.<source_db>.<source_table>;

-- CTAS
CREATE TABLE <catalog_name>.<db_name>.<new_table>
AS SELECT * FROM <source_catalog>.<source_db>.<source_table>;

-- Drop
DROP TABLE IF EXISTS <catalog_name>.<db_name>.<table_name>;
DROP DATABASE IF EXISTS <catalog_name>.<db_name>;

Iceberg Catalog

Iceberg Catalog manages and queries Apache Iceberg data. Available since AnalyticDB for MySQL 3.2.8; HMS only.

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type" = "iceberg",
    "metastore-uri" = "thrift://<host>:<port>"
);

Property semantics are the same as Hive Catalog; just change type to iceberg.

Example

CREATE EXTERNAL CATALOG iceberg_hms
PROPERTIES
(
    "type" = "iceberg",
    "metastore-uri" = "thrift://hms-host:9083"
);

DDL and DML

From 3.2.8 onward, the supported DDL and DML are identical to Hive Catalog. Use STORED AS ICEBERG when creating a table:

CREATE TABLE IF NOT EXISTS <catalog_name>.<db_name>.<table_name> (
    <col1> <data_type>,
    <col2> <data_type>
)
STORED AS ICEBERG;

Delta Catalog

Delta Catalog queries Delta Lake-formatted data. Available since AnalyticDB for MySQL 3.2.8; HMS only; query only—writes and DDL are not supported.

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type" = "delta",
    "metastore-uri" = "thrift://<host>:<port>"
);

Property semantics are the same as Hive Catalog; just change type to delta.

Example

CREATE EXTERNAL CATALOG delta_hms
PROPERTIES
(
    "type" = "delta",
    "metastore-uri" = "thrift://hms-host:9083"
);

Paimon Catalog

Paimon Catalog manages and queries Apache Paimon data. Available since AnalyticDB for MySQL 3.2.7. Compared with the other types, Paimon Catalog supports more metastores: FileSystem, HMS, and DLF 2.5. Write and DDL operations are supported from 3.2.8.

Supported metastores

Metastore

Description

FileSystem

Paimon writes metadata directly into the warehouse directory. No separate metastore is required. Best for lightweight setups.

DLF 2.5

Alibaba Cloud Data Lake Formation (DLF 2.5) as a unified metadata service. Recommended for production.

HMS

Hive Metastore. Best for environments with an existing Hadoop ecosystem.

FileSystem (no separate metastore)

Paimon writes its metadata directly into the warehouse directory. Best for lightweight setups.

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "paimon",
    "paimon.catalog.metastore" = "filesystem",
    "paimon.catalog.warehouse" = "oss://<bucket>/<path>/",
    -- Optional tuning properties
    "paimon.enable_raw_delete_vector" = "true",
    "paimon.enable_metadata_cache" = "true",
    "paimon.enable_bucket_join" = "true",
    "paimon.meta_cache_expired_time" = "1800"
);

-- Example
CREATE EXTERNAL CATALOG paimon_oss
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "paimon",
    "paimon.catalog.metastore" = "filesystem",
    "paimon.catalog.warehouse" = "oss://my-bucket/paimon/warehouse/"
);

DLF 2.5 (recommended for production)

Use Alibaba Cloud Data Lake Formation (DLF 2.5) as the unified metastore.

Important

DLF 1.0 is not supported. Use DLF 2.5 or later.

Prerequisites:

  1. DLF 2.5 is enabled and a target Catalog has been created in the DLF console under Data Catalogs. Its name is what you fill into paimon.catalog.warehouse.

  2. In the DLF console under Data Permissions, grant the AnalyticDB for MySQL service-linked role AliyunServiceRoleForAnalyticDBForMySQL permissions on the target Catalog: select the preset permission type Data Editor.

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "paimon",
    "paimon.catalog.metastore" = "rest",
    "paimon.catalog.warehouse" = "<dlf_catalog_name>",
    "paimon.catalog.uri" = "<dlf_endpoint>",
    "paimon.catalog.token.provider" = "dlf",
    -- The following properties are required only outside the managed-credential scenario
    "paimon.catalog.dlf.access-key-id" = "<access_key_id>",
    "paimon.catalog.dlf.access-key-secret" = "<access_key_secret>"
);

Properties

Property

Required

Description

type

Yes

Catalog type. Must be paimon.

paimon.catalog.type

Yes

Paimon Catalog engine type. Must be paimon.

paimon.catalog.metastore

Yes

Metastore type. Set to rest for DLF 2.5.

paimon.catalog.warehouse

Yes

The DLF Catalog name (not an OSS path).

paimon.catalog.uri

Yes

The VPC-internal endpoint of the DLF service. Do not include a protocol prefix. Format: <region>-vpc.dlf.aliyuncs.com, for example cn-hangzhou-vpc.dlf.aliyuncs.com.

paimon.catalog.token.provider

Yes

Token provider. Must be dlf.

paimon.catalog.dlf.access-key-id

No

Alibaba Cloud AccessKey ID. Not required when the AnalyticDB for MySQL cluster uses managed credentials.

paimon.catalog.dlf.access-key-secret

No

Alibaba Cloud AccessKey Secret. Not required when the AnalyticDB for MySQL cluster uses managed credentials.

Examples

With managed credentials:

CREATE EXTERNAL CATALOG paimon_catalog_dlf
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "paimon",
    "paimon.catalog.metastore" = "rest",
    "paimon.catalog.warehouse" = "my_paimon_catalog",
    "paimon.catalog.uri" = "cn-hangzhou-vpc.dlf.aliyuncs.com",
    "paimon.catalog.token.provider" = "dlf"
);

With an explicit AccessKey pair:

CREATE EXTERNAL CATALOG paimon_catalog_dlf
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "paimon",
    "paimon.catalog.metastore" = "rest",
    "paimon.catalog.warehouse" = "my_paimon_catalog",
    "paimon.catalog.uri" = "cn-hangzhou-vpc.dlf.aliyuncs.com",
    "paimon.catalog.token.provider" = "dlf",
    "paimon.catalog.dlf.access-key-id" = "<your_access_key_id>",
    "paimon.catalog.dlf.access-key-secret" = "<your_access_key_secret>"
);

HMS (existing Hadoop ecosystem)

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "paimon",
    "paimon.catalog.metastore" = "hive",
    "paimon.catalog.metastore.thrift.uri.selection" = "SEQUENTIAL",
    "paimon.catalog.warehouse" = "oss://<bucket>/<path>/",
    "paimon.catalog.uri" = "thrift://<hms_host>:<port>"
);
Warning

paimon.catalog.metastore.thrift.uri.selection must be set to SEQUENTIAL in the current release. Setting it to Random can cause HMS connection failures.

Key properties

Property

Required

Description

type

Yes

Must be paimon.

paimon.catalog.metastore

Yes

Metastore type: filesystem, hive, or rest (DLF 2.5).

paimon.catalog.warehouse

Yes

FileSystem and HMS modes: the OSS path of the Paimon warehouse, in the form oss://<bucket>/<path>/. DLF mode: the DLF Catalog name.

paimon.enable_metadata_cache

No

Whether to enable metadata caching. Default false. Enabling cuts metadata round-trips to OSS.

paimon.meta_cache_expired_time

No

Cache TTL in seconds. Default 1800. Effective only when paimon.enable_metadata_cache=true.

paimon.enable_bucket_join

No

Whether to enable bucket-join optimization. Default false. Speeds up joins on bucket-partitioned Paimon tables.

paimon.enable_raw_delete_vector

No

Whether to enable raw delete-vector optimization. Default false. Improves query performance on tables with deletes.

DDL and DML

From 3.2.8 onward, Paimon Catalog supports the operations below. Table properties such as primary-key, partition, and bucket go inside TBLPROPERTIES. Paimon supports several table types: Append Only, primary-key (MOR/COW/MOW), aggregation, and First Row.

CREATE TABLE IF NOT EXISTS <catalog_name>.<db_name>.<table_name> (
    order_key   BIGINT,
    order_status VARCHAR,
    total_price  DOUBLE,
    order_date   DATE
)
TBLPROPERTIES (
    'file.format'  = 'PARQUET',
    'partition'    = 'order_date',
    'bucket'       = '2',
    'bucket-key'   = 'order_key'
);

MySQL Catalog

MySQL Catalog talks to MySQL-compatible external databases (RDS for MySQL, another AnalyticDB for MySQL instance, StarRocks) over JDBC. Available since AnalyticDB for MySQL 3.2.8; no metastore is required; query only.

Supported data sources

Data source

Description

RDS for MySQL

Alibaba Cloud ApsaraDB RDS for MySQL. Connects through the standard MySQL port (default 3306).

AnalyticDB for MySQL

Another AnalyticDB for MySQL instance. Connects through the standard MySQL port (default 3306).

StarRocks

Connects to the StarRocks FE node over the MySQL protocol (default port 9030). The MariaDB driver must be enabled.

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
    "type"                = "mysql",
    "connection-url"      = "jdbc:mysql://<host>:<port>",
    "connection-user"     = "<username>",
    "connection-password" = "<password>"
    -- For StarRocks, also set:
    -- , "mysql.use-mariadb-driver" = "true"
);

Property

Required

Description

type

Yes

Must be mysql.

connection-url

Yes

JDBC URL of the target database. Must start with jdbc:mysql://.

connection-user / connection-password

Yes

Credentials for the target database.

mysql.use-mariadb-driver

No

Whether to use the MariaDB JDBC driver. Default false. Set to true when connecting to StarRocks. StarRocks's MySQL protocol is built on MariaDB, so the default MySQL JDBC driver may hit compatibility issues.

Examples

-- Connect to RDS for MySQL
CREATE EXTERNAL CATALOG rds_mysql_catalog
PROPERTIES
(
    "type"                = "mysql",
    "connection-url"      = "jdbc:mysql://rm-bp1xxxxxxx.mysql.rds.aliyuncs.com:3306",
    "connection-user"     = "<your_username>",
    "connection-password" = "<your_password>"
);

-- Connect to another AnalyticDB for MySQL instance
CREATE EXTERNAL CATALOG adb_mysql_catalog
PROPERTIES
(
    "type"                = "mysql",
    "connection-url"      = "jdbc:mysql://am-bp1xxxxxxx.ads.aliyuncs.com:3306",
    "connection-user"     = "<your_username>",
    "connection-password" = "<your_password>"
);

-- Connect to StarRocks (FE port defaults to 9030; MariaDB driver is required)
CREATE EXTERNAL CATALOG sr_catalog
PROPERTIES
(
    "type"                     = "mysql",
    "connection-url"           = "jdbc:mysql://fe-host:9030",
    "connection-user"          = "<your_username>",
    "connection-password"      = "<your_password>",
    "mysql.use-mariadb-driver" = "true"
);

Manage External Catalogs

View and drop

-- View the creation statement of an External Catalog
SHOW CREATE CATALOG <catalog_name>;

-- Drop a catalog (the original data in the external source is untouched)
DROP CATALOG <catalog_name>;

Permission control

External Catalogs offer a separate catalog-level permission model (the Managed Catalog adb reuses the existing database/table/column three-level model):

-- Grant
GRANT <privilege> ON CATALOG <catalog_name> TO '<user_name>'@'%';

-- Revoke
REVOKE <privilege> ON CATALOG <catalog_name> FROM '<user_name>'@'%';

Privilege

Effect

ACCESS

Access databases and tables under the catalog

CREATE

Create catalogs

DROP

Drop catalogs

ALTER

Alter catalogs

Ingestion and query examples

Import from an external source into AnalyticDB for MySQL

Use INSERT INTO ... SELECT to materialize data from an External Catalog into an internal table. The pattern works for every catalog type.

INSERT INTO <adb_db>.<adb_table>
SELECT * FROM <catalog_name>.<lake_db>.<lake_table>;

-- From Hive Catalog
INSERT INTO analytics_db.orders_adb
SELECT * FROM hive_hms.warehouse_db.orders;

-- From Iceberg Catalog
INSERT INTO analytics_db.events_adb
SELECT * FROM iceberg_hms.lakehouse.events;

-- From Delta Catalog
INSERT INTO analytics_db.transactions_adb
SELECT * FROM delta_hms.finance_db.transactions;

-- From Paimon Catalog
INSERT INTO analytics_db.lineitem_adb
SELECT * FROM paimon_oss.tpch.lineitem;

Hive Catalog query

-- List databases in the Hive Catalog
SHOW DATABASES FROM hive_hms;

-- Query a Hive table
SELECT
    order_id,
    customer_name,
    order_date,
    total_amount
FROM hive_hms.sales_db.orders
WHERE order_date >= '2024-01-01'
ORDER BY total_amount DESC
LIMIT 100;

Iceberg Catalog query

-- List databases in the Iceberg Catalog
SHOW DATABASES FROM iceberg_hms;

-- Query an Iceberg table
SELECT
    event_type,
    COUNT(*) AS event_count,
    AVG(duration_ms) AS avg_duration
FROM iceberg_hms.lakehouse.user_events
WHERE event_date >= '2024-01-01'
GROUP BY event_type
ORDER BY event_count DESC;

Delta Catalog query

-- List databases in the Delta Catalog
SHOW DATABASES FROM delta_hms;

-- Query a Delta table
SELECT
    product_category,
    SUM(revenue) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM delta_hms.finance_db.transactions
WHERE transaction_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY product_category
ORDER BY total_revenue DESC;

Paimon Catalog query

SELECT
    l_returnflag,
    l_linestatus,
    SUM(l_quantity)                                       AS sum_qty,
    SUM(l_extendedprice)                                  AS sum_base_price,
    SUM(l_extendedprice * (1 - l_discount))               AS sum_disc_price,
    SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    AVG(l_quantity)                                       AS avg_qty,
    AVG(l_extendedprice)                                  AS avg_price,
    AVG(l_discount)                                       AS avg_disc,
    COUNT(*)                                              AS count_order
FROM   paimon_oss.tpch.lineitem
WHERE  l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
GROUP  BY l_returnflag, l_linestatus
ORDER  BY l_returnflag, l_linestatus;

MySQL Catalog query

-- List databases in the MySQL Catalog
SHOW DATABASES FROM rds_mysql_catalog;

-- Query an RDS MySQL table directly
SELECT
    user_id,
    username,
    email,
    created_at
FROM rds_mysql_catalog.user_db.users
WHERE created_at >= '2024-01-01'
ORDER BY created_at DESC
LIMIT 100;

Federated lakehouse query

Join an internal table with an external one in the same statement:

SELECT
    u.username,
    u.email,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM
    rds_mysql_catalog.user_db.users u
JOIN
    analytics_db.orders o ON u.user_id = o.user_id
GROUP BY u.username, u.email
ORDER BY total_spent DESC
LIMIT 50;

FAQ

Q: Do External Catalogs support writes and DDL?

A: Paimon, Hive, and Iceberg Catalogs support write and DDL operations from 3.2.8. Delta Catalog and MySQL Catalog are query-only at present. To persist data from a query-only catalog into AnalyticDB for MySQL, use INSERT INTO ... SELECT into an internal table.

Q: Can Hive / Iceberg / Delta Catalogs use FileSystem or DLF as the metastore?

A: Not today—HMS only. Use Paimon Catalog if you need FileSystem or DLF.

Q: HMS connection fails. What should I check?

A: Verify ENI access is on, that the cluster and HMS share a subnet, that the HMS security group allows the subnet's IP range, and that the address and port in metastore-uri are correct.

Q: Does dropping a catalog delete the external data?

A: No. DROP CATALOG removes the catalog definition and metadata cache from AnalyticDB for MySQL only; the source data is untouched.

Q: Why does MySQL Catalog need the MariaDB driver for StarRocks?

A: StarRocks's MySQL protocol implementation is based on MariaDB. The default MySQL JDBC driver can hit compatibility issues. Setting mysql.use-mariadb-driver to true avoids them.

Q: CREATE EXTERNAL CATALOG returned success but later queries fail. Why?

A: Catalog creation only validates SQL syntax and parameter format; it never actually contacts the underlying source. Run SHOW DATABASES FROM <catalog_name> right after creation to confirm connectivity. Common causes include a missing or unauthorized OSS bucket, a wrong Metastore address, ENI access being off, or the target database firewall blocking the cluster.

Related documents