DataWorks supports configuring a Paimon Catalog data source to collect and govern metadata for Paimon tables that do not originate from Data Lake Formation (DLF). This specialized data source helps you unify the governance of Paimon data lake assets in Data Map. This topic describes how to configure this data source.
Introduction
With the growing adoption of the lakehouse architecture in enterprises, open table formats like Paimon, Iceberg, and Delta Lake have become the cornerstones for building real-time data warehouses and enabling unified batch-stream processing scenarios. In the Flink stream processing ecosystem, Paimon Catalog is widely used due to its native compatibility.
DataWorks is deeply integrated with Data Lake Formation, supporting unified management and access to data lake tables through DLF data sources. For example, a user might define a Paimon Catalog using the Flink engine, with the metadata and data stored in Alibaba Cloud OSS.
Existing data source systems cannot effectively discover or deeply manage this type of native, non-DLF-managed lake format metadata. To address this, DataWorks introduces the Paimon Catalog data source to support metadata collection and governance for native data lake formats. This feature fills the management gap for self-declared catalogs, making end-to-end lakehouse data visible, manageable, and usable.
Limitations
Network connectivity: Only a serverless resource group is supported.
Scenarios: Paimon Catalog is currently used only for Collect Metadata and governance. It does not support data integration and synchronization tasks. To read from and write to Paimon tables for data synchronization, use other data sources, such as DLF or OSS.
Procedure
1. Go to the Data Sources page
Log on to the DataWorks console and switch to the target region. In the left navigation bar, click Workspace, and then click Manage in the Actions column of the target workspace to go to the management page.
On the workspace management center page, click Data Sources in the left navigation bar to go to the Data Source page.
2. Add a Paimon Catalog data source
On the Data Sources page, click Add Data Source .
In the Add Data Source dialog box, search for and select Paimon Catalog.
3. Configure parameters
Configure the following parameters:
Parameter | Description |
Data Source Name | Specify a custom data source name, such as |
Catalog | The name of the catalog for the connection, such as |
MetaStore | The storage type of the catalog. Currently, only Filesystem is supported. |
Filesystem | The file storage type. Currently, only OSS is supported. |
Access Mode |
|
Region | Select a bucket in the same region as the workspace for optimal performance. For cross-region data sources, establish a VPC peering connection. For details, see Connect to a data source in a different region under the same Alibaba Cloud account. Alternatively, connect by using a public endpoint. |
Endpoint | For information about endpoint configuration, see Access domain names and data centers. |
Warehouse | Warehouse path: The storage path of the Paimon Catalog in OSS.
|
4. Test connectivity
After you configure the data source, run a connectivity test to verify the connection between the data source and the resource group.
If Connected is displayed, the configuration is correct.
If Connection failed. is displayed, a diagnostic tool opens to help you troubleshoot. Common causes include incorrect credentials, network connectivity issues such as an unconfigured IP address whitelist, or a missing NAT gateway.
In standard mode, you must ensure that both the development environment and the production environment are Connected. Otherwise, errors will occur during subsequent operations such as metadata collection.
Next steps
After adding the data source, go to the Data Map module to collect metadata. You can then view and govern this metadata.
to the right of the input box to select a path from a list.