Create a Paimon data source-Dataphin(Dataphin)-阿里云帮助中心

Create a Paimon data source to allow Dataphin to read from or write to Paimon.

Warning

The Paimon data source type is deprecated, and you can no longer create new data sources of this type. You can use a Hive data source (Aliyun EMR 5.x Hive 3.1.x, Paimon data lake format) instead. For more information, see Create a Hive data source. Paimon data sources that you created before version 6.1 continue to function normally.

Permissions

Only users with the super administrator, data source administrator, domain architect, or project administrator role, or a custom global role with the Create Data Source permission, can create data sources.

Limitations

You cannot access Paimon data sources using a data source code or as a compute source physical table.
Only HDFS storage is supported.

Procedure

On the Dataphin homepage, choose Management Center > Data Source Management from the top navigation bar.
On the Data Sources page, click + New Data Source.
On the New Data Source page, go to the Big Data Storage section and select Paimon.
If you have recently used Paimon, you can also select Paimon from the Recently Used section. You can also enter Paimon as a keyword in the search box to quickly find it.

On the Create Paimon Data Source page, configure the basic information for the data source.

Parameter	Description
Data source name	The naming conventions are as follows: Must contain only Chinese characters, letters, digits, underscores (_), and hyphens (-). Must not exceed 64 characters.
Data source code	After you configure a data source code, you can reference a table from the data source in a Flink SQL task by using the `data_source_code.table_name` or `data_source_code.schema.table_name` format. If you need to automatically access the data source that corresponds to the current environment, use the `${data_source_code}.table` or `${data_source_code}.schema.table` variable format. For more information, see Develop tables by using Dataphin data sources or . Important The data source code cannot be modified after it is configured. You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured. In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Version	Select the version of the Paimon data source. Supported versions include Aliyun EMR 3.x Hive 2.3.5, Aliyun EMR 5.x Hive 3.1.x, CDH 6.x Hive 2.1.1, CDP 7.x Hive 3.1.3, and AsiaInfo DP 5.x Hive 3.1.0.
Data source description	Enter a brief description of the data source, up to 128 characters.
Data source configuration	Select the data source environments to configure: Select Production + Development Data Source if you use separate production and development environments. Select Production Data Source if you do not use separate production and development environments.
Tag	You can add tags to classify the data source. For information about how to create a tag, see Manage data source tags.

Configure the connection parameters.

If you selected Production + Development Data Source, configure the connection information for both the production and development environments. If you selected Production Data Source, configure the connection information for the production environment only.

Note

Typically, you should configure the production and development data sources as separate instances to isolate the environments and reduce the impact of development activities on production. However, Dataphin also allows you to use the same instance for both environments by entering identical connection parameters.

Parameter	Description
Catalog configuration
Catalog type	Only Hive is supported. This parameter cannot be modified.
Warehouse	Enter the root storage path for Paimon tables. We recommend that you use the values of the `fs.defaultFS` parameter from `core-site.xml` and the `hive.metastore.warehouse.dir` parameter from `hive-site.xml`. Note Object Storage Service (OSS) is not supported.
Hive Thrift URI	Enter the value of the hive.metastore.uris parameter from the hive-site.xml file.
Metadata configuration
Metadata retrieval method	The supported methods are Metastore database and HMS. Metastore database Database type: Only the MySQL database type is supported. Supported versions include MySQL 5.1.43, MySQL 5.6/5.7, and MySQL 8. JDBC URL: Enter the JDBC URL of the metastore database, for example, `jdbc:mysql://host:port/dbname`. Username and Password: Enter the username and password for accessing the metastore database. HMS Authentication method: The supported methods are No Authentication, LDAP, and Kerberos. Note To use the Kerberos method, you must enable the Kerberos option in the Cluster configuration section. hive-site.xml: Upload the `hive-site.xml` configuration file. If real-time development is enabled, this configuration file is reused. Keytab file: If you use the Kerberos method, upload the keytab file. Principal: If you use the Kerberos method, enter the principal.
Cluster configuration
NameNode	Enter the address of the cluster's NameNode. To add multiple NameNodes, click + Add.
Configuration files	Upload the cluster's hdfs-site.xml and core-site.xml configuration files.
Kerberos	To access the cluster by using Kerberos, enable this option and configure the following parameters. Kerberos configuration method: Select the configuration method for the cluster's KDC server. Supported methods are KDC Server and krb5 File Configuration. KDC Server: If you use this method, enter the address of the KDC server. You can specify multiple addresses separated by semicolons (;). krb5 file configuration: If you use this method, upload the krb5 configuration file. HDFS configuration: Specify the HDFS configuration information for the cluster. HDFS keytab file: Upload the HDFS keytab file for the cluster. HDFS principal: Enter the Kerberos principal for the cluster, for example, `XXXX/hadoopclient@xxx.xxx`.
Hive configuration
JDBC URL	Enter the JDBC URL for Hive, for example, `jdbc:hive2://host:port/dbname`.
Username and Password	If you do not use Kerberos to access the cluster, enter the username and password for Hive. Note To ensure that tasks can run as expected, make sure the specified user has the required data permissions.
Hive keytab file	If you use Kerberos to access the cluster, upload the Hive keytab file.
Hive principal	If you use Kerberos to access the cluster, enter the Kerberos principal, for example, `XXXX/hadoopclient@xxx.xxx`.
Configuration file	Upload Hive's hive-site.xml configuration file. Important Flink SQL tasks ignore the authentication information in this integration and use the Flink engine's authentication information to access the Hive data source.

Select a default resource group. Dataphin uses this resource group to run tasks related to the data source, including database SQL queries, offline full database migration, and data preview.
Click Test Connection, or click OK to save your changes and create the Paimon data source.
When you click Test Connection, Dataphin tests the connection to the data source. If you click OK directly, Dataphin automatically tests the connectivity to all configured environments. You can create the data source even if a connectivity test fails.