Create a Paimon data source to allow Dataphin to read from or write to Paimon.
The Paimon data source type is deprecated, and you can no longer create new data sources of this type. You can use a Hive data source (Aliyun EMR 5.x Hive 3.1.x, Paimon data lake format) instead. For more information, see Create a Hive data source. Paimon data sources that you created before version 6.1 continue to function normally.
Permissions
Only users with the super administrator, data source administrator, domain architect, or project administrator role, or a custom global role with the Create Data Source permission, can create data sources.
Limitations
You cannot access Paimon data sources using a data source code or as a compute source physical table.
Only HDFS storage is supported.
Procedure
On the Dataphin homepage, choose Management Center > Data Source Management from the top navigation bar.
On the Data Sources page, click + New Data Source.
On the New Data Source page, go to the Big Data Storage section and select Paimon.
If you have recently used Paimon, you can also select Paimon from the Recently Used section. You can also enter Paimon as a keyword in the search box to quickly find it.
On the Create Paimon Data Source page, configure the basic information for the data source.
Parameter
Description
Data source name
The naming conventions are as follows:
Must contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
Must not exceed 64 characters.
Data source code
After you configure a data source code, you can reference a table from the data source in a Flink SQL task by using the
data_source_code.table_nameordata_source_code.schema.table_nameformat. If you need to automatically access the data source that corresponds to the current environment, use the${data_source_code}.tableor${data_source_code}.schema.tablevariable format. For more information, see Develop tables by using Dataphin data sources or .ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Version
Select the version of the Paimon data source. Supported versions include Aliyun EMR 3.x Hive 2.3.5, Aliyun EMR 5.x Hive 3.1.x, CDH 6.x Hive 2.1.1, CDP 7.x Hive 3.1.3, and AsiaInfo DP 5.x Hive 3.1.0.
Data source description
Enter a brief description of the data source, up to 128 characters.
Data source configuration
Select the data source environments to configure:
Select Production + Development Data Source if you use separate production and development environments.
Select Production Data Source if you do not use separate production and development environments.
Tag
You can add tags to classify the data source. For information about how to create a tag, see Manage data source tags.
Configure the connection parameters.
If you selected Production + Development Data Source, configure the connection information for both the production and development environments. If you selected Production Data Source, configure the connection information for the production environment only.
NoteTypically, you should configure the production and development data sources as separate instances to isolate the environments and reduce the impact of development activities on production. However, Dataphin also allows you to use the same instance for both environments by entering identical connection parameters.
Parameter
Description
Catalog configuration
Catalog type
Only Hive is supported. This parameter cannot be modified.
Warehouse
Enter the root storage path for Paimon tables.
We recommend that you use the values of the
fs.defaultFSparameter fromcore-site.xmland thehive.metastore.warehouse.dirparameter fromhive-site.xml.NoteObject Storage Service (OSS) is not supported.
Hive Thrift URI
Enter the value of the hive.metastore.uris parameter from the hive-site.xml file.
Metadata configuration
Metadata retrieval method
The supported methods are Metastore database and HMS.
Metastore database
Database type: Only the MySQL database type is supported. Supported versions include MySQL 5.1.43, MySQL 5.6/5.7, and MySQL 8.
JDBC URL: Enter the JDBC URL of the metastore database, for example,
jdbc:mysql://host:port/dbname.Username and Password: Enter the username and password for accessing the metastore database.
HMS
Authentication method: The supported methods are No Authentication, LDAP, and Kerberos.
NoteTo use the Kerberos method, you must enable the Kerberos option in the Cluster configuration section.
hive-site.xml: Upload the
hive-site.xmlconfiguration file. If real-time development is enabled, this configuration file is reused.Keytab file: If you use the Kerberos method, upload the keytab file.
Principal: If you use the Kerberos method, enter the principal.
Cluster configuration
NameNode
Enter the address of the cluster's NameNode.
To add multiple NameNodes, click + Add.
Configuration files
Upload the cluster's hdfs-site.xml and core-site.xml configuration files.
Kerberos
To access the cluster by using Kerberos, enable this option and configure the following parameters.
Kerberos configuration method: Select the configuration method for the cluster's KDC server. Supported methods are KDC Server and krb5 File Configuration.
KDC Server: If you use this method, enter the address of the KDC server. You can specify multiple addresses separated by semicolons (;).
krb5 file configuration: If you use this method, upload the krb5 configuration file.
HDFS configuration: Specify the HDFS configuration information for the cluster.
HDFS keytab file: Upload the HDFS keytab file for the cluster.
HDFS principal: Enter the Kerberos principal for the cluster, for example,
XXXX/hadoopclient@xxx.xxx.
Hive configuration
JDBC URL
Enter the JDBC URL for Hive, for example,
jdbc:hive2://host:port/dbname.Username and Password
If you do not use Kerberos to access the cluster, enter the username and password for Hive.
NoteTo ensure that tasks can run as expected, make sure the specified user has the required data permissions.
Hive keytab file
If you use Kerberos to access the cluster, upload the Hive keytab file.
Hive principal
If you use Kerberos to access the cluster, enter the Kerberos principal, for example,
XXXX/hadoopclient@xxx.xxx.Configuration file
Upload Hive's hive-site.xml configuration file.
ImportantFlink SQL tasks ignore the authentication information in this integration and use the Flink engine's authentication information to access the Hive data source.
Select a default resource group. Dataphin uses this resource group to run tasks related to the data source, including database SQL queries, offline full database migration, and data preview.
Click Test Connection, or click OK to save your changes and create the Paimon data source.
When you click Test Connection, Dataphin tests the connection to the data source. If you click OK directly, Dataphin automatically tests the connectivity to all configured environments. You can create the data source even if a connectivity test fails.