Create a Paimon data source (Deprecated, use Hive-Paimon)

更新时间:
复制 MD 格式

Create a Paimon data source to allow Dataphin to read from or write to Paimon.

Warning

The Paimon data source type is deprecated, and you can no longer create new data sources of this type. You can use a Hive data source (Aliyun EMR 5.x Hive 3.1.x, Paimon data lake format) instead. For more information, see Create a Hive data source. Paimon data sources that you created before version 6.1 continue to function normally.

Permissions

Only users with the super administrator, data source administrator, domain architect, or project administrator role, or a custom global role with the Create Data Source permission, can create data sources.

Limitations

  • You cannot access Paimon data sources using a data source code or as a compute source physical table.

  • Only HDFS storage is supported.

Procedure

  1. On the Dataphin homepage, choose Management Center > Data Source Management from the top navigation bar.

  2. On the Data Sources page, click + New Data Source.

  3. On the New Data Source page, go to the Big Data Storage section and select Paimon.

    If you have recently used Paimon, you can also select Paimon from the Recently Used section. You can also enter Paimon as a keyword in the search box to quickly find it.

  4. On the Create Paimon Data Source page, configure the basic information for the data source.

    Parameter

    Description

    Data source name

    The naming conventions are as follows:

    • Must contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).

    • Must not exceed 64 characters.

    Data source code

    After you configure a data source code, you can reference a table from the data source in a Flink SQL task by using the data_source_code.table_name or data_source_code.schema.table_name format. If you need to automatically access the data source that corresponds to the current environment, use the ${data_source_code}.table or ${data_source_code}.schema.table variable format. For more information, see Develop tables by using Dataphin data sources or .

    Important
    • The data source code cannot be modified after it is configured.

    • You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.

    • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.

    Version

    Select the version of the Paimon data source. Supported versions include Aliyun EMR 3.x Hive 2.3.5, Aliyun EMR 5.x Hive 3.1.x, CDH 6.x Hive 2.1.1, CDP 7.x Hive 3.1.3, and AsiaInfo DP 5.x Hive 3.1.0.

    Data source description

    Enter a brief description of the data source, up to 128 characters.

    Data source configuration

    Select the data source environments to configure:

    • Select Production + Development Data Source if you use separate production and development environments.

    • Select Production Data Source if you do not use separate production and development environments.

    Tag

    You can add tags to classify the data source. For information about how to create a tag, see Manage data source tags.

  5. Configure the connection parameters.

    If you selected Production + Development Data Source, configure the connection information for both the production and development environments. If you selected Production Data Source, configure the connection information for the production environment only.

    Note

    Typically, you should configure the production and development data sources as separate instances to isolate the environments and reduce the impact of development activities on production. However, Dataphin also allows you to use the same instance for both environments by entering identical connection parameters.

    Parameter

    Description

    Catalog configuration

    Catalog type

    Only Hive is supported. This parameter cannot be modified.

    Warehouse

    Enter the root storage path for Paimon tables.

    We recommend that you use the values of the fs.defaultFS parameter from core-site.xml and the hive.metastore.warehouse.dir parameter from hive-site.xml.

    Note

    Object Storage Service (OSS) is not supported.

    Hive Thrift URI

    Enter the value of the hive.metastore.uris parameter from the hive-site.xml file.

    Metadata configuration

    Metadata retrieval method

    The supported methods are Metastore database and HMS.

    • Metastore database

      • Database type: Only the MySQL database type is supported. Supported versions include MySQL 5.1.43, MySQL 5.6/5.7, and MySQL 8.

      • JDBC URL: Enter the JDBC URL of the metastore database, for example, jdbc:mysql://host:port/dbname.

      • Username and Password: Enter the username and password for accessing the metastore database.

    • HMS

      • Authentication method: The supported methods are No Authentication, LDAP, and Kerberos.

        Note

        To use the Kerberos method, you must enable the Kerberos option in the Cluster configuration section.

      • hive-site.xml: Upload the hive-site.xml configuration file. If real-time development is enabled, this configuration file is reused.

      • Keytab file: If you use the Kerberos method, upload the keytab file.

      • Principal: If you use the Kerberos method, enter the principal.

    Cluster configuration

    NameNode

    Enter the address of the cluster's NameNode.

    To add multiple NameNodes, click + Add.

    Configuration files

    Upload the cluster's hdfs-site.xml and core-site.xml configuration files.

    Kerberos

    To access the cluster by using Kerberos, enable this option and configure the following parameters.

    • Kerberos configuration method: Select the configuration method for the cluster's KDC server. Supported methods are KDC Server and krb5 File Configuration.

      • KDC Server: If you use this method, enter the address of the KDC server. You can specify multiple addresses separated by semicolons (;).

      • krb5 file configuration: If you use this method, upload the krb5 configuration file.

    • HDFS configuration: Specify the HDFS configuration information for the cluster.

      • HDFS keytab file: Upload the HDFS keytab file for the cluster.

      • HDFS principal: Enter the Kerberos principal for the cluster, for example, XXXX/hadoopclient@xxx.xxx.

    Hive configuration

    JDBC URL

    Enter the JDBC URL for Hive, for example, jdbc:hive2://host:port/dbname.

    Username and Password

    If you do not use Kerberos to access the cluster, enter the username and password for Hive.

    Note

    To ensure that tasks can run as expected, make sure the specified user has the required data permissions.

    Hive keytab file

    If you use Kerberos to access the cluster, upload the Hive keytab file.

    Hive principal

    If you use Kerberos to access the cluster, enter the Kerberos principal, for example, XXXX/hadoopclient@xxx.xxx.

    Configuration file

    Upload Hive's hive-site.xml configuration file.

    Important

    Flink SQL tasks ignore the authentication information in this integration and use the Flink engine's authentication information to access the Hive data source.

  6. Select a default resource group. Dataphin uses this resource group to run tasks related to the data source, including database SQL queries, offline full database migration, and data preview.

  7. Click Test Connection, or click OK to save your changes and create the Paimon data source.

    When you click Test Connection, Dataphin tests the connection to the data source. If you click OK directly, Dataphin automatically tests the connectivity to all configured environments. You can create the data source even if a connectivity test fails.