Create a Kudu data source

更新时间: 2026-06-04 18:30:18

Create a Kudu data source to enable Dataphin to read from or write to Kudu.

Background information

Kudu provides RDBMS-like data models and supports insert, update, and delete operations. As a columnar storage layer, Kudu depends on external Hadoop processing engines (MapReduce, Spark, Impala) and stores data in the underlying Linux file system.

Kudu is designed for HTAP scenarios (such as IoT) where traditional OLTP/OLAP separation or Lambda architectures introduce data replication complexity. Kudu's unified storage eliminates this overhead. Kudu official website.

Permissions

Required role: super administrator, administrator, domain architect, project administrator, or a custom global role with the permission to create data sources.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.

  2. On the Datasource page, click +Create Data Source.

  3. On the Create Data Source page, select Kudu in the Big Data section.

    Alternatively, select Kudu from the Recently Used section or use the search box to filter.

  4. On the Create Kudu Data Source page, configure the connection parameters.

    1. Configure basic information.

      Parameter

      Description

      Datasource Name

      Requirements:

      • Can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).

      • Maximum 64 characters.

      Datasource Code

      Reference tables in Flink_SQL tasks using the format data source code.table name or data source code.schema.table name. To auto-match the current environment, use the variable format ${data source code}.table or ${data source code}.schema.table. For more information, see Flink_SQL task development method.

      Important
      • The data source code cannot be modified after it is configured.

      • You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.

      • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.

      Version

      Select your Kudu version. Supported versions:

      • CHD5:1.16

      • CHD6:1.16

      • CDP7.1.3:1.16.

      Data Source Description

      Maximum 128 characters.

      Data Source Configuration

      Select the environment scope:

      • Production + Development Data Source: Use separate connections for production and development.

      • Production Data Source: No separation between production and development environments.

      Tag

      Categorize data sources with tags. Manage data source tags.

    2. Configure connection parameters.

      Configure connections based on your data source configuration. Production + Development data source requires separate connection details for each environment. Production data source requires only the production connection.

      Note

      Best practice: use different data sources for production and development to isolate environments. Dataphin also supports using the same connection for both.

      Parameter

      Description

      Connection Url

      The Kudu connection address. Format: ip1:port1,ip2:port2.

      Kerberos

      Kerberos is a symmetric-key authentication protocol for service identity verification.

      • If Kudu has Kerberos enabled, turn on Kerberos and configure:

        • Krb5 File Configuration or KDC Server: Upload a Krb5 file with the Kerberos realm, or specify the KDC server address.

          Note

          Separate multiple KDC Server addresses with commas (,).

        • Keytab File: Upload the Kerberos keytab file.

        • Principal: The Kerberos principal. Format: xxxx/hadoopclient@xxx.xxx.

      • If Kudu does not use Kerberos, leave Kerberos disabled.

      Configuration File

      Upload the Hadoop configuration file.

      Note

      Available only when Kerberos is enabled.

      Table Prefix

      A prefix to isolate tables across environments or storage systems. For example, use Impala as the prefix to distinguish Impala-sourced tables from other storage systems sharing the same Kudu service.

  5. Select a Default Resource Group to run data source tasks such as SQL execution, offline migration, and data preview.

  6. Click Test Connection to verify connectivity, or click OK to save directly.

    Clicking Test Connection validates connectivity to Dataphin. Clicking OK auto-tests all selected clusters but saves the data source regardless of test results.

上一篇: Create an Impala data source 下一篇: Create a Greenplum data source
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈