Create Kudu data source Dataphin-Dataphin(Dataphin)-阿里云帮助中心

Create a Kudu data source to enable Dataphin to read from or write to Kudu.

Background information

Kudu provides RDBMS-like data models and supports insert, update, and delete operations. As a columnar storage layer, Kudu depends on external Hadoop processing engines (MapReduce, Spark, Impala) and stores data in the underlying Linux file system.

Kudu is designed for HTAP scenarios (such as IoT) where traditional OLTP/OLAP separation or Lambda architectures introduce data replication complexity. Kudu's unified storage eliminates this overhead. Kudu official website.

Permissions

Required role: super administrator, administrator, domain architect, project administrator, or a custom global role with the permission to create data sources.

Procedure

In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select Kudu in the Big Data section.

Alternatively, select Kudu from the Recently Used section or use the search box to filter.

On the Create Kudu Data Source page, configure the connection parameters.

Configure basic information.

Parameter	Description
Datasource Name	Requirements: Can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-). Maximum 64 characters.
Datasource Code	Reference tables in Flink_SQL tasks using the format `data source code.table name` or `data source code.schema.table name`. To auto-match the current environment, use the variable format `${data source code}.table` or `${data source code}.schema.table`. For more information, see Flink_SQL task development method. Important The data source code cannot be modified after it is configured. You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured. In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Version	Select your Kudu version. Supported versions: CHD5:1.16 CHD6:1.16 CDP7.1.3:1.16.
Data Source Description	Maximum 128 characters.
Data Source Configuration	Select the environment scope: Production + Development Data Source: Use separate connections for production and development. Production Data Source: No separation between production and development environments.
Tag	Categorize data sources with tags. Manage data source tags.

Configure connection parameters.

Configure connections based on your data source configuration. Production + Development data source requires separate connection details for each environment. Production data source requires only the production connection.

Note

Best practice: use different data sources for production and development to isolate environments. Dataphin also supports using the same connection for both.

Parameter	Description
Connection Url	The Kudu connection address. Format: `ip1:port1,ip2:port2`.
Kerberos	Kerberos is a symmetric-key authentication protocol for service identity verification. If Kudu has Kerberos enabled, turn on Kerberos and configure: Krb5 File Configuration or KDC Server: Upload a Krb5 file with the Kerberos realm, or specify the KDC server address. Note Separate multiple KDC Server addresses with commas (,). Keytab File: Upload the Kerberos keytab file. Principal: The Kerberos principal. Format: `xxxx/hadoopclient@xxx.xxx`. If Kudu does not use Kerberos, leave Kerberos disabled.
Configuration File	Upload the Hadoop configuration file. Note Available only when Kerberos is enabled.
Table Prefix	A prefix to isolate tables across environments or storage systems. For example, use Impala as the prefix to distinguish Impala-sourced tables from other storage systems sharing the same Kudu service.

Select a Default Resource Group to run data source tasks such as SQL execution, offline migration, and data preview.
Click Test Connection to verify connectivity, or click OK to save directly.

Clicking Test Connection validates connectivity to Dataphin. Clicking OK auto-tests all selected clusters but saves the data source regardless of test results.

上一篇: Create an Impala data source 下一篇: Create a Greenplum data source