通过DLA读写Kudu上的数据_云原生数据湖分析（文档停止维护）-阿里云帮助中心

DLA支持通过CU版访问用户自建的Kudu，通过标准SQL语句查询Kudu中的数据或者直接向Kudu写入数据。本文介绍如何通过DLA读写Kudu上的数据。

前提条件

目前仅支持通过CU版访问Kudu，请确保您已经开通了CU版。
虚拟集群绑定的数据源网络必须和Kudu集群在同一个VPC下面。

准备工作

通过DLA读写Kudu数据前，需要在Kudu中创建测试表。关于Kudu的表设计请参见：https://kudu.apache.org/docs/schema_design.html。

说明目前DLA不支持通过SQL在Kudu中创建新表，仅支持关联已有的表。如果需要新增一张表，需要先在Kudu中创建好，然后通过DLA的建表语句来关联。

以下代码片段是一个Java创建表的示例：

        String KUDU_MASTERS = "master-1:7051,master-2:7051,master-3:7051";
        String tableName = "users";
    
    KuduClient client = new KuduClient.KuduClientBuilder(KUDU_MASTERS).build();

    // Set up a simple schema.
    List<ColumnSchema> columns = new ArrayList<>(3);
    columns.add(new ColumnSchema.ColumnSchemaBuilder("user_id", Type.INT32)
            .key(true)
            .build());
    columns.add(new ColumnSchema.ColumnSchemaBuilder("first_name", Type.STRING).nullable(true)
            .build());
    columns.add(new ColumnSchema.ColumnSchemaBuilder("last_name", Type.STRING).nullable(true)
            .build());
    Schema schema = new Schema(columns);

    // Set up the partition schema, which distributes rows to different tablets by hash.
    // Kudu also supports partitioning by key range. Hash and range partitioning can be combined.
    // For more information, see http://kudu.apache.org/docs/schema_design.html.
    CreateTableOptions cto = new CreateTableOptions();
    List<String> hashKeys = new ArrayList<>(1);
    hashKeys.add("user_id");
    int numBuckets = 2;
    cto.addHashPartitions(hashKeys, numBuckets);
    cto.setNumReplicas(1);

    // Create the table.
    client.createTable(tableName, schema, cto);
    System.out.println("Created table " + tableName);

操作步骤

连接DLA。
创建库
```
CREATE DATABASE `kudu_test`
WITH DBPROPERTIES (
    catalog = 'kudu',
    location = 'master-1:7051,master-2:7051,master-3:7051'
);
```
参数说明如下：
- CATALOG：取值为kudu，表示创建的是Kudu Schema。
- LOCATION：填写kudu master的地址，以逗号分隔。

创建表

CREATE EXTERNAL TABLE users (
user_id int primary key,
  first_name varchar,
  last_name varchar);

重要表名、字段的顺序和类型要和Kudu保持一致。

访问数据

由于只有CU的计算资源和Kudu网络可以联通，因此所有访问Kudu表的SQL语句都需要指定hint： /*+cluster=your-vc-name*/ ，这样SQL就会在CU中执行。

例如：

mysql> /*+ cluster=vc-test */ insert into kudu_it_db_vc.users values(1, 'Donald', 'Duck');
+------+
| rows |
+------+
|    1 |
+------+
1 row in set (0.46 sec)

mysql> /*+ cluster=vc-test */ select user_id,first_name,last_name from kudu_it_db_vc.users where user_id = 1;
+---------+------------+-----------+
| user_id | first_name | last_name |
+---------+------------+-----------+
|       1 | Donald     | Duck      |
+---------+------------+-----------+
1 row in set (0.43 sec)

更多SQL信息请参见：SQL参考手册。