通过Hive访问TableStore数据-开源大数据平台 E-MapReduce-阿里云

本文通过示例为您介绍EMR Hive作业如何处理TableStore中的数据。

前提条件

已创建DataLake集群，详情请参见创建集群。
已登录集群，详情请参见登录集群。

已获取下方JAR包并上传到集群。

JAR包名称	获取方法	参考下载链接
emr-tablestore-X.X.X.jar	Maven库中下载：emr-tablestore。	emr-tablestore-2.2.0.jar
tablestore-Y.Y.Y-jar-with-dependencies.jar	Maven库中下载：tablestore。重要需要下载jar-with-dependencies。	tablestore-5.13.11-jar-with-dependencies.jar

使用限制

本文操作仅适用于DataLake集群。
DataLake集群和TableStore实例须在相同地域下，通过EMR VPC内网可以访问TableStore。

操作步骤

在EMR Master节点上创建一个目录，同时将Hive访问TableStore所需的JAR包复制到该目录。

mkdir -p /path/to/tablestore/jars
cp emr-tablestore-2.2.0.jar tablestore-5.13.11-jar-with-dependencies.jar \
  /path/to/tablestore/jars

在EMR控制台，修改Hive服务配置，保存并开启自动配置更新。
修改配置项详情请参见修改配置项。
配置文件
配置项
修改内容
hive-env.sh
hive_aux_jars_path
配置项末尾添加,/path/to/tablestore/jars。
hive-site.xml
hive.aux.jars.path
配置项末尾添加,/path/to/tablestore/jars。
在TableStore控制台创建数据表，详情请参见通过控制台使用宽表模型。
本文示例中创建的数据表名称为pet，表主键为name。
执行以下命令，进入Hive命令行。
```
hive
```
说明
如果使用Beeline，则需要重启HiveServer2服务。

在Hive中创建并查询表数据。

执行以下命令，创建Hive表。

CREATE EXTERNAL TABLE pet
  (name STRING, owner STRING, species STRING, sex STRING, birth STRING, death STRING)
  STORED BY 'com.aliyun.openservices.tablestore.hive.TableStoreStorageHandler'
  WITH SERDEPROPERTIES(
    "tablestore.columns.mapping"="name,owner,species,sex,birth,death")
  TBLPROPERTIES (
    "tablestore.endpoint"="https://<instance_name>.<region>.vpc.tablestore.aliyuncs.com",
    "tablestore.access_key_id"="<yourAccesskeyId>",
    "tablestore.access_key_secret"="<yourAccesskeyKey>",
    "tablestore.table.name"="pet");

执行以下命令，向表中插入数据。

INSERT INTO pet VALUES("Fluffy", "Harold", "cat", "f", "1993-02-04", null);
INSERT INTO pet VALUES("Claws", "Gwen", "cat", "m", "1994-03-17", null);
INSERT INTO pet VALUES("Buffy", "Harold", "dog", "f", "1989-05-13", null);
INSERT INTO pet VALUES("Fang", "Benny", "dog", "m", "1990-08-27", null);
INSERT INTO pet VALUES("Bowser", "Diane", "dog", "m", "1979-08-31", "1995-07-29");
INSERT INTO pet VALUES("Chirpy", "Gwen", "bird", "f", "1998-09-11", null);
INSERT INTO pet VALUES("Whistler", "Gwen", "bird", null, "1997-12-09", null);
INSERT INTO pet VALUES("Slim", "Benny", "snake", "m", "1996-04-29", null);
INSERT INTO pet VALUES("Puffball", "Diane", "hamster", "f", "1999-03-30", null);

执行以下命令，查询数据。

SELECT * FROM pet;

返回信息如下所示。

OK
Bowser	Diane	dog	m	1979-08-31	1995-07-29
Buffy	Harold	dog	f	1989-05-13	NULL
Chirpy	Gwen	bird	f	1998-09-11	NULL
Claws	Gwen	cat	m	1994-03-17	NULL
Fang	Benny	dog	m	1990-08-27	NULL
Fluffy	Harold	cat	f	1993-02-04	NULL
Puffball	Diane	hamster	f	1999-03-30	NULL
Slim	Benny	snake	m	1996-04-29	NULL
Whistler	Gwen	bird	NULL	1997-12-09	NULL
Time taken: 1.731 seconds, Fetched: 9 row(s)

配置文件	配置项	修改内容
hive-env.sh	hive_aux_jars_path	配置项末尾添加,/path/to/tablestore/jars。
hive-site.xml	hive.aux.jars.path	配置项末尾添加,/path/to/tablestore/jars。