本文介绍如何使用Impala或Presto查询JindoFS上的数据。

JindoFS配置

以EMR-3.35版本为例,创建名为emr-jfs的命名空间,相关配置参数示例如下:
  • jfs.namespaces=emr-jfs
  • jfs.namespaces.emr-jfs.oss.uri=oss://oss-bucket/oss-dir
  • jfs.namespaces.emr-jfs.mode=block

使用JindoFS

目前, 在E-MapReduce 3.22.0及以上版本,Impala或Presto支持Hive元数据的读取,对于存储在JindoFS的Hive数据表,E-MapReduce Impala/Presto可以直接读取。

同样,也可以将建表语句的Location设置为JindoFS路径,即可实现表数据落在JindoFS上。

例如,原有HDFS上的建表语句如下:

Create external table lineitem (L_ORDERKEY INT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE, L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING, L_SHIPDATE STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING, L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'hdfs:///tpch_impala/lineitem';

相应的改成如下命令即可:

Create external table lineitem (L_ORDERKEY INT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE, L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING, L_SHIPDATE STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING, L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'jfs://emr-jfs/tpch_impala/lineitem';