Dataphin计算源为CDH,代码任务运行报错“java.lang.IllegalArgumentException: Bucket ID out of range: -1”

更新时间:

问题描述

Dataphin计算源为CDH,代码任务和即席SQL运行报错“java.lang.IllegalArgumentException: Bucket ID out of range: -1”。完整的异常信息如下:

2022-03-24 16:05:08.652 Task failed : java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1647589880332_0092_1_00, diagnostics=[Task failed, taskId=task_1647589880332_0092_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1647589880332_0092_1_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.lang.IllegalArgumentException: Bucket ID out of range: -1
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
 at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
 at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
 at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
 at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
 at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
 at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
 at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

问题原因

客户计算引擎是hive3.1,默认是桶表,没有分桶的话桶数量就是-1,因此客户执行会报错。

解决方案

需要在即席SQL或者代码任务最前面加上如下配置之后再运行问题解决:
set hive.tez.bucket.pruning=true;
set hive.fetch.task.conversion=none;  


添加如上参数不生效之后,可以再加上以下参数:
set hive.mapred.mode=nonstrict;
set hive.optimize.ppd=true;
set hive.optimize.index.filter=true;
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

适用于

  • Dataphin