文档

Dataphin中Hive数据通过管道任务同步到Hbase数据库,运行报错"java.lang.IllegalArgumentException: KeyValue size too large"

更新时间:
一键部署

问题描述

Dataphin中Hive数据通过管道任务同步到Hbase数据库,运行报错"java.lang.IllegalArgumentException: KeyValue size too large"。

具体日志如下所示:

2021-12-22 14:39:38.179 [0-0-99-reader] INFO  ReaderImpl - Reading ORC rows from hdfs://X.X.X.X:XX/user/hive/warehouse/yx_ads_dev.db/rtp_yx_inverted_id/part-00048-a2fff144-91ba-41e8-9399-d0db13136bc8-c000.snappy.orc with {include: null, offset: 0, length: 9223372036854775807}
2021-12-22 14:39:38.844 [0-1-1-writer] ERROR DlinkTaskPluginCollector - 
java.lang.IllegalArgumentException: KeyValue size too large
 at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
 at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
 at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
 at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.writer.hbase11xwriter.HbaseAbstractTask.startWriter(HbaseAbstractTask.java:60) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.writer.hbase11xwriter.Hbase11xWriter$Task.startWrite(Hbase11xWriter.java:75) [hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.dt.dlink.core.trans.WriterRunner.run(WriterRunner.java:50) [dlink-engine-0.0.1-SNAPSHOT.jar:na]
 at java.lang.Thread.run(Thread.java:882) [na:1.8.0_152]
2021-12-22 14:39:39.097 [trans metric report timer] INFO  KettleMetricCollector - Total 11762119 records, 883932715 bytes | Speed 22.33MB/s, 191415 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 162.895s |  All Task WaitReaderTime 175.932s | Percentage 95.00%
2021-12-22 14:39:39.140 [0-2-98-reader] INFO  Reader$Task - end read source files...
2021/12/22 14:39:39 - Hive_1.2 - 完成处理 (I=4038806, O=0, R=0, W=4038806, U=0, E=0)
2021-12-22 14:39:39.197 [DlinkTrans - Hive_1] INFO  DlinkLogbackListener - Hive_1 - 完成处理 (I=4038806, O=0, R=0, W=4038806, U=0, E=0)
2021/12/22 14:39:39 - HBase_1.2 - 完成处理 (I=0, O=4038806, R=4038806, W=0, U=0, E=0)
2021-12-22 14:39:39.497 [DlinkTrans - HBase_1] INFO  DlinkLogbackListener - HBase_1 - 完成处理 (I=0, O=4038806, R=4038806, W=0, U=0, E=0)
2021-12-22 14:39:40.625 [0-1-1-writer] ERROR DlinkTaskPluginCollector - 脏数据: 
{"exception":"KeyValue size too large","record":[{"byteSize":15,"index":0,"rawData":"segment_id#6001","type":"STRING"},{"byteSize":107889697,"index":1,"rawData":"[

问题原因

插入Hbase数据库的字段大小超过了Hbase中 maxKeyValueSize 最大值导致。

解决方案

该报错是因为插入Hbase数据库的字段大小超过Hbase中 maxKeyValueSize 最大值导致。需要联系Hbase运维人员修改Hbase客户端hbase-site.xml文件,将hbase.client.keyvalue.maxsize值改大。

如果客户端修改客户端配置还不生效,需要进一步修改服务端hbase-default.xml文件中hbase.client.keyvalue.maxsize的值。

适用于

  • Dataphin
  • 本页导读
文档反馈