Dataphin中Hive数据通过管道任务同步到Hbase数据库,运行报错"java.lang.IllegalArgumentException: KeyValue size too large"
更新时间:
问题描述
Dataphin中Hive数据通过管道任务同步到Hbase数据库,运行报错"java.lang.IllegalArgumentException: KeyValue size too large"。
具体日志如下所示:
2021-12-22 14:39:38.179 [0-0-99-reader] INFO ReaderImpl - Reading ORC rows from hdfs://X.X.X.X:XX/user/hive/warehouse/yx_ads_dev.db/rtp_yx_inverted_id/part-00048-a2fff144-91ba-41e8-9399-d0db13136bc8-c000.snappy.orc with {include: null, offset: 0, length: 9223372036854775807}
2021-12-22 14:39:38.844 [0-1-1-writer] ERROR DlinkTaskPluginCollector -
java.lang.IllegalArgumentException: KeyValue size too large
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.hbase11xwriter.HbaseAbstractTask.startWriter(HbaseAbstractTask.java:60) ~[hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.hbase11xwriter.Hbase11xWriter$Task.startWrite(Hbase11xWriter.java:75) [hbase11xwriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.dt.dlink.core.trans.WriterRunner.run(WriterRunner.java:50) [dlink-engine-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:882) [na:1.8.0_152]
2021-12-22 14:39:39.097 [trans metric report timer] INFO KettleMetricCollector - Total 11762119 records, 883932715 bytes | Speed 22.33MB/s, 191415 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 162.895s | All Task WaitReaderTime 175.932s | Percentage 95.00%
2021-12-22 14:39:39.140 [0-2-98-reader] INFO Reader$Task - end read source files...
2021/12/22 14:39:39 - Hive_1.2 - 完成处理 (I=4038806, O=0, R=0, W=4038806, U=0, E=0)
2021-12-22 14:39:39.197 [DlinkTrans - Hive_1] INFO DlinkLogbackListener - Hive_1 - 完成处理 (I=4038806, O=0, R=0, W=4038806, U=0, E=0)
2021/12/22 14:39:39 - HBase_1.2 - 完成处理 (I=0, O=4038806, R=4038806, W=0, U=0, E=0)
2021-12-22 14:39:39.497 [DlinkTrans - HBase_1] INFO DlinkLogbackListener - HBase_1 - 完成处理 (I=0, O=4038806, R=4038806, W=0, U=0, E=0)
2021-12-22 14:39:40.625 [0-1-1-writer] ERROR DlinkTaskPluginCollector - 脏数据:
{"exception":"KeyValue size too large","record":[{"byteSize":15,"index":0,"rawData":"segment_id#6001","type":"STRING"},{"byteSize":107889697,"index":1,"rawData":"[
问题原因
插入Hbase数据库的字段大小超过了Hbase中 maxKeyValueSize 最大值导致。
解决方案
该报错是因为插入Hbase数据库的字段大小超过Hbase中 maxKeyValueSize 最大值导致。需要联系Hbase运维人员修改Hbase客户端hbase-site.xml文件,将hbase.client.keyvalue.maxsize值改大。
如果客户端修改客户端配置还不生效,需要进一步修改服务端hbase-default.xml文件中hbase.client.keyvalue.maxsize的值。
适用于
- Dataphin
文档内容是否对您有帮助?