Dataphin中数据集成到hive报错"org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout"

产品名称

Dataphin

产品模块

数据集成   数据源  数据同步

概述

提供数据集成、数据同步任务运行过程中报“org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch ::java.nio.channels.SocketChannel[connection-pending remote=/IP:Port]”类异常的排查思路和方法

问题描述

客户oracle数据库数据集成到hive中运行报错org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch ::java.nio.channels.SocketChannel[connection-pending remote=/IP:Port]”具体任务如下:

完整的异常堆栈如下:

2021-07-01 13:38:39.710 [trans metric report timer] INFO  KettleMetricCollector - Total 52872 records, 2603684 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.434s |  All Task WaitReaderTime 0.692s | Percentage 100.00%
2021-07-01 13:38:39.891 [Thread-17] INFO  DFSClient - Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/IP:Port]
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) ~[hadoop-common-2.7.1.jar:na]
 at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1508) ~[hadoop-hdfs-2.7.1.jar:na]
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1284) [hadoop-hdfs-2.7.1.jar:na]
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) [hadoop-hdfs-2.7.1.jar:na]
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) [hadoop-hdfs-2.7.1.jar:na]
2021-07-01 13:38:39.892 [Thread-17] INFO  DFSClient - Abandoning BP-1125737243-10.168.0.11-1622274785060:blk_1073806855_66031
2021-07-01 13:38:39.905 [Thread-17] INFO  DFSClient - Excluding datanode DatanodeInfoWithStorage[IP:Port,DS-eeacc440-0699-4793-afb9-9ef1a6003563,DISK]
2021-07-01 13:38:39.906 [Thread-17] WARN  DFSClient - DataStreamer Exception
java.io.IOException: Unable to create new block.
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1250) ~[hadoop-hdfs-2.7.1.jar:na]
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) ~[hadoop-hdfs-2.7.1.jar:na]
2021-07-01 13:38:39.906 [Thread-17] WARN  DFSClient - Could not get block locations. Source file "/user/hive/warehouse/tranning_dev.db/ods_ext_yield_curve__c2383859_4197_46f5_b472_796337b76727/ods_ext_yield_curve" - Aborting...
2021-07-01 13:38:39.907 [0-0-0-writer] ERROR HdfsWriter$Job - 写文件文件[hdfs://10.168.0.27:8020/user/hive/warehouse/tranning_dev.db/ods_ext_yield_curve__c2383859_4197_46f5_b472_796337b76727/ods_ext_yield_curve]时发生IO异常,请检查您的网络是否正常!
2021-07-01 13:38:39.907 [0-0-0-writer] INFO  BaseDfsUtil - start delete tmp dir [hdfs://10.168.0.27:8020/user/hive/warehouse/tranning_dev.db/ods_ext_yield_curve__c2383859_4197_46f5_b472_796337b76727] .
2021-07-01 13:38:39.915 [0-0-0-writer] INFO  BaseDfsUtil - finish delete tmp dir [hdfs://10.168.0.27:8020/user/hive/warehouse/tranning_dev.db/ods_ext_yield_curve__c2383859_4197_46f5_b472_796337b76727] .
2021-07-01 13:38:39.916 [0-0-0-writer] ERROR WriterRunner - Writer Runner Received Exceptions:
com.alibaba.dt.pipeline.plugin.center.exception.DataXException: Code:[HdfsWriter-04], Description:[您配置的文件在写入时出现IO异常.]. - org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/IP:Port]
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
 at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1508)
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1284)
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
 - org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/IP:Port]

问题原因

输出数据源hive中一个节点端口未开放导致网络不通

解决方案

1】直接根据异常“org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch ::java.nio.channels.SocketChannel[connection-pending remote=/IP:Port]”分析,怀疑是高并发场景下hive服务的线程连接数都被占用导致超时。可以建议客户调整管道任务并发数,重新运行看是否有缓解。

当前客户的情况经过确认:全局并发调度配置本身为3,并发配置并不高。所以该因素排除。

2】确认dataphin界面数据源测试连接是否正常,如果确认数据源测试连接通过,但是管道任务运行报错,就需要考虑以下几种情况:

  1. 公有云prod/dev模式项目的沙箱白名单是否配置了对应数据源节点的IP和端口(独立部署环境和basic模式项目没有沙箱白名单):
  2. 如果数据库为VPC网络环境下阿里云数据库,还需要检查dataphin对应信息是否配置到了数据库白名单:
  3. 如果1、2条确认没有问题,但是管道或者同步任务运行还是报数据库连接超时或者数据库连接不上的问题,可以找出堆栈中暴露的数据库IP、端口或者域名;然后在dataphin平台创建shell任务,telnet下看是否能telnet通: 这次问题就是客户hive其中一个节点端口XXX未开放导致,telnet显示连接超时,端口开放后问题解决。

更多信息

相关文档

添加沙箱白名单:https://help.aliyun.com/document_detail/112032.htm

配置数据源时需指定授权IP白名单:https://help.aliyun.com/document_detail/112037.html