本文为您介绍任务启动失败时的报错信息及解决方法。
无法加载主类
-
报错信息
错误:找不到或无法加载主类 com.alibaba.proxima.CentauriRunner.OK OK OK OK 错误:找不到或无法加载主类 com.alibaba.proxima.CentauriRunner FAILED: Run job failed. REPORT: https://dm.guide/report/Run job fail?data-dm-guide-action=4&data-dm-guide-extra-msg=ID:6192c00f-dc6f-46c5-9a13-93f1722920a4 2021-06-18 16:28:38 INFO ============================================================ 2021-06-18 16:28:38 INFO Exit code of the Shell command 1 2021-06-18 16:28:38 INFO --- Invocation of Shell command completed --- 2021-06-18 16:28:38 ERROR Shell run failed! 2021-06-18 16:28:38 ERROR Current task status: ERROR 2021-06-18 16:28:38 INFO Cost time is: 1.25s /home/admin/alisatasknode/taskinfo//20210618/dide/16/28/32/95u0koh57ra79t6aft71l38t/T3_1231618871.log-END-EOF -
解决方法
该问题主要原因是MaxCompute无法加载Proxima CE的可执行JAR包,可以通过申请链接或搜索(钉钉群号:11782920)加入MaxCompute开发者社区钉群联系MaxCompute技术支持团队获取支持。
分隔符指定有误
-
报错信息
FAILED: ODPS-0123131:User defined function exception - Traceback: ProximaCEException(code=20003, msg=参数校验异常, detailMsg=数据向量维度[=1]和config配置的向量维度[=128]不一致,) at com.alibaba.proxima.utils.VectorConvert.convert(VectorConvert.java:17) at com.alibaba.proxima.mr.BuildMapper.map(BuildMapper.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.aliyun.MaxCompute.mapred.bridge.utils.MapReduceUtils.runMapper(MapReduceUtils.java:120) at com.aliyun.MaxCompute.mapred.bridge.LotMapperUDTF.run(LotMapperUDTF.java:807) at com.aliyun.MaxCompute.udf.impl.batch.BatchStandaloneUDTFEvaluator.run(BatchStandaloneUDTFEvaluator.java:53) -
解决方法
使用-vector_separator命令行参数指定正确的分隔符,默认是波浪号(
~),详情请参见可选参数。说明分隔符不能带单引号或双引号,使用字符本身即可。例如
','会被识别为字符串','而不是分隔符,。
用户资源组JDK设置错误
-
报错信息
OK Exception in thread "main" java.lang.UnsupportedClassVersionError: com/alibaba/proxima/CentauriRunner : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482) FAILED: Run job failed. 2021-08-26 16:27:51 INFO ======================================================================== 2021-08-26 16:27:51 INFO Exit code of the Shell command 100 2021-08-26 16:27:51 INFO —— Invocation of Shell command completed —— 2021-08-26 16:27:51 ERROR Shell run failed! 2021-08-26 16:27:51 ERROR Current task status: ERROR 2021-08-26 16:27:51 INFO Cost time is: 1.206s /home/admin/alisatasknode/taskinfo//20210826/phoenixprod/16/27/36/6psm65y7cs39edodxp0sdcxq/T3_6801991809.log-END-EOF -
解决方法
MaxCompute新建任务,调度配置有多个网关资源组 ,一般有一个默认的资源组,需要JDK为1.8及以上版本,出现上述报错的主要原因可能是JDK版本过低,手动更换一个资源组即可。
找不到工程的Volume
-
报错信息
MaxCompute-0010000: System internal error - Lost volume dir2021-08-27 11:49:45.689 [main] INFO c.a.proxima.utils.CheckSignUtil - [] - project:taobao_machinelearning appId:201160 check sign pass. times(ms):314 [500] com.aliyun.odps.OdpsException: ODPS-0010000: System internal error - Lost volume dir. at com.aliyun.odps.rest.RestClient.handleErrorResponse(RestClient.java:395) at com.aliyun.odps.rest.RestClient.request(RestClient.java:330) at com.aliyun.odps.rest.RestClient.request(RestClient.java:284) at com.aliyun.odps.Volume.reload(Volume.java:113) at com.aliyun.odps.Volumes.exists(Volumes.java:119) at com.aliyun.odps.Volumes.exists(Volumes.java:102) at com.alibaba.proxima.config.ConfigConvert.volumeProcess(ConfigConvert.java:182) at com.alibaba.proxima.config.ConfigConvert.convert(ConfigConvert.java:31) at com.alibaba.proxima.CentauriRunner.main(CentauriRunner.java:236) Caused by: com.aliyun.odps.rest.RestException: RequestId=612862E170BEC39964445860,Code=InternalServerError,Message=ODPS-0010000: System internal error - Lost volume dir. ... 9 more FAILED: Run job failed. 2021-08-27 11:58:25 INFO ===================================================================== 2021-08-27 11:58:25 INFO Exit code of the Shell command 100 -
解决方法
这种情况可能是对应的Volume目录存在,但是目录被损坏了,可尝试通过手动删除对应的目录重新执行。具体ODPS SQL命令如下:
vfs -ls /; --该命令会输出前缀为'proxima_v2/xxx'的目录 vfs -rm -r -f /proxima_v2/xxx; --删除该目录(与runLog里面打印的Volume目录一致)。与下述命令二选一 vfs -rmv /proxima_v2; --删除整个Volume。与上述命令二选一
报错 exceeds the allowed maximum length of '2097152'.
-
报错信息
MaxCompute-0420031: Invalid xml in HTTP request body - The request body is malformed or the server version doesn’t match this sdk/client. XML Schema validation failed: Element 'Value': [facet 'maxLength'] The value has a length of '7238452'; this exceeds the allowed maximum length of '2097152'. -
该问题的原因之一是用户在设置启动参数-classpath时出现问题,请参考运行重新设置正确的-classpath启动参数后再运行任务。
报错 java.lang.UnsatisfiedLinkError: no jniproxima in java library.path
-
报错信息
17:57:05.229 [main] DEBUG org.bytedeco.javacpp.Loader - Loading library jniproxima 17:57:05.229 [main] DEBUG org.bytedeco.javacpp.Loader - Failed to load for jniproxima: java.lang.UnsatisfiedLinkError: no jniproxima in java.library.path Can not load proxima core:java.lang.UnsatisfiedLinkError: no jniproxima in java.library.path java.lang.UnsatisfiedLinkError: no jniproxima in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1738) at org.bytedeco.javacpp.Loader.load(Loader.java:1345) at org.bytedeco.javacpp.Loader.load(Loader.java:1157) at org.bytedeco.javacpp.Loader.load(Loader.java:1133) at com.alibaba.proxima2.core.global.proxima.<clinit>(proxima.java:12) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.bytedeco.javacpp.Loader.load(Loader.java:1212) at org.bytedeco.javacpp.Loader.load(Loader.java:1157) at org.bytedeco.javacpp.Loader.load(Loader.java:1133) at com.alibaba.proxima2.core.IndexPluginBroker.<clinit>(IndexPluginBroker.java:16) at com.alibaba.proxima2.ce.utils.ProximaUtil.<clinit>(ProximaUtil.java:24) at com.alibaba.proxima2.ce.utils.ConfigParser.commandLineParserProcess(ConfigParser.java:128) at com.alibaba.proxima2.ce.utils.ConfigParser.parse(ConfigParser.java:36) at com.alibaba.proxima2.ce.ProximaCERunner.main(ProximaCERunner.java:139) Caused by: java.lang.UnsatisfiedLinkError: /home/ads/.javacpp/cache/main_sub0.jar/com/alibaba/proxima2/linux-x86_64/libjniproxima.so: /home/ads/.javacpp/cache/main_sub... at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) at java.lang.Runtime.load0(Runtime.java:809) at java.lang.System.load(System.java:1086) at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1685) ... 14 more 17:57:05.240 [main] INFO com.alibaba.proxima2.ce.utils.ConfigParser - odps.stage.mapper.split.size: 128M Running job in console. 17:57:05.405 [main] INFO com.alibaba.proxima2.ce.utils.ConfigParser - projectName: alimama_kgb_algo, appId:200696 17:57:05.921 [main] INFO com.alibaba.proxima2.ce.utils.CheckSignUtil - checkSign, projectName:alimama_kgb_algo, appId:200696, fc_name:vector_retrieval, result:{"code"... 17:57:06.080 [main] INFO com.alibaba.proxima2.ce.utils.CheckSignUtil - checkSign, projectName:alimama_kgb_algo, appId:200696, fc_name:vector_retrieval_v2, result:{"co... ProximaCEException(code=20002, msg=xxx, detailMsg=project:alimama_kgb_algo appId:200696 xxx) at com.alibaba.proxima2.ce.utils.CheckSignUtil.checkSign(CheckSignUtil.java:39) at com.alibaba.proxima2.ce.utils.ConfigParser.checkSign(ConfigParser.java:452) at com.alibaba.proxima2.ce.utils.ConfigParser.parse(ConfigParser.java:38) at com.alibaba.proxima2.ce.ProximaCERunner.main(ProximaCERunner.java:139) ProximaCEException(code=20002, msg=xxx, detailMsg=xxx.) at com.alibaba.proxima2.ce.utils.ConfigParser.checkSign(ConfigParser.java:455) at com.alibaba.proxima2.ce.utils.ConfigParser.parse(ConfigParser.java:38) at com.alibaba.proxima2.ce.ProximaCERunner.main(ProximaCERunner.java:139) FAILED: Run job failed. 2022-09-28 17:57:06 INFO ================================================================ -
解决方法
该问题一般是MaxCompute实例未成功加载Proxima SDK,可能是该实例所在机器太老、环境配置过低的原因导致。这种情况出现的概率较低,通常情况下重跑任务即可,系统会调度到能work的机器实例上。
该文章对您有帮助吗?