Dataphin数据集成任务从MicrosoftSQLServer同步数据到AnalyticDBforPostgreSQL提示“org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00”

更新时间:

问题描述

Dataphin数据集成任务从MicrosoftSQLServer同步数据到AnalyticDBforPostgreSQL提示“org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00”。

2022-10-09 10:22:03.149 [0-0-0-writer] ERROR DlinkTaskPluginCollector - 
org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2476) ~[postgresql-42.1.1.jar:42.1.1]
 at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2189) ~[postgresql-42.1.1.jar:42.1.1]
 at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300) ~[postgresql-42.1.1.jar:42.1.1]
 at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) ~[postgresql-42.1.1.jar:42.1.1]
 at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) ~[postgresql-42.1.1.jar:42.1.1]
 at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169) ~[postgresql-42.1.1.jar:42.1.1]
 at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:158) ~[postgresql-42.1.1.jar:42.1.1]
 at com.alibaba.datax.plugin.writer.adbpgwriter.AdbpgCopyProxy.doOneInsert(AdbpgCopyProxy.java:229) [adbpgwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.writer.adbpgwriter.AdbpgCopyProxy.startWriteInCopy(AdbpgCopyProxy.java:169) [adbpgwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.writer.adbpgwriter.AdbpgWriter$Task$1.startWrite(AdbpgWriter.java:207) [adbpgwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.writer.adbpgwriter.AdbpgWriter$Task.startWrite(AdbpgWriter.java:223) [adbpgwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.dt.dlink.core.trans.WriterRunner.run(WriterRunner.java:50) [dlink-engine-0.0.1-SNAPSHOT.jar:na]
 at java.lang.Thread.run(Thread.java:882) [na:1.8.0_152]

问题原因

数据存在空值。

解决方案

“invalid byte sequence for encoding "UTF8": 0x00”(注意:若不是0x00则很可能是字符集设置有误),是PostgreSQL独有的错误信息,直接原因是varchar型的字段或变量不接受含有'\0'(也即数值0x00、UTF编码'\u0000')的字符串 。官方给出的解决方法:事先去掉字符串中的'\0',例如在Java代码中使用str.replaceAll('\u0000', ''),貌似这是目前唯一可行的方法。

'\0'是判定字符数组结束的标识,表示这串字符到结尾了;或'\0'是字符串的结束符,任何字符串之后都会自动加上'\0'。

You're trying to insert a string which contains a '\0' character. The server can't handle strings containing embedded NULs, as it uses C-style string termination internally.

适用于

  • Dataphin