Dataphin管道任务数据中有表情包时运行报“ERROR DlinkTaskPluginCollector”错误
更新时间:
问题描述
管道任务MaxCompute数据库数据集成到MySQL数据库,数据中包含表情包等特殊字符,任务运行报错。
2021-09-09 11:39:05.486 [DlinkTrans - in_article] INFO DlinkLogbackListener - in_article - 完成处理 (I=1, O=0, R=0, W=1, U=0, E=0)
2021-09-09 11:39:05.501 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Incorrect string value: '\xF0\x9F\x91\x87</...' for column 'content' at row 1
2021-09-09 11:39:05.506 [0-0-0-writer] ERROR DlinkTaskPluginCollector -
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\x87</...' for column 'content' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:965) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2495) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1903) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1242) ~[mysql-connector-java-5.1.47.jar:5.1.47]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doOneInsert(CommonRdbmsWriter.java:434) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doBatchInsert(CommonRdbmsWriter.java:414) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWriteWithConnection(CommonRdbmsWriter.java:349) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWrite(CommonRdbmsWriter.java:371) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Task.startWrite(MysqlWriter.java:78) [mysqlwriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.dt.dlink.core.trans.WriterRunner.run(WriterRunner.java:50) [dlink-engine-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:882) [na:1.8.0_152]
2021-09-09 11:39:05.510 [0-0-0-writer] ERROR DlinkTaskPluginCollector - 脏数据:
{"exception":"Incorrect string value: '\\xF0\\x9F\\x91\\x87</...' for column 'content' at row 1","record":[{"byteSize":15,"index":0,"rawData":"<p> </p>","type":"STRING"}],"type":"writer"}
2021/09/09 11:39:05 - out_article.0 - 完成处理 (I=0, O=1, R=1, W=0, U=0, E=1)
2021-09-09 11:39:05.585 [DlinkTrans - out_article] INFO DlinkLogbackListener - out_article - 完成处理 (I=0, O=1, R=1, W=0, U=0, E=1)
2021-09-09 11:39:05.592 [DlinkTrans - out_article] INFO KettleMetricCollector - Total 1 records, 29 bytes | Speed 0B/s, 0 records/s | Error 1 records, 29 bytes | All Task WaitWriterTime 0.002s | All Task WaitReaderTime 0.224s | Percentage 100.00%
2021-09-09 11:39:05.602 [DlinkTrans - out_article] INFO DirtyRecordCheckThread - dirty record check thread begin to stop ...
2021-09-09 11:39:05.602 [dirty_record_check_thread] INFO DirtyRecordCheckThread - dirty record check thread stopped ...
2021-09-09 11:39:05.602 [job-1054526] INFO DlinkTrans - trans result : total success read : 1 , total success write : 0 , total dirty record : 1
2021-09-09 11:39:05.607 [job-1054526] ERROR DlinkTrans - Exception when job run
问题原因
MySQL数据库配置不完整,导致通过jdbc读时不能正确解码。
解决方案
1. 数据源配置的URL后缀加上useUnicode=true&autoReconnect=true。
jdbc:mysql://数据库域名或者IP:端口/workroom?useUnicode=true&autoReconnect=true
2.修改数据库配置:/etc/mysql/my.cnf
或者/etc/my.cnf
。
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
然后重启数据库,使之生效,可由DBA操作。然后执行SQL检查配置。
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE
'collation%';
确保以下配置是utf8mb4,尤其是character_set_server。
系统变量 |
描述 |
character_set_client |
客户端来源数据使用的字符集 |
character_set_connection |
连接层字符集 |
character_set_database |
当前选中数据库的默认字符集 |
character_set_results |
查询结果字符集 |
character_set_server |
默认的内部操作字符集 |
适用于
- Dataphin-数据集成
反馈
- 本页导读
文档反馈