阿里云首页

Dataphin管道任务数据中有表情包时,运行报错:“ERROR DlinkTaskPluginCollector - 脏数据: "exception":"Incorrect string value: '\\xF0\\x9F\\x91\\x87

问题描述

管道任务Maxcompute数据库数据集成到MySQL数据库,数据中包含表情包等特殊字符,任务运行报错。

2021-09-09 11:39:05.486 [DlinkTrans - in_article] INFO  DlinkLogbackListener - in_article - 完成处理 (I=1, O=0, R=0, W=1, U=0, E=0)
2021-09-09 11:39:05.501 [0-0-0-writer] WARN  CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Incorrect string value: '\xF0\x9F\x91\x87</...' for column 'content' at row 1
2021-09-09 11:39:05.506 [0-0-0-writer] ERROR DlinkTaskPluginCollector - 
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\x87</...' for column 'content' at row 1
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:965) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2495) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1903) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1242) ~[mysql-connector-java-5.1.47.jar:5.1.47]
 at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doOneInsert(CommonRdbmsWriter.java:434) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doBatchInsert(CommonRdbmsWriter.java:414) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWriteWithConnection(CommonRdbmsWriter.java:349) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWrite(CommonRdbmsWriter.java:371) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Task.startWrite(MysqlWriter.java:78) [mysqlwriter-0.0.1-SNAPSHOT.jar:na]
 at com.alibaba.dt.dlink.core.trans.WriterRunner.run(WriterRunner.java:50) [dlink-engine-0.0.1-SNAPSHOT.jar:na]
 at java.lang.Thread.run(Thread.java:882) [na:1.8.0_152]
2021-09-09 11:39:05.510 [0-0-0-writer] ERROR DlinkTaskPluginCollector - 脏数据: 
{"exception":"Incorrect string value: '\\xF0\\x9F\\x91\\x87</...' for column 'content' at row 1","record":[{"byteSize":15,"index":0,"rawData":"<p>&nbsp;</p>","type":"STRING"}],"type":"writer"}
2021/09/09 11:39:05 - out_article.0 - 完成处理 (I=0, O=1, R=1, W=0, U=0, E=1)
2021-09-09 11:39:05.585 [DlinkTrans - out_article] INFO  DlinkLogbackListener - out_article - 完成处理 (I=0, O=1, R=1, W=0, U=0, E=1)
2021-09-09 11:39:05.592 [DlinkTrans - out_article] INFO  KettleMetricCollector - Total 1 records, 29 bytes | Speed 0B/s, 0 records/s | Error 1 records, 29 bytes |  All Task WaitWriterTime 0.002s |  All Task WaitReaderTime 0.224s | Percentage 100.00%
2021-09-09 11:39:05.602 [DlinkTrans - out_article] INFO  DirtyRecordCheckThread - dirty record check thread begin to stop ...
2021-09-09 11:39:05.602 [dirty_record_check_thread] INFO  DirtyRecordCheckThread - dirty record check thread stopped ...
2021-09-09 11:39:05.602 [job-1054526] INFO  DlinkTrans - trans result : total success read : 1 , total success write : 0 , total dirty record : 1
2021-09-09 11:39:05.607 [job-1054526] ERROR DlinkTrans - Exception when job run

问题原因

MySQL数据库配置不完整,导致通过jdbc读时不能正确解码。

解决方案

1. 数据源配置的url后缀加上useUnicode=true&autoReconnect=true。

jdbc:mysql://数据库域名或者IP:端口/workroom?useUnicode=true&autoReconnect=true

2.修改数据库配置:/etc/mysql/my.cnf或者/etc/my.cnf。

[client] 
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'

然后重启数据库,使之生效,可由DBA操作。然后执行sql检查配置。

SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 
'collation%';

确保以下配置是utf8mb4,尤其是character_set_server。

系统变量

描述

character_set_client

(客户端来源数据使用的字符集)

character_set_connection

(连接层字符集)

character_set_database

(当前选中数据库的默认字符集)

character_set_results

(查询结果字符集)

character_set_server

(默认的内部操作字符集)

适用于

  • Dataphin-数据集成
首页 Dataphin管道任务数据中有表情包时,运行报错:“ERROR DlinkTaskPluginCollector - 脏数据: "exception":"Incorrect string value: '\\xF0\\x9F\\x91\\x87