mysql字符集utf8mb4失效踩坑

现象

mysql数据表的字符集已经设置成了utf8mb4,但是通过JDBC向数据库写入4字节的emoji表情时报错,但是通过直接使用命令行插入该4字节的emoji表情时却成功了。示例如下:

1
2
3
4
5
6
CREATE TABLE `user_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(11) NOT NULL,
`age` int(4) DEFAULT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

JDBC写入失败

1
2
3
4
User user = new User();
user.setName("\uD83D\uDC8B");
user.setAge(18);
userMapper.insertUser(user);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
org.springframework.jdbc.UncategorizedSQLException: 
### Error updating database. Cause: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\x8B' for column 'name' at row 1
### The error may involve com.wakzz.database.persistence.UserMapper.insertUser-Inline
### The error occurred while setting parameters
### SQL: insert into user_info (name, age) values (?, ? )
### Cause: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\x8B' for column 'name' at row 1
; uncategorized SQLException; SQL state [HY000]; error code [1366]; Incorrect string value: '\xF0\x9F\x92\x8B' for column 'name' at row 1; nested exception is java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\x8B' for column 'name' at row 1

at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:89)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:73)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:446)
at com.sun.proxy.$Proxy81.insert(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:278)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:58)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)
at com.sun.proxy.$Proxy92.insertUser(Unknown Source)

命令行写入成功

1
2
mysql> insert into user_info (name,age) values ('💋',18);
Query OK, 1 row affected

解决

修改mysql配置文件,添加以下配置:

1
character_set_server = utf8mb4

修改前字符集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mysql> show variables like "%char%";
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/soft/mysql-5.6.31/share/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_general_ci |
+----------------------+--------------------+
3 rows in set

修改后字符集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mysql> show variables like "%char%";
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/soft/mysql-5.6.31/share/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set

原理

通过抓包比较mysql配置character_set_server修改前后的tcp数据,发现每一个mysql的数据库连接建立成功后会JDBC会自动执行一次字符集设置语句SET NAMES xxx。当character_set_serverutf8时JDBC执行SET NAMES utf8,而当character_set_serverutf8mb4时JDBC执行SET NAMES utf8mb4

image

image

在命令行中测试SET NAMES发现即使数据库表的字符集是utf8mb4时,若执行了SET NAMES utf8也会导致4字节字符写入mysql失败。成功复现了JDBC写入emoji写入异常的问题。

1
2
3
4
5
mysql> SET NAMES utf8;
Query OK, 0 rows affected

mysql> insert into user_info (name,age) values ('💋',18);
1366 - Incorrect string value: '\xF0\x9F\x92\x8B' for column 'name' at row 1
1
2
3
4
5
mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected

mysql> insert into user_info (name,age) values ('💋',18);
Query OK, 1 row affected

在官方文档MySQL Connector/Java中,文档表明JDBC会通过character_set_server的值自动执行SET NAMES命令设置字符集编码。其目的是自动检测服务端字符集编码配置而减少JDBC客户端的字符集编码配置。

Functionality Added or Changed

  • Connector/J did not support utf8mb4 for servers 5.5.2 and newer.
  • Connector/J now auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using characterEncoding=… as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)

继续抓包以及查看JDBC源码发现,JDBC成功与mysql建立连接后,mysql会返回其版本号和一些附带信息例如字符集编码。如下图,0x00000167位表示当前mysql的默认字符集编码。图一的字符集编码值为0x21即十进制33,图二的字符集编码值为0x2D即十进制45

通过翻阅JDBC源码发现,33表示utf845表示utf8mb4,即该值就是mysql的character_set_server值。

image

image

mysql-connector-java-5.1.46-sources.jar!/com/mysql/jdbc/CharsetMapping.java:258

1
2
collation[33] = new Collation(33, "utf8_general_ci", 1, MYSQL_CHARSET_NAME_utf8);
collation[45] = new Collation(45, "utf8mb4_general_ci", 1, MYSQL_CHARSET_NAME_utf8mb4);

在获取mysql的服务器参数后,解析字符集编码:

  • character_set_serverutf8时,执行SET NAMES utf8
  • character_set_serverutf8mb4时,执行SET NAMES utf8mb4

mysql-connector-java-5.1.46-sources.jar!/com/mysql/jdbc/ConnectionImpl.java:1697

1
2
3
4
5
6
7
8
9
10
11
12
13
boolean utf8mb4Supported = versionMeetsMinimum(5, 5, 2);
// 解析mysql返回的字符集编码,33表示 utf8;45表示 utf8mb4
boolean useutf8mb4 = utf8mb4Supported && (CharsetMapping.UTF8MB4_INDEXES.contains(this.io.serverCharsetIndex));

if (!getUseOldUTF8Behavior()) {
if (dontCheckServerMatch || !characterSetNamesMatches("utf8") || (utf8mb4Supported && !characterSetNamesMatches("utf8mb4"))) {
// 执行set names命令指定字符集编码从而自动配置mysql字符集
execSQL(null, "SET NAMES " + (useutf8mb4 ? "utf8mb4" : "utf8"), -1, null, DEFAULT_RESULT_SET_TYPE,
DEFAULT_RESULT_SET_CONCURRENCY, false, this.database, null, false);
this.serverVariables.put("character_set_client", useutf8mb4 ? "utf8mb4" : "utf8");
this.serverVariables.put("character_set_connection", useutf8mb4 ? "utf8mb4" : "utf8");
}
}
>