最近做项目的时候发现了一个奇怪的问题,我们在建表的时候设置了唯一键,然后在插入数据的时候无法正常插入,具体什么情况呢?我在下方一步一步的复现一下:
首先,准备一个建表的脚本,数据库服务任意版本都行,本文讨论的问题与数据库版本基本上没啥关系。
create table example
(
id int auto_increment primary key,
name varchar(20) charset utf8mb4 not null,
age int null,
constraint example_pk_2 unique (name)
);
大概这样就可以了,咱们先随便插入一条数据:
test> insert into example (name, age) values ('Tom', 20)
[2024-12-21 20:19:29] 1 row affected in 22 ms
这条数据正常插入了,然后咱们接下来再插入一条数据:
test> insert into example (name, age) values ('TOM', 21)
[2024-12-21 20:20:27] [23000][1062] Duplicate entry 'TOM' for key 'example_pk_2'
报错了,说唯一键冲突了,怎么回事儿?明明 tom ≠ TOM 啊?字段可以不区分大小写,数据值不区分这就有些过分了吧!!!
后经过多方排查,得到了答案,字符集出了问题,对,没错,就是字符集。
建表语句中对name
字段指定了字符集utf8mb4
,其实这个字符集本身没什么问题,但是字段实际使用的字符集除了问题,mysql默认使用utf8mb4_general_ci
。通过查询,该字段确实是使用了这种字符集,可以使用一下语句查询:
show full columns from example;
那么utf8mb4_general_ci
这个字符集有什么问题?需要咱们仔细了解一下这个字符集到底是什么类型的字符集:
- utf8mb4 – 就是4字符长度的utf8字符集,在mysql中utf8的长度是3个字节。
- general – 就是常规的,普通的的意思
- ci – case insensitive, 就是大小写不敏感,也就是说不区分大小写
连起来就是常规的不区分大小写的4字符UTF8字符集
既然知道原因了,咱们还是需要验证一下,咱们将example表name字段的字符集修改成 utf8mb4_bin
试试。
test> insert into example (name, age) values ('TOM', 21)
[2024-12-21 20:38:51] 1 row affected in 17 ms
咱们来看看mysql的默认字符集,可以通过 show charset
查看:
Charset | Description | Default collation | Maxlen |
---|---|---|---|
utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
tis620 | TIS620 Thai | tis620_thai_ci | 1 |
swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
macroman | Mac West European | macroman_general_ci | 1 |
macce | Mac Central European | macce_general_ci | 1 |
latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
latin1 | cp1252 West European | latin1_swedish_ci | 1 |
koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
hp8 | HP West European | hp8_english_ci | 1 |
hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
greek | ISO 8859-7 Greek | greek_general_ci | 1 |
geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
euckr | EUC-KR Korean | euckr_korean_ci | 2 |
eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
dec8 | DEC West European | dec8_swedish_ci | 1 |
cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
cp866 | DOS Russian | cp866_general_ci | 1 |
cp852 | DOS Central European | cp852_general_ci | 1 |
cp850 | DOS West European | cp850_general_ci | 1 |
cp1257 | Windows Baltic | cp1257_general_ci | 1 |
cp1256 | Windows Arabic | cp1256_general_ci | 1 |
cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
cp1250 | Windows Central European | cp1250_general_ci | 1 |
binary | Binary pseudo charset | binary | 1 |
big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
ascii | US ASCII | ascii_general_ci | 1 |
armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
可以看出mysql默认情况的字符集都是不区分大小写的,因此咱们在存储大小写敏感数据时需要指定大小写敏感的字符集,不然会出现意想不到的结果!!!