JIYIK CN >

Current Location:Home > Learning > DATABASE > MySQL >

How to support full Unicode in MySQL database

Author:JIYIK Last Updated:2025/04/26 Views:

Are you using MySQL's utf8 character set in your database ? In this article, I'll explain why you should switch to utf8mb4 , and how to do it.


UTF-8

The UTF-8 encoding can represent every symbol in the Unicode character set, ranging from U+0000001 to 1. U+10FFFFThat's 1,114,112 possible symbols. (Not all of these Unicode code points have characters assigned to them, but that doesn't prevent UTF-8 from being able to encode them.)

UTF-8 is a variable-width encoding; see the Unicode character set and UTF-8, UTF-16, UTF-32 encodings article. It uses one to four 8-bit bytes to encode each symbol. Symbols with lower numeric code point values ​​are encoded using fewer bytes. In this way, UTF-8 is optimized for the common case of using ASCII characters and other BMP symbols (whose code points range from U+000000to U+00FFFF) - while still allowing astral symbols (whose code points range from U+010000to U+10FFFF) to be stored.


utf8 in MySQL

For a long time, I have been using MySQL's utf8 character set for my databases, tables, and columns, assuming that it maps to the aforementioned UTF-8 encoding. By using utf8 , I could store any symbols I wanted in my database - or so I thought.

While writing about the internal character encoding of JavaScript, I noticed that I could not U+1D306 TETRAGRAM FOR CENTER (𝌆)insert the ' symbol into the MySQL database behind this site. The column I was trying to update had utf8_unicode_cia collation of , and the connection character set was set to utf8 .

mysql> SET NAMES utf8; # just to emphasize that the connection charset is set to `utf8`
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE database_name.table_name SET column_name = 'foo𝌆bar' WHERE id = 9001;
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 1

mysql> SELECT column_name FROM database_name.table_name WHERE id = 9001;
+-------------+
| column_name |
+-------------+
| foo         |
+-------------+
1 row in set (0.00 sec)

The content is truncated at the first astral Unicode_ symbol, in this case _ 𝌆— therefore, the attempt to insert foo𝌆bar_ is actually inserted foo, resulting in data loss (and possibly introducing security issues; see below). MySQL also returns a warning message:

mysql> SHOW WARNINGS;
+---------+------+------------------------------------------------------------------------------+
| Level   | Code | Message                                                                      |
+---------+------+------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xF0\x9D\x8C\x86' for column 'column_name' at row 1 |
+---------+------+------------------------------------------------------------------------------+
1 row in set (0.00 sec)

It turns out that MySQL's utf8 character set only partially implements correct UTF-8 encoding. It can only store UTF-8 encoded symbols consisting of one to three bytes; encoding symbols that take up four bytes are not supported.

Because astral symbols (whose code points range from U+010000to U+10FFFF) each consist of four bytes in UTF-8, we cannot use MySQL's utf8 implementation to store them.

This affects not only 𝌆characters, but also more important symbols, such as U+01F4A9 PILE OF POO (💩). In total there are 1,048,575 possible code points that we cannot use. In fact, MySQL's utf8 only allows us to store 5.88% of all possible Unicode code points (0x00FFFF + 1) / (0x10FFFF + 1). Proper UTF-8 can encode 100% of all Unicode code points .

MySQL's utf8encoding naming is awkward because it is different from the correct UTF-8encoding. It does not provide full Unicode support, which may lead to data loss or security vulnerabilities.


utf8mb4 in MySQL

Fortunately, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 that maps to proper UTF-8 and thus fully supports Unicode, including astral symbols.

We can refer to the article Switching MySQL from utf8 to utf8mb4 to switch the encoding.


Summarize

Never use utf8 in MySQL - always use utf8mb4 . It may take some time to update your database and your code, but it is definitely worth the effort. Why arbitrarily limit the set of symbols that can be used in a database? Why lose data every time a user enters an astral symbol as part of a comment or message or whatever we store in the database? There is no reason not to strive for full Unicodesupport everywhere. Do the right thing and use utf8mb4 . 🍻

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Two ways to install mysql-5.5.47 on Linux system and manage mysql

Publish Date:2025/04/26 Views:140 Category:MySQL

We know that there are generally two ways to install software on the Linux system. One is to use rpm or yum to install, which is convenient and fast; the other is to use the compiled source package. Although this method is more troublesome

Mysql master-slave replication simple configuration

Publish Date:2025/04/26 Views:120 Category:MySQL

I'm learning about MySQL master-slave replication recently and want to apply it to a project at work. Since I'm new to it, I don't understand it very well, so I can only share how to make a simple configuration. At the beginning, we configu

MySQL stored procedure details

Publish Date:2025/04/26 Views:163 Category:MySQL

A stored procedure can be thought of as encapsulating a SQL statement that we need to process specially into a function. When needed, we only need to call this function to achieve the desired operation. This process can be called a stored p

How many of these MySQL statement tags have you used?

Publish Date:2025/04/26 Views:122 Category:MySQL

In the article "A Peek into MySQL Stored Procedure Details" , we briefly introduced the use of stored procedures. The syntax for creating stored procedures includes BEGIN...END. In addition to BEGIN...END, the following statement tags can b

Back up the MySQL database to a file

Publish Date:2025/04/26 Views:166 Category:MySQL

Backing up your database is a very important system administration task that should usually cron be run from a job at scheduled intervals. We will use mysqldump the dump utility that comes with mysql to dump the contents of the database to

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial