MySQL switch from utf8 to utf8mb4
utf8mb4
Fortunately, MySQL 5.5.3 (released in early 2010) introduced a utf8mb4
new encoding called which maps to proper UTF-8 and thus fully supports Unicode, including astral symbols.
Step 1: Create a backup
Create a backup of all databases on the server you are upgrading. Safety first!
Step 2: Upgrade MySQL Server
Upgrade your MySQL server to v5.5.3+, or ask your server administrator to do it for you.
Step 3: Modify database, table and column
Change the character set and collation properties of the database, tables, and columns to use utf8mb4
instead of utf8
.
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
Since is fully backwards compatible utf8mb4
with utf8
, no mojibake or other forms of data loss should occur. (But you have backups, right?)
Step 4: Check the maximum length of column and index keys
This is probably the most tedious part of the entire upgrade process.
When utf8
converting from utf8mb4
, the maximum length of a column or index key in bytes is unchanged. Therefore, it is smaller in terms of characters because the maximum length of a character is now four bytes instead of three.
For example, a TINYTEXT
column can hold up to 255 bytes, which corresponds to 85 three-byte characters or 63 four-byte characters. Suppose we have a column that uses utf8TINYTEXT
but must be able to contain more than 63 characters. Given this requirement, we cannot convert this column to utf8mb4 unless we also change the data type to a longer type, such as TEXT
_t — because if we try to fill it with four-byte characters, we will only be able to enter 63 characters, but no more.
The same is true for index keys. InnoDB
The maximum index length for a storage engine is 767 bytes, so we can index up to 255 or 191 characters for a utf8 or utf8mb4 column, respectively. If our current index on a utf8 column is longer than 191 characters, we will need to index fewer characters when using utf8mb4VARCHAR(255)
. (I had to change some indexes on the columns to VARCHAR(191)
.)
Step 5: Modify the connection, client and server character sets
In our application code, set the connection character set to utf8mb4 . This can be done by simply SET NAMES utf8
replacing any variation of with SET NAMES utf8mb4
. If our old SET NAMES
statement specified a collation, make sure to change that as well, e.g. SET NAMES utf8 COLLATE utf8_unicode_ci
to SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci
.
Make sure to set both the client and server character sets. I have the following in my MySQL configuration file ( /etc/my.cnf ):
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
We can easily confirm that these settings are working correctly:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
As we can see, all relevant options are set to utf8mb4 , except that should be binary unless the file system we are using supports multi-byte UTF-8 encoded characters character_set_filesystem
in filenames , while is always utf8 and cannot be overridden.character_set_system
注意
: The default character set and collation can also be configured at some other levels.
Step 6: Repair and optimize all tables
After upgrading your MySQL server and making the necessary changes above, make sure to repair and optimize all databases and tables. I did not do this immediately after the upgrade (I did not think it was necessary since everything seemed to be working fine at first glance) and ran into some strange errors where UPDATE
statements had no effect, even though no errors were thrown.
We can run the following MySQL query for each table to be repaired and optimized:
# For each table
REPAIR TABLE table_name;
OPTIMIZE TABLE table_name;
Fortunately, this can be mysqlcheck
easily done in one go using the command line utility:
$ mysqlcheck -u root -p --auto-repair --optimize --all-databases
This will prompt for the root user's password, after which all tables in all databases will be repaired and optimized.
Summarize
Never use utf8
in MySQL - always use utf8mb4 . It may take some time to update your database and your code, but it is definitely worth the effort. Why arbitrarily limit the set of symbols that can be used in a database? Why lose data every time a user enters an astral symbol as part of a comment or message or whatever we store in the database? There is no reason not to strive for full Unicode
support everywhere. Do the right thing and use utf8mb4 . 🍻
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Two ways to install mysql-5.5.47 on Linux system and manage mysql
Publish Date:2025/04/26 Views:140 Category:MySQL
-
We know that there are generally two ways to install software on the Linux system. One is to use rpm or yum to install, which is convenient and fast; the other is to use the compiled source package. Although this method is more troublesome
Mysql master-slave replication simple configuration
Publish Date:2025/04/26 Views:120 Category:MySQL
-
I'm learning about MySQL master-slave replication recently and want to apply it to a project at work. Since I'm new to it, I don't understand it very well, so I can only share how to make a simple configuration. At the beginning, we configu
Implementation details of MySQL master-slave replication (I) Exploration of the m
Publish Date:2025/04/26 Views:56 Category:MySQL
-
This article mainly discusses the implementation mechanism of master-slave replication, which may not be directly helpful for our actual application, but understanding its principle can achieve twice the result with half the effort for futu
Implementation details of MySQL master-slave replication (II) Exploration from th
Publish Date:2025/04/26 Views:74 Category:MySQL
-
Previously we explored the master server in master-slave replication. Now let's look at how the slave server works in the entire system. In the article Master Server Exploration, we mentioned that three threads are needed in a master-slave
Mysql master-slave replication - what to do if the slave server stops
Publish Date:2025/04/26 Views:97 Category:MySQL
-
You may find this topic a little ridiculous. What can we do if the server stops? Of course, we should restart the service. Restarting the service is no problem. The problem is that a lot of data has been written to the master database durin
MySQL stored procedure details
Publish Date:2025/04/26 Views:163 Category:MySQL
-
A stored procedure can be thought of as encapsulating a SQL statement that we need to process specially into a function. When needed, we only need to call this function to achieve the desired operation. This process can be called a stored p
How many of these MySQL statement tags have you used?
Publish Date:2025/04/26 Views:122 Category:MySQL
-
In the article "A Peek into MySQL Stored Procedure Details" , we briefly introduced the use of stored procedures. The syntax for creating stored procedures includes BEGIN...END. In addition to BEGIN...END, the following statement tags can b
MySQL CURRENT_TIMESTAMP() Function Detailed Introduction (with Examples)
Publish Date:2025/04/26 Views:132 Category:MySQL
-
This article explains how to use MySQL functions with examples CURRENT_TIMESTAMP() . By using it, we can convert or display the current date and time. The output format is "YYYY-MM-DD HH:MM:SS" or "YYYYMMDDHHMMSS", depending on the context
Back up the MySQL database to a file
Publish Date:2025/04/26 Views:166 Category:MySQL
-
Backing up your database is a very important system administration task that should usually cron be run from a job at scheduled intervals. We will use mysqldump the dump utility that comes with mysql to dump the contents of the database to