JIYIK CN >

Current Location:Home > Learning > ALGORITHM >

Base64 encoding pitfalls: ID card number fuzzy query

Author:JIYIK Last Updated:2025/03/19 Views:

First, I will dig out the pits I encountered in the project. In our project, there is such a requirement that the user's ID number cannot be stored in plain text when it is stored in the database. In view of such a requirement, it is natural for us to encode the ID number with base64 encoding and then store it in the database.

After the above processing, we thought there would be no problem. However, since we did not know the encoding principle of base64 encoding very well, we fell into a pit we dug for ourselves in the following development process. The new requirement is this: user information can be fuzzily queried by the last 4 or 6 digits of the ID number, and of course, the user's information can also be found by the complete ID number. It is of course easy to search by the complete ID number. Just base64 encode the input ID number and then search it in the database. |

However, is it feasible to encode the last 4 or 6 digits of the ID number and then match it in the data table using the like keyword? Of course, we did this at the beginning. We took it for granted that the string after the base64 encoding of the complete ID number contains the string of the last 4 or 6 digits of the base64 encoding. For example, encode an ID number first:

<?php
    echo base64_encode('153933199401281934');
    // output --- MTUzOTMzMTk5NDAxMjgxOTM0

Then encode the last 6 digits

<?php
    echo base64_encode('281934');
    // output --- MjgxOTM0j

The result was the same as expected. So I was very happy to use like to match it. But when I encoded the last 4 digits, I found that the result was abnormal.

<?php
    echo base64_encode('1934');
    // output --- MtkzNA==

The base64 encoding string of the complete ID card number does not contain the encoding string of the last four digits. This means that directly using LIKE to match does not work.

At this point, I feel it is necessary to understand how base64 is encoded. Only by knowing yourself and your enemy can you win every battle.

Base64 encoding principle

The reason why base64 encoding is called base64 is because it is an encoding method based on 64 characters. These 64 characters are: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/.

There is a correspondence table between these characters and values, and the entire encoding is carried out according to this correspondence table.

Numeric character Numeric character Numeric character Numeric character
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 45 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F twenty one V 37 l 53 1
6 G twenty two W 38 m 54 2
7 H twenty three X 39 n 55 3
8 I twenty four Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

Why do we need 64 characters? Because 2 times 6 is 64, which can represent 64 characters. For example, according to the corresponding table above, 000000 corresponds to 'A', 000001 corresponds to 'B', 000002 corresponds to 'C', and 111110 corresponds to '+'.

However, we know that 1 byte is equal to 8 bits. When we get a string, we first find the ASCII code corresponding to each character, and then convert the ASCII code into an 8-bit binary representation ('A'=65=01000001). Then divide the binary string converted from the string into groups of 6 bits. In this way, the decimal number corresponding to each 6-bit binary number is used to retrieve the corresponding character from the comparison table above. For example: Base64 encoding 'PHP'

Original Characters P H P
ASCII code decimal value 80 72 80
Binary Value 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0
Every 6 groups 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0
Corresponding decimal value 20 4 33 16
Encoded characters U E h Q

So the string 'PHP' is 'UEhQ' after base64 encoding.

The above situation is rather special. The three characters are calculated as 8 bits per byte, which is exactly 24 bits. Then, each 6 bits is a group, which is exactly four groups. However, if the original string is not 24 bits after binary conversion, but 32 bits, then the last group will only have two bits if it is divided into groups of 6 bits. After the first 5 groups are converted into decimal numbers, the corresponding characters are found in the comparison table. How to convert the last group with two bits into decimal? In fact, it is very simple. Use 0 to fill it to 6 bits, and then convert it. Of course, it is not over yet. Although each group has 6 bits filled, the total number of bits may not be divisible by 8 after being filled with 0. Therefore, it is necessary to use '=' to indicate that a 6-bit is filled in turn until the total number of bits can be divided by 8 again. Next, we encode 'PHPS' based on Example 1.

Original Characters P H P S
 

 
ASCII code decimal value 80 72 80 83
 

 
Binary Value 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1
 

 
Every 6 groups 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0
 

 

 

 

 

 

 

 

 

 

 

 
Corresponding decimal value 20 4 33 16 20 48
 

 
Encoded characters U E h Q U w = =

We can see that after 0-padded, it is 36 bits, which is not divisible by 8. Then '=' is used to represent a 6-bit padded sequence, and finally when two '=' are added, it is 48 bits, which is divisible by 8. At this point, the encoding ends, and finally 'PHPS' is encoded with base64 to become 'UEhQUw=='.

Base64 encoding is mainly used in transmission, storage, binary representation and other fields. It can also be used for simple encryption. However, you may not know what the content is at a glance, but you can decode it according to its encoding rules.

Filling the hole

Through a deep understanding of the base64 encoding principle, it is found that the implementation scheme for ID card number fuzzy query is not feasible at all. Since it does not work, the only way is to reverse decode and then perform fuzzy query.

The so-called reverse is to first decode the field to be retrieved in sql, and then use the decoded value to match the ID number we want to query. In this way, our mysql needs to support base64 encoding/decoding aggregation functions. After searching, it is found that it is only supported in mysqL5.6 and above versions 5.6. So we need to upgrade our mysql to at least version 5.6.

The following is the required sql

	select name,id_card from users where from_base64(id_card) like ‘%1533’;

Of course, in actual applications, 1533 will be dynamically replaced with the last 4 or 6 digits of the ID number we input.

In this way, our problem is perfectly solved. We found that it is actually very simple to solve our problem, just one SQL statement can solve it. However, the process of finding this method is still worth it. Through this problem, we have a deep understanding of the principle of base64 encoding, which will be of great help to us in designing programs in the future.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Unicode character set and UTF-8, UTF-16, UTF-32 encoding

Publish Date:2025/03/19 Views:64 Category:ALGORITHM

ASCII Code In the early days of computing, ASCII codes were used to represent characters. The English language only has 26 letters and a few other special characters and symbols. The following table is an ASCII code comparison table contain

如何在 MySQL 中声明和使用变量

Publish Date:2024/03/26 Views:116 Category:MySQL

当你需要在 MySQL 中的脚本中存储单个值时,最好的方法是使用变量。变量有不同的种类,有必要知道何时以及如何使用每种类型。

在 MySQL 中实现刷新权限

Publish Date:2024/03/26 Views:214 Category:MySQL

本教程介绍了 MySQL 中的刷新权限命令,用于刷新授权表并影响允许的更改。

在 MySQL 中设置时区

Publish Date:2024/03/26 Views:94 Category:MySQL

在本教程中,我们将学习如何在 MySQL 服务器中更改时区。

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial