JIYIK CN >

Current Location:Home > Learning > ALGORITHM >

Base64 encoding pitfalls: ID card number fuzzy query

Author:JIYIK Last Updated:2025/03/19 Views:

First, I will dig out the pits I encountered in the project. In our project, there is such a requirement that the user's ID number cannot be stored in plain text when it is stored in the database. In view of such a requirement, it is natural for us to encode the ID number with base64 encoding and then store it in the database.

After the above processing, we thought there would be no problem. However, since we did not know the encoding principle of base64 encoding very well, we fell into a pit we dug for ourselves in the following development process. The new requirement is this: user information can be fuzzily queried by the last 4 or 6 digits of the ID number, and of course, the user's information can also be found by the complete ID number. It is of course easy to search by the complete ID number. Just base64 encode the input ID number and then search it in the database. |

However, is it feasible to encode the last 4 or 6 digits of the ID number and then match it in the data table using the like keyword? Of course, we did this at the beginning. We took it for granted that the string after the base64 encoding of the complete ID number contains the string of the last 4 or 6 digits of the base64 encoding. For example, encode an ID number first:

<?php
    echo base64_encode('153933199401281934');
    // output --- MTUzOTMzMTk5NDAxMjgxOTM0

Then encode the last 6 digits

<?php
    echo base64_encode('281934');
    // output --- MjgxOTM0j

The result was the same as expected. So I was very happy to use like to match it. But when I encoded the last 4 digits, I found that the result was abnormal.

<?php
    echo base64_encode('1934');
    // output --- MtkzNA==

The base64 encoding string of the complete ID card number does not contain the encoding string of the last four digits. This means that directly using LIKE to match does not work.

At this point, I feel it is necessary to understand how base64 is encoded. Only by knowing yourself and your enemy can you win every battle.

Base64 encoding principle

The reason why base64 encoding is called base64 is because it is an encoding method based on 64 characters. These 64 characters are: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/.

There is a correspondence table between these characters and values, and the entire encoding is carried out according to this correspondence table.

Numeric character Numeric character Numeric character Numeric character
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 45 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F twenty one V 37 l 53 1
6 G twenty two W 38 m 54 2
7 H twenty three X 39 n 55 3
8 I twenty four Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

Why do we need 64 characters? Because 2 times 6 is 64, which can represent 64 characters. For example, according to the corresponding table above, 000000 corresponds to 'A', 000001 corresponds to 'B', 000002 corresponds to 'C', and 111110 corresponds to '+'.

However, we know that 1 byte is equal to 8 bits. When we get a string, we first find the ASCII code corresponding to each character, and then convert the ASCII code into an 8-bit binary representation ('A'=65=01000001). Then divide the binary string converted from the string into groups of 6 bits. In this way, the decimal number corresponding to each 6-bit binary number is used to retrieve the corresponding character from the comparison table above. For example: Base64 encoding 'PHP'

Original Characters P H P
ASCII code decimal value 80 72 80
Binary Value 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0
Every 6 groups 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0
Corresponding decimal value 20 4 33 16
Encoded characters U E h Q

So the string 'PHP' is 'UEhQ' after base64 encoding.

The above situation is rather special. The three characters are calculated as 8 bits per byte, which is exactly 24 bits. Then, each 6 bits is a group, which is exactly four groups. However, if the original string is not 24 bits after binary conversion, but 32 bits, then the last group will only have two bits if it is divided into groups of 6 bits. After the first 5 groups are converted into decimal numbers, the corresponding characters are found in the comparison table. How to convert the last group with two bits into decimal? In fact, it is very simple. Use 0 to fill it to 6 bits, and then convert it. Of course, it is not over yet. Although each group has 6 bits filled, the total number of bits may not be divisible by 8 after being filled with 0. Therefore, it is necessary to use '=' to indicate that a 6-bit is filled in turn until the total number of bits can be divided by 8 again. Next, we encode 'PHPS' based on Example 1.

Original Characters P H P S
 

 
ASCII code decimal value 80 72 80 83
 

 
Binary Value 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1
 

 
Every 6 groups 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0
 

 

 

 

 

 

 

 

 

 

 

 
Corresponding decimal value 20 4 33 16 20 48
 

 
Encoded characters U E h Q U w = =

We can see that after 0-padded, it is 36 bits, which is not divisible by 8. Then '=' is used to represent a 6-bit padded sequence, and finally when two '=' are added, it is 48 bits, which is divisible by 8. At this point, the encoding ends, and finally 'PHPS' is encoded with base64 to become 'UEhQUw=='.

Base64 encoding is mainly used in transmission, storage, binary representation and other fields. It can also be used for simple encryption. However, you may not know what the content is at a glance, but you can decode it according to its encoding rules.

Filling the hole

Through a deep understanding of the base64 encoding principle, it is found that the implementation scheme for ID card number fuzzy query is not feasible at all. Since it does not work, the only way is to reverse decode and then perform fuzzy query.

The so-called reverse is to first decode the field to be retrieved in sql, and then use the decoded value to match the ID number we want to query. In this way, our mysql needs to support base64 encoding/decoding aggregation functions. After searching, it is found that it is only supported in mysqL5.6 and above versions 5.6. So we need to upgrade our mysql to at least version 5.6.

The following is the required sql

	select name,id_card from users where from_base64(id_card) like ‘%1533’;

Of course, in actual applications, 1533 will be dynamically replaced with the last 4 or 6 digits of the ID number we input.

In this way, our problem is perfectly solved. We found that it is actually very simple to solve our problem, just one SQL statement can solve it. However, the process of finding this method is still worth it. Through this problem, we have a deep understanding of the principle of base64 encoding, which will be of great help to us in designing programs in the future.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Encoding an image as Base64 in PHP

Publish Date:2025/04/13 Views:198 Category:PHP

There are many ways to encode images into base64 format using several built-in functions in PHP. These functions include: pathinfo file_get_contents base64_encode mime_content_type Encode images to Base64 in PHP using file_get_contents , pa

If ELSE in MySQL

Publish Date:2025/04/11 Views:85 Category:MySQL

In this tutorial, we aim to explore how to use IF ELSE the statement in MySQL. One of the key roles of a data analyst is to gather insights from the data and produce meaningful results. It can be done with the help of several data filtering

DATETIME vs. TIMESTAMP in MySQL

Publish Date:2025/04/11 Views:117 Category:MySQL

DATETIME and TIMESTAMP are two different data types that can be used to store values ​​that must contain both a date and a time portion. In this article, we will understand how it is stored in the database and the memory required for ea

Execute multiple joins in one query in MYSQL

Publish Date:2025/04/11 Views:94 Category:MySQL

Have you ever wondered how to include multiple joins in one query in MySQL? You have come to the right place. Remember that joins allow us to access information from other tables. This information is included separately to avoid redundancy.

Joining 3 tables in MySQL

Publish Date:2025/04/11 Views:187 Category:MySQL

In this tutorial, we will learn how to join three tables in MySQL. Businesses and organizations may have to visualize three tables simultaneously based on certain matching columns common to all three tables. This operation is allowed in MyS

Use of UPDATE JOIN in MySQL

Publish Date:2025/04/11 Views:85 Category:MySQL

This tutorial will explain how to use the statement in MySQL database UPDATE JOIN . We generally use joins to iterate over the rows in a particular table which may or may not have similar rows in other tables. We can UPDATE use JOIN the cla

How to use the Row_Number() function in MySQL

Publish Date:2025/04/11 Views:142 Category:MySQL

In this tutorial, we will explain how to use the VALUES function in MySQL ROW_NUMBER() . This is a sorting method that assigns consecutive numbers within a partition starting from 1. It is important to note that no two rows within a partiti

Multiple primary keys in MySQL

Publish Date:2025/04/11 Views:66 Category:MySQL

In this tutorial, our goal is to explore the concept of multiple primary keys for a table in MySQL. Many times, businesses and organizations have to assign certain columns as primary keys. This primary key has multiple uses and reasons to b

Displaying foreign keys in MySQL

Publish Date:2025/04/11 Views:55 Category:MySQL

In this tutorial, we aim to explore how to display foreign keys for tables and columns in MySQL. The type of key that references a primary key, also known as the primary key of another table, is called a foreign key. Understanding the forei

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial