Introduction to Consistent Hashing Algorithm and PHP Code Implementation

Current Location：Home > Learning > ALGORITHM >

WEB FRONT-END PROGRAM NETWORK ALGORITHM OPERATING SYSTEM DATABASE

Introduction to Consistent Hashing Algorithm and PHP Code Implementation

Author：JIYIK Last Updated：2025/03/19 Views：

When designing a distributed system architecture, in order to improve the system's load capacity, different data needs to be distributed to different service nodes. Therefore, a distribution mechanism is needed, which is actually an algorithm, to achieve this function. Here we use the Consistent Hashing algorithm.

Before formally introducing the Consistent Hashing algorithm, let's first look at a simple hash algorithm, which is to select nodes by taking the remainder. The specific steps are as follows:

1. Create a hash table based on the number of nodes in the cluster service
. 2. Then calculate the integer hash value of the key name based on the key name, and use the hash value to take the remainder of the number of nodes.
3. Finally, take out the node from the hash table based on the remainder.

Suppose there are n server nodes in a cluster, and these nodes are numbered 0, 1, 2, ..., n-1. Then, a piece of data (key, value) is stored in the server. How do we choose the server node at this time? According to the above steps, we need to calculate the hash value of the key, and then take the remainder of n (the number of nodes). The final value is the node we want. Expressed as a formula: num = hash(key) % n. hash() is a function that calculates the hash value. There are certain requirements for the hash() function here. If the hash() function we use is very optimized, the calculated num is evenly distributed between 0, 1, 2, ..., n-1, so that as many server nodes as possible can be used. Instead of all data being concentrated on one or several server nodes. The specific hash() implementation is not the focus of this chapter.

Although this method of simply taking the remainder is simple, there will be big problems if it is applied to the actual production system. Suppose we have 23 service nodes. Then according to the above method, the probability of a key being mapped to each node is 1/23. Assuming that a service node is added, the previous hash(key) % n will become hash(key) % (n+1). That is to say, there is a 23/24 probability that the key will be reallocated to the new node. On the contrary, there is only a 1/24 probability that it will be allocated to the original node. Similarly, when you reduce a node, there is a 22/23 probability that it will be reallocated to the new node.

In view of this situation, there needs to be a way to avoid or reduce the occurrence of a decrease in hit rate when scaling horizontally. This method is the Consistent Hashing algorithm we are going to introduce, which we call the consistent hashing algorithm.
To understand how the Consistent Hashing algorithm works, we assume that the unit interval [0, 1) is evenly distributed on the circle in a clockwise direction.

Suppose there are n service nodes, and each service node is numbered 0, 1, 2, …, n-1. Then we need a hash() function to calculate the hash value of the service node. If the selected hash() function returns a value in the range [0, R), then use the formula v = hash(n) / R. The v obtained in this way will be distributed in the unit interval [0, 1). Therefore, in this way, our service nodes can be distributed on the circle.

Of course, drawing a circle with the unit interval [0, 1) is just one way. There are many other ways to draw a circle, for example: using the interval [0, 2^32-1) as a circle, and then using the hash() function to calculate the hash() value of the service node. The value generated by the selected hash() function must of course be within the range of 0 – (2^32-1).

Here we still take [0, 1) as an example.

Let's take 3 service nodes as an example to illustrate

These three nodes are randomly distributed on the circle. Now suppose we have a piece of data (key, value) that needs to be stored. The next thing to do is to map this data to the circle using the same method.

Then, starting from the position where the key is located on the circle, we search for the service node in a clockwise direction. The first service node we find is the node to be stored. So this data will be stored on service node 1.

Similarly, when other (key, value) pairs need to be stored, the service node is selected in the same way as above.

Now let's see whether this method can solve the horizontal expansion problem we mentioned at the beginning?

Suppose we need to add a service node 3

From the above figure, we can see that only key1 will change its storage service node. For most data, the original node will still be found. Therefore, for a cluster of n service nodes, when a service node is added, there is only a 1/(n+1) probability that a piece of data will change its storage service node. This probability is much smaller than the probability obtained by the remainder method. Similarly, the principle of reducing a service node and adding a service node is the same, and the probability of reselecting a service node for each piece of data is 1/(n-1). This probability is also very small.

Here is a simple PHP code to implement this process.

$nodes = array('192.168.5.201','192.168.5.102','192.168.5.111');
$keys = array('onmpw', 'jiyi', 'onmpw_key', 'jiyi_key', 'www','www_key','key1');
$buckets = array(); //节点的hash字典
$maps = array(); //存储key和节点之间的映射关系
/**
 * 生成节点字典 —— 使节点分布在单位区间[0,1)的圆上
 */
foreach( $nodes as $key) {
	$crc = crc32($key)/pow(2,32);            // CRC値
	$buckets[] = array('index'=>$crc,'node'=>$key);
}

/*
 * 根据索引进行排序
 */
sort($buckets);
/*
 * 对每个key进行hash计算，找到其在圆上的位置
 * 然后在该位置开始依顺时针方向找到第一个服务节点
 */
foreach($keys as $key){
	$flag = false; //表示是否有找到服务节点
	$crc = crc32($key)/pow(2,32);//计算key的hash值
	for($i = 0; $i < count($buckets); $i++){

	    if($buckets[$i]['index'] > $crc){
	    /*
	     * 因为已经对buckets进行了排序
	     * 所以第一个index大于key的hash值的节点即是要找的节点
	     */
	     $maps[$key] = $buckets[$i]['node'];
	          $flag = true;
	          break;
	     }
	}
	if(!$flag){
	     //没有找到，则使用buckets中的第一个服务节点
	     $maps[$key] = $buckets[0]['node'];
	}
}
foreach($maps as $key=>$val){
	echo $key.'=>'.$val,"<br />";
}

The result of running this code is as follows

onmpw=>192.168.5.102
jiyi=>192.168.5.201
onmpw_key=>192.168.5.201
jiyi_key=>192.168.5.102
www=>192.168.5.201
www_key=>192.168.5.201
key1=>192.168.5.111

Then we add a service node and modify the code as follows

$nodes = array('192.168.5.201','192.168.5.102','192.168.5.111','192.168.5.11');

The other codes remain unchanged, and the results of continued operation are as follows

onmpw=>192.168.5.102
jiyi=>192.168.5.201
onmpw_key=>192.168.5.11
jiyi_key=>192.168.5.102
www=>192.168.5.201
www_key=>192.168.5.201
key1=>192.168.5.111

We can see that only onmpw_key has reselected the service node. The others are the original nodes.

We can see here that the probability of hitting the target has been greatly improved compared to the remainder method. Does this solve the problem we encountered earlier?

Actually, not yet. Because the distribution of these values is not so uniform after all. In the system, it is possible that these service nodes are very concentrated, which may lead to a situation where all keys are mapped to one or several nodes, and the remaining service nodes are not used. Although this is not a serious problem, why should we waste even just one server?

We can see that this situation causes the data to be concentrated on one service node, resulting in waste of other service nodes. So how to solve this problem? People have come up with a new way: to create a virtual node for each node. What does it mean? That is to say, for node j, create m replicas for it. These m replicated nodes all get different hash values through the hash() function, but the node information saved by each virtual node is that of node j. Then these virtual nodes are randomly distributed on the circle. For example, we have two service nodes. And three virtual nodes are copied for each node. These nodes (including virtual nodes are randomly distributed on the circle)

It seems that the service nodes are distributed evenly on the circle. In fact, to sum up, it is just a slight improvement on the above method - copying some virtual nodes to each node.

Therefore, our code does not need to be modified too much. In order to read the code more intuitively, I will list the entire code here.

$nodes = array('192.168.5.201','192.168.5.102','192.168.5.111');
$keys = array('onmpw', 'jiyi', 'onmpw_key', 'jiyi_key', 'www','www_key','key1');
//添加的变量  修改的地方
$replicas = 160;  //每个节点的复制的个数
$buckets = array(); //节点的hash字典
$maps = array(); //存储key和节点之间的映射关系
/**
 * 生成节点字典 —— 使节点分布在单位区间[0,1)的圆上
 */
foreach( $nodes as $key) {
        //修改的地方
        for($i=1;$i<=$replicas;$i++){
	    $crc = crc32($key.'.'.$i)/pow(2,32);            // CRC値
	    $buckets[] = array('index'=>$crc,'node'=>$key);
        }
}
/*
 * 根据索引进行排序
 */
sort($buckets);
/*
 * 对每个key进行hash计算，找到其在圆上的位置
 * 然后在该位置开始依顺时针方向找到第一个服务节点
 */
foreach($keys as $key){
	$flag = false; //表示是否有找到服务节点
	$crc = crc32($key)/pow(2,32);//计算key的hash值
	for($i = 0; $i < count($buckets); $i++){
	    if($buckets[$i]['index'] > $crc){
	         /*
	          * 因为已经对buckets进行了排序
	          * 所以第一个index大于key的hash值的节点即是要找的节点
	          */
	          $maps[$key] = $buckets[$i]['node'];
	          $flag = true;
	          break;

	    }

	 }
	 if(!$flag){
	    //没有找到，则使用buckets中的第一个服务节点
	    $maps[$key] = $buckets[0]['node'];
	 }
}
foreach($maps as $key=>$val){
	echo $key.'=>'.$val,"<br />";
}

The changes are marked in the code. As you can see, the changes are relatively minor.

At this point, I believe everyone should have a clearer understanding of Consistent Hashing. Hash algorithms are widely used, such as in memcache clusters, nginx load, etc. Therefore, understanding hash algorithms is of great help to us.

If there are any unclear or inappropriate parts in the description of the above algorithm process, you are welcome to give me your advice.

Previous：Distributed thinking in Memcached

Next：PHP finds the common prefix of multiple strings [Case]

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >

Introduction to Consistent Hashing Algorithm and PHP Code Implementation

Related Articles

Creating a signature from Hash_hmac() and Sha256 in PHP

PBKDF2+HMAC Hash Conflict

Learning the Sorting Algorithm - Insertion Sort (Concepts)

Learning path of sorting algorithm - direct insertion sort

Learning the sorting algorithm - Binary Insertion Sort

The road to learning sorting algorithms - table insertion sort

The road to learning sorting algorithms - Hill sort

The road to learning sorting algorithms - merge sort

The road to learning sorting algorithms - merge sort (non-recursive implementatio

Scan to Read All Tech Tutorials

Social Media

 https://www.github.com/onmpw

 qq:1244347461



Recommended

Tags

Introduction to Consistent Hashing Algorithm and PHP Code Implementation

Related Articles

Scan to Read All Tech Tutorials

Social Media  https://www.github.com/onmpw  qq:1244347461 

Recommended

Tags

Social Media

 https://www.github.com/onmpw

 qq:1244347461

