JIYIK CN >

Current Location:Home > Learning > ALGORITHM >

Distributed thinking in Memcached

Author:JIYIK Last Updated:2025/03/19 Views:

Memcached is known as a high-performance distributed cache system. Speaking of distribution, Memcached is worth analyzing. Its distribution mechanism is different from that of general distributed service systems. In distributed service systems, there is communication between nodes in order to ensure data consistency. However, for Memcached, there is no communication between nodes, that is, the Memcached cluster is a set of single-point services, and there is no communication between nodes. All its distribution-related aspects are implemented on the client. This is reflected in an article on Memcached's github titled "A Story of Caching". If you are interested, you can read this article, which will give you a clear understanding of Memcached.

Let us first verify our above statement through a case.

Example 1

There are two Memcached servers. We use PHP to add data to Memcached.

Code 1

$MEMCACHE_SERVERS = array(
	"192.168.5.111", //memcached1
	"192.168.5.102" //memcached2
);
$memcache = new Memcache();
foreach($MEMCACHE_SERVERS as $server){
	$memcache->addServer ( $server );  //将两个台服务器地址都添加进连接池
}
$memcache->set("onmpw", 5);
$memcache->set("jiyi", 4);

After the above PHP code is executed, according to the principle of distributed system, both servers should have data with keys onmpw and jiyi, but is this true? We can log in to the two services separately to check.

Connect 5.111
# telnet 192.168.5.111 11211

Trying 192.168.5.111...
Connected to 192.168.5.111 (192.168.5.111).
Escape character is '^]'.
get jiyi
VALUE jiyi 768 1
4
END
get onmpw   //在5.111上并没有onmpw这条数据
END

Connect 5.102
Trying 192.168.5.102...
Connected to 192.168.5.102 (192.168.5.102).
Escape character is '^]'.
get onmpw
VALUE onmpw 768 1
5
END
get jiyi    //在5.102上同样是没有jiyi这条数据
END

This shows that the Memcached server does not synchronize data in the cluster.

How does the data distribute different key-value pairs to different nodes? This is the distribution that the client needs to achieve. Here, an algorithm is involved - the hash algorithm. There are two hash algorithms involved: one is the remainder calculation distribution, and the other is the consistent hash algorithm (Consistent Hashing). The consistent hash algorithm is systematically described in the article Introduction to Consistent Hashing Algorithm and PHP Implementation . However, the description of the remainder method is not detailed enough, so here is a slightly more detailed description to make up for the shortcomings of that article.

General hash algorithm - calculation of dispersion based on remainder

First, a hash table is created based on the number of nodes served by the cluster. Then, the integer hash value of the key name is calculated based on the key name. The remainder of the hash value is taken with the remainder of the number of nodes. Finally, the node is retrieved from the hash table based on the remainder.

The CRC (Cyclic Redundancy Check) is used when calculating the hash value of the key. The following simplified PHP code is used to illustrate

Code 2

$nodes = array('node1','node2','node3');   //nodes代表服务器节点
$keys = array('onmpw', 'jiyi', 'onmpw_key', 'jiyi_key', 'www','www_key'); //这是key值
foreach( $keys as $key) {
	$crc = crc32($key);            // CRC値
	$mod = $crc % ( count($nodes) );
	$server = $nodes[ $mod ];       // 根据余数选择服务器
	printf("%s => %s\n", $key, $server);
}

First, find the crc value of the key, and then take the remainder with the number of nodes. The execution result of the above code is as follows

onmpw => node2
jiyi => node2
onmpw_key => node1
jiyi_key => node2
www => node1
www_key => node3

According to the above results, we can see that onmpw, jiyi and jiyi_key are scattered to node2, onmpw_key and www are scattered to node1, and www is scattered to node3.

PHP client distributed implementation

The Memcache client implemented in PHP supports the remainder dispersion algorithm. Let's take a look at the part of the source code that involves this algorithm. The php-memcache version used here is memcache-3.0.6

mmc_t *mmc_standard_find_server(void *s, const char *key, unsigned int key_len TSRMLS_DC) {      
	mmc_standard_state_t *state = s;     
	if (state->num_servers > 1) {  
	   /* "new-style" hash */
	   //计算key值的哈希值
	   unsigned int hash = (mmc_hash(state->hash, key, key_len) >> 16) & 0x7fff;
	    //用得到的哈希值对节点数取余
	   return state->buckets[(hash ? hash : 1) % state->num_buckets];
	}
	return state->buckets[0];
}

We can see that the way to calculate the hash value is to use the mmc_hash() function to get a value, then shift the value 16 bits to the right, and finally do an 'AND' operation with 0x7fff.

Here we are looking at how mmc_hash() calculates the value of the key.

#define mmc_hash(hash, key, key_len) ((hash)->finish((hash)->combine((hash)->init(), (key), (key_len))))

This is too tricky, what kind of function is this? Don't worry, this is nothing more than a macro definition. You can simply think that mmc_hash(hash, key, key_len) is actually a value. What is this value? It is obtained by ((hash)->finish((hash)->combine((hash)->init(), (key), (key_len)))). In fact, the main parts are still the finish(), combine() and init() functions. Let's look at the implementation of these three functions.

The above three functions are defined by the following structure

typedef struct mmc_hash_function {
	mmc_hash_function_init          init;
	mmc_hash_function_combine       combine;
	mmc_hash_function_finish        finish;
} mmc_hash_function_t;

At the same time, the following mapping is done for mmc_hash_function_t

extern mmc_hash_function_t mmc_hash_crc32;

That is to say, the implementation of these functions is implemented in mmc_hash_crc32. Continuing to search, we found the following code

mmc_hash_function_t mmc_hash_crc32 = {
	mmc_hash_crc32_init,
	mmc_hash_crc32_combine,
	mmc_hash_crc32_finish  
};

To trace the root, in fact, init(), combine() and finish() functions correspond to mmc_hash_crc32_init(), mmc_hash_crc32_combine() and mmc_hash_crc32_finish() respectively. Let's look at the implementation of these functions

static unsigned int mmc_hash_crc32_init()                                       { return  ~0; }
static unsigned int mmc_hash_crc32_finish(unsigned int seed) { return ~seed; }
static unsigned int mmc_hash_crc32_combine(unsigned int seed, const void *key, unsigned int key_len)
{
	const char *p = (const char *)key, *end = p + key_len;
	while (p < end) {
	      CRC32(seed, *(p++));
	}
	return seed;
}

Do we see familiar figures in the above code? The underlying implementation also uses the crc32 function (this function is defined in ext/standard/crc32.h of the PHP source code. If you are interested, you can study it). By the way, the value returned in the mmc_hash_crc32_init() function is ~0, and its value is -1. The '~' indicates bitwise inversion. When defining the type of the function's return value, it is unsigned. So the binary representation of -1 is 11111111111111111111111111111111111 (32 bits in total).

We can see that the implementation method using remainder calculation is very simple, and its data is relatively scattered. However, this method also has a big flaw. When we need to expand the nodes horizontally, the cost of cache reorganization is also very high. Next, we modify Code 2 and add node4 to nodes.

Code three

$nodes = array('node1','node2','node3','node4');   //nodes代表服务器节点
$keys = array('onmpw', 'jiyi', 'onmpw_key', 'jiyi_key', 'www','www_key'); //这是key值
foreach( $keys as $key) {
	$crc = crc32($key);            // CRC値
	$mod = $crc % ( count($nodes) );
	$server = $nodes[ $mod ];       // 根据余数选择服务器
	printf("%s => %s\n", $key, $server);
}

Running the code again yields the following results

onmpw => node4
jiyi => node4
onmpw_key => node3
jiyi_key => node3
www => node2
www_key => node3

We can see from the above results that only www_key is hit. In this case, the hit rate decreases when adding nodes, and the cache efficiency decreases instantly when adding Memcached servers. At this time, the load will be concentrated on the database server, which may cause the database to be unable to provide normal services due to excessive pressure.

Therefore, a new method is needed to solve the problem of cluster horizontal expansion. Therefore, with the emergence of this problem, a new distributed method was born - Consistent Hashing.

Consistent Hashing

The consistent hashing algorithm is the same as the general hashing algorithm, which is to build a dictionary for the nodes in the cluster. Because the algorithm is described in detail in the article "Introduction to Consistent Hashing Algorithm and PHP Implementation", and there is a simple PHP code implementation, this chapter will not go into too much detail about Consistent Hashing.

Unlike general hash algorithms, this method can minimize the movement of data when nodes are added or removed. In other words, the hit rate will be greatly improved. So how does it achieve this?

First, find the hash value of the memcached server (node) and assign it to the circle of 0 to 1. Then use the same method to find the hash value of the key that stores the data and map it to the circle. Then start searching clockwise from the location where the data is mapped and save the data to the first server found. If more than 1 server is still not found, it will be saved to the first memcached server.

The core part of the implementation code in the PHP-Memcache client is as follows

void mmc_consistent_add_server(void *s, mmc_t *mmc, unsigned int weight)
{      
	mmc_consistent_state_t *state = s;
	int i, key_len, points = weight * MMC_CONSISTENT_POINTS;
	unsigned int seed = state->hash->init(), hash;     
	/* buffer for "host:port-i\0" */
	char *key = emalloc(strlen(mmc->host) + MAX_LENGTH_OF_LONG * 2 + 3);
	key_len = sprintf(key, "%s:%d-", mmc->host, mmc->tcp.port);
	seed = state->hash->combine(seed, key, key_len);
	/* add weight * MMC_CONSISTENT_POINTS number of points for this server */
	/* 申请保存所有节点及其各自的虚拟节点的空间 */
	state->points = erealloc(state->points, sizeof(*state->points) * (state->num_points + points));
	for (i=0; i<points; i++) {
	     key_len = sprintf(key, "%d", i);
	     // 对每个节点及虚拟节点计算hash值
	     hash = state->hash->finish(state->hash->combine(seed, key, key_len));
	     // 保存个虚拟节点的hash值和该节点的信息
	     state->points[state->num_points + i].server = mmc;
	     state->points[state->num_points + i].point = hash;
	}       
	state->num_points += points;           
	state->num_servers++;                  
	state->buckets_populated = 0;                 
	efree(key);                                    
}

The above code disperses the service nodes so that each node and its respective virtual node are randomly and evenly distributed on the circle.

mmc_t *mmc_consistent_find_server(void *s, const char *key, unsigned int key_len TSRMLS_DC)
{
	mmc_consistent_state_t *state = s;       
	if (state->num_servers > 1) {  
	    unsigned int hash;    
	    if (!state->buckets_populated) {
	        mmc_consistent_populate_buckets(state);
	    }
	    //计算key的hash值             
	    hash = mmc_hash(state->hash, key, key_len);
	    return state->buckets[hash % MMC_CONSISTENT_BUCKETS];

	}
	//没有找到相应的服务节点,则返回第一个节点
	return state->points[0].server;
}

This core code searches for the service node to be stored according to the key of a key-value pair when storing it. Similarly, the key value is hashed, and then according to the calculated hash value, the key must be distributed at a certain position on the circle, and then the first service node to be encountered is searched clockwise from that position. If no service node is encountered until the end, the first service node is returned directly. This is the meaning of return state->points[0].server.

Summary: Both of the above methods have been implemented in the PHP-Memcache client. Again, the distributed ideas of Memcached are reflected in the client. Different clients may have different implementation methods, but I believe the basic ideas are the same. At the same time, you are welcome to give different opinions.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

The road to learning sorting algorithms - heap sort

Publish Date:2025/03/19 Views:145 Category:ALGORITHM

Like other sorting algorithms, let's first look at the definition of heap sort. Heapsort : refers to a sorting algorithm designed using the heap data structure. A heap is a structure that approximates a complete binary tree and satisfies th

My understanding of the Paxos algorithm

Publish Date:2025/03/19 Views:147 Category:ALGORITHM

In a distributed system, a core issue is data consistency. Therefore, the consistency algorithm is of utmost importance in a distributed system. The Paxos algorithm is designed to solve the consistency problem, but it has always been consid

Convert a binary tree to a binary search tree

Publish Date:2025/03/18 Views:52 Category:ALGORITHM

A binary tree is a non-linear data structure. It is called a binary tree because each node has at most two children. These children are called left and right children. It can also be interpreted as an undirected graph where the topmost node

In-order descendants in a binary search tree

Publish Date:2025/03/18 Views:107 Category:ALGORITHM

The in-order descendant of a binary tree is the next node in the in-order traversal of the binary tree. So, for the last node in the tree, it is NULL . Since the in-order traversal of a binary search tree is a sorted array. The node with th

Binary Search Tree Deletion

Publish Date:2025/03/18 Views:93 Category:ALGORITHM

In the article Binary Search Trees: Searching and Inserting , we discussed how to insert an element into a binary search tree and how to search for a value in a binary search tree. In this article, we will discuss how to delete a node from

Binary Search Tree Check

Publish Date:2025/03/18 Views:79 Category:ALGORITHM

A binary tree is a non-linear data structure. It is called a binary tree because each node has at most two children. These children are called left children and right children. For a binary tree to be a BST, it must satisfy the following pr

Iterative insertion into a binary search tree

Publish Date:2025/03/18 Views:158 Category:ALGORITHM

In the previous article Binary Search Tree , we discussed the recursive method to insert a node in BST. In this article, we will discuss the iterative method to insert a node in BST. It is better than the recursive method because the iterat

Binary Search Tree

Publish Date:2025/03/18 Views:181 Category:ALGORITHM

A binary search tree (BST) is an ordered binary tree data structure based on nodes. A node has a value and two child nodes (a binary tree has at most two child nodes) connected to each other. A node has a value and two child nodes (a binary

Binary Tree Traversal

Publish Date:2025/03/18 Views:115 Category:ALGORITHM

A binary tree is a non-linear data structure. It is called a binary tree because each node has at most two children. These children are called left and right children. It can also be interpreted as an undirected graph where the topmost node

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial