JIYIK CN >

Current Location:Home > Learning > ALGORITHM >

Distributed thinking in Memcached

Author:JIYIK Last Updated:2025/03/19 Views:

Memcached is known as a high-performance distributed cache system. Speaking of distribution, Memcached is worth analyzing. Its distribution mechanism is different from that of general distributed service systems. In distributed service systems, there is communication between nodes in order to ensure data consistency. However, for Memcached, there is no communication between nodes, that is, the Memcached cluster is a set of single-point services, and there is no communication between nodes. All its distribution-related aspects are implemented on the client. This is reflected in an article on Memcached's github titled "A Story of Caching". If you are interested, you can read this article, which will give you a clear understanding of Memcached.

Let us first verify our above statement through a case.

Example 1

There are two Memcached servers. We use PHP to add data to Memcached.

Code 1

$MEMCACHE_SERVERS = array(
	"192.168.5.111", //memcached1
	"192.168.5.102" //memcached2
);
$memcache = new Memcache();
foreach($MEMCACHE_SERVERS as $server){
	$memcache->addServer ( $server );  //将两个台服务器地址都添加进连接池
}
$memcache->set("onmpw", 5);
$memcache->set("jiyi", 4);

After the above PHP code is executed, according to the principle of distributed system, both servers should have data with keys onmpw and jiyi, but is this true? We can log in to the two services separately to check.

Connect 5.111
# telnet 192.168.5.111 11211

Trying 192.168.5.111...
Connected to 192.168.5.111 (192.168.5.111).
Escape character is '^]'.
get jiyi
VALUE jiyi 768 1
4
END
get onmpw   //在5.111上并没有onmpw这条数据
END

Connect 5.102
Trying 192.168.5.102...
Connected to 192.168.5.102 (192.168.5.102).
Escape character is '^]'.
get onmpw
VALUE onmpw 768 1
5
END
get jiyi    //在5.102上同样是没有jiyi这条数据
END

This shows that the Memcached server does not synchronize data in the cluster.

How does the data distribute different key-value pairs to different nodes? This is the distribution that the client needs to achieve. Here, an algorithm is involved - the hash algorithm. There are two hash algorithms involved: one is the remainder calculation distribution, and the other is the consistent hash algorithm (Consistent Hashing). The consistent hash algorithm is systematically described in the article Introduction to Consistent Hashing Algorithm and PHP Implementation . However, the description of the remainder method is not detailed enough, so here is a slightly more detailed description to make up for the shortcomings of that article.

General hash algorithm - calculation of dispersion based on remainder

First, a hash table is created based on the number of nodes served by the cluster. Then, the integer hash value of the key name is calculated based on the key name. The remainder of the hash value is taken with the remainder of the number of nodes. Finally, the node is retrieved from the hash table based on the remainder.

The CRC (Cyclic Redundancy Check) is used when calculating the hash value of the key. The following simplified PHP code is used to illustrate

Code 2

$nodes = array('node1','node2','node3');   //nodes代表服务器节点
$keys = array('onmpw', 'jiyi', 'onmpw_key', 'jiyi_key', 'www','www_key'); //这是key值
foreach( $keys as $key) {
	$crc = crc32($key);            // CRC値
	$mod = $crc % ( count($nodes) );
	$server = $nodes[ $mod ];       // 根据余数选择服务器
	printf("%s => %s\n", $key, $server);
}

First, find the crc value of the key, and then take the remainder with the number of nodes. The execution result of the above code is as follows

onmpw => node2
jiyi => node2
onmpw_key => node1
jiyi_key => node2
www => node1
www_key => node3

According to the above results, we can see that onmpw, jiyi and jiyi_key are scattered to node2, onmpw_key and www are scattered to node1, and www is scattered to node3.

PHP client distributed implementation

The Memcache client implemented in PHP supports the remainder dispersion algorithm. Let's take a look at the part of the source code that involves this algorithm. The php-memcache version used here is memcache-3.0.6

mmc_t *mmc_standard_find_server(void *s, const char *key, unsigned int key_len TSRMLS_DC) {      
	mmc_standard_state_t *state = s;     
	if (state->num_servers > 1) {  
	   /* "new-style" hash */
	   //计算key值的哈希值
	   unsigned int hash = (mmc_hash(state->hash, key, key_len) >> 16) & 0x7fff;
	    //用得到的哈希值对节点数取余
	   return state->buckets[(hash ? hash : 1) % state->num_buckets];
	}
	return state->buckets[0];
}

We can see that the way to calculate the hash value is to use the mmc_hash() function to get a value, then shift the value 16 bits to the right, and finally do an 'AND' operation with 0x7fff.

Here we are looking at how mmc_hash() calculates the value of the key.

#define mmc_hash(hash, key, key_len) ((hash)->finish((hash)->combine((hash)->init(), (key), (key_len))))

This is too tricky, what kind of function is this? Don't worry, this is nothing more than a macro definition. You can simply think that mmc_hash(hash, key, key_len) is actually a value. What is this value? It is obtained by ((hash)->finish((hash)->combine((hash)->init(), (key), (key_len)))). In fact, the main parts are still the finish(), combine() and init() functions. Let's look at the implementation of these three functions.

The above three functions are defined by the following structure

typedef struct mmc_hash_function {
	mmc_hash_function_init          init;
	mmc_hash_function_combine       combine;
	mmc_hash_function_finish        finish;
} mmc_hash_function_t;

At the same time, the following mapping is done for mmc_hash_function_t

extern mmc_hash_function_t mmc_hash_crc32;

That is to say, the implementation of these functions is implemented in mmc_hash_crc32. Continuing to search, we found the following code

mmc_hash_function_t mmc_hash_crc32 = {
	mmc_hash_crc32_init,
	mmc_hash_crc32_combine,
	mmc_hash_crc32_finish  
};

To trace the root, in fact, init(), combine() and finish() functions correspond to mmc_hash_crc32_init(), mmc_hash_crc32_combine() and mmc_hash_crc32_finish() respectively. Let's look at the implementation of these functions

static unsigned int mmc_hash_crc32_init()                                       { return  ~0; }
static unsigned int mmc_hash_crc32_finish(unsigned int seed) { return ~seed; }
static unsigned int mmc_hash_crc32_combine(unsigned int seed, const void *key, unsigned int key_len)
{
	const char *p = (const char *)key, *end = p + key_len;
	while (p < end) {
	      CRC32(seed, *(p++));
	}
	return seed;
}

Do we see familiar figures in the above code? The underlying implementation also uses the crc32 function (this function is defined in ext/standard/crc32.h of the PHP source code. If you are interested, you can study it). By the way, the value returned in the mmc_hash_crc32_init() function is ~0, and its value is -1. The '~' indicates bitwise inversion. When defining the type of the function's return value, it is unsigned. So the binary representation of -1 is 11111111111111111111111111111111111 (32 bits in total).

We can see that the implementation method using remainder calculation is very simple, and its data is relatively scattered. However, this method also has a big flaw. When we need to expand the nodes horizontally, the cost of cache reorganization is also very high. Next, we modify Code 2 and add node4 to nodes.

Code three

$nodes = array('node1','node2','node3','node4');   //nodes代表服务器节点
$keys = array('onmpw', 'jiyi', 'onmpw_key', 'jiyi_key', 'www','www_key'); //这是key值
foreach( $keys as $key) {
	$crc = crc32($key);            // CRC値
	$mod = $crc % ( count($nodes) );
	$server = $nodes[ $mod ];       // 根据余数选择服务器
	printf("%s => %s\n", $key, $server);
}

Running the code again yields the following results

onmpw => node4
jiyi => node4
onmpw_key => node3
jiyi_key => node3
www => node2
www_key => node3

We can see from the above results that only www_key is hit. In this case, the hit rate decreases when adding nodes, and the cache efficiency decreases instantly when adding Memcached servers. At this time, the load will be concentrated on the database server, which may cause the database to be unable to provide normal services due to excessive pressure.

Therefore, a new method is needed to solve the problem of cluster horizontal expansion. Therefore, with the emergence of this problem, a new distributed method was born - Consistent Hashing.

Consistent Hashing

The consistent hashing algorithm is the same as the general hashing algorithm, which is to build a dictionary for the nodes in the cluster. Because the algorithm is described in detail in the article "Introduction to Consistent Hashing Algorithm and PHP Implementation", and there is a simple PHP code implementation, this chapter will not go into too much detail about Consistent Hashing.

Unlike general hash algorithms, this method can minimize the movement of data when nodes are added or removed. In other words, the hit rate will be greatly improved. So how does it achieve this?

First, find the hash value of the memcached server (node) and assign it to the circle of 0 to 1. Then use the same method to find the hash value of the key that stores the data and map it to the circle. Then start searching clockwise from the location where the data is mapped and save the data to the first server found. If more than 1 server is still not found, it will be saved to the first memcached server.

The core part of the implementation code in the PHP-Memcache client is as follows

void mmc_consistent_add_server(void *s, mmc_t *mmc, unsigned int weight)
{      
	mmc_consistent_state_t *state = s;
	int i, key_len, points = weight * MMC_CONSISTENT_POINTS;
	unsigned int seed = state->hash->init(), hash;     
	/* buffer for "host:port-i\0" */
	char *key = emalloc(strlen(mmc->host) + MAX_LENGTH_OF_LONG * 2 + 3);
	key_len = sprintf(key, "%s:%d-", mmc->host, mmc->tcp.port);
	seed = state->hash->combine(seed, key, key_len);
	/* add weight * MMC_CONSISTENT_POINTS number of points for this server */
	/* 申请保存所有节点及其各自的虚拟节点的空间 */
	state->points = erealloc(state->points, sizeof(*state->points) * (state->num_points + points));
	for (i=0; i<points; i++) {
	     key_len = sprintf(key, "%d", i);
	     // 对每个节点及虚拟节点计算hash值
	     hash = state->hash->finish(state->hash->combine(seed, key, key_len));
	     // 保存个虚拟节点的hash值和该节点的信息
	     state->points[state->num_points + i].server = mmc;
	     state->points[state->num_points + i].point = hash;
	}       
	state->num_points += points;           
	state->num_servers++;                  
	state->buckets_populated = 0;                 
	efree(key);                                    
}

The above code disperses the service nodes so that each node and its respective virtual node are randomly and evenly distributed on the circle.

mmc_t *mmc_consistent_find_server(void *s, const char *key, unsigned int key_len TSRMLS_DC)
{
	mmc_consistent_state_t *state = s;       
	if (state->num_servers > 1) {  
	    unsigned int hash;    
	    if (!state->buckets_populated) {
	        mmc_consistent_populate_buckets(state);
	    }
	    //计算key的hash值             
	    hash = mmc_hash(state->hash, key, key_len);
	    return state->buckets[hash % MMC_CONSISTENT_BUCKETS];

	}
	//没有找到相应的服务节点,则返回第一个节点
	return state->points[0].server;
}

This core code searches for the service node to be stored according to the key of a key-value pair when storing it. Similarly, the key value is hashed, and then according to the calculated hash value, the key must be distributed at a certain position on the circle, and then the first service node to be encountered is searched clockwise from that position. If no service node is encountered until the end, the first service node is returned directly. This is the meaning of return state->points[0].server.

Summary: Both of the above methods have been implemented in the PHP-Memcache client. Again, the distributed ideas of Memcached are reflected in the client. Different clients may have different implementation methods, but I believe the basic ideas are the same. At the same time, you are welcome to give different opinions.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Learning the Sorting Algorithm - Insertion Sort (Concepts)

Publish Date:2025/03/19 Views:96 Category:ALGORITHM

What is "insertion sort"? The concept is as follows: each time a record to be sorted is inserted into the previously sorted sequence according to its key size, until all records are inserted. Concepts are always somewhat abstract, and can a

Learning path of sorting algorithm - direct insertion sort

Publish Date:2025/03/19 Views:176 Category:ALGORITHM

This article follows up on Insertion Sort (Concepts) and presents the implementation steps and code for direct insertion sort. Since the Concepts section already has a large number of illustrations, it would be a bit long-winded to provide

Learning the sorting algorithm - Binary Insertion Sort

Publish Date:2025/03/19 Views:143 Category:ALGORITHM

This article follows the insertion sort (concept article) and presents the implementation steps and implementation code of the binary insertion sort Binary Insertion Sort Algorithm Steps Treat the first element of the first sequence to be s

The road to learning sorting algorithms - table insertion sort

Publish Date:2025/03/19 Views:194 Category:ALGORITHM

Table insertion sort was briefly mentioned in Insertion sort (concept) . I briefly summarized it and wrote this article. You can refer to it if you need it. Table insertion sort, as the name implies, uses an index table to sort the original

The road to learning sorting algorithms - Hill sort

Publish Date:2025/03/19 Views:52 Category:ALGORITHM

Hill sort is named after the designer of the algorithm, Hill. It is an improvement of Hill on the basis of insertion sort and can be said to be a special insertion sort. Here are the properties of insertion sort: First of all, the insertion

Things about the singleton design pattern

Publish Date:2025/03/19 Views:53 Category:ALGORITHM

The singleton design pattern is one of the most commonly used design patterns. The singleton design pattern, just by its name, you can roughly know its meaning. Single means one; instance means instance object. So a singleton has only one i

The road to learning sorting algorithms - merge sort

Publish Date:2025/03/19 Views:158 Category:ALGORITHM

Let's first look at the definition of merge sort Merge sort is an effective sorting algorithm based on the merge operation. This algorithm is a very typical application of the Divide and Conquer method. Merge the ordered subsequences to obt

The road to learning sorting algorithms - quick sort

Publish Date:2025/03/19 Views:122 Category:ALGORITHM

Quick sort is a sorting algorithm developed by Tony Hall. On average, sorting n items requires O(n log n) comparisons. In the worst case, O(n2) comparisons are required, but this is uncommon. In fact, quick sort is often significantly faste

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial