深入剖析 redis 数据淘汰策略

概述

在 redis 中，允许用户设置最大使用内存大小 server.maxmemory，在内存限定的情况下是很有用的。譬如，在一台 8G 机子上部署了 4 个 redis 服务点，每一个服务点分配 1.5G 的内存大小，减少内存紧张的情况，由此获取更为稳健的服务。

redis 内存数据集大小上升到一定大小的时候，就会施行数据淘汰策略。redis 提供 6种数据淘汰策略：

volatile-lru：从已设置过期时间的数据集（server.db[i].expires）中挑选最近最少使用的数据淘汰
volatile-ttl：从已设置过期时间的数据集（server.db[i].expires）中挑选将要过期的数据淘汰
volatile-random：从已设置过期时间的数据集（server.db[i].expires）中任意选择数据淘汰
allkeys-lru：从数据集（server.db[i].dict）中挑选最近最少使用的数据淘汰
allkeys-random：从数据集（server.db[i].dict）中任意选择数据淘汰
no-enviction（驱逐）：禁止驱逐数据

redis 确定驱逐某个键值对后，会删除这个数据并，并将这个数据变更消息发布到本地（AOF 持久化）和从机（主从连接）。

LRU 数据淘汰机制

在服务器配置中保存了 lru 计数器 server.lrulock，会定时（redis 定时程序 serverCorn()）更新，server.lrulock 的值是根据 server.unixtime 计算出来的。

另外，从 struct redisObject 中可以发现，每一个 redis 对象都会设置相应的 lru。可以想象的是，每一次访问数据的时候，会更新 redisObject.lru。

LRU 数据淘汰机制是这样的：在数据集中随机挑选几个键值对，取出其中 lru 最大的键值对淘汰。所以，你会发现，redis 并不是保证取得所有数据集中最近最少使用（LRU）的键值对，而只是随机挑选的几个键值对中的。

Code Sample
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53	`// redisServer 保存了 lru 计数器` `struct` `redisServer {` `...` `unsigned lruclock:22;` `/* Clock incrementing every minute, for LRU /` `...` `};` `// 每一个 redis 对象都保存了 lru` `#define REDIS_LRU_CLOCK_MAX ((1<<21)-1) / Max value of obj->lru /` `#define REDIS_LRU_CLOCK_RESOLUTION 10 / LRU clock resolution in seconds /` `typedef` `struct` `redisObject {` `// 刚刚好 32 bits` `// 对象的类型，字符串/列表/集合/哈希表` `unsigned type:4;` `// 未使用的两个位` `unsigned notused:2;` `/ Not used /` `// 编码的方式，redis 为了节省空间，提供多种方式来保存一个数据` `// 譬如：“123456789” 会被存储为整数 123456789` `unsigned encoding:4;` `unsigned lru:22;` `/ lru time (relative to server.lruclock) /` `// 引用数` `int` `refcount;` `// 数据指针` `void` `ptr;` `} robj;` `// redis 定时执行程序。联想：linux cron` `int` `serverCron(struct` `aeEventLoop eventLoop,long` `long` `id,void` `clientData) {` `......` `/* We have just 22 bits per object for LRU information.` `* So we use an (eventually wrapping) LRU clock with 10 seconds resolution.` `* 2^22 bits with 10 seconds resolution is more or less 1.5 years.` `` ` Note that even if this will wrap after 1.5 years it's not a problem,` `* everything will still work but just some object will appear younger` `* to Redis. But for this to happen a given object should never be touched` `* for 1.5 years.` `` ` Note that you can change the resolution altering the` `* REDIS_LRU_CLOCK_RESOLUTION define.` `*/` `updateLRUClock();` `......` `}` `// 更新服务器的 lru 计数器` `void` `updateLRUClock(void) {` `server.lruclock = (server.unixtime/REDIS_LRU_CLOCK_RESOLUTION) &` `REDIS_LRU_CLOCK_MAX;` `}`

TTL 数据淘汰机制

redis 数据集数据结构中保存了键值对过期时间的表，即 redisDb.expires。和 LRU 数据淘汰机制类似，TTL 数据淘汰机制是这样的：从过期时间的表中随机挑选几个键值对，取出其中 ttl 最大的键值对淘汰。同样你会发现，redis 并不是保证取得所有过期时间的表中最快过期的键值对，而只是随机挑选的几个键值对中的。

总结

redis 每服务客户端执行一个命令的时候，会检测使用的内存是否超额。如果超额，即进行数据淘汰。

Code Sample
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207	`// 执行命令` `int` `processCommand(redisClient c) {` `......` `// 内存超额` `/ Handle the maxmemory directive.` `` ` First we try to free some memory if possible (if there are volatile` `* keys in the dataset). If there are not the only thing we can do` `* is returning an error. /` `if` `(server.maxmemory) {` `int` `retval = freeMemoryIfNeeded();` `if` `((c->cmd->flags & REDIS_CMD_DENYOOM) && retval == REDIS_ERR) {` `flagTransaction(c);` `addReply(c, shared.oomerr);` `return` `REDIS_OK;` `}` `}` `......` `}` `// 如果需要，是否一些内存` `int` `freeMemoryIfNeeded(void) {` `size_t` `mem_used, mem_tofree, mem_freed;` `int` `slaves = listLength(server.slaves);` `// redis 从机回复空间和 AOF 内存大小不计算入 redis 内存大小` `/ Remove the size of slaves output buffers and AOF buffer from the` `* count of used memory. /` `mem_used = zmalloc_used_memory();` `// 从机回复空间大小` `if` `(slaves) {` `listIter li;` `listNode ln;` `listRewind(server.slaves,&li);` `while((ln = listNext(&li))) {` `redisClient slave = listNodeValue(ln);` `unsignedlong` `obuf_bytes = getClientOutputBufferMemoryUsage(slave);` `if` `(obuf_bytes > mem_used)` `mem_used = 0;` `else` `mem_used -= obuf_bytes;` `}` `}` `// server.aof_buf && server.aof_rewrite_buf_blocks` `if` `(server.aof_state != REDIS_AOF_OFF) {` `mem_used -= sdslen(server.aof_buf);` `mem_used -= aofRewriteBufferSize();` `}` `// 内存是否超过设置大小` `/ Check if we are over the memory limit. /` `if` `(mem_used <= server.maxmemory)return` `REDIS_OK;` `// redis 中可以设置内存超额策略` `if` `(server.maxmemory_policy == REDIS_MAXMEMORY_NO_EVICTION)` `return` `REDIS_ERR;/ We need to free memory, but policy forbids. /` `/ Compute how much memory we need to free. /` `mem_tofree = mem_used - server.maxmemory;` `mem_freed = 0;` `while` `(mem_freed < mem_tofree) {` `int` `j, k, keys_freed = 0;` `// 遍历所有数据集` `for` `(j = 0; j < server.dbnum; j++) {` `long` `bestval = 0;/ just to prevent warning /` `sds bestkey = NULL;` `struct` `dictEntry de;` `redisDb db = server.db+j;` `dict dict;` `// 不同的策略，选择的数据集不一样` `if` `(server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU \|\|` `server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)` `{` `dict = server.db[j].dict;` `}else` `{` `dict = server.db[j].expires;` `}` `// 数据集为空，继续下一个数据集` `if` `(dictSize(dict) == 0)continue;` `// 随机淘汰随机策略：随机挑选` `/* volatile-random and allkeys-random policy /` `if` `(server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM \|\|` `server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)` `{` `de = dictGetRandomKey(dict);` `bestkey = dictGetKey(de);` `}` `// LRU 策略：挑选最近最少使用的数据` `/ volatile-lru and allkeys-lru policy /` `else` `if` `(server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU \|\|` `server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)` `{` `// server.maxmemory_samples 为随机挑选键值对次数` `// 随机挑选 server.maxmemory_samples个键值对，驱逐最近最少使用的数据` `for` `(k = 0; k < server.maxmemory_samples; k++) {` `sds thiskey;` `long` `thisval;` `robj o;` `// 随机挑选键值对` `de = dictGetRandomKey(dict);` `// 获取键` `thiskey = dictGetKey(de);` `/* When policy is volatile-lru we need an additional lookup` `* to locate the real key, as dict is set to db->expires. /` `if` `(server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)` `de = dictFind(db->dict, thiskey);` `o = dictGetVal(de);` `// 计算数据的空闲时间` `thisval = estimateObjectIdleTime(o);` `// 当前键值空闲时间更长，则记录` `/ Higher idle time is better candidate for deletion /` `if` `(bestkey == NULL \|\| thisval > bestval) {` `bestkey = thiskey;` `bestval = thisval;` `}` `}` `}` `// TTL 策略：挑选将要过期的数据` `/ volatile-ttl /` `else` `if` `(server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_TTL) {` `// server.maxmemory_samples 为随机挑选键值对次数` `// 随机挑选 server.maxmemory_samples个键值对，驱逐最快要过期的数据` `for` `(k = 0; k < server.maxmemory_samples; k++) {` `sds thiskey;` `long` `thisval;` `de = dictGetRandomKey(dict);` `thiskey = dictGetKey(de);` `thisval = (long) dictGetVal(de);` `/ Expire sooner (minor expire unix timestamp) is better` `* candidate for deletion /` `if` `(bestkey == NULL \|\| thisval < bestval) {` `bestkey = thiskey;` `bestval = thisval;` `}` `}` `}` `// 删除选定的键值对` `/ Finally remove the selected key. /` `if` `(bestkey) {` `long` `long` `delta;` `robj keyobj = createStringObject(bestkey,sdslen(bestkey));` `// 发布数据更新消息，主要是 AOF 持久化和从机` `propagateExpire(db,keyobj);` `// 注意， propagateExpire() 可能会导致内存的分配， propagateExpire() 提前执行就是因为 redis 只计算 dbDelete() 释放的内存大小。倘若同时计算 dbDelete() 释放的内存和 propagateExpire() 分配空间的大小，与此同时假设分配空间大于释放空间，就有可能永远退不出这个循环。` `// 下面的代码会同时计算 dbDelete() 释放的内存和 propagateExpire() 分配空间的大小：` `// propagateExpire(db,keyobj);` `// delta = (long long) zmalloc_used_memory();` `// dbDelete(db,keyobj);` `// delta -= (long long) zmalloc_used_memory();` `// mem_freed += delta;` `/////////////////////////////////////////` `/* We compute the amount of memory freed by dbDelete() alone.` `* It is possible that actually the memory needed to propagate` `* the DEL in AOF and replication link is greater than the one` `* we are freeing removing the key, but we can't account for` `* that otherwise we would never exit the loop.` `` ` AOF and Output buffer memory will be freed eventually so` `* we only care about memory used by the key space. /` `// 只计算 dbDelete() 释放内存的大小` `delta = (long` `long) zmalloc_used_memory();` `dbDelete(db,keyobj);` `delta -= (long` `long) zmalloc_used_memory();` `mem_freed += delta;` `server.stat_evictedkeys++;` `// 将数据的删除通知所有的订阅客户端` `notifyKeyspaceEvent(REDIS_NOTIFY_EVICTED,"evicted",` `keyobj, db->id);` `decrRefCount(keyobj);` `keys_freed++;` `// 将从机回复空间中的数据及时发送给从机` `/ When the memory to free starts to be big enough, we may` `* start spending so much time here that is impossible to` `* deliver data to the slaves fast enough, so we force the` `* transmission here inside the loop. /` `if` `(slaves) flushSlavesOutputBuffers();` `}` `}` `// 未能释放空间，且此时 redis 使用的内存大小依旧超额，失败返回` `if` `(!keys_freed)return` `REDIS_ERR;/ nothing to free... */` `}` `return` `REDIS_OK;` `}`

捣乱 2014-5-27