深入剖析 redis 数据淘汰策略

概述

在 redis 中,允许用户设置最大使用内存大小 server.maxmemory,在内存限定的情况下是很有用的。譬如,在一台 8G 机子上部署了 4 个 redis 服务点,每一个服务点分配 1.5G 的内存大小,减少内存紧张的情况,由此获取更为稳健的服务。

redis 内存数据集大小上升到一定大小的时候,就会施行数据淘汰策略。redis 提供 6种数据淘汰策略:

  1. volatile-lru:从已设置过期时间的数据集(server.db[i].expires)中挑选最近最少使用的数据淘汰
  2. volatile-ttl:从已设置过期时间的数据集(server.db[i].expires)中挑选将要过期的数据淘汰
  3. volatile-random:从已设置过期时间的数据集(server.db[i].expires)中任意选择数据淘汰
  4. allkeys-lru:从数据集(server.db[i].dict)中挑选最近最少使用的数据淘汰
  5. allkeys-random:从数据集(server.db[i].dict)中任意选择数据淘汰
  6. no-enviction(驱逐):禁止驱逐数据

redis 确定驱逐某个键值对后,会删除这个数据并,并将这个数据变更消息发布到本地(AOF 持久化)和从机(主从连接)。

LRU 数据淘汰机制

在服务器配置中保存了 lru 计数器 server.lrulock,会定时(redis 定时程序 serverCorn())更新,server.lrulock 的值是根据 server.unixtime 计算出来的。

另外,从 struct redisObject 中可以发现,每一个 redis 对象都会设置相应的 lru。可以想象的是,每一次访问数据的时候,会更新 redisObject.lru。

LRU 数据淘汰机制是这样的:在数据集中随机挑选几个键值对,取出其中 lru 最大的键值对淘汰。所以,你会发现,redis 并不是保证取得所有数据集中最近最少使用(LRU)的键值对,而只是随机挑选的几个键值对中的。

?
Code Sample
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// redisServer 保存了 lru 计数器
struct redisServer {
    ...
    unsigned lruclock:22;       /* Clock incrementing every minute, for LRU */
    ...
};
 
// 每一个 redis 对象都保存了 lru
#define REDIS_LRU_CLOCK_MAX ((1<<21)-1) /* Max value of obj->lru */
#define REDIS_LRU_CLOCK_RESOLUTION 10 /* LRU clock resolution in seconds */
typedef struct redisObject {
    // 刚刚好 32 bits
 
    // 对象的类型,字符串/列表/集合/哈希表
    unsigned type:4;
    // 未使用的两个位
    unsigned notused:2;     /* Not used */
    // 编码的方式,redis 为了节省空间,提供多种方式来保存一个数据
    // 譬如:“123456789” 会被存储为整数 123456789
    unsigned encoding:4;
    unsigned lru:22;        /* lru time (relative to server.lruclock) */
 
    // 引用数
    int refcount;
 
    // 数据指针
    void *ptr;
} robj;
 
// redis 定时执行程序。联想:linux cron
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ......
    /* We have just 22 bits per object for LRU information.
     * So we use an (eventually wrapping) LRU clock with 10 seconds resolution.
     * 2^22 bits with 10 seconds resolution is more or less 1.5 years.
     *
     * Note that even if this will wrap after 1.5 years it's not a problem,
     * everything will still work but just some object will appear younger
     * to Redis. But for this to happen a given object should never be touched
     * for 1.5 years.
     *
     * Note that you can change the resolution altering the
     * REDIS_LRU_CLOCK_RESOLUTION define.
     */
    updateLRUClock();
    ......
}
 
// 更新服务器的 lru 计数器
void updateLRUClock(void) {
    server.lruclock = (server.unixtime/REDIS_LRU_CLOCK_RESOLUTION) &
                                                REDIS_LRU_CLOCK_MAX;
}

TTL 数据淘汰机制

redis 数据集数据结构中保存了键值对过期时间的表,即 redisDb.expires。和 LRU 数据淘汰机制类似,TTL 数据淘汰机制是这样的:从过期时间的表中随机挑选几个键值对,取出其中 ttl 最大的键值对淘汰。同样你会发现,redis 并不是保证取得所有过期时间的表中最快过期的键值对,而只是随机挑选的几个键值对中的。

总结

redis 每服务客户端执行一个命令的时候,会检测使用的内存是否超额。如果超额,即进行数据淘汰。

?
Code Sample
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
// 执行命令
int processCommand(redisClient *c) {
    ......
    // 内存超额
    /* Handle the maxmemory directive.
     *
     * First we try to free some memory if possible (if there are volatile
     * keys in the dataset). If there are not the only thing we can do
     * is returning an error. */
    if (server.maxmemory) {
        int retval = freeMemoryIfNeeded();
        if ((c->cmd->flags & REDIS_CMD_DENYOOM) && retval == REDIS_ERR) {
            flagTransaction(c);
            addReply(c, shared.oomerr);
            return REDIS_OK;
        }
    }
    ......
}
 
// 如果需要,是否一些内存
int freeMemoryIfNeeded(void) {
    size_t mem_used, mem_tofree, mem_freed;
    int slaves = listLength(server.slaves);
 
    // redis 从机回复空间和 AOF 内存大小不计算入 redis 内存大小
    /* Remove the size of slaves output buffers and AOF buffer from the
     * count of used memory. */
    mem_used = zmalloc_used_memory();
 
    // 从机回复空间大小
    if (slaves) {
        listIter li;
        listNode *ln;
 
        listRewind(server.slaves,&li);
        while((ln = listNext(&li))) {
            redisClient *slave = listNodeValue(ln);
            unsigned long obuf_bytes = getClientOutputBufferMemoryUsage(slave);
            if (obuf_bytes > mem_used)
                mem_used = 0;
            else
                mem_used -= obuf_bytes;
        }
    }
    // server.aof_buf && server.aof_rewrite_buf_blocks
    if (server.aof_state != REDIS_AOF_OFF) {
        mem_used -= sdslen(server.aof_buf);
        mem_used -= aofRewriteBufferSize();
    }
 
    // 内存是否超过设置大小
    /* Check if we are over the memory limit. */
    if (mem_used <= server.maxmemory) return REDIS_OK;
 
    // redis 中可以设置内存超额策略
    if (server.maxmemory_policy == REDIS_MAXMEMORY_NO_EVICTION)
        return REDIS_ERR; /* We need to free memory, but policy forbids. */
 
    /* Compute how much memory we need to free. */
    mem_tofree = mem_used - server.maxmemory;
    mem_freed = 0;
    while (mem_freed < mem_tofree) {
        int j, k, keys_freed = 0;
 
        // 遍历所有数据集
        for (j = 0; j < server.dbnum; j++) {
            long bestval = 0; /* just to prevent warning */
            sds bestkey = NULL;
            struct dictEntry *de;
            redisDb *db = server.db+j;
            dict *dict;
 
            // 不同的策略,选择的数据集不一样
            if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)
            {
                dict = server.db[j].dict;
            } else {
                dict = server.db[j].expires;
            }
 
            // 数据集为空,继续下一个数据集
            if (dictSize(dict) == 0) continue;
 
            // 随机淘汰随机策略:随机挑选
            /* volatile-random and allkeys-random policy */
            if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)
            {
                de = dictGetRandomKey(dict);
                bestkey = dictGetKey(de);
            }
 
            // LRU 策略:挑选最近最少使用的数据
            /* volatile-lru and allkeys-lru policy */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
            {
                // server.maxmemory_samples 为随机挑选键值对次数
                // 随机挑选 server.maxmemory_samples个键值对,驱逐最近最少使用的数据
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;
                    robj *o;
 
                    // 随机挑选键值对
                    de = dictGetRandomKey(dict);
 
                    // 获取键
                    thiskey = dictGetKey(de);
 
                    /* When policy is volatile-lru we need an additional lookup
                     * to locate the real key, as dict is set to db->expires. */
                    if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
                        de = dictFind(db->dict, thiskey);
                    o = dictGetVal(de);
 
                    // 计算数据的空闲时间
                    thisval = estimateObjectIdleTime(o);
 
                    // 当前键值空闲时间更长,则记录
                    /* Higher idle time is better candidate for deletion */
                    if (bestkey == NULL || thisval > bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }
 
            // TTL 策略:挑选将要过期的数据
            /* volatile-ttl */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_TTL) {
                // server.maxmemory_samples 为随机挑选键值对次数
                // 随机挑选 server.maxmemory_samples个键值对,驱逐最快要过期的数据
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;
 
                    de = dictGetRandomKey(dict);
                    thiskey = dictGetKey(de);
                    thisval = (long) dictGetVal(de);
 
                    /* Expire sooner (minor expire unix timestamp) is better
                     * candidate for deletion */
                    if (bestkey == NULL || thisval < bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }
 
            // 删除选定的键值对
            /* Finally remove the selected key. */
            if (bestkey) {
                long long delta;
 
                robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
 
                // 发布数据更新消息,主要是 AOF 持久化和从机
                propagateExpire(db,keyobj);
 
                // 注意, propagateExpire() 可能会导致内存的分配, propagateExpire() 提前执行就是因为 redis 只计算 dbDelete() 释放的内存大小。倘若同时计算 dbDelete() 释放的内存和 propagateExpire() 分配空间的大小,与此同时假设分配空间大于释放空间,就有可能永远退不出这个循环。
                // 下面的代码会同时计算 dbDelete() 释放的内存和 propagateExpire() 分配空间的大小:
                // propagateExpire(db,keyobj);
                // delta = (long long) zmalloc_used_memory();
                // dbDelete(db,keyobj);
                // delta -= (long long) zmalloc_used_memory();
                // mem_freed += delta;
                /////////////////////////////////////////
 
                /* We compute the amount of memory freed by dbDelete() alone.
                 * It is possible that actually the memory needed to propagate
                 * the DEL in AOF and replication link is greater than the one
                 * we are freeing removing the key, but we can't account for
                 * that otherwise we would never exit the loop.
                 *
                 * AOF and Output buffer memory will be freed eventually so
                 * we only care about memory used by the key space. */
                // 只计算 dbDelete() 释放内存的大小
                delta = (long long) zmalloc_used_memory();
                dbDelete(db,keyobj);
                delta -= (long long) zmalloc_used_memory();
                mem_freed += delta;
 
                server.stat_evictedkeys++;
 
                // 将数据的删除通知所有的订阅客户端
                notifyKeyspaceEvent(REDIS_NOTIFY_EVICTED, "evicted",
                    keyobj, db->id);
                decrRefCount(keyobj);
                keys_freed++;
 
                // 将从机回复空间中的数据及时发送给从机
                /* When the memory to free starts to be big enough, we may
                 * start spending so much time here that is impossible to
                 * deliver data to the slaves fast enough, so we force the
                 * transmission here inside the loop. */
                if (slaves) flushSlavesOutputBuffers();
            }
        }
 
        // 未能释放空间,且此时 redis 使用的内存大小依旧超额,失败返回
        if (!keys_freed) return REDIS_ERR; /* nothing to free... */
    }
    return REDIS_OK;
}

 

捣乱 2014-5-27

原文地址:https://www.cnblogs.com/hosealeo/p/4190495.html