数据备份与恢复 半持久化 全持久化 fork aof rdb Backing up Disaster recovery 备份 容灾

Redis数据备份与恢复 - 流年晕开时光 - 博客园 https://www.cnblogs.com/deny/p/11531355.html

Redis数据备份与恢复

Redis所有数据都是保存在内存中,Redis数据备份可以定期的通过异步方式保存到磁盘上,该方式称为半持久化模式,如果每一次数据变化都写入aof文件里面,则称为全持久化模式。同时还可以基于Redis主从复制实现Redis备份与恢复。

1. 半持久化RDB模式

    半持久化RDB模式也是Redis备份默认方式,是通过快照(snapshotting)完成的,当符合在Redis.conf配置文件中设置的条件时Redis会自动将内存中的所有数据进行快照并存储在硬盘上,完成数据备份。

Redis进行RDB快照的条件由用户在配置文件中自定义,由两个参数构成:时间和改动的键的个数。当在指定的时间内被更改的键的个数大于指定的数值时就会进行快照。在配置文件中已经预置了3个条件:

save       900    1       #900秒内有至少1个键被更改则进行快照;
save       300    10      #300秒内有至少10个键被更改则进行快照;
save       60     10000        #60秒内有至少10000个键被更改则进行快照。

        默认可以存在多个条件,条件之间是“或”的关系,只要满足其中一个条件,就会进行快照。 如果想要禁用自动快照,只需要将所有的save参数删除即可。Redis默认会将快照文件存储在Redis数据目录,默认文件名为:dump.rdb文件,可以通过配置dir和dbfilename两个参数分别指定快照文件的存储路径和文件名。也可以在Redis命令行执行config  get  dir获取Redis数据保存路径,如图12-15(a)、12-15(b)所示:

                                                   

                                                                             图12-15(a) 获取Redis数据目录

                                                  

                                                                         图12-15(b) Redis数据目录及dump.rdb文件

        Redis实现快照的过程,Redis使用fork函数复制一份当前进程(父进程)的副本(子进程),父进程继续接收并处理客户端发来的命令,而子进程开始将内存中的数据写入硬盘中的临时文件,当子进程写入完所有数据后会用该临时文件替换旧的RDB文件,至此一次快照操作完成。

        执行fork的时操作系统会使用写时复制(copy-on-write)策略,即fork函数发生的一刻父子进程共享同一内存数据,当父进程要更改其中某片数据时,操作系统会将该片数据复制一份以保证子进程的数据不受影响,所以新的RDB文件存储的是执行fork一刻的内存数据。

        Redis在进行快照的过程中不会修改RDB文件,只有快照结束后才会将旧的文件替换成新的,也就是说任何时候RDB文件都是完整的。这使得我们可以通过定时备份RDB文件来实 现Redis数据库备份。

RDB文件是经过压缩(可以配置rdbcompression参数以禁用压缩节省CPU占用)的二进制格式,所以占用的空间会小于内存中的数据大小,更加利于传输。除了自动快照,还可以手动发送SAVE和BGSAVE命令让Redis执行快照,两个命令的区别在于,前者是由主进程进行快照操作,会阻塞住其他请求,后者会通过fork子进程进行快照操作。

Redis启动后会读取RDB快照文件,将数据从硬盘载入到内存,根据数据量大小与结构和服务器性能不同,通常将一个记录一千万个字符串类型键、大小为1GB的快照文件载入到内存中需花费20~30秒钟。

通过RDB方式实现持久化,一旦Redis异常退出,就会丢失最后一次快照以后更改的所有数据。此时需要开发者根据具体的应用场合,通过组合设置自动快照条件的方式来将可能发生的数据损失控制在能够接受的范围。

2 .全持久化AOF模式

       如果数据很重要无法承受任何损失,可以考虑使用AOF方式进行持久化,默认Redis没有开启AOF(append only file)方式的全持久化模式。

在启动时Redis会逐个执行AOF文件中的命令来将硬盘中的数据载入到内存中,载入的速度相较RDB会慢一些,开启AOF持久化后每执行一条会更改Redis中的数据的命令,Redis就会将该命令写入硬盘中的AOF文件。AOF文件的保存位置和RDB文件的位置相同,都是通过dir参数设置的,默认的文件名是appendonly.aof,可以通过appendfilename参数修改该名称。

Redis允许同时开启AOF和RDB,既保证了数据安全又使得进行备份等操作十分容易。此时重新启动Redis后Redis会使用AOF文件来恢复数据,因为AOF方式的持久化可能丢失的数据更少,可以在redis.conf中通过appendonly参数开启Redis AOF全持久化模式:

复制代码
appendonly  yes
appendfilename appendonly.aof
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
appendfsync always  
#appendfsync everysec
#appendfsync no
复制代码

Redis AOF持久化参数配置详解:

复制代码
appendonly  yes                    #开启AOF持久化功能;
appendfilename appendonly.aof          #AOF持久化保存文件名;
appendfsync always                     #每次执行写入都会执行同步,最安全也最慢;
#appendfsync everysec                  #每秒执行一次同步操作;
#appendfsync no                    #不主动进行同步操作,而是完全交由操作系统来做,每30秒一次,最快也最不安全;
auto-aof-rewrite-percentage  100      #当AOF文件大小超过上一次重写时的AOF文件大小的百分之多少时会再次进行重写,如果之前没有重写过,则以启动时的AOF文件大小为依据;
auto-aof-rewrite-min-size    64mb      #允许重写的最小AOF文件大小配置写入AOF文件后,要求系统刷新硬盘缓存的机制。
复制代码

3 . Redis主从复制备份

       通过持久化功能,Redis保证了即使在服务器重启的情况下也不会损失(或少量损失)数据。但是由于数据是存储在一台服务器上的,如果这台服务器的硬盘出现故障,也会导致数据丢失。

为了避免单点故障,我们希望将数据库复制多个副本以部署在不同的服务器上,即使只有一台服务器出现故障其他服务器依然可以继续提供服务,这就要求当一台服务器上的数据库更新后,可以自动将更新的数据同步到其他服务器上,Redis提供了复制(replication)功能可以自动实现同步的过程。通过配置文件在Redis从数据库中配置文件中加入slaveof  master-ip  master-port即可,主数据库无需配置。

       Redis主从复制优点及应用场景, WEB应用程序可以基于主从同步实现读写分离以提高服务器的负载能力。在常见的场景中,读的频率一般比较大,当单机Redis无法应付大量的读请求时,可以通过复制功能建立多个从数据库,主数据库只进行写操作,而从数据库负责读操作,还可以基于LVS+keepalived+Redis对Redis实现均和高可用。

       从数据库持久化持久化通常相对比较耗时,为了提高性能,可以通过复制功能建立一个(或若干个)从数据库,并在从数据库中启用持久化,同时在主数据库禁用持久化。

        当从数据库崩溃时重启后主数据库会自动将数据同步过来,所以无需担心数据丢失。而当主数据库崩溃时,需要在从数据库中使用SLAVEOF NO ONE命令将从数据库提升成主数据库继续服务,并在原来的主数据库启动后使用SLAVE  OF命令将其设置成新的主数据库的从数据库,即可将数据同步回来。

 AOF — Redis 设计与实现
http://redisbook.readthedocs.io/en/latest/internal/aof.html

Redis Persistence – Redis https://redis.io/topics/persistence

This page provides a technical description of Redis persistence, it is a suggested read for all Redis users. For a wider overview of Redis persistence and the durability guarantees it provides you may also want to read Redis persistence demystified.

Redis Persistence

Redis provides a different range of persistence options:

  • The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
  • The AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log in the background when it gets too big.
  • If you wish, you can disable persistence completely, if you want your data to just exist as long as the server is running.
  • It is possible to combine both AOF and RDB in the same instance. Notice that, in this case, when Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.

The most important thing to understand is the different trade-offs between the RDB and AOF persistence. Let's start with RDB:

RDB advantages

  • RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance you may want to archive your RDB files every hour for the latest 24 hours, and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters.
  • RDB is very good for disaster recovery, being a single compact file that can be transferred to far data centers, or onto Amazon S3 (possibly encrypted).
  • RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike.
  • RDB allows faster restarts with big datasets compared to AOF.

RDB disadvantages

  • RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, but you can have multiple save points). However you'll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason you should be prepared to lose the latest minutes of data.
  • RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability.

AOF advantages

  • Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second write performances are still great (fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress.) but you can only lose one second worth of writes.
  • The AOF log is an append only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with an half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily.
  • Redis is able to automatically rewrite the AOF in background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.
  • AOF contains a log of all the operations one after the other in an easy to understand and parse format. You can even easily export an AOF file. For instance even if you flushed everything for an error using a FLUSHALLcommand, if no rewrite of the log was performed in the meantime you can still save your data set just stopping the server, removing the latest command, and restarting Redis again.

AOF disadvantages

  • AOF files are usually bigger than the equivalent RDB files for the same dataset.
  • AOF can be slower than RDB depending on the exact fsync policy. In general with fsync set to every secondperformance is still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load.
  • In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to not reproduce exactly the same dataset on reloading. These bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is fine. However, these kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works by incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However - 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). 2) We have never had a single report from users about an AOF corruption that was detected in the real world.

Ok, so what should I use?

The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.

If you care a lot about your data, but still can live with a few minutes of data loss in case of disasters, you can simply use RDB alone.

There are many users using AOF alone, but we discourage it since to have an RDB snapshot from time to time is a great idea for doing database backups, for faster restarts, and in the event of bugs in the AOF engine.

Note: for all these reasons we'll likely end up unifying AOF and RDB into a single persistence model in the future (long term plan).

The following sections will illustrate a few more details about the two persistence models.

Snapshotting

By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.

For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed:

save 60 1000

This strategy is known as snapshotting.

How it works

Whenever Redis needs to dump the dataset to disk, this is what happens:

  • Redis forks. We now have a child and a parent process.

  • The child starts to write the dataset to a temporary RDB file.

  • When the child is done writing the new RDB file, it replaces the old one.

This method allows Redis to benefit from copy-on-write semantics.

Append-only file

Snapshotting is not very durable. If your computer running Redis stops, your power line fails, or you accidentally kill -9your instance, the latest data written on Redis will get lost. While this may not be a big deal for some applications, there are use cases for full durability, and in these cases Redis was not a viable option.

The append-only file is an alternative, fully-durable strategy for Redis. It became available in version 1.1.

You can turn on the AOF in your configuration file:

appendonly yes

From now on, every time Redis receives a command that changes the dataset (e.g. SET) it will append it to the AOF. When you restart Redis it will re-play the AOF to rebuild the state.

Log rewriting

As you can guess, the AOF gets bigger and bigger as write operations are performed. For example, if you are incrementing a counter 100 times, you'll end up with a single key in your dataset containing the final value, but 100 entries in your AOF. 99 of those entries are not needed to rebuild the current state.

So Redis supports an interesting feature: it is able to rebuild the AOF in the background without interrupting service to clients. Whenever you issue a BGREWRITEAOF Redis will write the shortest sequence of commands needed to rebuild the current dataset in memory. If you're using the AOF with Redis 2.2 you'll need to run BGREWRITEAOF from time to time. Redis 2.4 is able to trigger log rewriting automatically (see the 2.4 example configuration file for more information).

How durable is the append only file?

You can configure how many times Redis will fsync data on disk. There are three options:

  • appendfsync alwaysfsync every time new commands are appended to the AOF. Very very slow, very safe. Note that the commands are apended to the AOF after a batch of commands from multiple clients or a pipeline are executed, so it means a single write and a single fsync (before sending the replies).
  • appendfsync everysecfsync every second. Fast enough (in 2.4 likely to be as fast as snapshotting), and you can lose 1 second of data if there is a disaster.
  • appendfsync no: Never fsync, just put your data in the hands of the Operating System. The faster and less safe method. Normally Linux will flush data every 30 seconds with this configuration, but it's up to the kernel exact tuning.

The suggested (and default) policy is to fsync every second. It is both very fast and pretty safe. The always policy is very slow in practice, but it supports group commit, so if there are multiple parallel writes Redis will try to perform a single fsync operation.

What should I do if my AOF gets truncated?

It is possible that the server crashed while writing the AOF file, or that the volume where the AOF file is stored was full at the time of writing. When this happens the AOF still contains consistent data representing a given point-in-time version of the dataset (that may be old up to one second with the default AOF fsync policy), but the last command in the AOF could be truncated. The latest major versions of Redis will be able to load the AOF anyway, just discarding the last non well formed command in the file. In this case the server will emit a log like the following:

* Reading RDB preamble from AOF file...
* Reading the remaining AOF tail...
# !!! Warning: short read while loading the AOF file !!!
# !!! Truncating the AOF at offset 439 !!!
# AOF loaded anyway because aof-load-truncated is enabled

You can change the default configuration to force Redis to stop in such cases if you want, but the default configuration is to continue regardless the fact the last command in the file is not well-formed, in order to guarantee availability after a restart.

Older versions of Redis may not recover, and may require the following steps:

  • Make a backup copy of your AOF file.
  • Fix the original file using the redis-check-aof tool that ships with Redis:

    $ redis-check-aof --fix

  • Optionally use diff -u to check what is the difference between two files.

  • Restart the server with the fixed file.

What should I do if my AOF gets corrupted?

If the AOF file is not just truncated, but corrupted with invalid byte sequences in the middle, things are more complex. Redis will complain at startup and will abort:

* Reading the remaining AOF tail...
# Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

The best thing to do is to run the redis-check-aof utility, initially without the --fix option, then understand the problem, jump at the given offset in the file, and see if it is possible to manually repair the file: the AOF uses the same format of the Redis protocol and is quite simple to fix manually. Otherwise it is possible to let the utility fix the file for us, but in that case all the AOF portion from the invalid part to the end of the file may be discarded, leading to a massive amount of data loss if the corruption happened to be in the initial part of the file.

How it works

Log rewriting uses the same copy-on-write trick already in use for snapshotting. This is how it works:

  • Redis forks, so now we have a child and a parent process.

  • The child starts writing the new AOF in a temporary file.

  • The parent accumulates all the new changes in an in-memory buffer (but at the same time it writes the new changes in the old append-only file, so if the rewriting fails, we are safe).

  • When the child is done rewriting the file, the parent gets a signal, and appends the in-memory buffer at the end of the file generated by the child.

  • Profit! Now Redis atomically renames the old file into the new one, and starts appending new data into the new file.

How I can switch to AOF, if I'm currently using dump.rdb snapshots?

There is a different procedure to do this in Redis 2.0 and Redis 2.2, as you can guess it's simpler in Redis 2.2 and does not require a restart at all.

Redis >= 2.2

  • Make a backup of your latest dump.rdb file.
  • Transfer this backup into a safe place.
  • Issue the following two commands:
  • redis-cli config set appendonly yes
  • redis-cli config set save ""
  • Make sure that your database contains the same number of keys it contained.
  • Make sure that writes are appended to the append only file correctly.

The first CONFIG command enables the Append Only File. In order to do so Redis will block to generate the initial dump, then will open the file for writing, and will start appending all the next write queries.

The second CONFIG command is used to turn off snapshotting persistence. This is optional, if you wish you can take both the persistence methods enabled.

IMPORTANT: remember to edit your redis.conf to turn on the AOF, otherwise when you restart the server the configuration changes will be lost and the server will start again with the old configuration.

Redis 2.0

  • Make a backup of your latest dump.rdb file.
  • Transfer this backup into a safe place.
  • Stop all the writes against the database!
  • Issue a redis-cli BGREWRITEAOF. This will create the append only file.
  • Stop the server when Redis finished generating the AOF dump.
  • Edit redis.conf end enable append only file persistence.
  • Restart the server.
  • Make sure that your database contains the same number of keys it contained.
  • Make sure that writes are appended to the append only file correctly.

Interactions between AOF and RDB persistence

Redis >= 2.4 makes sure to avoid triggering an AOF rewrite when an RDB snapshotting operation is already in progress, or allowing a BGSAVE while the AOF rewrite is in progress. This prevents two Redis background processes from doing heavy disk I/O at the same time.

When snapshotting is in progress and the user explicitly requests a log rewrite operation using BGREWRITEAOF the server will reply with an OK status code telling the user the operation is scheduled, and the rewrite will start once the snapshotting is completed.

In the case both AOF and RDB persistence are enabled and Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.

Backing up Redis data

Before starting this section, make sure to read the following sentence: Make Sure to Backup Your Database. Disks break, instances in the cloud disappear, and so forth: no backups means huge risk of data disappearing into /dev/null.

Redis is very data backup friendly since you can copy RDB files while the database is running: the RDB is never modified once produced, and while it gets produced it uses a temporary name and is renamed into its final destination atomically using rename(2) only when the new snapshot is complete.

This means that copying the RDB file is completely safe while the server is running. This is what we suggest:

  • Create a cron job in your server creating hourly snapshots of the RDB file in one directory, and daily snapshots in a different directory.
  • Every time the cron script runs, make sure to call the find command to make sure too old snapshots are deleted: for instance you can take hourly snapshots for the latest 48 hours, and daily snapshots for one or two months. Make sure to name the snapshots with data and time information.
  • At least one time every day make sure to transfer an RDB snapshot outside your data center or at least outside the physical machine running your Redis instance.

If you run a Redis instance with only AOF persistence enabled, you can still copy the AOF in order to create backups. The file may lack the final part but Redis will be still able to load it (see the previous sections about truncated AOF files).

Disaster recovery

Disaster recovery in the context of Redis is basically the same story as backups, plus the ability to transfer those backups in many different external data centers. This way data is secured even in the case of some catastrophic event affecting the main data center where Redis is running and producing its snapshots.

Since many Redis users are in the startup scene and thus don't have plenty of money to spend we'll review the most interesting disaster recovery techniques that don't have too high costs.

  • Amazon S3 and other similar services are a good way for implementing your disaster recovery system. Simply transfer your daily or hourly RDB snapshot to S3 in an encrypted form. You can encrypt your data using gpg -c (in symmetric encryption mode). Make sure to store your password in many different safe places (for instance give a copy to the most important people of your organization). It is recommended to use multiple storage services for improved data safety.
  • Transfer your snapshots using SCP (part of SSH) to far servers. This is a fairly simple and safe route: get a small VPS in a place that is very far from you, install ssh there, and generate an ssh client key without passphrase, then add it in the authorized_keys file of your small VPS. You are ready to transfer backups in an automated fashion. Get at least two VPS in two different providers for best results.

It is important to understand that this system can easily fail if not implemented in the right way. At least make absolutely sure that after the transfer is completed you are able to verify the file size (that should match the one of the file you copied) and possibly the SHA1 digest if you are using a VPS.

You also need some kind of independent alert system if the transfer of fresh backups is not working for some reason.

原文地址:https://www.cnblogs.com/rsapaper/p/6813262.html