postgresql wal文件误删恢复

2021-09-10 17:22:42.417183T  @  startup  00000 [2021-09-10 17:22:42 CST] 0 [9298] LOCATION:  StartupXLOG, xlog.c:6347
2021-09-10 17:22:42.417206T  @  startup  XX000 [2021-09-10 17:22:42 CST] 0 [9298] FATAL:  XX000: required WAL directory "pg_wal" does not exist
2021-09-10 17:22:42.417206T  @  startup  XX000 [2021-09-10 17:22:42 CST] 0 [9298] LOCATION:  ValidateXLOGDirectoryStructure, xlog.c:4262
2021-09-10 17:22:42.417407T  @  postmaster  00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG:  00000: startup process (PID 9298) exited with exit code 1
2021-09-10 17:22:42.417407T  @  postmaster  00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION:  LogChildExit, postmaster.c:3714
2021-09-10 17:22:42.417417T  @  postmaster  00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG:  00000: aborting startup due to startup process failure
2021-09-10 17:22:42.417417T  @  postmaster  00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION:  reaper, postmaster.c:2969
2021-09-10 17:22:42.427171T  @  postmaster  00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG:  00000: database system is shut down
2021-09-10 17:22:42.427171T  @  postmaster  00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION:  UnlinkLockFiles, miscinit.c:928

  执行pg_resetwal -f PGDATA可以重新初始化wal文件,但是会丢失事务日志以及数据不一致,因为可能有full checkpoint之前的数据丢失,极端情况下某些数据块丢失。此时初始化WAL文件如下:

[zjh@lightdb1 pgsql13.2]$ cd data/pg_wal/
[zjh@lightdb1 pg_wal]$ ll
total 1048576
-rw------- 1 zjh zjh 1073741824 Sep 10 21:44 00000001000000BB00000001
drwx------ 2 zjh zjh          6 Sep 10 21:42 archive_status

  再启动PG,备份、重建。

  具体会丢失多少数据,可以通过pg_controldata输出中的latest checkpoint确认。

  如果因为wal_size设置的比较大,希望删除历史wal的话,可以通过pg_archivecleanup清理latest checkpoint之前的wal日志,如下:

  pg_archivecleanup /data1/zjh/coordinator/pg_wal/ 000000010000000900000023

  清理000000010000000900000023之前的wal文件。

   确实,比他小的没有了,但是问题在于之前的日志都还没删除。

原文地址:https://www.cnblogs.com/zhjh256/p/15253179.html