Do you use ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY for your Oracle databases in Data Guard? Maybe you also use the Fast Recovery Area as archive log destination. That’s good practice! But did you ever check that it works as expected?

What I mean is this:

  • The archived logs that you don’t need are reclaimable by the FRA when space is needed
  • And the archived logs that are required for availability (standby or backup) are not deleted.

It’s not an easy thing to check because Oracle doesn’t show which archive log is reclaimable. Only the total reclaimable space is shown in v$recovery_area_usage. But that is not sufficient to validate which archivelog sequence is concerned. I’ll show you below a query that returns the reclaimable status from the archived logs. And you will see that until 12c the APPLIED ON ALL STANDBY does not work as expected. You’ve probably seen a FRA full at standby site and solved it by deleting archived logs. But this is not the right solution because the FRA is supposed to do that.

Let’s look at an example I encountered recently. The archivelog deletion policy is set correctly:

RMAN> show archivelog deletion policy;   RMAN configuration parameters for database with db_unique_name DATABASE_SITE2 are: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

This configuration, an 11g feature, allows to delete an archive log as soon as it is applied to all standby destinations. Note that it works if I manually do a ‘delete archivelog all;’ but I expect that the archivelogs in the FRA becomes reclaimable automatically.

Unfortunately, this is not the case and the FRA is growing:

SQL> select * from v$recovery_area_usage where file_type='ARCHIVED LOG';   FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES -------------------- ------------------ ------------------------- --------------- ARCHIVED LOG                      61.11                     43.02             467

Let’s check everything. We are on the standby database:

SQL> select open_mode,database_role from v$database;   OPEN_MODE            DATABASE_ROLE -------------------- ---------------- MOUNTED              PHYSICAL STANDBY

The archivelogs are going to the Fast Recovery Area:

SQL> show parameter log_archive_dest_1 NAME                                 TYPE        VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_1                   string      location=USE_DB_RECOVERY_FILE_                                                  DEST, valid_for=(ALL_LOGFILES,                                                   ALL_ROLES)

All archived logs are applied (we are in SYNC AFFIRM):

DGMGRL> show database 'DATABASE_SITE2';   Database - DATABASE_SITE2     Role:            PHYSICAL STANDBY   Intended State:  APPLY-ON   Transport Lag:   0 seconds   Apply Lag:       0 seconds   Real Time Query: OFF   Instance(s):     DATABASE   Database Status: SUCCESS

Well, with that configuration, I expect that all archivelogs are reclaimable – except the current one.

Let’s investigate. V$RECOVERY_AREA_USAGE is an aggregate view. If we check its definition, we see that the reclaimable size comes from x$kccagf.rectype.

So I’ll use it in in conjunction with v$archived_log in order to give the detail about which archived logs are reclaimable:

SQL> select applied,deleted,decode(rectype,11,'YES','NO') reclaimable            ,count(*),min(sequence#),max(sequence#)      from v$archived_log left outer join sys.x$kccagf using(recid)       where is_recovery_dest_file='YES' and name is not null      group by applied,deleted,decode(rectype,11,'YES','NO') order by 5 /   APPLIED   DELETED RECLAIMABLE   COUNT(*) MIN(SEQUENCE#) MAX(SEQUENCE#) --------- ------- ----------- ---------- -------------- -------------- YES       NO      YES                429           5938           6366 YES       NO      NO                  37           6367           6403 IN-MEMORY NO      NO                   1           6404           6404

The problem is there: Because of a bug (Bug 14227959 : STANDBY DID NOT RELEASE SPACE IN FRA) the archivelogs are not marked as reclaimable when the database is in mount mode.

The workaround is to execute dbms_backup_restore.refreshagedfiles. This is what must be scheduled (maybe daily) on the standby. It can be a good idea to do it at the same time as a daily ‘delete obsolete’, so here is the way to call it from RMAN:
RMAN> sql "begin dbms_backup_restore.refreshagedfiles; end;";
But I’ve found another workaround: just run the CONFIGURE ARCHIVELOG POLICY from RMAN as it refreshes the reclaimable flag – even when there is no change.

Then, you can run a daily job that does CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY on your database, whatever the role is.

It’s different for the database where you do the backup, because you want to be sure that the backup is done before an archivelog is deleted: CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK;

This is a good way to prevent anyone from changing the configuration and keep it close to the backup scripts. At dbi services, we advise to keep the same configuration on all Data Guard sites for the same database so that a switchover can be done without any problem. For this reason, having a script that depends on the place where the backups are done is a better alternative than a configuration that depends on the database role.

Finally, here is the state of our reclaimable archivelogs after any of these solutions:

APPLIED   DELETED RECLAIMABLE   COUNT(*) MIN(SEQUENCE#) MAX(SEQUENCE#) --------- ------- ----------- ---------- -------------- -------------- YES       NO      YES                466           5938           6403 IN-MEMORY NO      NO                   1           6404           6404

FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES -------------------- ------------------ ------------------------- --------------- ARCHIVED LOG                      61.11                     61.09             467

All applied archived logs are reclaimable and the FRA will never be full.
You can check the primary as well. Are you sure that Oracle will never delete an archived log that has not been backed up ? Check your deletion policy.
Here is the full query I use for that:

column deleted format a7 column reclaimable format a11 set linesize 120 select applied,deleted,backup_count  ,decode(rectype,11,'YES','NO') reclaimable,count(*)  ,to_char(min(completion_time),'dd-mon hh24:mi') first_time  ,to_char(max(completion_time),'dd-mon hh24:mi') last_time  ,min(sequence#) first_seq,max(sequence#) last_seq from v$archived_log left outer join sys.x$kccagf using(recid) where is_recovery_dest_file='YES' group by applied,deleted,backup_count,decode(rectype,11,'YES','NO') order by min(sequence#) /

This is the result on primary where the last archivelog backup has run around 21:00

APPLIED   DELETED BACKUP_COUNT RECLAIMABLE COUNT(*) FIRST_TIME   LAST_TIME    FIRST_SEQ LAST_SEQ --------- ------- ------------ ----------- -------- ------------ ------------ --------- -------- NO        YES                1 NO               277 15-jan 17:56 19-jan 09:49      5936     6212 NO        NO                 1 YES              339 19-jan 10:09 22-jan 21:07      6213     6516 NO        NO                 0 NO                33 22-jan 21:27 23-jan 07:57      6517     6549

That is fine according to my policy APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK
And here is the result on standby where the workaround job has run around 06:00 and redo apply is in SYNC

APPLIED   DELETED BACKUP_COUNT RECLAIMABLE COUNT(*) FIRST_TIME   LAST_TIME    FIRST_SEQ LAST_SEQ --------- ------- ------------ ----------- -------- ------------ ------------ --------- -------- YES       YES                0 NO               746 07-jan 13:27 17-jan 11:17      5320     6065 YES       NO                 0 YES              477 17-jan 11:37 23-jan 05:37      6066     6542 YES       NO                 0 NO                 8 23-jan 05:57 23-jan 08:14      6543     6550 IN-MEMORY NO                 0 NO                 1 23-jan 08:15 23-jan 08:15      6551     6551

This is good for my policy APPLIED ON ALL STANDBY – except that because of the bug mentioned above, redo applied since 06:00 are not yet reclaimable.

Update SEP-17

When troubleshooting archivelog deletion policy issue, here is a better query which counts the number of backups for each sequence:
set linesize 200 pagesize 1000 column is_recovery_dest_file format a21 select deleted,status,is_recovery_dest_file,thread#,min(sequence#),max(sequence#),min(first_time),max(next_time),count(distinct sequence#),archived,applied,backup_count,count("x$kccagf") from ( select deleted,thread#,sequence#,status,name ,first_time, next_time,case x$kccagf.rectype when 11 then recid end "x$kccagf" ,count(case archived when 'YES' then 'YES' end)over(partition by thread#,sequence#) archived ,count(case applied when 'YES' then 'YES' end)over(partition by thread#,sequence#) applied ,sum(backup_count)over(partition by thread#,sequence#) backup_count ,listagg(is_recovery_dest_file||':'||dest_id,',')within group(order by dest_id)over(partition by thread#,sequence#) is_recovery_dest_file from v$archived_log left outer join sys.x$kccagf using(recid) ) group by deleted,status,is_recovery_dest_file,thread#,archived,applied,backup_count order by max(sequence#),min(sequence#),thread#,deleted desc,status;
With the following output:
DEL S IS_RECOVERY_DEST_FILE THREAD# MIN(SEQUENCE#) MAX(SEQUENCE#) MIN(FIRST MAX(NEXT_ COUNT(DISTINCTSEQUENCE#) ARCHIVED APPLIED BACKUP_COUNT COUNT("X$KCCAGF") --- - --------------------- ---------- -------------- -------------- --------- --------- ------------------------ ---------- ---------- ------------ ----------------- NO A YES:1 1 3233 3233 23-JUN-17 23-JUN-17 1 1 0 1 1 NO A YES:1,NO:2 1 3234 5387 23-JUN-17 21-JUL-17 2154 2 1 1 2154 NO A YES:1,NO:2 1 5388 11596 21-JUL-17 10-OCT-17 6209 2 1 0 6208 NO A YES:1,NO:2 1 11597 11597 10-OCT-17 10-OCT-17 1 2 0 0 0

#############参考文档5

https://www.cnblogs.com/kerrycode/p/5684768.html

 关于Oracle数据库的RMAN备份,除了邮件外,是否能通过其它方式检查RMAN备份的成功与失败呢?其实我们可以通过下面SQL脚本来检查某个时间段备份失败的记录:

SELECT * FROM V$RMAN_STATUS 
WHERE START_TIME >= TO_DATE(&START_TIME,'YYYY-MM-DD HH24:MI:SS')
  AND END_TIME   <= TO_DATE(&END_TIME  ,'YYYY-MM-DD HH24:MI:SS')
  AND OPERATION ='BACKUP'
  AND STATUS !='COMPLETED' 
  AND STATUS NOT LIKE 'RUNNING%'

查看备份成功的历史记录:

SELECT * FROM V$RMAN_STATUS 
WHERE START_TIME >= TO_DATE(&START_TIME,'YYYY-MM-DD HH24:MI:SS')
  AND END_TIME   <= TO_DATE(&END_TIME  ,'YYYY-MM-DD HH24:MI:SS')
  AND OPERATION ='BACKUP'
  AND STATUS ='COMPLETED'

其中STATUS主要有RUNNING、RUNNING WITH WARNINGS、RUNNING WITH ERRORS、COMPLETED、COMPLETED WITH WARNINGS、COMPLETED WITH ERRORS、FAILED等几种状态。另外,如果你在命令窗口输入

[oracle@MyDB ~]$ date
Tue Jul 19 10:52:02 CST 2016
[oracle@MyDB ~]$ rman target /
 
Recovery Manager: Release 10.2.0.5.0 - Production on Tue Jul 19 10:52:13 2016
 
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
 
connected to target database: SCM2 (DBID=3990839260)
 
RMAN> 

在V$RMAN_STATUS里面,你会看到插入了一条记录STATUS为RUNNING状态

SQL> COL ROW_TYPE FOR A10;
SQL> COL OPERATION FOR A10;
SQL> COL COMMAND_ID FOR A20;
SQL> COL STATUS FOR A30;
SQL> COL OBJECT_TYPE FOR A16;
SQL> SELECT ROW_TYPE, COMMAND_ID, OPERATION, STATUS,OBJECT_TYPE 
  2  FROM V$RMAN_STATUS 
  3  WHERE START_TIME >= TO_DATE('2016-07-19 10:52:00', 'YYYY-MM-DD HH24:MI:SS');
 
ROW_TYPE   COMMAND_ID           OPERATION  STATUS              OBJECT_TYPE
---------- -------------------- ---------- ----------------- ----------------
SESSION    2016-07-19T10:52:13  RMAN       RUNNING

此时如果在RMAN中随意执行一个错误命令,如下所示

[oracle@MyDB ~]$ rman target /
 
Recovery Manager: Release 10.2.0.5.0 - Production on Tue Jul 19 10:52:13 2016
 
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
 
connected to target database: SCM2 (DBID=3990839260)
 
RMAN> /
 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01006: error signalled during parse
RMAN-02001: unrecognized punctuation symbol "/"

clip_image001

SQL> COL ROW_TYPE FOR A10;
SQL> COL OPERATION FOR A10;
SQL> COL COMMAND_ID FOR A20;
SQL> COL STATUS FOR A30;
SQL> COL OBJECT_TYPE FOR A16;
SQL> SELECT ROW_TYPE, COMMAND_ID, OPERATION, STATUS,OBJECT_TYPE 
  2  FROM V$RMAN_STATUS 
  3  WHERE START_TIME >= TO_DATE('2016-07-19 10:52:00', 'YYYY-MM-DD HH24:MI:SS');
 
ROW_TYPE   COMMAND_ID           OPERATION  STATUS                  OBJECT_TYPE
---------- -------------------- ---------- --------------------- ----------------
SESSION    2016-07-19T10:52:13  RMAN       RUNNING WITH ERRORS

在RMAN中退出,此时你会看到STAUS记录从"RUNNING WITH ERRORS"变成了"COMPLETED WITH ERRORS"

clip_image002

也就是说,你可以在这个视图里面查看在RMAN里面执行的一些操作,例如删除归档日志等,另外,如果要查看RMAN的输出,可以查看V$RMAN_OUTPUT,V$RMAN_OUTPUT 视图记录了RMAN生成的信息,这是在内存中额视图,不会记录到控制文件上。最多有37278条记录.

当然也可以从V$RMAN_BACKUP_JOB_DETAILS中查看RMAN备份更详细的信息。

SELECT START_TIME,
      END_TIME,
      OUTPUT_DEVICE_TYPE,
      STATUS,
      ELAPSED_SECONDS,
      COMPRESSION_RATIO,
      INPUT_BYTES_DISPLAY,
      OUTPUT_BYTES_DISPLAY
FROM V$RMAN_BACKUP_JOB_DETAILS
ORDER BY START_TIME DESC ;

另外,如果我们需要查看RMAN备份的一些详细记录,在惜分飞的通过sql查询rman备份信息博客里面分享了下面一些经典的SQL语句。收录在此。

查看所有备份集详细信息:

SELECT A.RECID "BACKUP SET",
       A.SET_STAMP,
        DECODE (B.INCREMENTAL_LEVEL,
                '', DECODE (BACKUP_TYPE, 'L', 'Archivelog', 'Full'),
                1, 'Incr-1级',
                0, 'Incr-0级',
                B.INCREMENTAL_LEVEL)
           "Type LV",
        B.CONTROLFILE_INCLUDED "包含CTL",
        DECODE (A.STATUS,
                'A', 'AVAILABLE',
                'D', 'DELETED',
                'X', 'EXPIRED',
                'ERROR')
           "STATUS",
        A.DEVICE_TYPE "Device Type",
        A.START_TIME "Start Time",
        A.COMPLETION_TIME "Completion Time",
        A.ELAPSED_SECONDS "Elapsed Seconds",
        A.BYTES/1024/1024/1024 "Size(G)",
        A.COMPRESSED,
        A.TAG "Tag",
        A.HANDLE "Path"
   FROM GV$BACKUP_PIECE A, GV$BACKUP_SET B
  WHERE A.SET_STAMP = B.SET_STAMP AND A.DELETED = 'NO'
RDER BY A.COMPLETION_TIME DESC;

查找某个备份集中包含数据文件

SELECT DISTINCT c.file#,A.SET_STAMP, D.NAME, C.CHECKPOINT_CHANGE#, C.CHECKPOINT_TIME
FROM V$BACKUP_DATAFILE C, V$BACKUP_PIECE A, V$DATAFILE D
WHERE A.SET_STAMP = C.SET_STAMP
    AND D.FILE# = C.FILE#
    AND A.DELETED='NO'
    AND c.set_stamp=&set_stamp
 ORDER BY C.FILE#;

查询某个备份集中控制文件

SELECT DISTINCT A.SET_STAMP,
                D.NAME,
                C.CHECKPOINT_CHANGE#,
                C.CHECKPOINT_TIME
 FROM V$BACKUP_DATAFILE C, V$BACKUP_PIECE A, V$CONTROLFILE D
 WHERE A.SET_STAMP = C.SET_STAMP
   AND C.FILE# = 0
   AND A.DELETED = 'NO'
   AND C.SET_STAMP = &SET_STAMP;

查看某个备份集中归档日志:

SELECT DISTINCT B.SET_STAMP,
                B.THREAD#,
                B.SEQUENCE#,
                B.FIRST_TIME,
                B.FIRST_CHANGE#,
                B.NEXT_TIME,
                B.NEXT_CHANGE#
 FROM V$BACKUP_REDOLOG B, V$BACKUP_PIECE A
 WHERE A.SET_STAMP = B.SET_STAMP
   AND A.DELETED = 'NO'
   AND B.SET_STAMP = &SET_STAMP
 ORDER BY THREAD#, SEQUENCE#;

查看某个备份集SPFILE

SELECT DISTINCT A.SET_STAMP, B.COMPLETION_TIME, HANDLE
FROM V$BACKUP_SPFILE B, V$BACKUP_PIECE A
 WHERE A.SET_STAMP = B.SET_STAMP
   AND A.DELETED = 'NO'
   AND B.SET_STAMP = &SET_STAMP;

查看RMAN的配置信息

SELECT NAME,VALUE FROM V$RMAN_CONFIGURATION;


###参考文档6

https://www.cnblogs.com/DataArt/p/10018940.html
也就是RMAN调用了一个内部的包 sys.dbms_backup_restore.get_connect_identifier来获得在备库连接主库时需要用到的串。这时我们需要知道这个串是在哪里设置的,为何为空。

接下来,针对RMAN进行10046 trace:

[oracle@test1 ~]$ rman target / debug trace=/tmp/rman_debug

Recovery Manager: Release 11.2.0.4.0 - Production on Mon Mar 17 09:00:00 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

RMAN-06568: connected to target database: R11204 (DBID=2001766638, not open)

RMAN> sql "alter session set tracefile_identifier=''rman_10046''";

RMAN-06009: using target database control file instead of recovery catalog
RMAN-06162: sql statement: alter session set tracefile_identifier=''rman_10046''

RMAN> sql "alter session set events ''10046 trace name context forever,level 12''";

RMAN-06162: sql statement: alter session set events ''10046 trace name context forever,level 12''

RMAN> backup archivelog all;
RMAN-03090: Starting backup at 25-MAR-14
RMAN-06820: WARNING: failed to archive current log at primary database
RMAN-06613: Connect identifier for DB_UNIQUE_NAME R11204 not configured
...

查看生成的trace file,这个文件在udump下:
$cd /u01/app/diag/rdbms/sdy/SDY/trace
$ls -ltr
-rw-r----- 1 oracle oinstall 1037463 Mar 25 14:11 SDY_ora_3792_rman_10046.trc

PARSING IN CURSOR #140366085001120 len=119 dep=0 uid=0 oct=47 lid=0 tim=1395736859520777 hv=3388798669 ad='7ec65738' sqlid='7pwt2c34
ztxqd'
begin   :lprimary_db_cs :=     sys.dbms_backup_restore.get_connect_identifier       (dbuname=> :primary_dbuname); end; 
END OF STMT
PARSE #140366085001120:c=0,e=285,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1395736859520776
BINDS #140366085001120:
 Bind#0
  oacdty=01 mxl=2000(1536) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=2128 off=0
  kxsbbbfp=7fa986a27f08  bln=2000  avl=00  flg=05
 Bind#1
  oacdty=01 mxl=128(90) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=0 off=2000
  kxsbbbfp=7fa986a286d8  bln=128  avl=06  flg=01
  value="R11204"
*** ACTION NAME:(0000018 STARTED189) 2014-03-25 14:10:59.521

WAIT #140366085001120: nam='control file sequential read' ela= 10 file#=0 block#=1 blocks=1 obj#=-1 tim=1395736859521532
WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=16 blocks=1 obj#=-1 tim=1395736859521566
WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=18 blocks=1 obj#=-1 tim=1395736859521580
WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=281 blocks=1 obj#=-1 tim=1395736859521594
WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=1 blocks=1 obj#=-1 tim=1395736859521614
WAIT #140366085001120: nam='control file sequential read' ela= 3 file#=0 block#=16 blocks=1 obj#=-1 tim=1395736859521627
WAIT #140366085001120: nam='control file sequential read' ela= 2 file#=0 block#=18 blocks=1 obj#=-1 tim=1395736859521638
WAIT #140366085001120: nam='control file sequential read' ela= 3 file#=0 block#=281 blocks=1 obj#=-1 tim=1395736859521650
krsd_get_primary_connect_string: found pcs '' by FAL_SERVER lookup <====================用FAL_SERVER找到了连接串''

所以这个10046 trace,很清楚地告诉我们它是从参数FAL_SERVER上获得了连接串''。

这时,连接到备库,查看参数FAL_SERVER,它的值的确为空:
SQL> show parameter fal

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fal_client                           string
fal_server                           string  

到此,我们通过RMAN debug和10046 trace,获得了我们想要的信息。

总结一下:

如果在执行RMAN命令后,遇到了性能问题或者需要深入跟踪一个错误,那么可以考虑使用rman debug:

$ rman target <connection> catalog <connection> debug trace=/tmp/rmanDebug.trc log=/tmp/rmanLog.txt
run { 
...Run your backup commands here 
}

如果还需要跟进一步的跟踪可以再使用10046 trace:

$ rman target <connection> catalog <connection> debug trace=/tmp/rmanDebug.trc log=/tmp/rmanLog.txt
RMAN> sql "alter session set tracefile_identifier=''rman_10046''";
RMAN> sql "alter session set events ''10046 trace name context forever,level 12''";
RMAN> run-your-commands;
RMAN> exit;

需要注意的是,上面的这些方法可能会生成大量文件,需要考虑对磁盘空间的压力以及对RMAN的性能的影响。


可以参考MOS文档:RMAN: Quick Debugging Guide (Doc ID 1198753.1)