28 Février 2018
Hi !
Après ce mois de Février riche en articles, je ne pouvais pas laisser le premier jour du mois sans publier. Pour ne pas changer, j'ai décidé de continuer avec mon dataguard dont je vous rappelle la configuration ci-dessous.
GMGRL> DGMGRL> show configuration Configuration - mapogos Protection Mode: MaxPerformance Members: rastadb - Primary database rastastby - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 20 seconds ago) DGMGRL>
Il ne se passe pas grand chose sur mon environnement. Je vais donc simuler ce que l'on aime moyennement avoir sur un DG: Un GAP !
Et pour cela, je vais commencer par modifier l'état de ma base primaire.
DGMGRL> DGMGRL> edit database rastadb set state='TRANSPORT-OFF'; Succeeded. DGMGRL>
Je vais maintenant créer une table sur un schema de test, et générer quelques archives.
SQL> SQL> create table lao.t5 tablespace lao as select * from dba_objects; Table created. SQL> alter system switch logfile; System altered. SQL> / System altered. SQL> / System altered. SQL>
Vérifions donc ce que cela donne sur ma base primaire & sur ma standby.
Base primaire :
SQL> SQL> select max(sequence#) from v$archived_log; MAX(SEQUENCE#) -------------- 174
Base standby:
SQL> SQL> select max(sequence#) from v$archived_log; MAX(SEQUENCE#) -------------- 171
Toujours sur ma vm rasta (primary), je vais simuler une perte d'archivelog en renommant les trois dernières archives (sequences 172 à 173). En thérorie, dans la vraie vie on essaye toujours d'éviter ce genre de situation, mais toujours est-il que cela peut arriver..
ASMCMD> cp thread_1_seq_172.316.969290245 thread_1_seq_172.316.969290245.old copying +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_172.316.969290245 -> +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_172.316.969290245.old ASMCMD> cp thread_1_seq_173.317.969290245 thread_1_seq_173.317.969290245.old copying +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_173.317.969290245 -> +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_173.317.969290245.old ASMCMD> cp thread_1_seq_174.323.969290247 thread_1_seq_174.323.969290247.old copying +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_174.323.969290247 -> +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_174.323.969290247.old ASMCMD> ASMCMD> rm thread_1_seq_173.317.969290245 ASMCMD> rm thread_1_seq_174.323.969290247 ASMCMD> rm thread_1_seq_172.316.969290245 ASMCMD>
Je vais également réactiver ma primaire (ou du moins le transport des redos).
DGMGRL> DGMGRL> edit database rastadb set state='TRANSPORT-ON'; Succeeded. DGMGRL>
Me voici donc maintenant dans une situation ou sur un dataguard de production, je viens de perdre des archives qui ne sont ni sur disques, ni sauvegardées.
Et l'impact ne se fait pas attendre.
DGMGRL>
DGMGRL> show configuration
Configuration - mapogos
Protection Mode: MaxPerformance
Members:
rastadb - Primary database
Error: ORA-16724: cannot resolve gap for one or more members
rastastby - Physical standby database
Warning: ORA-16809: multiple warnings detected for the member
Fast-Start Failover: DISABLED
Configuration Status:
ERROR (status updated 17 seconds ago)
Ce qui est certain, c'est qu'il préférable ne pas avoir à faire un switchover / failover dans les moments à venir.
DGMGRL> validate database rastastby ; Database Role: Physical standby database Primary Database: rastadb Ready for Switchover: No Ready for Failover: Yes (Primary Running) Flashback Database Status: rastadb : On rastastby: Off Managed by Clusterware: rastadb : YES rastastby: YES Standby Apply-Related Information: Apply State: Running Apply Lag: 19 minutes 20 seconds (computed 0 seconds ago) Apply Delay: 0 minutes DGMGRL>
Nous avons la confirmation que ma standby n'est pas prête pour un switchover. Certes, le failover est possible. Mais cela impliquera une perte de données. On peut d'ailleurs interroger la vue v$archive_gap pour constater qu'un GAP est en cours.
SQL> SQL> select * from v$archive_gap; THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# CON_ID ---------- ------------- -------------- ---------- 1 172 174 1 SQL>
Dans la mesure ou ces archives sont définitivement perdues nous avons deux options:
Sauf à gérer des petites bases, on va pencher assez rapidement vers l'option 2.
Avant la version 12c, il fallait
Oracle 12c a introduit la notion de restauration à travers un service un peu à la manière d'un "duplicate from active".
Avant de débuter les opérations de re-synchro, je vais stopper le transfert des redo & l'apply afin d'éviter des interférences.
DGMGRL> DGMGRL> DGMGRL> edit database rastastby set state='APPLY-OFF'; Succeeded. DGMGRL> DGMGRL> edit database rastadb set state='TRANSPORT-OFF'; Succeeded. DGMGRL> DGMGRL>
Sur ma VM PrettyBoy (standby), je me connecte via rman. Et comme rman permet maintenant de faire du sql (je veux dire directement, sans l'instruction sql), j'en profite pour vérifier que ma standby est en mode mount;
RMAN> RMAN> select open_mode from v$database; OPEN_MODE -------------------- MOUNTED RMAN>
Nous sommes prêt à effectuer un recover de ma standby à travers un service. Dans mon cas, j'utilise l'alias RASTADB de mon tnsnames.ora qui pointe sur le serveur primaire.
RMAN> recover database from service RASTADB noredo using compressed backupset; Starting recover at 28-FEB-18 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=269 device type=DISK channel ORA_DISK_1: starting incremental datafile backup set restore channel ORA_DISK_1: using compressed network backup set from service RASTADB destination for restore of datafile 00001: +DATA/RASTASTBY/DATAFILE/system.263.965579849 channel ORA_DISK_1: restore complete, elapsed time: 00:00:04 channel ORA_DISK_1: starting incremental datafile backup set restore channel ORA_DISK_1: using compressed network backup set from service RASTADB destination for restore of datafile 00003: +DATA/RASTASTBY/DATAFILE/sysaux.264.965579865 channel ORA_DISK_1: restore complete, elapsed time: 00:00:03 channel ORA_DISK_1: starting incremental datafile backup set restore channel ORA_DISK_1: using compressed network backup set from service RASTADB destination for restore of datafile 00004: +DATA/RASTASTBY/DATAFILE/undotbs1.265.965579871 channel ORA_DISK_1: restore complete, elapsed time: 00:00:07 channel ORA_DISK_1: starting incremental datafile backup set restore channel ORA_DISK_1: using compressed network backup set from service RASTADB destination for restore of datafile 00005: +DATA/RASTASTBY/DATAFILE/lao.342.966869437 channel ORA_DISK_1: restore complete, elapsed time: 00:00:01 channel ORA_DISK_1: starting incremental datafile backup set restore channel ORA_DISK_1: using compressed network backup set from service RASTADB destination for restore of datafile 00007: +DATA/RASTASTBY/DATAFILE/users.266.965579873 channel ORA_DISK_1: restore complete, elapsed time: 00:00:01 Finished recover at 28-FEB-18 RMAN>
Remarque : Dans le cas de plus grosses bases, et d'un delta plus important, il est tout à faire possible de paralléliser le processus.
Ceci étant fait, il faut maintenant passer la standby en nomount afin de regenerer un fichier controlfile depuis la base primaire. Une fois cela fait, je repasse la base en mode mount.
RMAN> RMAN> restore standby controlfile from service RASTADB; Starting restore at 28-FEB-18 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=24 device type=DISK channel ORA_DISK_1: starting datafile backup set restore channel ORA_DISK_1: using network backup set from service RASTADB channel ORA_DISK_1: restoring control file channel ORA_DISK_1: restore complete, elapsed time: 00:00:02 output file name=+DATA/RASTASTBY/CONTROLFILE/current.261.965579843 output file name=+DATA/RASTASTBY/CONTROLFILE/current.262.965579843 Finished restore at 28-FEB-18 RMAN> RMAN> alter database mount; Statement processed released channel: ORA_DISK_1 RMAN>
Remarque: Je suis ici dans un cas simple ou le delta est faible & ou il n'y a pas d'activité sur la base primaire. Cependant, il n'est pas impossible qu'entre le début du gap et le moment ou l'on se lance dans la resynchro, de nouveaux datafiles aient été crées. Il faudrait donc également en effectuer la restauration toujours à travers un service. Qui sait, peut être un prochain article, ou un amendement à celui-ci.
Le fait d'avoir restauré un controlfile depuis la primaire implique que pour le moment il peut y avoir incohérence entre les informations contenues dans le controlfile et la réalité. Nous allons remédier à cela.
RMAN>
RMAN> catalog start with '+DATA';
Starting implicit crosscheck backup at 28-FEB-18
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=32 device type=DISK
Crosschecked 10 objects
Finished implicit crosscheck backup at 28-FEB-18
Starting implicit crosscheck copy at 28-FEB-18
using channel ORA_DISK_1
Finished implicit crosscheck copy at 28-FEB-18
searching for all files in the recovery area
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: +DATA/rastastby/TEMPFILE/temp.285.965579907
File Name: +DATA/rastastby/ARCHIVELOG/2018_02_28/thread_1_seq_166.452.969289737
...
File Name: +DATA/rastastby/ARCHIVELOG/2018_01_16/thread_1_seq_42.324.965598559
File Name: +DATA/rastastby/DATAFILE/system.263.965579849
File Name: +DATA/rastastby/DATAFILE/sysaux.264.965579865
File Name: +DATA/rastastby/DATAFILE/undotbs1.265.965579871
File Name: +DATA/rastastby/DATAFILE/users.266.965579873
File Name: +DATA/rastastby/DATAFILE/lao.342.966869437
searching for all files that match the pattern +DATA
List of Files Unknown to the Database
=====================================
File Name: +DATA/orapwasm
File Name: +DATA/rastastby/PARAMETERFILE/spfile.300.965594883
File Name: +DATA/rastastby/TEMPFILE/temp.285.965579907
File Name: +DATA/rastastby/ONLINELOG/group_1.269.965579879
File Name: +DATA/rastastby/ONLINELOG/group_1.270.965579879
File Name: +DATA/rastastby/ONLINELOG/group_2.271.965579881
File Name: +DATA/rastastby/ONLINELOG/group_2.272.965579883
File Name: +DATA/rastastby/ONLINELOG/group_3.273.965579883
File Name: +DATA/rastastby/ONLINELOG/group_3.274.965579885
File Name: +DATA/rastastby/ONLINELOG/group_4.275.965579885
File Name: +DATA/rastastby/ONLINELOG/group_4.276.965579887
File Name: +DATA/rastastby/ONLINELOG/group_5.277.965579889
File Name: +DATA/rastastby/ONLINELOG/group_5.278.965579889
File Name: +DATA/rastastby/ONLINELOG/group_6.279.965579891
File Name: +DATA/rastastby/ONLINELOG/group_6.280.965579891
File Name: +DATA/rastastby/ONLINELOG/group_7.284.969198429
File Name: +DATA/rastastby/ONLINELOG/group_7.283.969198429
File Name: +DATA/rastastby/CONTROLFILE/current.257.965579843
File Name: +DATA/rastastby/CONTROLFILE/current.258.965579843
File Name: +DATA/rastastby/CONTROLFILE/current.259.965579843
File Name: +DATA/rastastby/CONTROLFILE/current.260.965579843
File Name: +DATA/ASM/ASMPARAMETERFILE/registry.253.965575363
Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: +DATA/rastastby/TEMPFILE/temp.285.965579907
List of Files Which Were Not Cataloged
=======================================
File Name: +DATA/orapwasm
RMAN-07518: Reason: Foreign database file DBID: 0 Database Name:
File Name: +DATA/rastastby/PARAMETERFILE/spfile.300.965594883
RMAN-07518: Reason: Foreign database file DBID: 0 Database Name:
File Name: +DATA/rastastby/ONLINELOG/group_1.269.965579879
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_1.270.965579879
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_2.271.965579881
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_2.272.965579883
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_3.273.965579883
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_3.274.965579885
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_4.275.965579885
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_4.276.965579887
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_5.277.965579889
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_5.278.965579889
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_6.279.965579891
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_6.280.965579891
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_7.284.969198429
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_7.283.969198429
RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/CONTROLFILE/current.257.965579843
RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/rastastby/CONTROLFILE/current.258.965579843
RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/rastastby/CONTROLFILE/current.259.965579843
RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/rastastby/CONTROLFILE/current.260.965579843
RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/ASM/ASMPARAMETERFILE/registry.253.965575363
RMAN-07518: Reason: Foreign database file DBID: 0 Database Name:
RMAN>
Nous y sommes presque..
RMAN> RMAN> switch database to copy; datafile 1 switched to datafile copy "+DATA/rastastby/DATAFILE/system.263.965579849" datafile 3 switched to datafile copy "+DATA/rastastby/DATAFILE/sysaux.264.965579865" datafile 4 switched to datafile copy "+DATA/rastastby/DATAFILE/undotbs1.265.965579871" datafile 5 switched to datafile copy "+DATA/rastastby/DATAFILE/lao.342.966869437" datafile 7 switched to datafile copy "+DATA/rastastby/DATAFILE/users.266.965579873" RMAN>
Il nous suffit maintenant à faire un peu de vide dans les redo & les Standby Redo. Dans mon cas 3 groupes de redos & 4 (3+1) groupes de Standby Redo.
SQL> SQL> SQL> alter database clear logfile group 1; Database altered. SQL> alter database clear logfile group 2; Database altered. SQL> alter database clear logfile group 3; Database altered. SQL> alter database clear logfile group 4; Database altered. SQL> alter database clear logfile group 5; Database altered. SQL> alter database clear logfile group 6; Database altered. SQL> alter database clear logfile group 7; Database altered.
Normalement, nous y sommes... Plus qu'à réctiver transport & APPLY
DGMGRL> DGMGRL> edit database rastadb set state='TRANSPORT-ON'; Succeeded. DGMGRL> DGMGRL> edit database rastastby set state='APPLY-ON'; Succeeded. DGMGRL>
Le moment de vérité est arrivé.. Vérifions.
DGMGRL> DGMGRL> show configuration Configuration - mapogos Protection Mode: MaxPerformance Members: rastadb - Primary database rastastby - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 21 seconds ago)
Cela a l'air plutôt bien. Mais un petit validate database nous confirmera cela.
DGMGRL>
DGMGRL> validate database rastastby
Database Role: Physical standby database
Primary Database: rastadb
Ready for Switchover: Yes
Ready for Failover: Yes (Primary Running)
Flashback Database Status:
rastadb : On
rastastby: Off
Managed by Clusterware:
rastadb : YES
rastastby: YES
DGMGRL>
Et depuis ma standby database..
SQL> SQL> select * from v$archive_gap; no rows selected SQL>
Cette méthode à le mérite d'être efficace, et d'éviter par exemple de se tromper dans le SCN pour effectuer le backup diff.
Enjoy !