Overblog
Editer l'article Suivre ce blog Administration + Créer mon blog
LAMI DBA

Articles autour des SGBD Oracle, SQL Server & PostgreSQL

Dataguard - Résoudre un GAP - Recover From Service

Hi !
Après ce mois de Février riche en articles, je ne pouvais pas laisser le premier jour du mois sans publier. Pour ne pas changer, j'ai décidé de continuer avec mon dataguard dont je vous rappelle la configuration ci-dessous.

GMGRL>
DGMGRL> show configuration

Configuration - mapogos

  Protection Mode: MaxPerformance
  Members:
  rastadb   - Primary database
    rastastby - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 20 seconds ago)

DGMGRL>


Il ne se passe pas grand chose sur mon environnement. Je vais donc simuler ce que l'on aime moyennement avoir sur un DG: Un GAP !
Et pour cela, je vais commencer par modifier l'état de ma base primaire.

 

DGMGRL>
DGMGRL> edit database rastadb set state='TRANSPORT-OFF';
Succeeded.
DGMGRL>


Je vais maintenant créer une table sur un schema de test, et générer quelques archives.

SQL>
SQL> create table lao.t5 tablespace lao as select * from dba_objects;

Table created.

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL>


Vérifions donc ce que cela donne sur ma base primaire & sur ma standby.
Base primaire :

SQL>
SQL> select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)
--------------
           174


Base standby:

SQL>
SQL> select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)
--------------
           171


Toujours sur ma vm rasta (primary), je vais simuler une perte d'archivelog en renommant les trois dernières archives (sequences 172 à 173). En thérorie, dans la vraie vie on essaye toujours d'éviter ce genre de situation, mais toujours est-il que cela peut arriver..

ASMCMD> cp thread_1_seq_172.316.969290245 thread_1_seq_172.316.969290245.old
copying +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_172.316.969290245 -> +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_172.316.969290245.old
ASMCMD> cp thread_1_seq_173.317.969290245 thread_1_seq_173.317.969290245.old
copying +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_173.317.969290245 -> +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_173.317.969290245.old
ASMCMD> cp thread_1_seq_174.323.969290247 thread_1_seq_174.323.969290247.old
copying +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_174.323.969290247 -> +DATA/RASTADB/ARCHIVELOG/2018_02_28/thread_1_seq_174.323.969290247.old
ASMCMD>


ASMCMD> rm thread_1_seq_173.317.969290245
ASMCMD> rm thread_1_seq_174.323.969290247
ASMCMD> rm thread_1_seq_172.316.969290245
ASMCMD>


Je vais également réactiver ma primaire (ou du moins le transport des redos).

DGMGRL>
DGMGRL>  edit database rastadb set state='TRANSPORT-ON';
Succeeded.
DGMGRL>


Me voici donc maintenant dans une situation ou sur un dataguard de production, je viens de perdre des archives qui ne sont ni sur disques, ni sauvegardées.
Et l'impact ne se fait pas attendre.

DGMGRL>
DGMGRL> show configuration

Configuration - mapogos

  Protection Mode: MaxPerformance
  Members:
  rastadb   - Primary database
    Error: ORA-16724: cannot resolve gap for one or more members

    rastastby - Physical standby database
      Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 17 seconds ago)


Ce qui est certain, c'est qu'il préférable ne pas avoir à faire un switchover  / failover dans les moments à venir.

DGMGRL> validate database rastastby ;

  Database Role:     Physical standby database
  Primary Database:  rastadb

  Ready for Switchover:  No
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    rastadb  :  On
    rastastby:  Off

  Managed by Clusterware:
    rastadb  :  YES
    rastastby:  YES

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        19 minutes 20 seconds (computed 0 seconds ago)
    Apply Delay:      0 minutes

DGMGRL>


Nous avons la confirmation que ma standby n'est pas prête pour un switchover. Certes, le failover est possible. Mais cela impliquera une perte de données. On peut d'ailleurs interroger la vue v$archive_gap pour constater qu'un GAP est en cours.

SQL>
SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#     CON_ID
---------- ------------- -------------- ----------
         1           172            174          1

SQL>


Dans la mesure ou ces archives sont définitivement perdues nous avons deux options:

 

  1. Recréer la standby. Tout le monde a son petit script pour réinitialiser une standby sur son environnement. Cependant, dans le cas de base à très forte volumétrie, cela va durer longtemps, consommer des ressources.
  2. Resynchroniser la standby en effectuant une sauvegarde différentielle.

Sauf à gérer des petites bases, on va pencher assez rapidement vers l'option 2.
Avant la version 12c, il fallait

  1. identifier le SCN ou la synchro s'est arreté afin d'effectuer la sauvegarde différentielle
  2. Copier les backupset sur le serveur ou se trouve la standby à synchroniser
  3. Procéder à la restauration.

Oracle 12c a introduit la notion de restauration  à travers un service un peu à la manière d'un "duplicate from active".

Avant de débuter les opérations de re-synchro, je vais stopper le transfert des redo & l'apply afin d'éviter des interférences.

DGMGRL>
DGMGRL>
DGMGRL> edit database rastastby set state='APPLY-OFF';
Succeeded.
DGMGRL>
DGMGRL> edit database rastadb set state='TRANSPORT-OFF';
Succeeded.
DGMGRL>
DGMGRL>


Sur ma VM PrettyBoy (standby), je me connecte via rman. Et comme rman permet maintenant de faire du sql (je veux dire directement, sans l'instruction sql), j'en profite pour vérifier que ma standby est en mode mount;

RMAN>

RMAN> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

RMAN>


Nous sommes prêt à effectuer un recover de ma standby à travers un service. Dans mon cas, j'utilise l'alias RASTADB de mon tnsnames.ora qui pointe sur le serveur primaire.

RMAN> recover database from service RASTADB noredo using compressed backupset;

Starting recover at 28-FEB-18
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=269 device type=DISK
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service RASTADB
destination for restore of datafile 00001: +DATA/RASTASTBY/DATAFILE/system.263.965579849
channel ORA_DISK_1: restore complete, elapsed time: 00:00:04
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service RASTADB
destination for restore of datafile 00003: +DATA/RASTASTBY/DATAFILE/sysaux.264.965579865
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service RASTADB
destination for restore of datafile 00004: +DATA/RASTASTBY/DATAFILE/undotbs1.265.965579871
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service RASTADB
destination for restore of datafile 00005: +DATA/RASTASTBY/DATAFILE/lao.342.966869437
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service RASTADB
destination for restore of datafile 00007: +DATA/RASTASTBY/DATAFILE/users.266.965579873
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

Finished recover at 28-FEB-18

RMAN>


Remarque : Dans le cas de plus grosses bases, et d'un delta plus important, il est tout à faire possible de paralléliser le processus.
Ceci étant fait, il faut maintenant passer la standby en nomount afin de regenerer un fichier controlfile depuis la base primaire. Une fois cela fait, je repasse la base en mode mount.

RMAN>

RMAN> restore standby controlfile from service RASTADB;

Starting restore at 28-FEB-18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service RASTADB
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:02
output file name=+DATA/RASTASTBY/CONTROLFILE/current.261.965579843
output file name=+DATA/RASTASTBY/CONTROLFILE/current.262.965579843
Finished restore at 28-FEB-18

RMAN>

RMAN> alter database mount;

Statement processed
released channel: ORA_DISK_1

RMAN>


Remarque: Je suis ici dans un cas simple ou le delta est faible & ou il n'y a pas d'activité sur la base primaire. Cependant, il n'est pas impossible qu'entre le début du gap et le moment ou l'on se lance dans la resynchro, de nouveaux datafiles aient été crées. Il faudrait donc également en effectuer la restauration toujours à travers un service. Qui sait, peut être un prochain article, ou un amendement à celui-ci.
Le fait d'avoir restauré un controlfile depuis la primaire implique que pour le moment il peut y avoir incohérence entre les informations contenues dans le controlfile et la réalité. Nous allons remédier à cela.

RMAN>
RMAN> catalog start with '+DATA';

Starting implicit crosscheck backup at 28-FEB-18
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=32 device type=DISK
Crosschecked 10 objects
Finished implicit crosscheck backup at 28-FEB-18

Starting implicit crosscheck copy at 28-FEB-18
using channel ORA_DISK_1
Finished implicit crosscheck copy at 28-FEB-18

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: +DATA/rastastby/TEMPFILE/temp.285.965579907
File Name: +DATA/rastastby/ARCHIVELOG/2018_02_28/thread_1_seq_166.452.969289737
...
File Name: +DATA/rastastby/ARCHIVELOG/2018_01_16/thread_1_seq_42.324.965598559
File Name: +DATA/rastastby/DATAFILE/system.263.965579849
File Name: +DATA/rastastby/DATAFILE/sysaux.264.965579865
File Name: +DATA/rastastby/DATAFILE/undotbs1.265.965579871
File Name: +DATA/rastastby/DATAFILE/users.266.965579873
File Name: +DATA/rastastby/DATAFILE/lao.342.966869437

searching for all files that match the pattern +DATA

List of Files Unknown to the Database
=====================================
File Name: +DATA/orapwasm
File Name: +DATA/rastastby/PARAMETERFILE/spfile.300.965594883
File Name: +DATA/rastastby/TEMPFILE/temp.285.965579907
File Name: +DATA/rastastby/ONLINELOG/group_1.269.965579879
File Name: +DATA/rastastby/ONLINELOG/group_1.270.965579879
File Name: +DATA/rastastby/ONLINELOG/group_2.271.965579881
File Name: +DATA/rastastby/ONLINELOG/group_2.272.965579883
File Name: +DATA/rastastby/ONLINELOG/group_3.273.965579883
File Name: +DATA/rastastby/ONLINELOG/group_3.274.965579885
File Name: +DATA/rastastby/ONLINELOG/group_4.275.965579885
File Name: +DATA/rastastby/ONLINELOG/group_4.276.965579887
File Name: +DATA/rastastby/ONLINELOG/group_5.277.965579889
File Name: +DATA/rastastby/ONLINELOG/group_5.278.965579889
File Name: +DATA/rastastby/ONLINELOG/group_6.279.965579891
File Name: +DATA/rastastby/ONLINELOG/group_6.280.965579891
File Name: +DATA/rastastby/ONLINELOG/group_7.284.969198429
File Name: +DATA/rastastby/ONLINELOG/group_7.283.969198429
File Name: +DATA/rastastby/CONTROLFILE/current.257.965579843
File Name: +DATA/rastastby/CONTROLFILE/current.258.965579843
File Name: +DATA/rastastby/CONTROLFILE/current.259.965579843
File Name: +DATA/rastastby/CONTROLFILE/current.260.965579843
File Name: +DATA/ASM/ASMPARAMETERFILE/registry.253.965575363

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: +DATA/rastastby/TEMPFILE/temp.285.965579907

List of Files Which Were Not Cataloged
=======================================
File Name: +DATA/orapwasm
  RMAN-07518: Reason: Foreign database file DBID: 0  Database Name:
File Name: +DATA/rastastby/PARAMETERFILE/spfile.300.965594883
  RMAN-07518: Reason: Foreign database file DBID: 0  Database Name:
File Name: +DATA/rastastby/ONLINELOG/group_1.269.965579879
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_1.270.965579879
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_2.271.965579881
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_2.272.965579883
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_3.273.965579883
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_3.274.965579885
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_4.275.965579885
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_4.276.965579887
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_5.277.965579889
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_5.278.965579889
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_6.279.965579891
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_6.280.965579891
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_7.284.969198429
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/ONLINELOG/group_7.283.969198429
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: +DATA/rastastby/CONTROLFILE/current.257.965579843
  RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/rastastby/CONTROLFILE/current.258.965579843
  RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/rastastby/CONTROLFILE/current.259.965579843
  RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/rastastby/CONTROLFILE/current.260.965579843
  RMAN-07517: Reason: The file header is corrupted
File Name: +DATA/ASM/ASMPARAMETERFILE/registry.253.965575363
  RMAN-07518: Reason: Foreign database file DBID: 0  Database Name:

RMAN>


Nous y sommes presque..

RMAN>

RMAN> switch database to copy;

datafile 1 switched to datafile copy "+DATA/rastastby/DATAFILE/system.263.965579849"
datafile 3 switched to datafile copy "+DATA/rastastby/DATAFILE/sysaux.264.965579865"
datafile 4 switched to datafile copy "+DATA/rastastby/DATAFILE/undotbs1.265.965579871"
datafile 5 switched to datafile copy "+DATA/rastastby/DATAFILE/lao.342.966869437"
datafile 7 switched to datafile copy "+DATA/rastastby/DATAFILE/users.266.965579873"

RMAN>


Il nous suffit maintenant à faire un peu de vide dans les redo & les Standby Redo. Dans mon cas 3 groupes de redos  &  4 (3+1) groupes de Standby Redo.

SQL>
SQL>
SQL> alter database clear logfile group 1;

Database altered.

SQL> alter database clear logfile group 2;

Database altered.


SQL> alter database clear logfile group 3;

Database altered.

SQL> alter database clear logfile group 4;

Database altered.

SQL> alter database clear logfile group 5;

Database altered.

SQL> alter database clear logfile group 6;

Database altered.

SQL> alter database clear logfile group 7;

Database altered.


Normalement, nous y sommes... Plus qu'à réctiver transport & APPLY

DGMGRL>
DGMGRL> edit database rastadb set state='TRANSPORT-ON';
Succeeded.
DGMGRL>
DGMGRL> edit database rastastby set state='APPLY-ON';
Succeeded.
DGMGRL>

Le moment de vérité est arrivé.. Vérifions.

DGMGRL>
DGMGRL> show configuration

Configuration - mapogos

  Protection Mode: MaxPerformance
  Members:
  rastadb   - Primary database
    rastastby - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 21 seconds ago)


Cela a l'air plutôt bien. Mais un petit validate database nous confirmera cela.

DGMGRL>
DGMGRL> validate database rastastby

  Database Role:     Physical standby database
  Primary Database:  rastadb

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    rastadb  :  On
    rastastby:  Off

  Managed by Clusterware:
    rastadb  :  YES
    rastastby:  YES

DGMGRL>


Et depuis ma standby database..

SQL>
SQL> select * from v$archive_gap;

no rows selected

SQL>


Cette méthode à le mérite d'être efficace, et d'éviter par exemple de se tromper dans le SCN pour effectuer le backup diff.

Enjoy !


 

Partager cet article
Repost0
Pour être informé des derniers articles, inscrivez vous :
Commenter cet article