Thursday, February 15, 2007

Oracle Error ORA-1578: RMAN to the Rescue

If you are familiar with Oracle error messages, you will recognize that the true error message text for the ORA-1578 error is not, in fact, "RMAN to the Rescue". If you look this error up in the Oracle Database Error Messages manual, you'll find the text is:

ORA-01578: ORACLE data block corrupted (file # string, block # string)
Cause: The data block indicated was corrupted, mostly due to software errors.

Action: Try to restore the segment containing the block indicated. This may involve dropping the segment and recreating it. If there is a trace file, report the errors in it to your ORACLE representative.

Uh-oh. This sounds like it could be some bad news indeed. In particular, the statement "This may involve dropping the segment and recreating it." does not have an especially appealing quality to it when the segment in question is, say, a several hundred million row table, just as an example. It is a time like this that RMAN really can come to the rescue.

"But, we don't use RMAN to backup our database. We only perform user managed backups." I can hear you say. Many people are surprised to learn that even if you don't use RMAN to backup your database (even though you really should!) it can still be used to perform what is known as block media recovery. It is just this situation that I will illustrate here - performing block media recovery on a corrupt data block (ORA-1578) using RMAN but without having an RMAN backup.

Before embarking on this adventure, I think it wise to create a new tablespace, and, thus, datafile dedicated to this purpose. I will, naturally, be using SQL*Plus for this activity:


sqlplus "/ as sysdba"

create tablespace test logging datafile 'D:\ORACLE_DB\LT10GR2\TEST01.DBF' size 1M;

create table ctest (c varchar2(1)) tablespace test;

insert into ctest values ('a');

commit;


So, now I have a test tablespace and a table.

Since I am going to show how to perform block media recovery using a non-RMAN backup, I had better backup the data file! In order to do that, I simply shutdown the database:


shutdown immediate;


Then I backup the file in a Command Prompt window by copying it to a directory called "backup":


D:\oracle_db\lt10gr2>copy TEST01.DBF backup\TEST01.DBF
1 file(s) copied.


NOTE: If your database is in ARCHIVELOG mode, you do not need to shut it down before copying the data file; however, remember to issue an “alter tablespace TEST begin backup;” before copying the data file and an “alter tablespace TEST end backup;” after the file has been copied. As this database is a “sandbox” database on my laptop, it runs in NOARCHIVELOG mode most of the time.


At this time some weird magic happened and a block belonging to the table has become corrupt. It's funny how these things can happen.


Time to start the database back up and see if the block is corrupted from Oracle’s perspective.

Back in my SQLPLUS session:


startup open;


The database has successfully started back up. Now I try to perform a select against the table:


select * from ctest;

ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 12)
ORA-01110: data file 6: 'D:\ORACLE_DB\LT10GR2\TEST01.DBF'


Hey! There’s a familiar looking error message.

OK, so the data block belonging to my CTEST table is definitely corrupt. I’d say it is time for that RMAN magic!

Because I did not use RMAN to perform any backup operations on this database – the “backup” was simply copying a single datafile to another directory – I do not have any RMAN Recovery Catalog. Therefore I specify the “nocatalog” option when I start RMAN:


rman target=/ nocatalog


After connecting to the database with RMAN, I need to tell RMAN about the backup file which is also known as the “datafilecopy” in RMAN terms. I refer to this as “registering” the file with RMAN. Here’s how I do that:


catalog datafilecopy 'D:\ORACLE_DB\LT10GR2\BACKUP\TEST01.DBF';


RMAN responds with something similar to:


cataloged datafile copy
datafile copy filename=D:\ORACLE_DB\LT10GR2\BACKUP\TEST01.DBF recid=2 stamp=614641757


RMAN is now “aware” of this file and can use it to perform the block media recovery. Also note that all of the redo that was generated from the time I copied the file up to the present time is also available on my system. To recover the block simply enter:


blockrecover datafile 6 block 12;


RMAN responds with text similar to the following:


Starting blockrecover at 15-FEB-07
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=148 devtype=DISK

channel ORA_DISK_1: restoring block(s) from datafile copy D:\ORACLE_DB\LT10GR2\BACKUP\TEST01.DBF

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished blockrecover at 15-FEB-07


Now that RMAN has performed the block media recover, I verify if I can access the information in the block that I previously corrupted by issuing the following statement in SQLPLUS:


SQL> select * from ctest;

C
-
a

1 row selected.


How about that? Having never taken an RMAN backup of the database, I was able to use RMAN to completely recover a corrupted block in my table!

Having successfully recovered the data block, I remove the datafilecopy registration in RMAN like so:


change datafilecopy 'D:\ORACLE_DB\LT10GR2\BACKUP\TEST01.DBF' uncatalog;


You may also use the “delete” command if you wish to both remove the registration and delete the physical file in a single command.

One final reminder: the purpose of this was to demonstrate an RMAN capability. If you encounter a corrupt data block in your database and need assistance, please ensure you contact Oracle Support for the proper procedures to recover. Remember this is for demonstration purposes only. Do not perform this in a production server!