Friday, March 19, 2010

Damn you, NetBackup!

Had a really interesting issue this morning at the office.  Got a call from my colleague that a mail file was having all kinds of corruption issues.  Everything that could be done against the mail file could had been done, and it was looking like it was time to bring the server down to fix the mail file.  The error that kept flying by on the console was
[1E90:02D3-14D8] **** DbMarkCorrupt(Folder ($Inbox) corrupt), DB=g:\Lotus\Domino\data\mail\user.nsf  TID=[1E90:02D3-14D8] File=dbfolder.c Line=672 ***
As Domino kept seeing the error, it would try to run a consistency check on the database, thus making it difficult for us remove the database.  We operate in a clustered environment so deleting the database and recreating the replica from the cluster mate is the normal mode of operation for us.  But after repeated fixups, updall and compact commands with every imaginable switch thrown in, nothing would either fix the database and no amount of "drop all" and "dbcache flush" commands would release this database.
So after almost giving up on it, I asked the backup administrator if she could check that the backup had completed on the database.  She said it had.  But, I wanted to try one more thing.  I logged into the server console and stopped the Windows service "NetBackup Client Service".  After another "drop all" and "dbache flush" command, the database deleted.  Now we were able to recreate the database from the cluster mate.  So if you ever come across an issue like this where a database appears to be held onto by something, check the backup service.  It saved me from having to explain why I would have to reboot the server to fix a mail database, when it wasn't even Domino's fault. 


Keith Taylor said...

Are you otherwise content with NBU as a backup product for Domino?

Andy Donaldson said...

Yes, we don't have any other issues that I've come across. Wasn't really meant as a slam against NetBackup. Just wanted to point out that it's not always Domino's fault when things go wacky like this.

Marc Champoux said...

Hi Andy ... I've had incredibly similar issues and the solution was the same one as yours. The only difference is that we run Symantec Backup Exec V12 instead of NetBackup. I guess they use the same APIs and lock files up at the same place [under certain conditions].

David Schaffer said...

My fix for that: replicate databases to a workstation running Notes client. Back up those files during a break in the replication schedule. Notes client doesn't lock them. You can create encrypted replicas in case the backup media is lost. No extra agents to run on the server. No disruption if you need to restart the workstation to clear up problems.