Some more ideas: We put the TSM server in its own storage pool. We do a force move data of all tapes in the offsite each week, that keeps it essentially on one tape and the process is quite fast. I have a perl script that figures all this out and does the move data reconstruct commands. It is relatively easy to do this. The other piece is we use mksysb to restore our TSM server not the TSM server itself. That makes the recovery of the TSM server quick for one that has a 100GB database. We do the mksysb restore which has all the scripts on it for the rest of the DR restore (all clients and the server). Restore the database using the TSM Server DSMSERVER RESTOREDB and we are set. The TSM server backup of itself is used only if there is a file that we need a copy of that is not on the mksysb. I have nearly automated the mksysb process to use a TSM storage pool to manage the tapes that has a bunch of empty private tapes assigned to it. The perl script to do the management is in testing. It will support doing the mksysb and managing the checkout and checkin automated by just kicking off the script.
Some things we are toying with. We are thinking if we create a balanced set of storage pools on tape and do a restore storage pool to disk, then restore the clients, that this is a good way to go. We have not tested it yet. The issue is that active and inactive data are restored, but if you plan your storage pools by priority and have enough disk at the DR site to restore the storage pool for the largest one, then you can just delete the storage pool when you have finished that particular restore (may require a dummy migration). We had considered asking for the restore storage pool to only restore active files, but the problem is it is impossible to accomplish with the way aggregates are managed and have any kind of performance. So our proposed method is: Restore the storage pool to the primary pool on disk. Fire up the clients to restore their root/c: drives (many at once). Then, use tape to restore the databases (they are in different storage pools). This may seem like a hokie way to accomplish something but I am from Virginia and the Hokies are our team. So we think we can win with this. Anyone else have some ideas? Paul D. Seay, Jr. Technical Specialist Naptheon Inc. 757-688-8180 -----Original Message----- From: Robin Sharpe [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 10, 2002 12:03 PM To: [EMAIL PROTECTED] Subject: Re: Strategies for DR recovery of large clients Werner, I feel your pain... ;) You have hit most of the major issues of disaster recovery with TSM squarely on the head. We have had similar experience in our testing... 4-6 hours to get the TSM server up, running through loads of tapes (even though storage pool is collocated with only three servers), 48-hour window, etc. We have proved that we can get our three critical clients back within 24 hours, but they are not nearly as big as yours. We use DLT8000 drives. Probably the best way for you to get better restore throughput is to add more drives and do concurrent restores. TSM should only mount the tapes that actually contain the file versions you will restore. The problem is that, even with collocation, after many months of backups on a relatively active system, these files will get scattered across many tapes. Conventional wisdom suggests using collocation by filespace to reduce this effect... and also guarantee that concurrent restores of different file systems will not compete for the same tape volume. But the cost is of course using a lot more tape. Another approach might be to occasionally (every three months maybe) do a "full" backup (by changing mode to "absolute" to force even unchanged files to get backed up)... this should effectively "defragment" the tape pool and put all active versions on one (or a couple) tape. We did this once with an additional machine that we DR'ed and it worked quite well. Some people don't like this concept because it defeats TSM's "progressive" backup methodology, but I think its an acceptable compromise. As you said, backup sets are not a good option for DR... for one thing, creating the backupset will take as long as restoring the whole system, and will read the same number of tapes. You will suffer this on a regular schedule since you'll have to make new backupsets probably every week or two. Secondly, restoring from backupsets effectively single-threads that client because all of it's data is on one or maybe two tapes. Good luck, and please keep us posted on your results! Robin Sharpe Berlex Labs