Re: [zfs-discuss] multiple disk failure (solved?)
On Feb 1, 2011, at 5:56 AM, Mike Tancsa wrote: > On 1/31/2011 4:19 PM, Mike Tancsa wrote: >> On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >>> Hi Mike, >>> >>> Yes, this is looking much better. >>> >>> Some combination of removing corrupted files indicated in the zpool >>> status -v output, running zpool scrub and then zpool clear should >>> resolve the corruption, but its depends on how bad the corruption is. >>> >>> First, I would try least destruction method: Try to remove the >>> files listed below by using the rm command. >>> >>> This entry probably means that the metadata is corrupted or some >>> other file (like a temp file) no longer exists: >>> >>> tank1/argus-data:<0xc6> >> >> >> Hi Cindy, >> I removed the files that were listed, and now I am left with >> >> errors: Permanent errors have been detected in the following files: >> >>tank1/argus-data:<0xc5> >>tank1/argus-data:<0xc6> >>tank1/argus-data:<0xc7> >> >> I have started a scrub >> scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go > > > Looks like that was it! The scrub finished in the time it estimated and > that was all I needed to do. I did not have to to do zpool clear or any > other commands. Is there anything beyond scrub to check the integrity > of the pool ? That is exactly what scrub does. It validates all data on the disks. > > 0(offsite)# zpool status -v > pool: tank1 > state: ONLINE > scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 > 2011 > config: > >NAMESTATE READ WRITE CKSUM >tank1 ONLINE 0 0 0 > raidz1ONLINE 0 0 0 >ad0 ONLINE 0 0 0 >ad1 ONLINE 0 0 0 >ad4 ONLINE 0 0 0 >ad6 ONLINE 0 0 0 > raidz1ONLINE 0 0 0 >ada0ONLINE 0 0 0 >ada1ONLINE 0 0 0 >ada2ONLINE 0 0 0 >ada3ONLINE 0 0 0 > raidz1ONLINE 0 0 0 >ada5ONLINE 0 0 0 >ada8ONLINE 0 0 0 >ada7ONLINE 0 0 0 >ada6ONLINE 0 0 0 > > errors: No known data errors Congrats! -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
Excellent. I think you are good for now as long as your hardware setup is stable. You survived a severe hardware failure so say a prayer and make sure this doesn't happen again. Always have good backups. Thanks, Cindy On 02/01/11 06:56, Mike Tancsa wrote: On 1/31/2011 4:19 PM, Mike Tancsa wrote: On 1/31/2011 3:14 PM, Cindy Swearingen wrote: Hi Mike, Yes, this is looking much better. Some combination of removing corrupted files indicated in the zpool status -v output, running zpool scrub and then zpool clear should resolve the corruption, but its depends on how bad the corruption is. First, I would try least destruction method: Try to remove the files listed below by using the rm command. This entry probably means that the metadata is corrupted or some other file (like a temp file) no longer exists: tank1/argus-data:<0xc6> Hi Cindy, I removed the files that were listed, and now I am left with errors: Permanent errors have been detected in the following files: tank1/argus-data:<0xc5> tank1/argus-data:<0xc6> tank1/argus-data:<0xc7> I have started a scrub scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go Looks like that was it! The scrub finished in the time it estimated and that was all I needed to do. I did not have to to do zpool clear or any other commands. Is there anything beyond scrub to check the integrity of the pool ? 0(offsite)# zpool status -v pool: tank1 state: ONLINE scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 2011 config: NAMESTATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada5ONLINE 0 0 0 ada8ONLINE 0 0 0 ada7ONLINE 0 0 0 ada6ONLINE 0 0 0 errors: No known data errors 0(offsite)# ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
On 1/31/2011 4:19 PM, Mike Tancsa wrote: > On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >> Hi Mike, >> >> Yes, this is looking much better. >> >> Some combination of removing corrupted files indicated in the zpool >> status -v output, running zpool scrub and then zpool clear should >> resolve the corruption, but its depends on how bad the corruption is. >> >> First, I would try least destruction method: Try to remove the >> files listed below by using the rm command. >> >> This entry probably means that the metadata is corrupted or some >> other file (like a temp file) no longer exists: >> >> tank1/argus-data:<0xc6> > > > Hi Cindy, > I removed the files that were listed, and now I am left with > > errors: Permanent errors have been detected in the following files: > > tank1/argus-data:<0xc5> > tank1/argus-data:<0xc6> > tank1/argus-data:<0xc7> > > I have started a scrub > scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go Looks like that was it! The scrub finished in the time it estimated and that was all I needed to do. I did not have to to do zpool clear or any other commands. Is there anything beyond scrub to check the integrity of the pool ? 0(offsite)# zpool status -v pool: tank1 state: ONLINE scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 2011 config: NAMESTATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada5ONLINE 0 0 0 ada8ONLINE 0 0 0 ada7ONLINE 0 0 0 ada6ONLINE 0 0 0 errors: No known data errors 0(offsite)# ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
On Jan 31, 2011, at 1:19 PM, Mike Tancsa wrote: > On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >> Hi Mike, >> >> Yes, this is looking much better. >> >> Some combination of removing corrupted files indicated in the zpool >> status -v output, running zpool scrub and then zpool clear should >> resolve the corruption, but its depends on how bad the corruption is. >> >> First, I would try least destruction method: Try to remove the >> files listed below by using the rm command. >> >> This entry probably means that the metadata is corrupted or some >> other file (like a temp file) no longer exists: >> >> tank1/argus-data:<0xc6> > > > Hi Cindy, > I removed the files that were listed, and now I am left with > > errors: Permanent errors have been detected in the following files: > >tank1/argus-data:<0xc5> >tank1/argus-data:<0xc6> >tank1/argus-data:<0xc7> > > I have started a scrub > scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go > > I will report back once the scrub is done! The "permanent" errors report shows the current and previous results. When you have multiple failures that are recovered, consider running scrub twice before attempting to correct or delete files. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
On 1/31/2011 3:14 PM, Cindy Swearingen wrote: > Hi Mike, > > Yes, this is looking much better. > > Some combination of removing corrupted files indicated in the zpool > status -v output, running zpool scrub and then zpool clear should > resolve the corruption, but its depends on how bad the corruption is. > > First, I would try least destruction method: Try to remove the > files listed below by using the rm command. > > This entry probably means that the metadata is corrupted or some > other file (like a temp file) no longer exists: > > tank1/argus-data:<0xc6> Hi Cindy, I removed the files that were listed, and now I am left with errors: Permanent errors have been detected in the following files: tank1/argus-data:<0xc5> tank1/argus-data:<0xc6> tank1/argus-data:<0xc7> I have started a scrub scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go I will report back once the scrub is done! ---Mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multiple disk failure (solved?)
Hi Mike, Yes, this is looking much better. Some combination of removing corrupted files indicated in the zpool status -v output, running zpool scrub and then zpool clear should resolve the corruption, but its depends on how bad the corruption is. First, I would try least destruction method: Try to remove the files listed below by using the rm command. This entry probably means that the metadata is corrupted or some other file (like a temp file) no longer exists: tank1/argus-data:<0xc6> If you are able to remove the individual file with rm, run another zpool scrub and then a zpool clear to clear the pool errors. You might need to repeat the zpool scrub/zpool clear combo. If you can't remove the individual files, then you might have to destroy the tank1/argus-data file system. Let us know what actually works. Thanks, Cindy On 01/31/11 12:20, Mike Tancsa wrote: On 1/29/2011 6:18 PM, Richard Elling wrote: On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote: On 1/29/2011 12:57 PM, Richard Elling wrote: 0(offsite)# zpool status pool: tank1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAMESTATE READ WRITE CKSUM tank1 UNAVAIL 0 0 0 insufficient replicas raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada4ONLINE 0 0 0 ada5ONLINE 0 0 0 ada6ONLINE 0 0 0 ada7ONLINE 0 0 0 raidz1UNAVAIL 0 0 0 insufficient replicas ada0UNAVAIL 0 0 0 cannot open ada1UNAVAIL 0 0 0 cannot open ada2UNAVAIL 0 0 0 cannot open ada3UNAVAIL 0 0 0 cannot open 0(offsite)# This is usually easily solved without data loss by making the disks available again. Can you read anything from the disks using any program? Thats the strange thing, the disks are readable. The drive cage just reset a couple of times prior to the crash. But they seem OK now. Same order as well. # camcontrol devlist at scbus0 target 0 lun 0 (pass0,ada0) at scbus0 target 1 lun 0 (pass1,ada1) at scbus0 target 2 lun 0 (pass2,ada2) at scbus0 target 3 lun 0 (pass3,ada3) # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 20+0 records in 20+0 records out 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) 0(offsite)# The next step is to run "zdb -l" and look for all 4 labels. Something like: zdb -l /dev/ada2 If all 4 labels exist for each drive and appear intact, then look more closely at how the OS locates the vdevs. If you can't solve the "UNAVAIL" problem, you won't be able to import the pool. -- richard On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote: On 1/28/2011 4:46 PM, Mike Tancsa wrote: I had just added another set of disks to my zfs array. It looks like the drive cage with the new drives is faulty. I had added a couple of files to the main pool, but not much. Is there any way to restore the pool below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps one file on the new drives in the bad cage. Get another enclosure and verify it works OK. Then move the disks from the suspect enclosure to the tested enclosure and try to import the pool. The problem may be cabling or the controller instead - you didn't specify how the disks were attached or which version of FreeBSD you're using. First off thanks to all who responded on and offlist! Good news (for me) it seems. New cage and all seems to be recognized correctly. The history is ... 2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6 /dev/ada7 2010-06-11.13:49:33 zfs create tank1/argus-data 2010-06-11.13:49:41 zfs create tank1/argus-data/previous 2010-06-11.13:50:38 zfs set compression=off tank1/argus-data 2010-08-06.12:20:59 zpool replace tank1 ad1 ad1 2010-09-16.10:17:51 zpool upgrade -a 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 FreeBSD RELENG_8 from last week, 8G of RAM, amd64. zpool status -v pool: tank1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1ONLINE 0 0
Re: [zfs-discuss] multiple disk failure (solved?)
On 1/29/2011 6:18 PM, Richard Elling wrote: > > On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote: > >> On 1/29/2011 12:57 PM, Richard Elling wrote: 0(offsite)# zpool status pool: tank1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAMESTATE READ WRITE CKSUM tank1 UNAVAIL 0 0 0 insufficient replicas raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada4ONLINE 0 0 0 ada5ONLINE 0 0 0 ada6ONLINE 0 0 0 ada7ONLINE 0 0 0 raidz1UNAVAIL 0 0 0 insufficient replicas ada0UNAVAIL 0 0 0 cannot open ada1UNAVAIL 0 0 0 cannot open ada2UNAVAIL 0 0 0 cannot open ada3UNAVAIL 0 0 0 cannot open 0(offsite)# >>> >>> This is usually easily solved without data loss by making the >>> disks available again. Can you read anything from the disks using >>> any program? >> >> Thats the strange thing, the disks are readable. The drive cage just >> reset a couple of times prior to the crash. But they seem OK now. Same >> order as well. >> >> # camcontrol devlist >> at scbus0 target 0 lun 0 >> (pass0,ada0) >> at scbus0 target 1 lun 0 >> (pass1,ada1) >> at scbus0 target 2 lun 0 >> (pass2,ada2) >> at scbus0 target 3 lun 0 >> (pass3,ada3) >> >> >> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 >> 20+0 records in >> 20+0 records out >> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) >> 0(offsite)# > > The next step is to run "zdb -l" and look for all 4 labels. Something like: > zdb -l /dev/ada2 > > If all 4 labels exist for each drive and appear intact, then look more closely > at how the OS locates the vdevs. If you can't solve the "UNAVAIL" problem, > you won't be able to import the pool. > -- richard On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote: > On 1/28/2011 4:46 PM, Mike Tancsa wrote: >> >> I had just added another set of disks to my zfs array. It looks like the >> drive cage with the new drives is faulty. I had added a couple of files >> to the main pool, but not much. Is there any way to restore the pool >> below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps >> one file on the new drives in the bad cage. > > Get another enclosure and verify it works OK. Then move the disks from > the suspect enclosure to the tested enclosure and try to import the pool. > > The problem may be cabling or the controller instead - you didn't > specify how the disks were attached or which version of FreeBSD you're > using. > First off thanks to all who responded on and offlist! Good news (for me) it seems. New cage and all seems to be recognized correctly. The history is ... 2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6 /dev/ada7 2010-06-11.13:49:33 zfs create tank1/argus-data 2010-06-11.13:49:41 zfs create tank1/argus-data/previous 2010-06-11.13:50:38 zfs set compression=off tank1/argus-data 2010-08-06.12:20:59 zpool replace tank1 ad1 ad1 2010-09-16.10:17:51 zpool upgrade -a 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 FreeBSD RELENG_8 from last week, 8G of RAM, amd64. zpool status -v pool: tank1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada5ONLINE 0 0 0 ada8ONLINE 0 0 0 ada7ONLINE 0 0 0