Re: [zfs-discuss] Checksum errors with SSD.
Dear Cindy and Edward Many thanks for your input. Indeed there is something wrong with the SSD. Smartmontools confirm me also couples of errors. So I open a case and hopefully they will replace the SSD. What I learned? - Be careful of special offers - Use also rock solid components for your homeserver - Use ZFS, Scrub regularly Best regards and many thanks for all your help and keep up the good work! Benjamin -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Checksum errors with SSD.
Dear Forum I use a KINGSTON SNV125-S2/30GB SSD on a ASUS M3A78-CM Motherboard (AMD SB700 Chipset). SATA Type (in BIOS) is SATA Os : SunOS homesvr 5.11 snv_134 i86pc i386 i86pc When I scrub my pool I got a lot of checksum errors : NAMESTATE READ WRITE CKSUM rpool DEGRADED 0 0 5 c8d0s0DEGRADED 0 071 too many errors zpool clear rpool works after a scrub I have again the same situation. fmstat looks like this : module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz cpumem-retire0 0 0.00.0 0 0 0 0 0 0 disk-transport 0 0 0.0 1541.1 0 0 0 032b 0 eft 1 0 0.04.7 0 0 0 0 1.2M 0 ext-event-transport 3 0 0.02.1 0 0 0 0 0 0 fabric-xlate 0 0 0.00.0 0 0 0 0 0 0 fmd-self-diagnosis 6 0 0.00.0 0 0 0 0 0 0 io-retire0 0 0.00.0 0 0 0 0 0 0 sensor-transport 0 0 0.0 37.3 0 0 0 032b 0 snmp-trapgen 3 0 0.01.1 0 0 0 0 0 0 sysevent-transport 0 0 0.0 2836.3 0 0 0 0 0 0 syslog-msgs 3 0 0.02.7 0 0 0 0 0 0 zfs-diagnosis 91 77 0.0 28.9 0 0 2 1 336b 280b zfs-retire 10 0 0.0 387.9 0 0 0 0 620b 0 fmadm looks like this : --- -- - TIMEEVENT-ID MSG-ID SEVERITY --- -- - Jun 30 16:37:28 806072e5-7cd6-efc1-c89d-d40bce4adf72 ZFS-8000-GHMajor Host: homesvr Platform: System-Product-Name Chassis_id : System-Serial-Number Product_sn : Fault class : fault.fs.zfs.vdev.checksum Affects : zfs://pool=rpool/vdev=f7dad7554a72b3bc faulted but still in service Problem in : zfs://pool=rpool/vdev=f7dad7554a72b3bc faulted but still in service In /var/adm/messages I don't have any abnormal issues. I can put the SSD also on a other SATA-Port but without success. My other HDD runs smoothly : NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4d1ONLINE 0 0 0 c5d0ONLINE 0 0 0 iostat gives me following : c4d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: WDC WD10EVDS-63 Revision: Serial No: WD-WCAV592 Size: 1000.20GB 1000202305536 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c5d0 Soft Errors: 981 Hard Errors: 0 Transport Errors: 981 Model: Hitachi HDS7210 Revision: Serial No: JP2921HQ0 Size: 1000.20GB 1000202305536 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c8d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: KINGSTON SSDNOW Revision: Serial No: 30PM10I Size: 30.02GB 30016659456 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Any hints? Best regards and many thanks for your help! Benjamin -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors with SSD.
Hi Benjamin, I'm not familiar with this disk but you can see the fmstat output that disk, system event, and zfs-related diagnostics are on overtime about something and its probably this disk. You can get further details from fmdump -eV and you will probably see lots of checksum errors on this disk. You might review some of the h/w diagnostic recommendations in this wiki: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide I would recommend replacing the disk, soon, or figure out what other issue might be causing problems for this disk. Thanks, Cindy Benjamin Grogg wrote: Dear Forum I use a KINGSTON SNV125-S2/30GB SSD on a ASUS M3A78-CM Motherboard (AMD SB700 Chipset). SATA Type (in BIOS) is SATA Os : SunOS homesvr 5.11 snv_134 i86pc i386 i86pc When I scrub my pool I got a lot of checksum errors : NAMESTATE READ WRITE CKSUM rpool DEGRADED 0 0 5 c8d0s0DEGRADED 0 071 too many errors zpool clear rpool works after a scrub I have again the same situation. fmstat looks like this : module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz cpumem-retire0 0 0.00.0 0 0 0 0 0 0 disk-transport 0 0 0.0 1541.1 0 0 0 032b 0 eft 1 0 0.04.7 0 0 0 0 1.2M 0 ext-event-transport 3 0 0.02.1 0 0 0 0 0 0 fabric-xlate 0 0 0.00.0 0 0 0 0 0 0 fmd-self-diagnosis 6 0 0.00.0 0 0 0 0 0 0 io-retire0 0 0.00.0 0 0 0 0 0 0 sensor-transport 0 0 0.0 37.3 0 0 0 032b 0 snmp-trapgen 3 0 0.01.1 0 0 0 0 0 0 sysevent-transport 0 0 0.0 2836.3 0 0 0 0 0 0 syslog-msgs 3 0 0.02.7 0 0 0 0 0 0 zfs-diagnosis 91 77 0.0 28.9 0 0 2 1 336b 280b zfs-retire 10 0 0.0 387.9 0 0 0 0 620b 0 fmadm looks like this : --- -- - TIMEEVENT-ID MSG-ID SEVERITY --- -- - Jun 30 16:37:28 806072e5-7cd6-efc1-c89d-d40bce4adf72 ZFS-8000-GHMajor Host: homesvr Platform: System-Product-Name Chassis_id : System-Serial-Number Product_sn : Fault class : fault.fs.zfs.vdev.checksum Affects : zfs://pool=rpool/vdev=f7dad7554a72b3bc faulted but still in service Problem in : zfs://pool=rpool/vdev=f7dad7554a72b3bc faulted but still in service In /var/adm/messages I don't have any abnormal issues. I can put the SSD also on a other SATA-Port but without success. My other HDD runs smoothly : NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4d1ONLINE 0 0 0 c5d0ONLINE 0 0 0 iostat gives me following : c4d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: WDC WD10EVDS-63 Revision: Serial No: WD-WCAV592 Size: 1000.20GB 1000202305536 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c5d0 Soft Errors: 981 Hard Errors: 0 Transport Errors: 981 Model: Hitachi HDS7210 Revision: Serial No: JP2921HQ0 Size: 1000.20GB 1000202305536 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c8d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: KINGSTON SSDNOW Revision: Serial No: 30PM10I Size: 30.02GB 30016659456 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Any hints? Best regards and many thanks for your help! Benjamin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors with SSD.
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Benjamin Grogg When I scrub my pool I got a lot of checksum errors : NAMESTATE READ WRITE CKSUM rpool DEGRADED 0 0 5 c8d0s0DEGRADED 0 071 too many errors Any hints? What's the confusion? Replace the drive. If you think it's a false positive (drive is not actually failing) then you would zpool clear, (or online, or whatever, until the pool looks normal again) and then scrub. If the errors come back, it definitely means the drive is failing. Or perhaps the sata cable that connects to it, or perhaps the controller. But 99% certain the drive. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Checksum errors on and after resilver
Hi all, I recently experienced a disk failure on my home server and observed checksum errors while resilvering the pool and on the first scrub after the resilver had completed. Now everything seems fine but I'm posting this to get help with calming my nerves and detect any possible future faults. Lets start with some specs. OSOL 2009.06 Intel SASUC8i (w LSI 1.30IT FW) Gigabyte MA770-UD3 mobo w 8GB ECC RAM Hitachi P7K500 harddrives When checking the condition of my pool some days ago (yes I should make it mail me if something like this happens again) one disk in my pool was labeled as Removed with a small number of read errors, nineish I think, all other disks where fine. I removed tested (DFT crashed so the disk seemed very broken) replaced the drive and started a resilver. Checking the status of the resilver everything looked good from the start but when it was finished the status report looked like this: pool: sasuc8i state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 4h9m with 0 errors on Mon Apr 12 18:12:26 2010 config: NAME STATE READ WRITE CKSUM sasuc8i ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c12t4d0 ONLINE 0 0 5 108K resilvered c12t8d0 ONLINE 0 0 0 254G resilvered c12t6d0 ONLINE 0 0 0 c12t7d0 ONLINE 0 0 0 c12t0d0 ONLINE 0 0 1 21.5K resilvered c12t1d0 ONLINE 0 0 2 43K resilvered c12t2d0 ONLINE 0 0 4 86K resilvered c12t3d0 ONLINE 0 0 1 21.5K resilvered errors: No known data errors All I really cared about at this point was the Applications are unaffected and No known data errors and I thought that the checksum errors might be down to the failing drive (c12t5d0 failed, the controlled labeled the new drive as c12t8d0) going out during a write. Then again ZFS is atomic, better clear the errors and run a scrub, it came out like this: pool: sasuc8i state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 1h16m with 0 errors on Tue Apr 13 01:29:32 2010 config: NAME STATE READ WRITE CKSUM sasuc8i ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c12t4d0 ONLINE 0 0 5 c12t8d0 ONLINE 0 0 0 c12t6d0 ONLINE 0 0 0 c12t7d0 ONLINE 0 0 4 86K repaired c12t0d0 ONLINE 0 0 1 c12t1d0 ONLINE 0 0 6 86K repaired c12t2d0 ONLINE 0 0 4 c12t3d0 ONLINE 0 0 6 108K repaired errors: No known data errors Now I'm getting nervous. Checksum errors, some repaired others not. Am I going to end up with multiple drive failures or what the * is going on here? Ran one more scrub and everything came up roses. Checked smart status on the drives with checksum errors and they are fine, allthough I expect only read/write errors would show up there. I'm not sure of how to get this into a propper question but what I'm after is is this normal to be expected after a resilver and can I start breathing again?. Checksum errors are as far as I can gather dodgy data on disk and read/write somewhere in the physical link (more or less). Thank you! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors on and after resilver
[this seems to be the question of the day, today...] On Apr 14, 2010, at 2:57 AM, bonso wrote: Hi all, I recently experienced a disk failure on my home server and observed checksum errors while resilvering the pool and on the first scrub after the resilver had completed. Now everything seems fine but I'm posting this to get help with calming my nerves and detect any possible future faults. Lets start with some specs. OSOL 2009.06 Intel SASUC8i (w LSI 1.30IT FW) Gigabyte MA770-UD3 mobo w 8GB ECC RAM Hitachi P7K500 harddrives When checking the condition of my pool some days ago (yes I should make it mail me if something like this happens again) one disk in my pool was labeled as Removed with a small number of read errors, nineish I think, all other disks where fine. I removed tested (DFT crashed so the disk seemed very broken) replaced the drive and started a resilver. Checking the status of the resilver everything looked good from the start but when it was finished the status report looked like this: pool: sasuc8i state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 4h9m with 0 errors on Mon Apr 12 18:12:26 2010 config: NAME STATE READ WRITE CKSUM sasuc8i ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c12t4d0 ONLINE 0 0 5 108K resilvered c12t8d0 ONLINE 0 0 0 254G resilvered c12t6d0 ONLINE 0 0 0 c12t7d0 ONLINE 0 0 0 c12t0d0 ONLINE 0 0 1 21.5K resilvered c12t1d0 ONLINE 0 0 2 43K resilvered c12t2d0 ONLINE 0 0 4 86K resilvered c12t3d0 ONLINE 0 0 1 21.5K resilvered errors: No known data errors All I really cared about at this point was the Applications are unaffected and No known data errors and I thought that the checksum errors might be down to the failing drive (c12t5d0 failed, the controlled labeled the new drive as c12t8d0) going out during a write. Then again ZFS is atomic, better clear the errors and run a scrub, it came out like this: pool: sasuc8i state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 1h16m with 0 errors on Tue Apr 13 01:29:32 2010 config: NAME STATE READ WRITE CKSUM sasuc8i ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c12t4d0 ONLINE 0 0 5 c12t8d0 ONLINE 0 0 0 c12t6d0 ONLINE 0 0 0 c12t7d0 ONLINE 0 0 4 86K repaired c12t0d0 ONLINE 0 0 1 c12t1d0 ONLINE 0 0 6 86K repaired c12t2d0 ONLINE 0 0 4 c12t3d0 ONLINE 0 0 6 108K repaired errors: No known data errors Now I'm getting nervous. Checksum errors, some repaired others not. Am I going to end up with multiple drive failures or what the * is going on here? When I see many disks suddenly reporting errors, I suspect a common element: HBA, cables, backplane, mobo, CPU, power supply, etc. If you search the zfs-discuss archives you can find instances where HBA firmware, driver issues, or firmware+driver interactions caused such reports. Cabling and power supplies are less commonly reported. Ran one more scrub and everything came up roses. Checked smart status on the drives with checksum errors and they are fine, allthough I expect only read/write errors would show up there. I'm not sure of how to get this into a propper question but what I'm after is is this normal to be expected after a resilver and can I start breathing again?. Checksum errors are as far as I can gather dodgy data on disk and read/write somewhere in the physical link (more or less). Breathing is good. Then check your firmware releases. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] checksum errors increasing on spare vdev?
Hi, One of my colleagues was confused by the output of 'zpool status' on a pool where a hot spare is being resilvered in after a drive failure: $ zpool status data pool: data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h56m, 23.78% done, 3h1m to go config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 c0t2d0ONLINE 0 0 0 c1t2d0ONLINE 0 0 0 c0t4d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 c1t4d0ONLINE 0 0 0 c0t7d0ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 spare DEGRADED 0 0 2.89M c0t1d0 REMOVED 0 0 0 c0t6d0 ONLINE 0 0 0 59.3G resilvered c1t5d0ONLINE 0 0 0 c0t3d0ONLINE 0 0 0 c1t1d0ONLINE 0 0 0 c1t3d0ONLINE 0 0 0 c1t6d0ONLINE 0 0 0 spares c0t6d0 INUSE currently in use The CKSUM error count is increasing so he thought that the spare was also failing. I disagreed because the errors were being recorded on the fake vdev spare, but I want to make sure my hunch is correct. My hunch is that since reads from userland continue to come to the pool, and since it's raidz, some of those reads will be for zobject addresses on the failed drive, now represented by the spare. Because the data at those addresses is uninitialized, we get checksum errors. I guess I really have two questions: 1. Am I correct about the source of the checksum errors attributed to the spare vdev? 2. During raidz resilver, if a read happens for an address that is among what's already been resilvered, will that read succeed, or will ALL reads to that top-level vdev require reconstruction from the other leaf vdevs? If the answer to #2 is that reads will succeed if they ask for data that's been resilvered, then I might expect my read performance to increase as resilver progresses, as less and less data requires reconstruction. I haven't measured this in a controlled environment though, so I'm mostly just curious about the theory. Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Checksum errors
pool: space01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 2.48% done, 4h18m to go config: NAME STATE READ WRITE CKSUM space01 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 2 errors: No known data errors The last drive shows two checksum errors, but iostat(1M) shows no hardware errors on that disk: iostat -Ene | grep Hard | grep c1t11d0 c1t11d0 Soft Errors: 178 Hard Errors: 0 Transport Errors: 0 I'm not sure what I need to do, respectively how else I can determine if the device needs replaced. Do I perform zpool clear, or do I need to replace c1t11d0, or do I rerun scrub? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors
Hi UNIX admin, I would check fmdump -eV output to see if this error is isolated or persistent. If fmdump says this error is isolated, then you might just monitor the status. For example, if fmdump says that these errors occurred on 6/15 and you moved this system on that date or you know that someone shouted at c1t11d0 on that date, then those events might explain this issue and you can use zpool clear to clear the error state. If fmdump says the c1t11d0 error persists over a period of time, then I could consider replacing this device. You can review more diagnostic tips here: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Resolving_Hardware_Problems Cindy UNIX admin wrote: pool: space01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 2.48% done, 4h18m to go config: NAME STATE READ WRITE CKSUM space01 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 2 errors: No known data errors The last drive shows two checksum errors, but iostat(1M) shows no hardware errors on that disk: iostat -Ene | grep Hard | grep c1t11d0 c1t11d0 Soft Errors: 178 Hard Errors: 0 Transport Errors: 0 I'm not sure what I need to do, respectively how else I can determine if the device needs replaced. Do I perform zpool clear, or do I need to replace c1t11d0, or do I rerun scrub? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] checksum errors on Sun Fire X4500
I have b105 running on a Sun Fire X4500, and I am constantly seeing checksum errors reported by zpool status. The errors are showing up over time on every disk in the pool. In normal operation there might be errors on two or three disks each day, and sometimes there are enough errors so it reports too many errors, and the disk goes into a degraded state. I have had to remove the spares from the pool because otherwise the spares get pulled into the pool to replace the drives. There are no reported hardware problems with any of the drives. I have run scrub multiple times, and this also generates checksum errors. After the scrub completes the checksums continue to occur during normal operation. This problem also occurred with b103. Before that Solaris 10u4 was installed on the server, and it never had any checksum errors. With the OpenSolaris builds I am running CIFS Server, and that's the only difference in server function from when Solaris 10u4 was installed on it. Is this a known issue? Any suggestions or workarounds? Thank you. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on Sun Fire X4500
Hi Jay, Jay Anderson schrieb: I have b105 running on a Sun Fire X4500, and I am constantly seeing checksum errors reported by zpool status. The errors are showing up over time on every disk in the pool. In normal operation there might be errors on two or three disks each day, and sometimes there are enough errors so it reports too many errors, and the disk goes into a degraded state. I have had to remove the spares from the pool because otherwise the spares get pulled into the pool to replace the drives. There are no reported hardware problems with any of the drives. I have run scrub multiple times, and this also generates checksum errors. After the scrub completes the checksums continue to occur during normal operation. This problem also occurred with b103. Before that Solaris 10u4 was installed on the server, and it never had any checksum errors. With the OpenSolaris builds I am running CIFS Server, and that's the only difference in server function from when Solaris 10u4 was installed on it. Is this a known issue? Any suggestions or workarounds? We had something similar two or three disk slots which started to act weird and failed quite often - usually starting with a high error rate. After exchanging two hard drives, the Sun hotline initiated to exchange the backplane - essentially the chassis was replaced. Since then, we have not encountered anything like this anymore. So it *might* be the backplane or a broken Marvell controller, but it's hard to judge. HTH Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] checksum errors after online'ing device
Dear all As we wanted to patch one of our iSCSI Solaris servers we had to offline the ZFS submirrors on the clients connected to that server. The devices connected to the second server stayed online so the pools on the clients were still available but in degraded mode. When the server came back up we onlined the devices on the clients an the resilver completed pretty quickly as the filesystem was read-mostly (ftp, http server) Nevertheless during the first hour of operation after onlining we recognized numerous checksum errors on the formerly offlined device. We decided to scrub the pool and after several hours we got about 3500 error in 600GB of data. I always thought that ZFS would sync the mirror immediately after bringing the device online not requiring a scrub. Am I wrong? Both, servers and clients run s10u5 with the latest patches but we saw the same behaviour with OpenSolaris clients Any hints? Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors after online'ing device
tn == Thomas Nau [EMAIL PROTECTED] writes: tn Nevertheless during the first hour of operation after onlining tn we recognized numerous checksum errors on the formerly tn offlined device. We decided to scrub the pool and after tn several hours we got about 3500 error in 600GB of data. Did you use 'zpool offline' when you took them down, or did you offline them some other way, like by breaking the network connection, stopping the iSCSI target daemon, or 'iscsiadm remove discovery-address ..' on the initiator? This is my experience, too (but with old b71). I'm also using iSCSI. It might be a variant of this: http://bugs.opensolaris.org/view_bug.do?bug_id=6675685 checksum errors after 'zfs offline ; reboot' Aside from the fact the checksum-errored blocks are silently not redundant, it's also interesting because I think, in general, there are a variety of things which can cause checksum errors besides disk/cable/controller problems. I wonder if they're useful for diagnosing disk problems only in very gently-used setups, or not at all? Another iSCSI problem: for me, the targets I've 'zpool offline'd will automatically ONLINE themselves when iSCSI rediscovers them. but only sometimes. I haven't figured out how to predict when they will and when they won't. pgpo9BOlPemM3.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors after online'ing device
Miles On Sat, 2 Aug 2008, Miles Nordin wrote: tn == Thomas Nau [EMAIL PROTECTED] writes: tn Nevertheless during the first hour of operation after onlining tn we recognized numerous checksum errors on the formerly tn offlined device. We decided to scrub the pool and after tn several hours we got about 3500 error in 600GB of data. Did you use 'zpool offline' when you took them down, or did you offline them some other way, like by breaking the network connection, stopping the iSCSI target daemon, or 'iscsiadm remove discovery-address ..' on the initiator? We did a zpool offline, nothing else, before we took the iSCSI server down Another iSCSI problem: for me, the targets I've 'zpool offline'd will automatically ONLINE themselves when iSCSI rediscovers them. but only sometimes. I haven't figured out how to predict when they will and when they won't. I never experienced that one but we usually don't touch any of the iSCSI settings as long as a devices is offline. At least as long as we don't have to for any reason Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors after online'ing device
tn == Thomas Nau [EMAIL PROTECTED] writes: tn I never experienced that one but we usually don't touch any of tn the iSCSI settings as long as a devices is offline. At least tn as long as we don't have to for any reason Usually I do 'zpool offline' followed by 'iscsiadm remove discovery-address ...' This is for two reasons: 1. At least with my old crappy Linux IET, it doesn't restore the sessions unless I remove and add the discovery-address 2. the auto-ONLINEing-on-discovery problem. Removing the discovery address makes absolutely sure ZFS doesn't ONLINE something before I want it to. If you have to do this maintenance again, you might want to try removing the discovery address for reason #2. Maybe when your iSCSI target was coming back up, it bounced a bit. so, when the target was coming back up, you might have done the equivalent of removing the target without 'zpool offline'ing first (and then immediately plugging it back in). That's the ritual I've been using anyway. If anything unexpected happens, I still have to manually scrub the whole pool to seek out all these hidden ``checksum'' errors. Hopefully some day you will be able to just look in fmdump and see ``yup, the target bounced once as it was coming back up.'' and targets will be able to bounce as much as they like with failmode=wait, or for short reasonable timeouts with other failmodes, and automatically do fully-adequate but efficient resilvers with proper dirty-region-logging without causing any latent checksum errors. and zpool offline'd devices will stay offline until reboot as promised, and will never online themselves. and iSCSI sessions will always come up on their own without having to kick the initiator. pgpPajiw7r2cN.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
I wrote: Bill Sommerfeld wrote: On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote: I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors once is accident. twice is coincidence. three times is enemy action :-) I'll file a bug as soon as I can I filed 6727872, for the problem with zpool scrub checksum errors on unmounted zfs filesystems with an unplayed ZIL. 6727872 has already been fixed, in what will become snv_96. For my zpool, zpool scrub doesn't report checksum errors any more. But: something is still a bit strange with the data reported by zpool status. The error counts displayed by zpool status are all 0 (during the scrub, and when the scrub has completed), but when zpool scrub completes it tells me that scrub completed after 0h58m with 6 errors. But it doesn't list the errors. # zpool status -v files pool: files state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub in progress for 0h57m, 99.39% done, 0h0m to go config: NAME STATE READ WRITE CKSUM files ONLINE 0 0 0 mirror ONLINE 0 0 0 c8t0d0s6 ONLINE 0 0 0 c9t0d0s6 ONLINE 0 0 0 errors: No known data errors # zpool status -v files pool: files state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub completed after 0h58m with 6 errors on Wed Jul 23 18:23:00 2008 config: NAME STATE READ WRITE CKSUM files ONLINE 0 0 0 mirror ONLINE 0 0 0 c8t0d0s6 ONLINE 0 0 0 c9t0d0s6 ONLINE 0 0 0 errors: No known data errors This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
Bill Sommerfeld wrote: On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote: I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors out of curiosity, is this a root pool? It started as standard pool, and is using version 3 zpool format. I'm using a small ufs root, and have /usr as a zfs filesystem on that pool. At some point in the past i did setup a zfs root and /usr filesystem for experimenting with xVM unstable bits. A second system of mine with a mirrored root pool (and an additional large multi-raidz pool) shows the same symptoms on the mirrored root pool only. once is accident. twice is coincidence. three times is enemy action :-) I'll file a bug as soon as I can (I'm travelling at the moment with spotty connectivity), citing my and your reports. Btw. I also found the scrub checksum errors on a non-mirrored zpool (laptop with only one hdd). And on one zpool that was using a non-mirrored, striped pool on two S-ATA drives. I think that in my case the cause for the scrub checksum errors is an open ZIL transaction on an *unmounted* zfs filesystem. In the past such a zfs state prevented creating snapshots for the unmounted zfs, see bug 6482985, 6462803. That is still the case. But now it also seems to trigger checksum errors for a zpool scrub. Stack backtrace for the ECKSUM (which gets translated into EIO errors in arc_read_done()): 1 64703 arc_read_nolock:return, rval 5 zfs`zil_read_log_block+0x140 zfs`zil_parse+0x155 zfs`traverse_zil+0x55 zfs`scrub_visitbp+0x284 zfs`scrub_visit_rootbp+0x4e zfs`scrub_visitds+0x82 zfs`dsl_pool_scrub_sync+0x109 zfs`dsl_pool_sync+0x158 zfs`spa_sync+0x254 zfs`txg_sync_thread+0x226 unix`thread_start+0x8 Does a zdb -ivv {pool} report any ZIL headers with a claim_txg != 0 on your pools? Is the dataset that is associated with such a ZIL an unmounted zfs? # zdb -ivv files | grep claim_txg ZIL header: claim_txg 5164405, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 0, seq 0 ZIL header: claim_txg 5164405, seq 0 ZIL header: claim_txg 0, seq 0 # zdb -i files/matrix-usr Dataset files/matrix-usr [ZPL], ID 216, cr_txg 5091978, 2.39G, 192089 objects ZIL header: claim_txg 5164405, seq 0 first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=0:12421e:1000 zilog uncompressed LE contiguous birth=5163908 fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3 Block seqno 3, already claimed, [L0 ZIL intent log] 1000L/1000P DVA[0]=0:12421e:1000 zilog uncompressed LE contiguous birth=5163908 fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3 On two of my zpools I've eliminated the zpool scrub checksum errors by mounting / unmounting the zfs with the unplayed ZIL. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
Rustam wrote: I'm living with this error for almost 4 months and probably have record number of checksum errors: # zpool status -xv pool: box5 ... errors: Permanent errors have been detected in the following files: box5:0x0 I've Sol 10 U5 though. I suspect that this (S10u5) is a different issue, because for my system's pool it seems to be caused by the opensolaris putback on July 07th for these fixes: 6343667 scrub/resilver has to start over when a snapshot is taken 6343693 'zpool status' gives delayed start for 'zpool scrub' 6670746 scrub on degraded pool return the status of 'resilver completed'? 6675685 DTL entries are lost resulting in checksum errors 6706404 get_history_one() can dereference off end of hist_event_table[] 6715414 assertion failed: ds-ds_owner != tag in dsl_dataset_rele() 6716437 ztest gets SEGV in arc_released() 6722838 bfu does not update grub This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote: I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors # zpool status files pool: files state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h46m with 9 errors on Fri Jul 18 13:33:56 2008 config: NAME STATE READ WRITE CKSUM files DEGRADED 0 018 mirror DEGRADED 0 018 c8t0d0s6 DEGRADED 0 036 too many errors c9t0d0s6 DEGRADED 0 036 too many errors errors: No known data errors out of curiosity, is this a root pool? A second system of mine with a mirrored root pool (and an additional large multi-raidz pool) shows the same symptoms on the mirrored root pool only. once is accident. twice is coincidence. three times is enemy action :-) I'll file a bug as soon as I can (I'm travelling at the moment with spotty connectivity), citing my and your reports. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
On Sun, 20 Jul 2008 11:26:16 -0700 Bill Sommerfeld [EMAIL PROTECTED] wrote: once is accident. twice is coincidence. three times is enemy action :-) I have no access to b94 yet, but as it is, it probably is better to skip this one when it comes out then. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D ++ http://nagual.nl/ + SunOS sxce snv91 ++ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors # zpool status files pool: files state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h46m with 9 errors on Fri Jul 18 13:33:56 2008 config: NAME STATE READ WRITE CKSUM files DEGRADED 0 018 mirror DEGRADED 0 018 c8t0d0s6 DEGRADED 0 036 too many errors c9t0d0s6 DEGRADED 0 036 too many errors errors: No known data errors Addding the -v option to zpool status returned: errors: Permanent errors have been detected in the following files: metadata:0x0 OTOH, trying to verify checksums with zdb -c didn't find any problems: # zdb -cvv files Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 2804880 bp logical:121461614592 avg: 43303 bp physical: 84585684992 avg: 30156compression: 1.44 bp allocated: 85146115584 avg: 30356compression: 1.43 SPA allocated: 85146115584 used: 79.30% 951.08u 419.55s 2:24:34.32 15.8% # This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors ... OTOH, trying to verify checksums with zdb -c didn't find any problems: And a zpool scrub under snv_85 doesn't find checksum errors, either. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94
I'm living with this error for almost 4 months and probably have record number of checksum errors: core# zpool status -xv pool: box5 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM box5ONLINE 0 0 856 mirrorONLINE 0 0 428 c1d0ONLINE 0 0 856 c2d0ONLINE 0 0 856 mirrorONLINE 0 0 428 c2d1ONLINE 0 0 856 c1d1ONLINE 0 0 856 errors: Permanent errors have been detected in the following files: box5:0x0 I've Sol 10 U5 though. -- Rustam. Jürgen Keil wrote: I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: Hmm, after reading this, I started a zpool scrub on my mirrored pool, on a system that is running post snv_94 bits: It also found checksum errors # zpool status files pool: files state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h46m with 9 errors on Fri Jul 18 13:33:56 2008 config: NAME STATE READ WRITE CKSUM files DEGRADED 0 018 mirror DEGRADED 0 018 c8t0d0s6 DEGRADED 0 036 too many errors c9t0d0s6 DEGRADED 0 036 too many errors errors: No known data errors Addding the -v option to zpool status returned: errors: Permanent errors have been detected in the following files: metadata:0x0 OTOH, trying to verify checksums with zdb -c didn't find any problems: # zdb -cvv files Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 2804880 bp logical:121461614592 avg: 43303 bp physical: 84585684992 avg: 30156compression: 1.44 bp allocated: 85146115584 avg: 30356compression: 1.43 SPA allocated: 85146115584 used: 79.30% 951.08u 419.55s 2:24:34.32 15.8% # This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] checksum errors on root pool after upgrade to snv_94
I ran a scrub on a root pool after upgrading to snv_94, and got checksum errors: pool: r00t state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h26m with 1 errors on Thu Jul 17 14:52:14 2008 config: NAME STATE READ WRITE CKSUM r00t ONLINE 0 0 2 mirror ONLINE 0 0 2 c4t0d0s0 ONLINE 0 0 4 c4t1d0s0 ONLINE 0 0 4 I ran it again, and it's now reporting the same errors, but still says applications are unaffected: pool: r00t state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h27m with 2 errors on Thu Jul 17 20:24:15 2008 config: NAME STATE READ WRITE CKSUM r00t ONLINE 0 0 4 mirror ONLINE 0 0 4 c4t0d0s0 ONLINE 0 0 8 c4t1d0s0 ONLINE 0 0 8 errors: No known data errors I wonder if I'm running into some combination of: 6725341 Running 'zpool scrub' repeatedly on a pool show an ever increasing error count and maybe: 6437568 ditto block repair is incorrectly propagated to root vdev Any way to dig further to determine what's going on? - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors in storage pool
In the meantime, the SUN supporter did figure out that zdb does not work because zdb uses the information from /etc/zfs/zpool.cache. However, I did use zpool -R to import the pool, which did not update /etc/zfs/zpool.cache. Is there another method to map a dataset number to a filesystem? Hans Schnitzer H.-J. Schnitzer wrote: Hi, I am using ZFS under Solaris 10u3. After the defect of a 3510 Raid controller, I have several storage pools with defect objects. zpool status -xv prints a long list: DATASET OBJECT RANGE 4c0c 5dd lvl=0 blkid=2 28 b346lvl=0 blkid=9 3b31 15d lvl=0 blkid=1 3b31 15d lvl=0 blkid=2 3b31 15d lvl=0 blkid=2727 3b31 190 lvl=0 blkid=0 ... I know that the number in the column OBJECT identifies the inode number of the affected file. However, I have more than 1000 filesystems in each of the affected storage pools. So how do I identify the correct filesystem? According to http://blogs.sun.com/erickustarz/entry/damaged_files_and_zpool_status I have to use zdb. But I can't figure out how to use it. Can you help? Hans Schnitzer This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Checksum errors in storage pool
Hi, I am using ZFS under Solaris 10u3. After the defect of a 3510 Raid controller, I have several storage pools with defect objects. zpool status -xv prints a long list: DATASET OBJECT RANGE 4c0c 5dd lvl=0 blkid=2 28 b346lvl=0 blkid=9 3b31 15d lvl=0 blkid=1 3b31 15d lvl=0 blkid=2 3b31 15d lvl=0 blkid=2727 3b31 190 lvl=0 blkid=0 ... I know that the number in the column OBJECT identifies the inode number of the affected file. However, I have more than 1000 filesystems in each of the affected storage pools. So how do I identify the correct filesystem? According to http://blogs.sun.com/erickustarz/entry/damaged_files_and_zpool_status I have to use zdb. But I can't figure out how to use it. Can you help? Hans Schnitzer This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors...
errors: The following persistent errors have been detected: DATASET OBJECT RANGE z_tsmsun1_pool/tsmsrv1_pool 26208464760832-8464891904 Looks like I have possibly a single file that is corrupted. My question is how do I find the file. Is it as simple as doing a find command using -inum 2620? FYI, i'm finishing up: 6410433 'zpool status -v' would be more useful with filenames Which will give you the complete path to the file (if applicable), so you don't have to do a 'find' on the inum. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Checksum errors...
Background: Large ZFS pool built on a couple of Sun 3511 SATA arrays. RAID-5 is done in the 3511s. ZFS is non-redundant. We have been using this setup for a couple of months now with no issues. Problem: Yesterday afternoon we started getting checksum errors. There have been no hardware errors reported at either the Solaris level or the hardware level. 3511 logs are clean. Here is the zpool status: tsmsun1 - /home/root zpool status -xv pool: z_tsmsun1_pool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM z_tsmsun1_pool ONLINE 0 0 180 c22t600C0FF000678A0A86F3D901d0s0 ONLINE 0 0 0 c22t600C0FF000678A0A86F3D900d0s0 ONLINE 0 0 0 c22t600C0FF00068190A86F3D901d0s0 ONLINE 0 0 0 c22t600C0FF00068190A86F3D900d0s0 ONLINE 0 0 0 c22t600C0FF00068191A598ED500d0s0 ONLINE 0 0 0 c22t600C0FF000678A1A598ED500d0s0 ONLINE 0 0 0 c22t600C0FF00068191A598ED501d0s0 ONLINE 0 0 0 c22t600C0FF000681943A7223100d0s0 ONLINE 0 0 0 c22t600C0FF000681943A7223101d0ONLINE 0 0 0 c22t600C0FF000681932BBD24400d0s0 ONLINE 0 0 0 c22t600C0FF000681932BBD24401d0s0 ONLINE 0 0 0 c22t600C0FF000678A43A7223100d0s0 ONLINE 0 0 180 c22t600C0FF000678A2055211B01d0s0 ONLINE 0 0 0 c22t600C0FF000678A2055211B00d0s0 ONLINE 0 0 0 c22t600C0FF000678A32BBD24401d0s0 ONLINE 0 0 0 c22t600C0FF000678A1A598ED501d0s0 ONLINE 0 0 0 c22t600C0FF000678A32BBD24400d0s0 ONLINE 0 0 0 c22t600C0FF000678A43A7223101d0s0 ONLINE 0 0 0 c22t600C0FF00068192055211B00d0s0 ONLINE 0 0 0 c22t600C0FF00068192055211B01d0s0 ONLINE 0 0 0 c22t600C0FF000678A44F3D81B00d0s0 ONLINE 0 0 0 c22t600C0FF000678A44F3D81B01d0s0 ONLINE 0 0 0 c22t600C0FF000681944F3D81B00d0s0 ONLINE 0 0 0 c22t600C0FF000681944F3D81B01d0s0 ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE z_tsmsun1_pool/tsmsrv1_pool 26208464760832-8464891904 Looks like I have possibly a single file that is corrupted. My question is how do I find the file. Is it as simple as doing a find command using -inum 2620? TIA, john This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss