Hi All,

I just got across a strange (well... at least for me) situation with ZFS and I 
hope you might be able to help me out. Recently I built a new machine from 
scratch for my storage needs which include various CIFS / NFS and most 
importantly VMware ESX based operations (in conjunction with COMSTAR). The 
machine that I built is based on fairly new hardware and is running x86 
OpenSolaris B134 OS respectively with a RAID-Z pool on the top of 4 x 1TB 
SATA-2 Samsung HDD's + one additional HDD for hotspare purposes.

Yesterday one of the HDD's decided to produce some errors and although I wasn't 
surprised of that, I was more surprised about the fact that there are permanent 
errors over some files. 

Here is the output I got from right after the resilvering:

--------------------------------------------------------------------------------------------------

  pool: ZPOOL_SAS_1234
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 2h45m with 4 errors on Fri Apr  2 03:01:34 2010
config:

        NAME        STATE     READ WRITE CKSUM
        ZPOOL_SAS_1234  DEGRADED   381     0     0
          c7t0d0    ONLINE       0     0     0
          c7t1d0    ONLINE       0     0     0
          c7t2d0    ONLINE       0     0     0
          spare-3   DEGRADED   363     0     1
            c7t3d0  DEGRADED   381     0     3  too many errors
            c7t4d0  ONLINE       0     0   730  326G resilvered
        spares
          c7t4d0    INUSE     currently in use

errors: Permanent errors have been detected in the following files:

        /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN1_DATASTORE01
        /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN2_DATASTORE02
        /ZPOOL_SAS_1234/iSCSI/ESX/ESX_Cluster_01/LUN5_DATASTORE05

--------------------------------------------------------------------------------------------------

Although I'm sure that the "c7t3d0" HDD is having some issues (obviously I'm 
about to replace it), now I still don't understand why would I get corruption 
over the files considering that all other drives are indicating zero problems 
within the READ, WRITE and CHKSUM columns. Perhaps I'm missing something about 
the ZFS concept and it's redundancy but my understanding for RAID-Z is that it 
operates in a way similar as RAID-5 which should mean that if one HDD goes down 
for whatever reason, the data stored over my ZFS pool / datasets should remain 
unharmed due to the redundancy.

Just as an additional information, the LUNX_DATASTOREXX are files which I 
exported via COMSTAR toward few ESX machines. Not sure if there is any 
relationship between the indicated errors and the way I use my storage box but 
I can certainly tell that the files were under heavy at the time being of the 
HDD failure.

Any advice would be appreciated very much.

Cheers
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to