Re: [zfs-discuss] ZFS offline ZIL corruption not detected

StorageConcepts Wed, 25 Aug 2010 23:42:19 -0700

Hello, 
actually this is bad news. 

I always assumed that the mirror redundancy of zil can also be used to handle 
bad blocks on the zil device (just as the main pool self healing does for data 
blocks).


I actually dont know how SSD's "die", because of the "wear out" characteristics 
I can think of a increased number of bad blocks / bit errors at the EOL of such 
a device -  probably undiscovered.

Because ZIL is write only, you only know if it worked in case you need it - 
wich is bad. So my suggestion was always to run with 1 zil during 
pre-production, and add the zil mirror 2 weeks later when production starts. 
This way they dont't age exactly the same and zil2 has 2 more weeks of expected 
flifetime (or even more, assuming the usual heavier writes during stress 
testing). 

I would call this pre-aging. However if the second zil is not used to recover 
from bad blocks, this does not make a lot of sense.

So would say there are 2 bugs / missing features in this: 

1) zil needs to report truncated transactions on zilcorruption
2) zil should need mirrored counterpart to recover bad block checksums 

Now with OpenSolaris beeing Oracle closed and Illumos beeing just startet, I 
don't  know how to handle bug openenings :) - is bugs.opensolaris.org still 
maintained ???

Regards, 
Robert
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS offline ZIL corruption not detected

Reply via email to