> > There is no substitute for cord-yank tests - many
> and often. The  
> > weird part is, the ZFS design team simulated
> millions of them.
> > So the full explanation remains to be uncovered?
> 
> We simulated power failure; we did not simulate disks
> that simply
> blow off write ordering.  Any disk that you'd ever
> deploy in an
> enterprise or storage appliance context gets this
> right.
> 
> The good news is that ZFS is getting popular enough
> on consumer-grade
> hardware.  The bad news is that said hardware has a
> different set of
> failure modes, so it takes a bit of work to become
> resilient to them.
> This is pretty high on my short list.

Jeff,
we lost many zpools with multimillion$ EMC, Netapp and HDS arrays
just simulating fc switches power fails.
The problem is that ZFS can't properly recover itself.

How can even think to adopt ZFS with >100TB pools if
a simple fc switch failure can make a pool totally unaccessible?
I know UFS fsck can only repair metadata but this is much better than loose all 
your data!
All we know how much it would take to restore from backup 100TB of data ..

ZFS should be at least able to recover pools discarding last txg as you 
suggested months ago. Any news about that?

thanks
gino
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to