Hi Nils,

thanks for the detailed info. I've tried searching the zfs-discuss archive for 
both the bug id and 'resilver', but in both cases the only result I can find 
from the whole history is this thread:
http://www.opensolaris.org/jive/thread.jspa?messageID=276358&#276358
Maybe the discussions you recall aren't fully indexed for searching on these 
keywords or they were in another forum, but thanks for giving me the gist of 
it. It is potentially quite an Achilles heel for ZFS though. I've argued 
locally to migrate our main online data archive (currently 3.5TB) to ZFS, but 
if the recovery time for disk failures keeps getting slower as the archive 
grows and accumulates snapshots etc., some questions might be asked about this 
policy. I've suggested we can do continuous data replication to a secondary 
server by sending incremental snapshot streams, but if the CDR had to be 
suspended for a significant time (days) to allow a resilver, this would be a 
real problem, at least until the 6343667 bugfix from snv_94 finds its way into 
a Solaris 10 patch (will this happen?).
Does the severity of the problem depend on access / write patterns used, i.e. 
would a simple archiving system where data is only ever added in a sequential 
fashion be less susceptible to slow rebuilds than a system where data is 
written, snapshotted, moved, modified, deleted, etc.
Does the time taken to scrub the pool give some indication of the likely 
resilvering time, or does that process walk a different kind of tree?

Graham
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to