> And it started replacement/resilvering... after few minutes system became 
unavailbale. Reboot only gives me a few minutes, then resilvering make system 
unresponsible.
> 
> Is there any workaroud or patch for this problem???

Argh, sorry -- the problem is that we don't do aggressive enough
scrub/resilver throttling.  The effect is most pronounced on 32-bit
or low-memory systems.  We're working on it.

One thing you might try is reducing txg_time to 1 second (the default
is 5 seconds) by saying this: "echo txg_time/W1 | mdb -kw".

Let me describe what's happening, and why this may help.

When we kick off a scrub (same code path as resilver, so I'll use
the term generically), we traverse the entire block tree looking
for blocks that need scrubbing.  The tree traversal itself is
single-threaded, but the work it generates is not -- each time
we find a block that needs scrubbing, we schedule an async I/O
to do it.  As you've discovered, we can generate work faster than
the I/O subsystem can process it.  To avoid overloading the disks,
we throttle I/O downstream, but we don't (yet) have an upstream
throttle.  If we discover blocks really fast, we can end up
scheduling lots of I/O -- and sitting on lots of memory -- before
the downstream throttle kicks in.

The reason this relates to txg_time is that every time we sync a
transaction group, we suspend the scrub thread and wait for all
pending scrub I/Os to complete.  This ensures that we won't
asynchronously scrub a block that was freed and reallocated
in a future txg; when coupled with the COW nature of ZFS,
this allows us to run scrubs entirely independent of all
filesystem-level structure (e.g. directories) and locking rules.
This little trick makes the scrubbing algorithms *much* simpler.

The key point is that each spa_sync() throttles the scrub to zero.
By lowering txg_time from 5 to 1, you're cutting down the maximum
number of pending scrub I/Os by roughly 5x.  The unresponsiveness
you're seeing is a threshold effect; I'm hoping that by running
spa_sync() more often, we can get you below that threshold.

Please let me know if this works for you.

Jeff

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to