Hi,
I believe I have found a race condition in the ZFS ARC code.
The problem manifests itself only when debugging is turned on and
arc_mru->arcs_size is very close to arc_mru->arcs_lsize.
It causes these assertions in arc.c to fail:
1) In remove_reference(): ASSERT3U(state->arcs_size, >=,
state->arcs_lsize);
2) In arc_change_state(): ASSERT3U(new_state->arcs_size + to_delta, >=,
new_state->arcs_lsize);
Steps to reproduce:
Well, in zfs-fuse it's just a matter of compiling in debug mode and
running 'bonnie++ -f -d /pool'. It fails a couple of minutes into the rewrite
test with the arc_change_state() assertion.
Currently, in zfs-fuse, there is a lot of context switching, which may trigger
this bug more frequently than in Solaris. I have found that, with the
bonnie++ workload, as much as 6 threads enter arc_change_state() at the same
time, just before it fails.
Also, I have a relatively low (64 MB) arc_c_max value, which might be
relevant.
It is much easier to reproduce if you apply this patch (relative to the
current mercurial tip):
diff -r 773fb303fd36 usr/src/uts/common/fs/zfs/arc.c
--- a/usr/src/uts/common/fs/zfs/arc.c Mon Feb 19 05:28:47 2007 -0800
+++ b/usr/src/uts/common/fs/zfs/arc.c Tue Feb 20 05:23:03 2007 +0000
@@ -834,6 +834,8 @@ arc_change_state(arc_state_t *new_state,
if (use_mutex)
mutex_exit(&new_state->arcs_mtx);
+ if (new_state == arc_mru)
+ delay(hz); // sleep 1 second
}
}
With this patch, zfs-fuse will always crash in the remove_reference()
assertion simply by mounting a filesystem (when compiled in debug mode, of
course).
Unfortunately, even with that patch, it's a little hard to trigger the bug
with ztest because arc_mru->arcs_size gets much bigger than
arc_mru->arcs_lsize after a while. I had moderate success with the following
steps:
1) Applying the patch
2) Changing the ztest_dmu_write_parallel test frequency from
zopt_always to
zopt_rarely (I'm unsure how much this actually helps)
3) Compiling in debug mode
4) Running ztest with parameters "-T600 -P3"
5) Keep retrying. Usually if it doesn't fail in the first 10 minutes, I
think
it's better to start ztest from the beginning..
I have fixed this bug with the attached patch, which I don't really like very
much, but it fixes the race.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arc.patch
Type: text/x-diff
Size: 823 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070220/7be98875/attachment.bin>