I've been using it in another CR where destroying one of a snapshots
was helping the performance. Nevertheless here it's on that server:
Short period of time:
bash-3.00# ./metaslab-6495013.d
^C
Loops count
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 17674
1 |@@@@@@@ 4418
2 |@@@ 2123
4 |@@ 1257
8 |@ 753
16 |@ 416
32 | 220
64 | 103
128 | 58
256 | 38
512 | 21
1024 | 13
2048 | 10
4096 | 8
8192 | 3
16384 | 3
32768 | 2
65536 | 1
131072 | 26
262144 | 7
524288 | 0
bash-3.00#
Looks like that's it.
Yeah, sometimes doing over 200,000 avl_walks isn't going to be good.
...
IIRC you've written before that someone is actively working on it
right now, right? Any update? Any approx. ETA? I would like to test it
ASAP even before putback.
I believe this is next on George's list (he's got a couple of fixes
he needs to putback first).
So since you're in this state, would you mind seeing the
'size' (arg1) distribution instead of avl_walk()s for
metaslab_ff_claim()? Something like:
"
#!/usr/sbin/dtrace -s
#pragma D option quiet
BEGIN
{
self->in_metaslab = 0;
}
fbt::metaslab_ff_alloc:entry
/self->in_metaslab == 0/
{
self->in_metaslab = 1;
@sizes["metaslab sizes"] = quantize(arg1);
}
fbt::metaslab_ff_alloc:return
/self->in_metaslab/
{
self->in_metaslab = 0;
}
"
I'm wondering if we can just lower the amount of space we're trying
to alloc as the pool becomes more fragmented - we'll lose a little I/
O performance, but it should limit this bug.
eric
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss