Re: [zfs-discuss] Problems with zfs and a "STK RAID INT" SAS HBA

Ragnar Sundblad Mon, 05 Apr 2010 02:36:30 -0700

On 5 apr 2010, at 04.35, Edward Ned Harvey wrote:

>> When running the card in copyback write cache mode, I got horrible
>> performance (with zfs), much worse than with copyback disabled
>> (which I believe should mean it does write-through), when tested
>> with filebench.
> 
> When I benchmark my disks, I also find that the system is slower with
> WriteBack enabled.  I would not call it "much worse," I'd estimate about 10%
> worse.


Yes, I oversimplified - I have been benchmarking with filebench,
just running the tests shipped with the OS trimmed a little
according to <http://www.solarisinternals.com/wiki/index.php/FileBench>.
For most tests, I typically get a little worse performance with
writeback enabled (or "copyback", as they called it on this card),
maybe about 10 % in average could be about right for these tests too.

The interesting part is that with these tests and writeback disabled,
on a 4 way stripe of sun stock 2.5" 146 GB 10000 RPM drives, the test
takes 2 hours and 18 minutes (138 minutes) to complete, but with
writeback enabled it takes 16 hours 57 minutes (1017 minutes), or 
over 7.3 times as long time!

I can't (yet) explain the large difference in test time and the
small diff in test results.

Maybe a hardware - or driver - problem has its' part in this.

I have made a few simple tests with these cards before and was
not really impressed, even with all the bells and whistles turned of
they merely seemed to be an IOPS and maybe BW bottleneck, but the above
seems just not right.

>  This, naturally, is counterintuitive.  I do have an explanation,
> however, which is partly conjecture:  With the WriteBack enabled, when the
> OS tells the HBA to write something, it seems to complete instantly.  So the
> OS will issue another, and another, and another.  The HBA has no knowledge
> of the underlying pool data structure, so it cannot consolidate the smaller
> writes into larger sequential ones.  It will brainlessly (or
> less-brainfully) do as it was told, and write the blocks to precisely the
> addresses that it was instructed to write.  Even if those are many small
> writes, scattered throughout the platters.  ZFS is smarter than that.  It's
> able to consolidate a zillion tiny writes, as well as some larger writes,
> all into a larger sequential transaction.  ZFS has flexibility, in choosing
> precisely how large a transaction it will create, before sending it to disk.
> One of the variables used to decide how large the transaction should be is
> ... Is the disk busy writing, right now?  If the disks are still busy, I
> might as well wait a little longer and continue building up my next
> sequential block of data to write.  If it appears to have completed the
> previous transaction already, no need to wait any longer.  Don't let the
> disks sit idle.  Just send another small write to the disk.
> 
> Long story short, I think, ZFS simply does a better job of write buffering
> than the HBA could possibly do.  So you benefit by disabling the WriteBack,
> in order to allow ZFS handle that instead.

You could think that ZIL transactions could get a speedup by the
writeback cache, meaning more sync actions per second, and in some
cases that seems to be true, and that the card should be designed to
be able to handle intermittent load as the txg completions bursts
(typically every 30 seconds), but something strange obviously happens,
at least on this setup.

(Actually I'd prefer if I could conclude that there is no use for
writeback caching HBAs - I'd like these machines to be as stable as
they possible can and therefore to be just as plain and simple as possible,
and for us to be able to just quickly move the disks if one machine should
brake - with some data stuck in some silly writeback cache inside a HBA
that may or may not cooperate depending on it's state of mind, mood and the
moon phase, that can't be done and I'd need a much more complicated
(= error- and mistake-prone) setup. But my tests so far seems just not
right and probably can't be used to conclude anything.
I'd rather use slogs, and have a few Intel X25-Es to test with, but
then I just recently read on this list that X25-Es aren't supported for
slog anymore! Maybe because they always have their writeback cache
turned on by default and ignore cache flush commands (and that is "not a
bug" - is the design from outer space?), I don't know yet.
(Don't know why I am stubbornly fooling around with this intel junk - they
right now manage to annoy me with a crappy (or broken) PCI-PCI bridge,
a crappy HBA and a crappy SSD drives...))

/ragge

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Problems with zfs and a "STK RAID INT" SAS HBA

Reply via email to