Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

Joe Little Fri, 23 Jun 2006 07:25:07 -0700

On 6/23/06, Roch <[EMAIL PROTECTED]> wrote:


Joe Little writes:
 > On 6/22/06, Bill Moore <[EMAIL PROTECTED]> wrote:
 > > Hey Joe.  We're working on some ZFS changes in this area, and if you
 > > could run an experiment for us, that would be great.  Just do this:
 > >
 > >     echo 'zil_disable/W1' | mdb -kw
 > >
 > > We're working on some fixes to the ZIL so it won't be a bottleneck when
 > > fsyncs come around.  The above command will let us know what kind of
 > > improvement is on the table.  After our fixes you could get from 30-80%
 > > of that improvement, but this would be a good data point.  This change
 > > makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a
 > > txg every 5 seconds.  So at most, your disk will be 5 seconds out of
 > > date compared to what it should be.  It's a pretty small window, but it
 > > all depends on your appetite for such windows.  :)
 > >
 > > After running the above command, you'll need to unmount/mount the
 > > filesystem in order for the change to take effect.
 > >
 > > If you don't have time, no big deal.
 > >
 > >
 > > --Bill
 > >
 > >
 > > On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote:
 > > > On 6/22/06, Jeff Bonwick <[EMAIL PROTECTED]> wrote:
 > > > >> a test against the same iscsi targets using linux and XFS and the
 > > > >> NFS server implementation there gave me 1.25MB/sec writes. I was about
 > > > >> to throw in the towel and deem ZFS/NFS has unusable until B41 came
 > > > >> along and at least gave me 1.25MB/sec.
 > > > >
 > > > >That's still super slow -- is this over a 10Mb link or something?
 > > > >
 > > > >Jeff

I  think the performance is   in line with expectation  for,
small  file,    single  threaded,     open/write/close   NFS
workload (nfs must commit on close). Therefore I expect :

        (avg file size) / (I/O latency).

Joe does this formula approach the 1.25 MB/s ?


To this day, I still don't know how to calculate the i/o latency.
Average file size is always expected to be close to kernel page size
for NASes -- 4-8k. Always tune for that.


 > > > >
 > > > >
 > > >
 > > > Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the
 > > > iscsi target, and single gig-e link (nge) to the NFS clients, who are
 > > > gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference.
 > > >
 > > > Again, the issue is the multiple fsyncs that NFS requires, and likely
 > > > the serialization of those iscsi requests. Apparently, there is a
 > > > basic latency in iscsi that one could improve upon with FC, but we are
 > > > definitely in the all ethernet/iscsi camp for multi-building storage
 > > > pool growth and don't have interest in a FC-based SAN.
 > > > _______________________________________________
 > > > zfs-discuss mailing list
 > > > zfs-discuss@opensolaris.org
 > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 >
 > Well, following Bill's advice and the previous note on disabling zil,
 > I ran my test on a B38 opteron initiator and if you do a time on the
 > copy from the client, 6250 8k files transfer at 6MB/sec now. If you
 > watch the entire commit on the backend using "zpool iostat 1" I see
 > that it takes a few more seconds, and the actual rate there is
 > 4MB/sec. Beats my best of 1.25MB/sec, and this is not B41.
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Joe, you know this but for the benefit of  others, I have to
highlight that running  any NFS server  this way, may cause
silent data corruption from client's point of view.

Whenever a server keeps  data in RAM this  way and  does not
commit it to stable storage  upon request from clients, that
opens a time window for corruption. So  a client writes to a
page, then reads the same page, and if the server suffered a
crash in between, the data may not match.

So this is performance at the expense of data integrity.

-r


Yes.. ZFS in its normal mode has better data integrity. However, this
may be a more ideal tradeoff if you have specific read/write patterns.
In my case, I'm going to use ZFS initially for my tier2 storage, with
nightly write periods (needs to be short duration rsync from tier1)
and mostly read periods throughout the rest of the day. I'd love to
use ZFS as a tier1 service as well, but then you'd have to perform as
a NetApp does. Same tricks, same NVRAM or initial write to local
stable storage before writing to backend storage. 6MB/sec is closer to
expected behavior for first tier at the expense of reliability. I
don't know what the answer is for Sun to make ZFS 1st Tier quality
with their NFS implementation and its sync happiness.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

Reply via email to