On 6/23/06, Roch <[EMAIL PROTECTED]> wrote:
Joe Little writes: > On 6/22/06, Bill Moore <[EMAIL PROTECTED]> wrote: > > Hey Joe. We're working on some ZFS changes in this area, and if you > > could run an experiment for us, that would be great. Just do this: > > > > echo 'zil_disable/W1' | mdb -kw > > > > We're working on some fixes to the ZIL so it won't be a bottleneck when > > fsyncs come around. The above command will let us know what kind of > > improvement is on the table. After our fixes you could get from 30-80% > > of that improvement, but this would be a good data point. This change > > makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a > > txg every 5 seconds. So at most, your disk will be 5 seconds out of > > date compared to what it should be. It's a pretty small window, but it > > all depends on your appetite for such windows. :) > > > > After running the above command, you'll need to unmount/mount the > > filesystem in order for the change to take effect. > > > > If you don't have time, no big deal. > > > > > > --Bill > > > > > > On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote: > > > On 6/22/06, Jeff Bonwick <[EMAIL PROTECTED]> wrote: > > > >> a test against the same iscsi targets using linux and XFS and the > > > >> NFS server implementation there gave me 1.25MB/sec writes. I was about > > > >> to throw in the towel and deem ZFS/NFS has unusable until B41 came > > > >> along and at least gave me 1.25MB/sec. > > > > > > > >That's still super slow -- is this over a 10Mb link or something? > > > > > > > >Jeff I think the performance is in line with expectation for, small file, single threaded, open/write/close NFS workload (nfs must commit on close). Therefore I expect : (avg file size) / (I/O latency). Joe does this formula approach the 1.25 MB/s ?
To this day, I still don't know how to calculate the i/o latency. Average file size is always expected to be close to kernel page size for NASes -- 4-8k. Always tune for that.
> > > > > > > > > > > > > > Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the > > > iscsi target, and single gig-e link (nge) to the NFS clients, who are > > > gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference. > > > > > > Again, the issue is the multiple fsyncs that NFS requires, and likely > > > the serialization of those iscsi requests. Apparently, there is a > > > basic latency in iscsi that one could improve upon with FC, but we are > > > definitely in the all ethernet/iscsi camp for multi-building storage > > > pool growth and don't have interest in a FC-based SAN. > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > Well, following Bill's advice and the previous note on disabling zil, > I ran my test on a B38 opteron initiator and if you do a time on the > copy from the client, 6250 8k files transfer at 6MB/sec now. If you > watch the entire commit on the backend using "zpool iostat 1" I see > that it takes a few more seconds, and the actual rate there is > 4MB/sec. Beats my best of 1.25MB/sec, and this is not B41. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Joe, you know this but for the benefit of others, I have to highlight that running any NFS server this way, may cause silent data corruption from client's point of view. Whenever a server keeps data in RAM this way and does not commit it to stable storage upon request from clients, that opens a time window for corruption. So a client writes to a page, then reads the same page, and if the server suffered a crash in between, the data may not match. So this is performance at the expense of data integrity. -r
Yes.. ZFS in its normal mode has better data integrity. However, this may be a more ideal tradeoff if you have specific read/write patterns. In my case, I'm going to use ZFS initially for my tier2 storage, with nightly write periods (needs to be short duration rsync from tier1) and mostly read periods throughout the rest of the day. I'd love to use ZFS as a tier1 service as well, but then you'd have to perform as a NetApp does. Same tricks, same NVRAM or initial write to local stable storage before writing to backend storage. 6MB/sec is closer to expected behavior for first tier at the expense of reliability. I don't know what the answer is for Sun to make ZFS 1st Tier quality with their NFS implementation and its sync happiness.
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss