I have a test configuration up and running internally. I'm
not sure I'm seeing the exact same issues. For this testing
I'm using GRITS. I don't know any really great FS performance
tools. I tend to do my performance testing with vdbench with
raw SCSI IO on purpose to avoid FS caches. (I'm more focused
on iSCSI performance in the Solaris initiator.)
configuration:
ultra 20(amd64) - gige connection via nge
solaris nevada build 40 - nondebug
Cisco48portGigESwitch
+-> ultra20 (amd64) host
+-> w2100z (2x amd64) host
+-> equal logic iscsi array (3 x 20G LUNs mapped to ultra20)
|
+-> lots of other unrelated hosts, arrays, etc...
3 x 20G iSCSI LUNs -> RAW IO <==> vdbench
pref: ~100 MB/s arg.
3 x 20G iSCSI LUNs -> UFS <==> localhost GRITS
perf: ~76.9 MB/s avg.
3 x 20G iSCSI LUNs -> NFS <==> w2100z(2x amd64) GRITS
pref: ~34 MB/s avg.
3 x 20G iSCSI LUNs -> ZFS raidz <==> localhost GRITS
pref: ~90 MB/s avg. (more bursty than ufs
29MB/s to 36MB/s)
3 x 20G iSCSI LUNs -> ZFS raidz -> NFS <==>
w2100(2x amd64) GRITS
pref: ~15 MB/s avg.
GRITS creates 10 directories, write 10 files in each directory,
reads back the files, verifies their contents, deletes the files
and directories and then repeats the process. This tool is java
based and designed more for file system data verification testing
not performance.
VDBENCH used for performance testing and benchmarking.
I will hand off my configuration for the Sun NFS teams internally
to check out.
-David
Joe Little wrote:
Well, I tried some suggested iscsi tunings to no avail. I did try
something else though: I brought up samba. My linux 2.2 source tree
copying into the ZFS volume (in other words, SMB->ZFS->iSCSI) did far
much better, taking a minute to copy 102MB. And that's from a
100MB/sec client.
My original tests of around 2TB of mixed files (mostly) small, took 51
minutes in the NFS case, but took 3.5 minutes exactly in the SMB case
(again, from a 100MB/sec client). I averaged almost 9MB/sec on that
transfer. These are not stellar numbers, but they are far better than
the low k/sec that I see with NFS. I definitely think the bug is on
the NFS server end, even considering that the SMB protocol is different.
On 5/8/06, *Joe Little* <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
I was asked to also snoop the iscsi end of things, trying to find
something different between the two. iscsi being relatively opaque, it
was easiest to find differences in the patterns. In the local copy to
RAIDZ example, the iscsi link would show packets of 1514 in length in
series of 5-10, with interjecting packets of 60 or 102, generally 2-4
in number. In the NFS client hitting the RAIDZ/iscsi combo, the iscsi
length would have 3-5 on average 1514 length packets with 5-7 packets
of 60 or 102 in between. Basically, the averages swapped, and its
likely because of a lot more meta data and/or write confirmations
going on in the NFS case.
At this point in time, I have two very important questions:
1) Is there any options available or planned to make NFS/ZFS work more
in concert to avoid this overhead, which with many small iscsi packets
(in the iscsi case) kills performance?
2) Is iscsi-backed storage, especially StorageTek acquired products,
in the planning matrix for supported ZFS (NAS) solutions? Also, why
hasn't this combination been tested to date since this appears to be
an achilles heal. Again, UFS does not have this problem, nor other
file systems on other OSes (namely, XFS, JFS, etc which I've tested
before)
On 5/8/06, Nicolas Williams <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
> On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler wrote:
> > On Fri, Joe Little wrote:
> > > Thanks. I'm playing with it now, trying to get the most
succinct test.
> > > This is one thing that bothers me: Regardless of the backend, it
> > > appears that a delete of a large tree (say the linux kernel)
over NFS
> > > takes forever, but its immediate when doing so locally. Is
delete over
> > > NFS really take such a different code path?
> >
> > Yes. As mentioned in my other email, the NFS protocol requires
> > that operations like REMOVE, RMDIR, CREATE have the filesystem
> > metadata written to stable storage/disk before sending a response
> > to the client. That is not required of local access and
therefore
> > the disparity between the two.
>
> So then multi-threading rm/rmdir on the client-side would help, no?
>
> Are there/should there be async versions of creat(2)/mkdir(2)/
> rmdir(2)/link(2)/unlink(2)/...?
>
> Nico
> --
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss