I asked Tano to use the 'snoop' command to capture the Ethernet
packets to a file, while he attempted VMware's 'VMotion'.

 # snoop -d {device} -o {filename} tcp port 3260

This file was made available to me on Tano's web server.
The file size was nearly 85 Mbytes, capturing over 100,000 packets.
I have downloaded the capture file, and been looking at it with 
Ethereal and WireShark.

I do not have a corresponding 'iscsisnoop.d' file, but from the pattern
of activity that I see, I can well imagine that it would show the same
pattern of that we saw from Eugene, which I reported on here:
http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006444.html

(So here I'm looking at what's happening at the lower TCP level,
rather than at the iScsi level.)

In the Ethernet capture file, I can see the pattern of bursts of
writes from the initiator. The Target can accept so many of these,
and then needs to slow things down by reducing the TCP window size.
Eventually the target says the TCP Window size is zero, effectively
asking the initiator to stop.

Now to start with, the target only leaves the 'TCP ZeroWindow', in
place for a fraction of a second. Then it opens things up again
by sending a 'TCP Window Update', restoring the window to 65160 bytes,
and transfer resumes. This is normal and expected.

But eventually we get to a stage where the target sets the TCP 'ZeroWindow'
and leaves it there for an extended period of time.  I talking about seconds 
here.
The initiator starts to send 'TCP ZeroWindowProbe' packets every 5 seconds.
The target promptly responds with a 'TCP ZeroWindowProbeAck' packet.
(Presumably, this is the initiator just confirming that the target is still 
alive.)
This cycle of Probes & Ack's repeats for 50 seconds.
During this period the target shows no sign of wanting to accept any more data.
Then the initiator seems to decide it has had enough, and just cannot
be bothered to wait any longer, and it [RST,ACK]'s the TCP session, and
then starts a fresh iscsi login.
(And then we go around the whole cycle of the pattern again.)

The question is why has the target refused to accept any more data for over 50 
seconds!

The obvious conclusion would be that the OpenSolaris box is so busy that
it does not have any time left to empty the network stack buffers.
But this then just leads you to another  question - why?

So the mystery deepens, and I am running out of ideas!

Tano, maybe you could check the network performance, with the 'iperf'
programs, as mentioned here:
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/052136.html

Does the OpenSolaris box give any indication of being busy with other things?
Try running 'prstat' to see if it gives any clues.

Presumably you are using ZFS as the backing store for iScsi, in
which case, maybe try with a UFS formatted disk to see if that is a factor.
Regards
Nigel Smith
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to