Hi Jon, Read below...
Jon Maloy ??????: > Hi, > I looked briefly at your dump with my new Wireshark. > It seems like things start to go wrong at packet > 14191, where we suddenly lose 95 packets, and jump > from seq no 53888 to 53983. > > Strangely enough, node 1.1.12 continues to ack packets > which we don't see in wireshark (is it possible that > wireshark can miss packets?). It goes on acking packets > up to the one with sequence number 53967, (on of the > "invisible" packets, but from there on it is stop. > I would guess that the packet drop is somewhat related to the box load + net traffic (more specificly the numbers of packets arriving). Thus this particular dump may be showing a case where first wireshark started dropping packets due to load or something (yes, as far as I can remember it's possible to drop packets in userspace/libpcap) and then the eth driver dropped some packets probably because it was unable to service the nic interrupt due to the load, which was noted by the tipc stack. Plausable? Probably? :) whatever is the reason - it's something that should not cause any troubles and must be handled. > 1.1.6 continues to wreak traffic for another while, > up to packet seq no 54991 (packet 14773), but then > it is stop even there. > After this point, there is only State messages going > from 1.1.6 to 1.1.12, while traffic runs normally > in the opposite direction. > > There seems to never be a request for retransmission > (sequence gap is always 0) in the State messages sent > out from 1.1.12 to 1.1.6, as there should be. This may > mean that TIPC never receives any of the packets we see, > from 53983 and on, and hence never has a chance to detect > a gap. > Ok .. that's something to begin with .. but why is that happening? My understanding is that regardles of how many packets are dropped, the stack should detect and at least try to recover? There is definitely and odd stack behavior. > Only a bearer reset can resolve this situation, > which seems to be your case. > My favorite 80% case :) > As a sum of this, I start to suspect your Ethernet > driver. It seems like it sometimes delivers packets > to TIPC which it does not deliver to Wireshark, and > vice versa. This seems to happen after a period of > high traffic, and only with messages beyond a certain > size, since the State messages always go through. > Can you see any pattern in the direction the links > go stale, with reference to which driver you are > using. (E.g., is there always an e1000 driver involved > on the receiving end in the stale direction?) > Does this happen when you only run one type of driver? > This happens i any combination of the tg3 and e1000 drivers .. I can't be very sure, but it seems that it is just a bit more likely to happen with the e1000 nics at the receiving end of the high traffic .. i.e. they seem to be more prone to packet loss under hi load etc... Also it seems less likely to happen when using the NAPI mode of e1000 (which was somewhate expected), but it still happens. > That's all I can guess for now. > Any ideas from the others? :P I'll try to do more dumps now and try to get both sides of the story. Regards, Peter Litov. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
