Re: [tipc-discussion] RE : Re: Link related question/issue

Horvath, Elmer Thu, 06 Mar 2008 13:31:46 -0800

Hi,

This is very interesting.  From the description by Jon (I did not see the 
Wireshark trace posted), the gap value is being calculated as 0 when it should 
be calculated as a non-zero value.


We actually encountered a similar, but different, issue internally and believed 
it to be a compiler problem (we were not compiling with GCC).  The target was 
an E500 (8560 based PPC system) and was compiled with software floating point 
(though no floating point code is in TIPC that I know of).

In our case, the calculation was effectively subtracting 1 from 1 and getting a 
1.  The node would then send a NAK falsely asking for retransmissions on 
packets it did in fact receive.

The incorrect calculation for us was in tipc_link.c in the routine 
link_recv_proto_msg() calculating the value of the variable 'rec_gap'.  The 
code is:
        if (less_eq(mod(l_ptr->next_in_no), msg_next_sent(msg))) {
                rec_gap = mod(msg_next_sent(msg) - 
                              mod(l_ptr->next_in_no));
        }

This resulted in rec_gap being non-zero even though both operands were the same 
value.  When rec_gap is non-zero, then tipc_link_send_proto_msg() is called 
with a non-zero gap value a bit further down in the same routine.

Adding instrumentation sometimes fixed the problem; doing the exact same 
calculation again immediately following this code would yield the correct gap 
value.  Very bizarre.

We attributed this to a compiler issue.

I don't know if this is the same issue, but it surely sounds similar enough to 
be noted.  And this may be another place to check since it calls 
tipc_link_send_proto_msg() after receiving a state message.

Elmer



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jon Maloy
Sent: Tuesday, March 04, 2008 7:51 PM
To: Xpl++; [EMAIL PROTECTED]; [email protected]
Subject: Re: [tipc-discussion] RE : Re: Link related question/issue

Hi Peter,

I see two interesting patterns:

a: After the packet loss has started at packet
   14191, state messages from 1.1.12 always come
   in pairs, with the same timestamp. 

b: Also, after the problems have started, all 
   state messages 1.1.6->1.1.12, even when they are
   not probes, are immedieately followed by a state 
   message in the opposite direction. 
   This is a strong indication that the receiver
   (1.1.12) actually detects the gap from the state
   message contents, and sends out a new state 
   message (a NACK), but for some reason the gap 
   value never makes it into that message. 
   Hence, tipc_link_send_proto_msg(),
   where the gap is calculated and added 
   (line 2135 in tipc_link.c, tipc-1.7.5), seems 
   to be a good place to start 
   looking.
   I strongly suspect that the gap calculated at
   lines 2128-2129 always yields 0, or that
   no packets ever make it into the deferred queue
   (via tipc_link_defer_pkt()).
   That would be consistent with what we see.

Regards
///jon



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Xpl++
Sent: March 4, 2008 2:17 PM
To: [EMAIL PROTECTED]; '[email protected]'
Subject: Re: [tipc-discussion] RE : Re: Link related question/issue

Hi,

So .. what about that TODO comment in tipc_link.c regarding the stronger seq# 
checking and stuff? :) Since I managed to stabilize my cluster I must proceed 
with a software upgrade (deadlines :( ...) and will be able to start looking 
into the link code sometime tomorrow evening. In the mean time any ideas as to 
where/what to look at would be highly appreciated ;)

Regards,
Peter.

Jon Paul Maloy ??????:
> Hi,
> Your analysis makes sense, but it still doesn't explain why TIPC 
> cannot handle this quite commonplace situation.
> Yesterday, I forgot one essential detail: Even State messages contain 
> info to help the receiver detect a gap. The "next_sent" sequence 
> number tells the receiver if it is out of synch with the sender, and 
> gives it a chance to send a NACK (a State with gap != 0). Since 
> State-packets clearly are received, otherwise the link would go down, 
> there must be some bug in tipc that causes the gap to be calculated 
> wrong, or not at all. Neither does it look like the receiver is 
> sending a State _immediately_ after a gap has occurred, which it 
> should.
> So, I think we are looking for some serious bug within tipc that 
> completely cripples the retransmission protocol. We should try to 
> backtrack and find out in which version it has been introduced.
>
> ///jon
>
>
> --- Xpl++ <[EMAIL PROTECTED]> a écrit :
>
>   
>> Hi,
>>
>> Some more info about my systems:
>> - all nodes that tend to drop packets are quite loaded, thou very 
>> rarely one can see cpu #0 being 100% busy
>> - there are also few multithreaded tasks that are bound to cpu#0 and 
>> running in SCHED_RR. All of them use tipc. None of them uses the 
>> maximum scheduler priority and they use very little cpu time and do 
>> not tend to make any peaks
>> - there is one taks that runs in SCHED_RR at maximum priority 99/RT 
>> (it really does a very very important job), which uses around 1ms of 
>> cpu, every 4 seconds, and it is explicitly bound to cpu#0
>> - all other tasks (mostly apache & php/perl) are free to run on any 
>> cpu
>> - all of these nodes also have considerable io load.
>> - kernel has irq balancing and prety much all irq are balanced, 
>> except for nic irqs. They are always services by cpu #0
>> - to create the packet drop issue I have to mildly stress the node, 
>> which would normaly mean a moment when apache would try to start some 
>> extra childred, that would also cause the number of simultaneously 
>> running php script to also rise, while at the same time the incoming 
>> network traffic is also rising. The stress is preceeded by a few 
>> seconds of high input packet rate which may be causing evene more 
>> stress on the scheduler and cpu starvation
>> - wireshark is dropping packets (surprising many, as it seems), tipc 
>> is confused .. and all is related to moments of general cpu 
>> starvation and an even worse one at cpu#0
>>
>> Then it all started adding up ..
>> I moved all non SCHED_OTHER tasks to other cpus, as well as few other 
>> services. The result - 30% of the nodes showed between 5 and 200 
>> packets dropped for the whole stress routine, which had not affected 
>> TIPC operation, nametables were in sync, all communications seem to 
>> work properly.
>> Thou this solves my problems, it is still very unclear what may have 
>> been happening in the kernel and in the tipc stack that is causing 
>> this bizzare behavior.
>> SMP systems alone are tricky, and when adding load and 
>> pseudo-realtime tasks situation seems to become really complicated.
>> One really cool thing to note is that Opteron based nodes handle hi 
>> load and cpu starvation much better than Xeon ones ..
>> which only confirms an
>> old observation of mine, that for some reason (that must be the
>> design/architecture?) Opterons appear _much_ more 
>> interactive/responsive than Xeons under heavy load ..
>> Another note, this on TIPC - link window for 100mbit nets should be 
>> at least 256 if one wants to do any serious communication between a 
>> dozen or more nodes. Also for a gbit net link windows above 1024 seem 
>> to really confuse the stack when face with high output packet rate.
>>
>> Regards,
>> Peter Litov.
>>
>>
>> Martin Peylo ??????:
>>     
>>> Hi,
>>>
>>> I'll try to help with the Wireshark side of this
>>>       
>> problem.
>>     
>>> On 3/4/08, Jon Maloy <[EMAIL PROTECTED]>
>>>       
>> wrote:
>>     
>>>   
>>>       
>>>>  Strangely enough, node 1.1.12 continues to ack
>>>>         
>> packets
>>     
>>>>  which we don't see in wireshark (is it possible
>>>>         
>> that
>>     
>>>>  wireshark can miss packets?). It goes on acking
>>>>         
>> packets
>>     
>>>>  up to the one with sequence number 53967, (on of
>>>>         
>> the
>>     
>>>>  "invisible" packets, but from there on it is
>>>>         
>> stop.
>>     
>>>>     
>>>>         
>>> I've never encountered Wireshark missing packets
>>>       
>> so far. While it
>>     
>>> sounds as it wouldn't be a problem with the TIPC
>>>       
>> dissector, could you
>>     
>>> please send me a trace file so I can definitely
>>>       
>> exclude this cause of
>>     
>>> defect? I've tried to get it from the link quoted
>>>       
>> in the mail from Jon
>>     
>>> but it seems it was already removed.
>>>
>>>   
>>>       
>>>>  [...]
>>>>     
>>>>         
>>>   
>>>       
>>>>  As a sum of this, I start to suspect your
>>>>         
>> Ethernet
>>     
>>>>  driver. It seems like it sometimes delivers
>>>>         
>> packets
>>     
>>>>  to TIPC which it does not deliver to Wireshark,
>>>>         
>> and
>>     
>>>>  vice versa. This seems to happen after a period
>>>>         
>> of
>>     
>>>>  high traffic, and only with messages beyond a
>>>>         
>> certain
>>     
>>>>  size, since the State  messages always go
>>>>         
>> through.
>>     
>>>>  Can you see any pattern in the direction the
>>>>         
>> links
>>     
>>>>  go stale, with reference to which driver you are  using. (E.g., is 
>>>> there always an e1000 driver
>>>>         
>> involved
>>     
>>>>  on the receiving end in the stale direction?)  Does this happen 
>>>> when you only run one type of
>>>>         
>> driver?
>>     
>>>>     
>>>>         
>>> I've not yet gone that deep into package capture,
>>>       
>> so I can't say much
>>     
>>> about that. Peter, could you send a mail to one of
>>>       
>> the Wireshark
>>     
>>> mailing lists describing the problem? Have you
>>>       
>> tried capturing other
>>     
>>> kinds of high traffic with less ressource hungy
>>>       
>> capture frontends?
>>     
>>> Best regards,
>>> Martin
>>>
>>>
>>>   
>>>       
>>     
> ----------------------------------------------------------------------
> ---
>   
>> This SF.net email is sponsored by: Microsoft Defy all challenges. 
>> Microsoft(R) Visual Studio 2008.
>>
>>     
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>   
>> _______________________________________________
>> tipc-discussion mailing list
>> [email protected]
>>
>>     
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>   
>
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) 
Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) 
Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] RE : Re: Link related question/issue

Reply via email to