Thanks for the patch, merged... 

The cpu tick counters are different on each thread, so calling vlib_time_now 
(wrong_vlib_main_t *) wrecks the victim thread's timebase. Knock-on effects 
include all manner of obscure / hard-to-reproduce failures.

Dave  

-----Original Message-----
From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Elias Rudberg
Sent: Thursday, May 7, 2020 10:17 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Fix in LACP code to avoid assertion failure in 
vlib_time_now()

Hello VPP experts,

When trying the current VPP master branch using a debug build we encountered an 
assertion failure in vlib_time_now() here:

always_inline f64
vlib_time_now (vlib_main_t * vm)
{
#if CLIB_DEBUG > 0
  extern __thread uword __os_thread_index; #endif
  /*
   * Make sure folks don't pass &vlib_global_main from a worker thread.
   */
  ASSERT (vm->thread_index == __os_thread_index);
  return clib_time_now (&vm->clib_time) + vm->time_offset; }

The ASSERT there is triggered because the LACP code passes &vlib_global_main 
when it should pass a thread-specific vlib_main_t. So this looks like precisely 
the kind of issue that the assertion was made to catch.

To reproduce the problem I think it should be anough to use LACP in a 
multi-threaded scenario, using a debug build, then the assertion failure 
happens directy at startup, every time.

I pushed a fix, here: https://gerrit.fd.io/r/c/vpp/+/26943

After that fix it seems to work, LACP then works without assertion failure. 
Please have a look and merge if it seems okay.

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16271): https://lists.fd.io/g/vpp-dev/message/16271
Mute This Topic: https://lists.fd.io/mt/74051150/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to