Re: [vpp-dev] Crash in vlib_add_trace with multi worker mode

2020-06-02 Thread Dave Barach via lists.fd.io
Unless you fully communicate your configuration, you’ll have to debug the issue 
yourself. Are you using the standard handoff mechanism, or a mechanism of your 
own design?

The handoff demo plugin seems to work fine... See 
../src/examples/handoffdemo/{README.md, node.c} etc.

DBGvpp# sh trace

--- Start of thread 0 vpp_main ---
No packets in trace buffer
--- Start of thread 1 vpp_wk_0 ---
Packet 1

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x100
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 2

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x101
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 3

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x102
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 4

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x103
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 5

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x104
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

--- Start of thread 2 vpp_wk_1 ---
Packet 1

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 0
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 2

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 1
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 3

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 2
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 4

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 3
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 5

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 4
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

DBGvpp#

From: vpp-dev@lists.fd.io  On Behalf Of Satya Murthy
Sent: Tuesday, June 2, 2020 7:11 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Crash in vlib_add_trace with multi worker mode

Hi ,

We are seeing a crash while doing add_trace for a vlib_buffer in our graph node.

#0 0x74ee0feb in raise () from /lib64/libc.so.6
#1 0x74ecb5c1 in abort () from /lib64/libc.so.6
#2 0x0040831c in os_panic () at 
/fdio/src/fdio.1810/src/vpp/vnet/main.c:368
#3 0x75f28f2f in debugger () at 
/fdio/src/fdio.1810/src/vppinfra/error.c:84
#4 0x75f2936a in _clib_error (how_to_die=2

[vpp-dev] Crash in vlib_add_trace with multi worker mode

2020-06-02 Thread Satya Murthy
Hi ,

We are seeing a crash while doing add_trace for a vlib_buffer in our graph node.

#0 0x74ee0feb in raise () from /lib64/libc.so.6
#1 0x74ecb5c1 in abort () from /lib64/libc.so.6
#2 0x0040831c in os_panic () at 
/fdio/src/fdio.1810/src/vpp/vnet/main.c:368
#3 0x75f28f2f in debugger () at 
/fdio/src/fdio.1810/src/vppinfra/error.c:84
#4 0x75f2936a in _clib_error (how_to_die=2, function_name=0x0, 
line_number=0, fmt=0x7fffb2083920 "%s:%d (%s) assertion `%s' fails")
at /fdio/src/fdio.1810/src/vppinfra/error.c:143
#5 0x7fffb2035d79 in vlib_validate_trace (tm=0x7fffbaccfd58, 
b=0x2aaab447d940)
at /fdio/src/fdio.1810/src/vlib/trace_funcs.h:53
#6 0x7fffb2035ec8 in vlib_add_trace (vm=0x7fffbaccfb40, r=0x7fffbb430f40, 
b=0x2aaab447d940, n_data_bytes=36)

The assert is pointing to following line in the code.
ASSERT (!pool_is_free_index (tm->trace_buffer_pool,
vlib_buffer_get_trace_index (b)));

One specific point with this crash is:
This crash is happening only when we have multiple workers in VPP. Following is 
the scenario this crash is happening.
1. Packet comes to worker-1 from nic card
2. graph node on worker-1 hands off the packet to worker-2
3. In worker-2, while processing the packet, we are trying to add the trace 
using vlib_add_trace and this crash occurs.

The trace buffer within vlib_buffer_t, is it specific to worker ?
If so, what happens, when the buffer gets hand off to another worker ?
Can this cause the above crash ?

--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16613): https://lists.fd.io/g/vpp-dev/message/16613
Mute This Topic: https://lists.fd.io/mt/74625018/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-