None of the routine names in the backtrace exist in master/latest – it’s your 
code - so it will be challenging for the community to help you.

See if you can repro the problem with a TAG=vpp_debug images (aka “make build” 
not “make build-release”). If you’re lucky, one of the numerous ASSERTs will 
catch the problem early.

vlib_get_frame_to_node(...) is not new code, it’s used all over the place, and 
it needs “help” to fail as shown below.

D.

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Hugo Garza
Sent: Tuesday, November 27, 2018 7:39 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] SIGSEGV after calling vlib_get_frame_to_node

Hi vpp-dev,

I'm seeing a crash when I enable our application with multiple works.
Nov 26 14:29:32  vnet[64035]: received signal SIGSEGV, PC 0x7f6979a12ce8, 
faulting address 0x7fa6cd0bd444
Nov 26 14:29:32  vnet[64035]: #0  0x00007f6a812743d8 0x7f6a812743d8
Nov 26 14:29:32  vnet[64035]: #1  0x00007f6a80bc56d0 0x7f6a80bc56d0
Nov 26 14:29:32  vnet[64035]: #2  0x00007f6979a12ce8 vlib_frame_vector_args + 
0x10
Nov 26 14:29:32  vnet[64035]: #3  0x00007f6979a16a2c tcpo_enqueue_to_output_i + 
0xf4
Nov 26 14:29:32  vnet[64035]: #4  0x00007f6979a16b23 tcpo_enqueue_to_output + 
0x25
Nov 26 14:29:32  vnet[64035]: #5  0x00007f6979a33fba send_packets + 0x7f2
Nov 26 14:29:32  vnet[64035]: #6  0x00007f6979a346f8 connection_tx + 0x17e
Nov 26 14:29:32  vnet[64035]: #7  0x00007f6979a34f08 tcpo_dispatch_node_fn + 
0x7fa
Nov 26 14:29:32  vnet[64035]: #8  0x00007f6a81248cb6 vlib_worker_loop + 0x6a6
Nov 26 14:29:32  vnet[64035]: #9  0x00007f6a8094f694 0x7f6a8094f694

Running on CentOS 7.4  with kernel 3.10.0-693.el7.x86_64
VPP
Version:                  v18.10-13~g00adcce~b60
Compiled by:              root
Compile host:             b0f32e97e93a
Compile date:             Mon Nov 26 09:09:42 UTC 2018
Compile location:         /w/workspace/vpp-merge-1810-centos7
Compiler:                 GCC 7.3.1 20180303 (Red Hat 7.3.1-5)
Current PID:              9612

On a Cisco server with 2 socket Intel Xeon E5-2697Av4 @ 2.60GHz and 2 Intel 
X520 NICs. T-Rex traffic generator is hooked up on the other end to provided 
data at about 5Gbps per NIC.
./t-rex-64 --astf -f astf/nginx_wget.py -c 14 -m 40000 -d 3000

startup.conf
unix {
  nodaemon
  interactive
  log /opt/tcpo/logs/vpp.log
  full-coredump
  cli-no-banner
  #startup-config /opt/tcpo/conf/local.conf
  cli-listen /run/vpp/cli.sock
}
api-trace {
  on
}
heapsize 3G
cpu {
  main-core 1
  corelist-workers 2-5
}
tcpo {
runtime-config /opt/tcpo/conf/runtime.conf
session-pool-size 1024000
}
dpdk {
  dev 0000:86:00.0 {
    num-rx-queues 1
  }
  dev 0000:86:00.1 {
    num-rx-queues 1
  }
  dev 0000:84:00.0 {
    num-rx-queues 1
  }
  dev 0000:84:00.1 {
    num-rx-queues 1
  }
  num-mbufs 1024000
  socket-mem 4096,4096
}
plugin_path /usr/lib/vpp_plugins
api-segment {
  gid vpp
}

Here's the function where the SIGSEGV is happening:

static void enqueue_to_output_i(tcpo_worker_ctx_t * wrk, u32 bi, u8 flush) {

    u32 *to_next, next_index;

    vlib_frame_t *f;



    TRACE_FUNC_VAR(bi);



    next_index = tcpo_output_node.index;



    /* Get frame to output node */

    f = wrk->tx_frame;

    if (!f) {

        f = vlib_get_frame_to_node(wrk->vm, next_index);

        ASSERT (clib_mem_is_heap_object (f));

        wrk->tx_frame = f;

    }

    ASSERT (clib_mem_is_heap_object (f));



    to_next = vlib_frame_vector_args(f);

    to_next[f->n_vectors] = bi;

    f->n_vectors += 1;



    if (flush || f->n_vectors == VLIB_FRAME_SIZE) {

        TRACE_FUNC_VAR2(flush, f->n_vectors);

        vlib_put_frame_to_node(wrk->vm, next_index, f);

        wrk->tx_frame = 0;

    }

}


I've observed that after a few Gbps of traffic go through and we call 
vlib_get_frame_to_node the pointer f that gets returned points to a chunk of 
memory that is invalid as confirmed by the assert statement that I added 
afterwards right below.

Not sure how to progress further on tracking down this issue, any help or 
advice would be much appreciated.

Thanks,
Hugo
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11444): https://lists.fd.io/g/vpp-dev/message/11444
Mute This Topic: https://lists.fd.io/mt/28408842/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to