Hi All,

We use RH 7.3 and OVS 2.5.0. From time to time we face the problem below
where the OVS has a segmentation fault and the guest loses connectivity. It
seems that the monitoring thread can not restart the OVS. When we manually
restart the OVS, we no longer see the issue but we still need a restart of
the guest to get the connectivity back. We can easily replicate this by
sending a member join (mcast) msg from one of the guests to join a cluster
and everytime we do that, we have this problem.

We have assigned socket-mem 4096,1024 and -n 4.  Any ideas on resolving
this or is this something known and got fixed in the upstream?

ovs-vsctl show
cfd11977-ab4a-4a73-a952-4e8ed2941ded
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge "ovsdpdkbr0"
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "ovsdpdkbr0"
            Interface "ovsdpdkbr0"
                type: internal
        Port "phy-ovsdpdkbr0"
            Interface "phy-ovsdpdkbr0"
                type: patch
                options: {peer="int-ovsdpdkbr0"}
        Port "dpdkbond0"










*Interface "dpdk0"                type: dpdk                error: "could
not open network device dpdk0 (Cannot allocate memory)"
Interface "dpdk3"                type: dpdk                error: "could
not open network device dpdk3 (Cannot allocate memory)"
Interface "dpdk1"                type: dpdk                error: "could
not open network device dpdk1 (Cannot allocate memory)" *
ovs-vswitchd.log:

2017-02-06T08:11:45.466Z|00287|coverage(vhost_thread1)|INFO|48 events never
hit
2017-02-06T08:11:45.466Z|00288|dpdk(vhost_thread1)|INFO|vHost Device
'/var/run/openvswitch/vhu67e39e1b-4f' 2 has been removed
2017-02-06T08:11:45.538Z|00289|dpdk(vhost_thread1)|INFO|vHost Device
'/var/run/openvswitch/vhue6e88b22-59' 1 has been removed
2017-02-06T08:11:56.719Z|00290|dpdk(vhost_thread1)|INFO|State of queue 0 (
tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu67e39e1b-4f' 2 changed
to 'enabled'
2017-02-06T08:11:56.719Z|00291|dpdk(vhost_thread1)|INFO|State of queue 0 (
tx_qid 0 ) of vhost device '/var/run/openvswitch/vhue6e88b22-59' 1 changed
to 'enabled'
2017-02-06T08:11:59.373Z|00292|dpdk(vhost_thread1)|INFO|vHost Device
'/var/run/openvswitch/vhu67e39e1b-4f' 2 has been added
2017-02-06T08:11:59.373Z|00293|dpdk(vhost_thread1)|INFO|vHost Device
'/var/run/openvswitch/vhue6e88b22-59' 1 has been added
2017-02-06T08:11:59.373Z|00294|dpdk(vhost_thread1)|INFO|State of queue 0 (
tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu67e39e1b-4f' 2 changed
to 'enabled'
2017-02-06T08:11:59.373Z|00295|dpdk(vhost_thread1)|INFO|State of queue 0 (
tx_qid 0 ) of vhost device '/var/run/openvswitch/vhue6e88b22-59' 1 changed
to 'enabled'
*2017-02-06T08:49:08.548Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid
32916 died, killed (Segmentation fault), core dumped, restarting*
2017-02-06T08:49:08.589Z|00004|ovs_numa|INFO|Discovered 24 CPU cores on
NUMA node 0
2017-02-06T08:49:08.589Z|00005|ovs_numa|INFO|Discovered 24 CPU cores on
NUMA node 1
2017-02-06T08:49:08.589Z|00006|ovs_numa|INFO|Discovered 2 NUMA nodes and 48
CPU cores
2017-02-06T08:49:08.589Z|00007|memory|INFO|87988 kB peak resident set size
after 3079.5 seconds
2017-02-06T08:49:08.589Z|00008|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
connecting...
2017-02-06T08:49:08.589Z|00009|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
connected
2017-02-06T08:49:08.598Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
Datapath supports recirculation
2017-02-06T08:49:08.598Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS
label stack length probed as 3
2017-02-06T08:49:08.598Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev:
Datapath supports unique flow ids
2017-02-06T08:49:08.598Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev:
Datapath does not support ct_state
2017-02-06T08:49:08.598Z|00014|ofproto_dpif|INFO|netdev@ovs-netdev:
Datapath does not support ct_zone
2017-02-06T08:49:08.598Z|00015|ofproto_dpif|INFO|netdev@ovs-netdev:
Datapath does not support ct_mark
2017-02-06T08:49:08.598Z|00016|ofproto_dpif|INFO|netdev@ovs-netdev:
Datapath does not support ct_label
2017-02-06T08:49:08.639Z|00017|bridge|INFO|bridge ovsdpdkbr0: added
interface ovsdpdkbr0 on port 65534
2017-02-06T08:49:08.639Z|00018|bridge|WARN|could not open network device
dpdk3 (Cannot allocate memory)
2017-02-06T08:49:08.639Z|00019|bridge|WARN|could not open network device
dpdk0 (Cannot allocate memory)
2017-02-06T08:49:08.639Z|00020|bridge|WARN|could not open network device
dpdk1 (Cannot allocate memory)
2017-02-06T08:49:08.639Z|00021|bridge|INFO|bridge ovsdpdkbr0: added
interface phy-ovsdpdkbr0 on port 4
2017-02-06T08:49:08.639Z|00022|bridge|INFO|bridge br-int: added interface
int-ovsdpdkbr0 on port 7
2017-02-06T08:49:08.639Z|00023|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhue6e88b22-59
2017-02-06T08:49:08.639Z|00024|bridge|WARN|could not open network device
vhue6e88b22-59 (Unknown error -1)
2017-02-06T08:49:08.639Z|00025|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhue1fe2f9a-29
2017-02-06T08:49:08.639Z|00026|bridge|WARN|could not open network device
vhue1fe2f9a-29 (Unknown error -1)
2017-02-06T08:49:08.639Z|00027|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhubd9d8402-d6
2017-02-06T08:49:08.639Z|00028|bridge|WARN|could not open network device
vhubd9d8402-d6 (Unknown error -1)
2017-02-06T08:49:08.639Z|00029|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhue6013e82-f7
2017-02-06T08:49:08.639Z|00030|bridge|WARN|could not open network device
vhue6013e82-f7 (Unknown error -1)
2017-02-06T08:49:08.639Z|00031|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu8d589307-cc
2017-02-06T08:49:08.639Z|00032|bridge|WARN|could not open network device
vhu8d589307-cc (Unknown error -1)
2017-02-06T08:49:08.639Z|00033|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu949db271-aa
2017-02-06T08:49:08.639Z|00034|bridge|WARN|could not open network device
vhu949db271-aa (Unknown error -1)
2017-02-06T08:49:08.639Z|00035|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu0f0f88cd-a8
2017-02-06T08:49:08.639Z|00036|bridge|WARN|could not open network device
vhu0f0f88cd-a8 (Unknown error -1)
2017-02-06T08:49:08.640Z|00037|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu600b1c3c-47
2017-02-06T08:49:08.640Z|00038|bridge|WARN|could not open network device
vhu600b1c3c-47 (Unknown error -1)
2017-02-06T08:49:08.640Z|00039|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhuf72a5c18-95
2017-02-06T08:49:08.640Z|00040|bridge|WARN|could not open network device
vhuf72a5c18-95 (Unknown error -1)
2017-02-06T08:49:08.640Z|00041|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhua39fa843-c3
2017-02-06T08:49:08.640Z|00042|bridge|WARN|could not open network device
vhua39fa843-c3 (Unknown error -1)
2017-02-06T08:49:08.640Z|00043|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhude918d08-ec
2017-02-06T08:49:08.640Z|00044|bridge|WARN|could not open network device
vhude918d08-ec (Unknown error -1)
2017-02-06T08:49:08.640Z|00045|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu13d398ba-fc
2017-02-06T08:49:08.640Z|00046|bridge|WARN|could not open network device
vhu13d398ba-fc (Unknown error -1)
2017-02-06T08:49:08.640Z|00047|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhub51d2503-42
2017-02-06T08:49:08.640Z|00048|bridge|WARN|could not open network device
vhub51d2503-42 (Unknown error -1)
2017-02-06T08:49:08.640Z|00049|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu6dbc3341-c9
2017-02-06T08:49:08.640Z|00050|bridge|WARN|could not open network device
vhu6dbc3341-c9 (Unknown error -1)
2017-02-06T08:49:08.640Z|00051|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhua4ab858d-e5
2017-02-06T08:49:08.640Z|00052|bridge|WARN|could not open network device
vhua4ab858d-e5 (Unknown error -1)
2017-02-06T08:49:08.640Z|00053|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu554d961a-6b
2017-02-06T08:49:08.640Z|00054|bridge|WARN|could not open network device
vhu554d961a-6b (Unknown error -1)
2017-02-06T08:49:08.640Z|00055|bridge|INFO|bridge br-int: added interface
br-int on port 65534
2017-02-06T08:49:08.641Z|00056|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu67e39e1b-4f
2017-02-06T08:49:08.641Z|00057|bridge|WARN|could not open network device
vhu67e39e1b-4f (Unknown error -1)
2017-02-06T08:49:08.641Z|00058|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhufb7d1a45-2f
2017-02-06T08:49:08.641Z|00059|bridge|WARN|could not open network device
vhufb7d1a45-2f (Unknown error -1)
2017-02-06T08:49:08.641Z|00060|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhu00e22fb6-ed
2017-02-06T08:49:08.641Z|00061|bridge|WARN|could not open network device
vhu00e22fb6-ed (Unknown error -1)
2017-02-06T08:49:08.641Z|00062|bridge|INFO|bridge ovsdpdkbr0: using
datapath ID 000002540980bf41
2017-02-06T08:49:08.642Z|00063|connmgr|INFO|ovsdpdkbr0: added service
controller "punix:/var/run/openvswitch/ovsdpdkbr0.mgmt"
2017-02-06T08:49:08.642Z|00064|connmgr|INFO|ovsdpdkbr0: added primary
controller "tcp:127.0.0.1:6633"
2017-02-06T08:49:08.642Z|00065|rconn|INFO|ovsdpdkbr0<->tcp:127.0.0.1:6633:
connecting...
2017-02-06T08:49:08.690Z|00066|bridge|INFO|bridge br-int: using datapath ID
0000065189af7548
2017-02-06T08:49:08.690Z|00067|connmgr|INFO|br-int: added service
controller "punix:/var/run/openvswitch/br-int.mgmt"
2017-02-06T08:49:08.690Z|00068|connmgr|INFO|br-int: added primary
controller "tcp:127.0.0.1:6633"
2017-02-06T08:49:08.690Z|00069|rconn|INFO|br-int<->tcp:127.0.0.1:6633:
connecting...
2017-02-06T08:49:08.740Z|00070|rconn|INFO|ovsdpdkbr0<->tcp:127.0.0.1:6633:
connected
2017-02-06T08:49:08.740Z|00071|rconn|INFO|br-int<->tcp:127.0.0.1:6633:
connected
2017-02-06T08:49:08.741Z|00072|bridge|WARN|could not open network device
dpdk3 (Cannot allocate memory)
2017-02-06T08:49:08.741Z|00073|bridge|WARN|could not open network device
dpdk0 (Cannot allocate memory)
2017-02-06T08:49:08.741Z|00074|bridge|WARN|could not open network device
dpdk1 (Cannot allocate memory)
2017-02-06T08:49:08.741Z|00075|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhue6e88b22-59
2017-02-06T08:49:08.741Z|00076|bridge|WARN|could not open network device
vhue6e88b22-59 (Unknown error -1)
2017-02-06T08:49:08.742Z|00077|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhue1fe2f9a-29
2017-02-06T08:49:08.742Z|00078|bridge|WARN|could not open network device
vhue1fe2f9a-29 (Unknown error -1)
2017-02-06T08:49:08.742Z|00079|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhubd9d8402-d6
2017-02-06T08:49:08.742Z|00080|bridge|WARN|could not open network device
vhubd9d8402-d6 (Unknown error -1)
2017-02-06T08:49:08.742Z|00081|dpdk|ERR|vhost-user socket device setup
failure for socket /var/run/openvswitch/vhue6013e82-f7


some other logs from another crash time

Feb  6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to
allocate memory for mbuf.
Feb  6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to
allocate memory for mbuf.
Feb  6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to
allocate memory for mbuf.
Feb  6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to
allocate memory for mbuf.
Feb  6 10:36:33 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to
allocate memory for mbuf.
Feb  6 10:36:33 compute-06 kernel: vhost_thread1[3124]: segfault at 20 ip
00007f6fc6e753b7 sp 00007f6fc21185d8 error 4 in libc-2.17.so
[7f6fc6d41000+1b6000]
Feb  6 10:36:34 compute-06 ovs-vswitchd[3121]:
ovs|00003|daemon_unix(monitor)|ERR|1 crashes: pid 3122 died, killed
(Segmentation fault), core dumped, restarting


Mon 2017-02-06 10:36:34.134521 +03
[s=339299f7b2f14e9490e6f7d0c4652709;i=1415;b=281e7a2d64be4faa8e1823110f040ed3;m=f1c74de56;t=547d7b0d97330;x=d9afa675780dc6bb]
    _UID=0
    _SYSTEMD_SLICE=system.slice
    _BOOT_ID=281e7a2d64be4faa8e1823110f040ed3
    _MACHINE_ID=29b9358ef5a34f78b9c6a781c33ee5b4
    PRIORITY=3
    SYSLOG_FACILITY=3
    _CAP_EFFECTIVE=1fffffffff
    _HOSTNAME=compute-06.turkcell.com.tr
    _TRANSPORT=syslog
    _GID=107
    _SYSTEMD_CGROUP=/system.slice/openvswitch-nonetwork.service
    _SYSTEMD_UNIT=openvswitch-nonetwork.service
    _SELINUX_CONTEXT=system_u:system_r:openvswitch_t:s0
    SYSLOG_IDENTIFIER=ovs-vswitchd
    _EXE=/usr/sbin/ovs-vswitchd
    SYSLOG_PID=3121
    MESSAGE=ovs|00003|daemon_unix(monitor)|ERR|1 crashes: pid 3122 died,
killed (Segmentation fault), core dumped, restarting
    _PID=3121
    _COMM=monitor
    _CMDLINE=ovs-vswit... --dpdk -l 1,2,3,25,26,27,13,37 -n 4 --socket-mem
4096 1024 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err
-vfile:info --mlockall --no-chdir
--log-file=/var/log/openvswitch/ovs-vswitchd.log
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
    _SOURCE_REALTIME_TIMESTAMP=1486366594134521

thanks, regards
-- 
BBD

Reply via email to