Hi All, We use RH 7.3 and OVS 2.5.0. From time to time we face the problem below where the OVS has a segmentation fault and the guest loses connectivity. It seems that the monitoring thread can not restart the OVS. When we manually restart the OVS, we no longer see the issue but we still need a restart of the guest to get the connectivity back. We can easily replicate this by sending a member join (mcast) msg from one of the guests to join a cluster and everytime we do that, we have this problem.
We have assigned socket-mem 4096,1024 and -n 4. Any ideas on resolving this or is this something known and got fixed in the upstream? ovs-vsctl show cfd11977-ab4a-4a73-a952-4e8ed2941ded Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge "ovsdpdkbr0" Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "ovsdpdkbr0" Interface "ovsdpdkbr0" type: internal Port "phy-ovsdpdkbr0" Interface "phy-ovsdpdkbr0" type: patch options: {peer="int-ovsdpdkbr0"} Port "dpdkbond0" *Interface "dpdk0" type: dpdk error: "could not open network device dpdk0 (Cannot allocate memory)" Interface "dpdk3" type: dpdk error: "could not open network device dpdk3 (Cannot allocate memory)" Interface "dpdk1" type: dpdk error: "could not open network device dpdk1 (Cannot allocate memory)" * ovs-vswitchd.log: 2017-02-06T08:11:45.466Z|00287|coverage(vhost_thread1)|INFO|48 events never hit 2017-02-06T08:11:45.466Z|00288|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhu67e39e1b-4f' 2 has been removed 2017-02-06T08:11:45.538Z|00289|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhue6e88b22-59' 1 has been removed 2017-02-06T08:11:56.719Z|00290|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu67e39e1b-4f' 2 changed to 'enabled' 2017-02-06T08:11:56.719Z|00291|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhue6e88b22-59' 1 changed to 'enabled' 2017-02-06T08:11:59.373Z|00292|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhu67e39e1b-4f' 2 has been added 2017-02-06T08:11:59.373Z|00293|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhue6e88b22-59' 1 has been added 2017-02-06T08:11:59.373Z|00294|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhu67e39e1b-4f' 2 changed to 'enabled' 2017-02-06T08:11:59.373Z|00295|dpdk(vhost_thread1)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vhue6e88b22-59' 1 changed to 'enabled' *2017-02-06T08:49:08.548Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid 32916 died, killed (Segmentation fault), core dumped, restarting* 2017-02-06T08:49:08.589Z|00004|ovs_numa|INFO|Discovered 24 CPU cores on NUMA node 0 2017-02-06T08:49:08.589Z|00005|ovs_numa|INFO|Discovered 24 CPU cores on NUMA node 1 2017-02-06T08:49:08.589Z|00006|ovs_numa|INFO|Discovered 2 NUMA nodes and 48 CPU cores 2017-02-06T08:49:08.589Z|00007|memory|INFO|87988 kB peak resident set size after 3079.5 seconds 2017-02-06T08:49:08.589Z|00008|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-02-06T08:49:08.589Z|00009|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-02-06T08:49:08.598Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation 2017-02-06T08:49:08.598Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3 2017-02-06T08:49:08.598Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids 2017-02-06T08:49:08.598Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state 2017-02-06T08:49:08.598Z|00014|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_zone 2017-02-06T08:49:08.598Z|00015|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_mark 2017-02-06T08:49:08.598Z|00016|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_label 2017-02-06T08:49:08.639Z|00017|bridge|INFO|bridge ovsdpdkbr0: added interface ovsdpdkbr0 on port 65534 2017-02-06T08:49:08.639Z|00018|bridge|WARN|could not open network device dpdk3 (Cannot allocate memory) 2017-02-06T08:49:08.639Z|00019|bridge|WARN|could not open network device dpdk0 (Cannot allocate memory) 2017-02-06T08:49:08.639Z|00020|bridge|WARN|could not open network device dpdk1 (Cannot allocate memory) 2017-02-06T08:49:08.639Z|00021|bridge|INFO|bridge ovsdpdkbr0: added interface phy-ovsdpdkbr0 on port 4 2017-02-06T08:49:08.639Z|00022|bridge|INFO|bridge br-int: added interface int-ovsdpdkbr0 on port 7 2017-02-06T08:49:08.639Z|00023|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhue6e88b22-59 2017-02-06T08:49:08.639Z|00024|bridge|WARN|could not open network device vhue6e88b22-59 (Unknown error -1) 2017-02-06T08:49:08.639Z|00025|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhue1fe2f9a-29 2017-02-06T08:49:08.639Z|00026|bridge|WARN|could not open network device vhue1fe2f9a-29 (Unknown error -1) 2017-02-06T08:49:08.639Z|00027|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhubd9d8402-d6 2017-02-06T08:49:08.639Z|00028|bridge|WARN|could not open network device vhubd9d8402-d6 (Unknown error -1) 2017-02-06T08:49:08.639Z|00029|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhue6013e82-f7 2017-02-06T08:49:08.639Z|00030|bridge|WARN|could not open network device vhue6013e82-f7 (Unknown error -1) 2017-02-06T08:49:08.639Z|00031|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu8d589307-cc 2017-02-06T08:49:08.639Z|00032|bridge|WARN|could not open network device vhu8d589307-cc (Unknown error -1) 2017-02-06T08:49:08.639Z|00033|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu949db271-aa 2017-02-06T08:49:08.639Z|00034|bridge|WARN|could not open network device vhu949db271-aa (Unknown error -1) 2017-02-06T08:49:08.639Z|00035|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu0f0f88cd-a8 2017-02-06T08:49:08.639Z|00036|bridge|WARN|could not open network device vhu0f0f88cd-a8 (Unknown error -1) 2017-02-06T08:49:08.640Z|00037|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu600b1c3c-47 2017-02-06T08:49:08.640Z|00038|bridge|WARN|could not open network device vhu600b1c3c-47 (Unknown error -1) 2017-02-06T08:49:08.640Z|00039|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhuf72a5c18-95 2017-02-06T08:49:08.640Z|00040|bridge|WARN|could not open network device vhuf72a5c18-95 (Unknown error -1) 2017-02-06T08:49:08.640Z|00041|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhua39fa843-c3 2017-02-06T08:49:08.640Z|00042|bridge|WARN|could not open network device vhua39fa843-c3 (Unknown error -1) 2017-02-06T08:49:08.640Z|00043|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhude918d08-ec 2017-02-06T08:49:08.640Z|00044|bridge|WARN|could not open network device vhude918d08-ec (Unknown error -1) 2017-02-06T08:49:08.640Z|00045|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu13d398ba-fc 2017-02-06T08:49:08.640Z|00046|bridge|WARN|could not open network device vhu13d398ba-fc (Unknown error -1) 2017-02-06T08:49:08.640Z|00047|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhub51d2503-42 2017-02-06T08:49:08.640Z|00048|bridge|WARN|could not open network device vhub51d2503-42 (Unknown error -1) 2017-02-06T08:49:08.640Z|00049|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu6dbc3341-c9 2017-02-06T08:49:08.640Z|00050|bridge|WARN|could not open network device vhu6dbc3341-c9 (Unknown error -1) 2017-02-06T08:49:08.640Z|00051|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhua4ab858d-e5 2017-02-06T08:49:08.640Z|00052|bridge|WARN|could not open network device vhua4ab858d-e5 (Unknown error -1) 2017-02-06T08:49:08.640Z|00053|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu554d961a-6b 2017-02-06T08:49:08.640Z|00054|bridge|WARN|could not open network device vhu554d961a-6b (Unknown error -1) 2017-02-06T08:49:08.640Z|00055|bridge|INFO|bridge br-int: added interface br-int on port 65534 2017-02-06T08:49:08.641Z|00056|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu67e39e1b-4f 2017-02-06T08:49:08.641Z|00057|bridge|WARN|could not open network device vhu67e39e1b-4f (Unknown error -1) 2017-02-06T08:49:08.641Z|00058|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhufb7d1a45-2f 2017-02-06T08:49:08.641Z|00059|bridge|WARN|could not open network device vhufb7d1a45-2f (Unknown error -1) 2017-02-06T08:49:08.641Z|00060|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhu00e22fb6-ed 2017-02-06T08:49:08.641Z|00061|bridge|WARN|could not open network device vhu00e22fb6-ed (Unknown error -1) 2017-02-06T08:49:08.641Z|00062|bridge|INFO|bridge ovsdpdkbr0: using datapath ID 000002540980bf41 2017-02-06T08:49:08.642Z|00063|connmgr|INFO|ovsdpdkbr0: added service controller "punix:/var/run/openvswitch/ovsdpdkbr0.mgmt" 2017-02-06T08:49:08.642Z|00064|connmgr|INFO|ovsdpdkbr0: added primary controller "tcp:127.0.0.1:6633" 2017-02-06T08:49:08.642Z|00065|rconn|INFO|ovsdpdkbr0<->tcp:127.0.0.1:6633: connecting... 2017-02-06T08:49:08.690Z|00066|bridge|INFO|bridge br-int: using datapath ID 0000065189af7548 2017-02-06T08:49:08.690Z|00067|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2017-02-06T08:49:08.690Z|00068|connmgr|INFO|br-int: added primary controller "tcp:127.0.0.1:6633" 2017-02-06T08:49:08.690Z|00069|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connecting... 2017-02-06T08:49:08.740Z|00070|rconn|INFO|ovsdpdkbr0<->tcp:127.0.0.1:6633: connected 2017-02-06T08:49:08.740Z|00071|rconn|INFO|br-int<->tcp:127.0.0.1:6633: connected 2017-02-06T08:49:08.741Z|00072|bridge|WARN|could not open network device dpdk3 (Cannot allocate memory) 2017-02-06T08:49:08.741Z|00073|bridge|WARN|could not open network device dpdk0 (Cannot allocate memory) 2017-02-06T08:49:08.741Z|00074|bridge|WARN|could not open network device dpdk1 (Cannot allocate memory) 2017-02-06T08:49:08.741Z|00075|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhue6e88b22-59 2017-02-06T08:49:08.741Z|00076|bridge|WARN|could not open network device vhue6e88b22-59 (Unknown error -1) 2017-02-06T08:49:08.742Z|00077|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhue1fe2f9a-29 2017-02-06T08:49:08.742Z|00078|bridge|WARN|could not open network device vhue1fe2f9a-29 (Unknown error -1) 2017-02-06T08:49:08.742Z|00079|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhubd9d8402-d6 2017-02-06T08:49:08.742Z|00080|bridge|WARN|could not open network device vhubd9d8402-d6 (Unknown error -1) 2017-02-06T08:49:08.742Z|00081|dpdk|ERR|vhost-user socket device setup failure for socket /var/run/openvswitch/vhue6013e82-f7 some other logs from another crash time Feb 6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to allocate memory for mbuf. Feb 6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to allocate memory for mbuf. Feb 6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to allocate memory for mbuf. Feb 6 10:36:32 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to allocate memory for mbuf. Feb 6 10:36:33 compute-06 ovs-vswitchd[3122]: VHOST_DATA: Failed to allocate memory for mbuf. Feb 6 10:36:33 compute-06 kernel: vhost_thread1[3124]: segfault at 20 ip 00007f6fc6e753b7 sp 00007f6fc21185d8 error 4 in libc-2.17.so [7f6fc6d41000+1b6000] Feb 6 10:36:34 compute-06 ovs-vswitchd[3121]: ovs|00003|daemon_unix(monitor)|ERR|1 crashes: pid 3122 died, killed (Segmentation fault), core dumped, restarting Mon 2017-02-06 10:36:34.134521 +03 [s=339299f7b2f14e9490e6f7d0c4652709;i=1415;b=281e7a2d64be4faa8e1823110f040ed3;m=f1c74de56;t=547d7b0d97330;x=d9afa675780dc6bb] _UID=0 _SYSTEMD_SLICE=system.slice _BOOT_ID=281e7a2d64be4faa8e1823110f040ed3 _MACHINE_ID=29b9358ef5a34f78b9c6a781c33ee5b4 PRIORITY=3 SYSLOG_FACILITY=3 _CAP_EFFECTIVE=1fffffffff _HOSTNAME=compute-06.turkcell.com.tr _TRANSPORT=syslog _GID=107 _SYSTEMD_CGROUP=/system.slice/openvswitch-nonetwork.service _SYSTEMD_UNIT=openvswitch-nonetwork.service _SELINUX_CONTEXT=system_u:system_r:openvswitch_t:s0 SYSLOG_IDENTIFIER=ovs-vswitchd _EXE=/usr/sbin/ovs-vswitchd SYSLOG_PID=3121 MESSAGE=ovs|00003|daemon_unix(monitor)|ERR|1 crashes: pid 3122 died, killed (Segmentation fault), core dumped, restarting _PID=3121 _COMM=monitor _CMDLINE=ovs-vswit... --dpdk -l 1,2,3,25,26,27,13,37 -n 4 --socket-mem 4096 1024 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor _SOURCE_REALTIME_TIMESTAMP=1486366594134521 thanks, regards -- BBD