Ah, spoke to soon. 30 seconds later the network went down with IPv6 disabled. So it does appear to be a host forwarding problem, not a VM problem. I have an oVirt 4.0 cluster on the same network that doesn't have these issues, so it must be a configuration issue somewhere. Here is a dump of my ip config on the host:

[07:57:26]root@ovirt730-01 ~ # ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmNet state UP qlen 1000
    link/ether 18:66:da:eb:8f:c0 brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 18:66:da:eb:8f:c1 brd ff:ff:ff:ff:ff:ff
4: em3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 18:66:da:eb:8f:c2 brd ff:ff:ff:ff:ff:ff
5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 18:66:da:eb:8f:c3 brd ff:ff:ff:ff:ff:ff
6: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP qlen 1000
    link/ether f4:e9:d4:a9:7a:f0 brd ff:ff:ff:ff:ff:ff
7: p5p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether f4:e9:d4:a9:7a:f2 brd ff:ff:ff:ff:ff:ff
8: vmNet: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 18:66:da:eb:8f:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.180/24 brd 192.168.1.255 scope global vmNet
       valid_lft forever preferred_lft forever
10: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether f4:e9:d4:a9:7a:f0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.130.180/24 brd 192.168.130.255 scope global ovirtmgmt
       valid_lft forever preferred_lft forever
11: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether fa:f3:48:35:76:8d brd ff:ff:ff:ff:ff:ff
14: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN qlen 1000
    link/ether fe:16:3e:3f:fb:ec brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe3f:fbec/64 scope link
       valid_lft forever preferred_lft forever
15: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmNet state UNKNOWN qlen 1000
    link/ether fe:1a:4a:16:01:51 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc1a:4aff:fe16:151/64 scope link
       valid_lft forever preferred_lft forever
16: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmNet state UNKNOWN qlen 1000
    link/ether fe:1a:4a:16:01:52 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc1a:4aff:fe16:152/64 scope link
       valid_lft forever preferred_lft forever

[07:57:50]root@ovirt730-01 ~ # ip route show
default via 192.168.1.254 dev vmNet
192.168.1.0/24 dev vmNet  proto kernel  scope link  src 192.168.1.180
169.254.0.0/16 dev vmNet  scope link  metric 1008
169.254.0.0/16 dev ovirtmgmt  scope link  metric 1010
192.168.130.0/24 dev ovirtmgmt proto kernel scope link src 192.168.130.180

[07:57:53]root@ovirt730-01 ~ # ip rule show
0:    from all lookup local
32760:    from all to 192.168.130.0/24 iif ovirtmgmt lookup 3232268980
32761:    from 192.168.130.0/24 lookup 3232268980
32762:    from all to 192.168.1.0/24 iif vmNet lookup 2308294836
32763:    from 192.168.1.0/24 lookup 2308294836
32766:    from all lookup main
32767:    from all lookup default
[07:57:58]root@ovirt730-01 ~ #


On 2017-04-10 07:54 AM, Charles Tassell wrote:

Hi Everyone,

Just an update, I installed a new Ubuntu guest VM and it was doing the same thing regarding the network going down, then I disabled IPv6 and it's been fine for the past 10-15 minutes. So the issue seems to be IPv6 related, and I don't need IPv6 so I can just turn it off. The eth1 NIC disappearing is still worrisome though.


On 2017-04-10 07:13 AM, Charles Tassell wrote:
Hi Everyone,

  Thanks for the help, answers below.

On 2017-04-10 05:27 AM, Sandro Bonazzola wrote:
Adding Simone and Martin, replying inline.

On Mon, Apr 10, 2017 at 10:16 AM, Ondrej Svoboda <osvob...@redhat.com <mailto:osvob...@redhat.com>> wrote:

    Hello Charles,

    First, can you give us more information regarding the duplicated
    IPv6 addresses? Since you are going to reinstall the hosted
    engine, could you make sure that NetworkManager is disabled
    before adding the second vNIC (and perhaps even disable IPv6 and
    reboot as well, so we have a solid base and see what makes the
    difference)?

I disabled NetworkManager on the hosts (systemctl disable NetworkManager ; service NetworkManager stop) before doing the oVirt setup and rebooted to make sure that it didn't come back up. Or are you referring to on the hosted engine VM? I just removed and re-added the eth1 NIC in the hosted engine, and this is what showed up in dmesg: [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: [1af4:1000] type 00 class 0x020000
[Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: reg 0x10: [io 0x0000-0x001f]
[Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: reg 0x14: [mem 0x00000000-0x00000fff] [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: BAR 6: assigned [mem 0xc0000000-0xc003ffff pref] [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: BAR 4: assigned [mem 0xc0040000-0xc0043fff 64bit pref] [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: BAR 1: assigned [mem 0xc0044000-0xc0044fff] [Mon Apr 10 06:46:43 2017] pci 0000:00:08.0: BAR 0: assigned [io 0x1000-0x101f] [Mon Apr 10 06:46:43 2017] virtio-pci 0000:00:08.0: enabling device (0000 -> 0003)
[Mon Apr 10 06:46:43 2017] virtio-pci 0000:00:08.0: irq 35 for MSI/MSI-X
[Mon Apr 10 06:46:43 2017] virtio-pci 0000:00:08.0: irq 36 for MSI/MSI-X
[Mon Apr 10 06:46:43 2017] virtio-pci 0000:00:08.0: irq 37 for MSI/MSI-X
[Mon Apr 10 06:46:43 2017] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready [Mon Apr 10 06:46:43 2017] IPv6: eth1: IPv6 duplicate address fe80::21a:4aff:fe16:151 detected!

Then when the network dropped I started getting these:

[Mon Apr 10 06:48:00 2017] IPv6: eth1: IPv6 duplicate address 2001:410:e000:902:21a:4aff:fe16:151 detected! [Mon Apr 10 06:48:00 2017] IPv6: eth1: IPv6 duplicate address 2001:410:e000:902:21a:4aff:fe16:151 detected! [Mon Apr 10 06:49:51 2017] IPv6: eth1: IPv6 duplicate address 2001:410:e000:902:21a:4aff:fe16:151 detected! [Mon Apr 10 06:51:40 2017] IPv6: eth1: IPv6 duplicate address 2001:410:e000:902:21a:4aff:fe16:151 detected!

The network on eth1 would go down for a few seconds then come back up, but networking stays solid on eth0. I disabled NetworkManager on the HE VM as well to see if that makes a difference. I also disabled IPv6 with sysctl to see if that helps. I'll install a Ubuntu VM on the cluster later today and see if it has a similar issue.



    What kind of documentation did you follow to install the hosted
    engine? Was it this page?
    https://www.ovirt.org/documentation/how-to/hosted-engine/
    <https://www.ovirt.org/documentation/how-to/hosted-engine/> If
    so, could you file a bug against VDSM networking and attach
    /var/log/vdsm/vdsm.log and supervdsm.log, and make sure they
    include the time period from adding the second vNIC to rebooting?

    Second, even the vNIC going missing after reboot looks like a
    bug to me. Even though eth1 does not exist in the VM, can you
    see it defined for the VM in the engine web GUI?


If the HE vm configuration wasn't flushed to the OVF_STORE yet, it make sense it disappeared on restart.

The docs I used were https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/chap-deploying_self-hosted_engine#Deploying_Self-Hosted_Engine_on_RHEL which someone on the list pointed me to last week as being more up-to-date than what was on the website (the docs on the website don't seem to mention that you need to put the HE on it's own datastore and look to be more geared towards bare-metal engine rather than the VM self hosted option.)

When I went back into the GUI and looked at the hosted engine config the second NIC was listed there, but it wasn't showing up in lspci on the VM. I removed the NIC in the GUI and re-added it, and the device appeared again on the VM. What is the proper way to "save" the state of the VM so that the OVF_STORE gets updated? When I do anything on the HE VM that I want to test I just type "reboot", but that powers down the VM. I then login to my host and run "hosted-engine --vm-start" which restarts it, but of course the last time I did that it restarted without the second NIC.


    The steps you took to install the hosted engine with regards to
    networking look good to me, but I believe Sandro (CC'ed) would
    be able to give more advice.

    Sandro, since we want to configure bonding, would you recommend
    to install the engine physically first, move it to a VM,
    according to the following method, and only then reconfigure
    networking?
    
https://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/
    
<https://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/>



I don't see why a diret HE deployment couldn't be done. Simone, Martin can you help here?



    Thank you,
    Ondra

    On Mon, Apr 10, 2017 at 8:51 AM, Charles Tassell
    <ctass...@gmail.com <mailto:ctass...@gmail.com>> wrote:

        Hi Everyone,

          Okay, I'm again having problems with getting basic
        networking setup with oVirt 4.1  Here is my situation.  I
        have two servers I want to use to create an oVirt cluster,
        with two different networks.  My "public" network is a 1G
        link on device em1 connected to my Internet feed, and my
        "storage" network is a 10G link connected on device p5p1 to
        my file server.  Since I need to connect to my storage
        network in order to do the install, I selected p5p1 has the
        ovirtmgmt interface when installing the hosted engine.  That
        worked fine, I got everything installed, so I used some
        ssh-proxy magic to connect to the web console and completed
        the install (setup a Storage domain and create a new network
        vmNet for VM networking and added em1 to it.)

          The problem was that when I added a second network device
        to the HostedEngine VM (so that I can connect to it from my
        public network) it would intermittently go down.  I did some
        digging and found some IPV6 errors in the dmesg (IPv6: eth1:
        IPv6 duplicate address 2001:410:e000:902:21a:4aff:fe16:151
        detected!) so I disabled IPv6 on both eth0 and eth1 in the
        HostedEngine and rebooted it.  The problem is that when I
        restarted the VM, the eth1 device was missing.

          So, my question is: Can I add a second NIC to the
        HostedEngine VM and make it stick, or will it be deleted
whenever the engine VM is restarted?

When you change something in the HE Vm using the web ui, it has to be saved also on the OVF_STORE to make it permanent for further reboot.
Martin can you please elaborate here?


        Is there a better way to do what I'm trying to do, ie,
        should I setup ovirtmgmt on the public em1 interface, and
        then create the "storage" network after the fact for
        connecting to the datastores and such.  Is that even
        possible, or required?  I was thinking that it would be
        better for migrations and other management functions to
        happen on the faster 10G network, but if the HostedEngine
        doesn't need to be able to connect to the storage network
        maybe it's not worth the effort?

          Eventually I want to setup LACP on the storage network,
        but I had to wipe the servers and reinstall from scratch the
        last time I tried to set that up.  I was thinking that it
        was because I setup the bonding before installing oVirt, so
        I didn't do that this time.

          Here are my /etc/sysconfig/network-scripts/ifcfg-* files
        in case I did something wrong there (I'm more familiar with
        Debian/Ubuntu network setup than CentOS)

        ifcfg-eth0: (ovirtmgmt aka storage)
        ----------------
        BROADCAST=192.168.130.255
        NETMASK=255.255.255.0
        BOOTPROTO=static
        DEVICE=eth0
        IPADDR=192.168.130.179
        ONBOOT=yes
        DOMAIN=public.net <http://public.net>
        ZONE=public
        IPV6INIT=no


        ifcfg-eth1: (vmNet aka Internet)
        ----------------
        BROADCAST=192.168.1.255
        NETMASK=255.255.255.0
        BOOTPROTO=static
        DEVICE=eth1
        IPADDR=192.168.1.179
        GATEWAY=192.168.1.254
        ONBOOT=yes
        DNS1=192.168.1.1
        DNS2=192.168.1.2
        DOMAIN=public.net <http://public.net>
        ZONE=public
        IPV6INIT=no

        _______________________________________________
        Users mailing list
        Users@ovirt.org <mailto:Users@ovirt.org>
        http://lists.ovirt.org/mailman/listinfo/users
        <http://lists.ovirt.org/mailman/listinfo/users>





--

SANDRO BONAZZOLA

ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D

Red Hat EMEA <https://www.redhat.com/>

<https://red.ht/sig>      
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>




_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to