Christian,

Let us move this discussion to DGX2 tracker ticket since we don't want
to share any DGX2/Nvidia specific details in the generic ticket.

https://bugs.launchpad.net/nvidia-dgx-2/+bug/1818116

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1817998

Title:
  KVM Guest - DHCP lease lost (Ubuntu 18.04)

Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  On Nvidia DGX2 system, we configured linux bridge (br0) using host
  physical NIC interface and it is using static IP (see below netplan
  file). BTW, we are using 18.04.2 based BaseOS and Guest images.

  -     All KVM guests are being launched using virtual network interface based 
on br0. All VMs are getting DHCP based IP address and network interface works 
fine for few hours (may be upto 24hours).  
  -     After that we are noticing these VMs are losing IP address and noticed 
the message in VM’s syslog
  "Feb 26 17:16:41 test-1g0 systemd-networkd[3479]: enp6s0: DHCP lease lost".
  -     At this point, we tried to create new VMs using br0 and none of them 
are getting any IP address.
  -     Then, we checked KVM host, and status of bridge but we didn’t see any 
error. Tried to unconfigure br0 by removing bridge configuration from host 
netplan and did “sudo netplan apply” but br0 is still there. It seems like 
bridge has in weird state and cannot unload this driver. 

  Guest
  lab@dgx-server-vm:~$ ssh nvidia@192.168.123.138
  The authenticity of host '192.168.123.138 (192.168.123.138)' can't be 
established.
  ECDSA key fingerprint is SHA256:k8XpnGH7yle76z46CX16pflYVeYcKoG6kWCymIkv0kk.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added '192.168.123.138' (ECDSA) to the list of known 
hosts.
  nvidia@192.168.123.138's password: 
   _   _       _     _ _         _            _        _        ___  
  | \ | |_   _(_) __| (_) __ _  | |_ ___  ___| |_     / | __ _ / _ \ 
  |  \| \ \ / / |/ _` | |/ _` | | __/ _ \/ __| __|____| |/ _` | | | |
  | |\  |\ V /| | (_| | | (_| | | ||  __/\__ \ ||_____| | (_| | |_| |
  |_| \_| \_/ |_|\__,_|_|\__,_|  \__\___||___/\__|    |_|\__, |\___/ 
                                                         |___/       

  Welcome to Ubuntu 18.04.2 LTS (4.15.0-45-generic)

  Welcome to NVIDIA DGX KVM VM Server Version 4.0.5 (GNU/Linux
  4.15.0-45-generic x86_64)

   * Documentation:  https://help.ubuntu.com
   * Management:     https://landscape.canonical.com
   * Support:        https://ubuntu.com/advantage
  System information as of: Wed Feb 27 12:20:21 PST 2019

  System load:  0.00                    IP Address:     
  Memory usage: 0.0% (59.36G avail)     System uptime:  21:04 hours
  Usage on /:   8% (44G free)           Swap usage:     0.0%
  Local Users:  1                       Processes:      158

  
    System information as of Wed Feb 27 12:20:22 PST 2019

    System load:  0.0               Processes:              155
    Usage of /:   6.7% of 48.96GB   Users logged in:        1
    Memory usage: 0%                IP address for enp1s0:  192.168.123.138
    Swap usage:   0%                IP address for docker0: 172.17.0.1

   * Canonical Livepatch is available for installation.
     - Reduce system reboots and improve kernel security. Activate at:
       https://ubuntu.com/livepatch

  15 packages can be updated.
  9 updates are security updates.

  Last login: Wed Feb 27 12:05:09 2019
  nvidia@test-1g0:~$ ifconfig
  docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
          inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
          ether 02:42:5c:b9:6f:94  txqueuelen 0  (Ethernet)
          RX packets 0  bytes 0 (0.0 B)
          RX errors 0  dropped 0  overruns 0  frame 0
          TX packets 0  bytes 0 (0.0 B)
          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

  enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
          inet 192.168.123.138  netmask 255.255.255.0  broadcast 192.168.123.255
          inet6 fe80::5054:ff:feb9:b8a1  prefixlen 64  scopeid 0x20<link>
          ether 52:54:00:b9:b8:a1  txqueuelen 1000  (Ethernet)
          RX packets 38879  bytes 2449778 (2.4 MB)
          RX errors 0  dropped 1  overruns 0  frame 0
          TX packets 977  bytes 132770 (132.7 KB)
          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

  enp6s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
          inet6 fe80::5055:ff:fe78:faa9  prefixlen 64  scopeid 0x20<link>
          ether 52:55:00:78:fa:a9  txqueuelen 1000  (Ethernet)
          RX packets 93842  bytes 7637062 (7.6 MB)
          RX errors 0  dropped 27  overruns 0  frame 0
          TX packets 1874  bytes 442869 (442.8 KB)
          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

  lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
          inet 127.0.0.1  netmask 255.0.0.0
          inet6 ::1  prefixlen 128  scopeid 0x10<host>
          loop  txqueuelen 1000  (Local Loopback)
          RX packets 562  bytes 52271 (52.2 KB)
          RX errors 0  dropped 0  overruns 0  frame 0
          TX packets 562  bytes 52271 (52.2 KB)
          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

  nvidia@test-1g0:~$ uptime
   12:20:35 up 21:04,  2 users,  load average: 0.00, 0.00, 0.00
  nvidia@test-1g0:~$ date
  Wed Feb 27 12:20:44 PST 2019
  nvidia@test-1g0:~$ route
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 
docker0
  192.168.123.0   0.0.0.0         255.255.255.0   U     0      0        0 enp1s0
  nvidia@test-1g0:~$ dmesg | grep -i DHCP
  nvidia@test-1g0:~$ cat /var/log/syslog | grep -i dhcp
  Feb 26 15:15:21 test-1g0 systemd-networkd[569]: enp1s0: DHCPv4 address 
192.168.123.138/24 via 192.168.123.1
  Feb 26 15:16:20 test-1g0 systemd-networkd[538]: enp1s0: DHCPv4 address 
192.168.123.138/24 via 192.168.123.1
  Feb 26 15:16:20 test-1g0 systemd-networkd[538]: enp6s0: DHCPv4 address 
172.18.232.32/25 via 172.18.232.1
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: enp1s0: DHCPv4 address 
192.168.123.138/24 via 192.168.123.1
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: enp6s0: DHCPv4 address 
172.18.232.32/25 via 172.18.232.1
  Feb 26 17:16:41 test-1g0 systemd-networkd[3479]: enp6s0: DHCP lease lost

  nvidia@test-1g0:~$ sudo networkctl status enp6s0
  [sudo] password for nvidia: 
  ● 3: enp6s0
         Link File: /lib/systemd/network/99-default.link
      Network File: /run/systemd/network/10-netplan-virtionetworks.network
              Type: ether
             State: degraded (configured)
              Path: pci-0000:06:00.0
            Driver: virtio_net
            Vendor: Red Hat, Inc.
             Model: Virtio network device
        HW Address: 52:55:00:78:fa:a9
           Address: fe80::5055:ff:fe78:faa9

  nvidia@test-1g0:~$ systemctl status  systemd-networkd.service 
  ● systemd-networkd.service - Network Service
     Loaded: loaded (/lib/systemd/system/systemd-networkd.service; 
enabled-runtime; vendor preset: enabled)
     Active: active (running) since Tue 2019-02-26 15:16:42 PST; 21h ago
       Docs: man:systemd-networkd.service(8)
   Main PID: 3479 (systemd-network)
     Status: "Processing requests..."
      Tasks: 1 (limit: 4915)
     CGroup: /system.slice/systemd-networkd.service
             └─3479 /lib/systemd/systemd-networkd

  Feb 26 15:16:42 test-1g0 systemd[1]: Started Network Service.
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: lo: Link is not managed by us
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: enp1s0: Link is not managed 
by us
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: docker0: Link is not managed 
by us
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: lo: Link is not managed by us
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: docker0: Link is not managed 
by us
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: enp1s0: DHCPv4 address 
192.168.123.138/24 via 192.168.123.1
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: enp6s0: DHCPv4 address 
172.18.232.32/25 via 172.18.232.1
  Feb 26 15:16:42 test-1g0 systemd-networkd[3479]: enp6s0: Configured
  Feb 26 17:16:41 test-1g0 systemd-networkd[3479]: enp6s0: DHCP lease lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1817998/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to