Default Comment by Bridge

** Attachment added: "New dmesg output captured Aug 23"
   
https://bugs.launchpad.net/bugs/1717224/+attachment/5006612/+files/dmesg_082317

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to procps in Ubuntu.
https://bugs.launchpad.net/bugs/1717224

Title:
  virsh start of virtual guest domain fails with internal error due to
  low default aio-max-nr sysctl value

Status in Ubuntu on IBM z Systems:
  In Progress
Status in kvm package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  In Progress
Status in procps package in Ubuntu:
  New
Status in kvm source package in Xenial:
  New
Status in linux source package in Xenial:
  In Progress
Status in procps source package in Xenial:
  New
Status in kvm source package in Zesty:
  New
Status in linux source package in Zesty:
  In Progress
Status in procps source package in Zesty:
  New
Status in kvm source package in Artful:
  Confirmed
Status in linux source package in Artful:
  In Progress
Status in procps source package in Artful:
  New

Bug description:
  Starting virtual guests via on Ubuntu 16.04.2 LTS installed with its
  KVM hypervisor on an IBM Z14 system LPAR fails on the 18th guest with
  the following error:

  root@zm93k8:/rawimages/ubu1604qcow2# virsh start zs93kag70038
  error: Failed to start domain zs93kag70038
  error: internal error: process exited while connecting to monitor: 
2017-07-26T01:48:26.352534Z qemu-kvm: -drive 
file=/guestimages/data1/zs93kag70038.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native:
 Could not open backing file: Could not set AIO state: Inappropriate ioctl for 
device

  The previous 17 guests started fine:

  root@zm93k8# virsh start zs93kag70020
  Domain zs93kag70020 started

  root@zm93k8# virsh start zs93kag70021
  Domain zs93kag70021 started

  .
  .

  root@zm93k8:/rawimages/ubu1604qcow2# virsh start zs93kag70036
  Domain zs93kag70036 started

  
  We ended up fixing the issue by adding the following line to /etc/sysctl.conf 
: 

  fs.aio-max-nr = 4194304

  ... then, reload the sysctl config file:

  root@zm93k8:/etc# sysctl -p /etc/sysctl.conf
  fs.aio-max-nr = 4194304

  
  Now, we're able to start more guests...

  root@zm93k8:/etc# virsh start zs93kag70036
  Domain zs93kag70036 started

  
  The default value was originally set to 65535: 

  root@zm93k8:/rawimages/ubu1604qcow2# cat /proc/sys/fs/aio-max-nr
  65536

  
  Note, we chose the 4194304 value, because this is what our KVM on System Z 
hypervisor ships as its default value.  Eg.  on our zKVM system: 

  [root@zs93ka ~]# cat /proc/sys/fs/aio-max-nr
  4194304

  ubuntu@zm93k8:/etc$ lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:    Ubuntu 16.04.2 LTS
  Release:        16.04
  Codename:       xenial
  ubuntu@zm93k8:/etc$

  ubuntu@zm93k8:/etc$ dpkg -s qemu-kvm |grep Version
  Version: 1:2.5+dfsg-5ubuntu10.8

  Is something already documented for Ubuntu KVM users warning them about the 
low default value, and some guidance as to
  how to select an appropriate value?   Also, would you consider increasing the 
default aio-max-nr value to something much
  higher, to accommodate significantly more virtual guests?  

  Thanks!

  ---uname output---
  ubuntu@zm93k8:/etc$ uname -a Linux zm93k8 4.4.0-62-generic #83-Ubuntu SMP Wed 
Jan 18 14:12:54 UTC 2017 s390x s390x s390x GNU/Linux
   
  Machine Type = z14 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   See Problem Description.

  The problem was happening a week ago, so this may not reflect that
  activity.

  This file was collected on Aug 7, one week after we were hitting the
  problem.  If I need to reproduce the problem and get fresh data,
  please let me know.

  /var/log/messages doesn't exist on this system, so I provided syslog
  output instead.

  All data have been collected too late after the problem was observed
  over a week ago.  If you need me to reproduce the problem and get new
  data, please let me know.  That's not a problem.

  Also, we would have to make special arrangements for login access to
  these systems.  I'm happy to run traces and data collection for you as
  needed.  If that's not sufficient, then we'll explore log in access
  for you.

  Thanks...   - Scott G.

  
  I was able to successfully recreate the problem and captured / attached new 
debug docs. 

  Recreate procedure:

  #  Started out with no virtual guests running.

  ubuntu@zm93k8:/home/scottg$ virsh list
   Id    Name                           State
  ----------------------------------------------------

  
  # Set fs.aio-max-nr back to original Ubuntu "out of the box" value in 
/etc/sysctl.conf

  ubuntu@zm93k8:~$ tail -1 /etc/sysctl.conf
  fs.aio-max-nr = 65536

  
  ## sysctl -a shows: 

  fs.aio-max-nr = 4194304

  
  ##  Reload sysctl.

  ubuntu@zm93k8:~$ sudo sysctl -p /etc/sysctl.conf
  fs.aio-max-nr = 65536
  ubuntu@zm93k8:~$

  ubuntu@zm93k8:~$ sudo sysctl -a |grep fs.aio-max-nr
  fs.aio-max-nr = 65536

  ubuntu@zm93k8:~$  cat /proc/sys/fs/aio-max-nr
  65536


  # Attempt to start more than 17 qcow2 virtual guests on the Ubuntu
  host.  Fails on the 18th XML.

  Script used to start guests..

  
  ubuntu@zm93k8:/home/scottg$ date;./start_privs.sh
  Wed Aug 23 13:21:25 EDT 2017
  virsh start zs93kag70015
  Domain zs93kag70015 started

  Started zs93kag70015 succesfully ...

  virsh start zs93kag70020
  Domain zs93kag70020 started

  Started zs93kag70020 succesfully ...

  virsh start zs93kag70021
  Domain zs93kag70021 started

  Started zs93kag70021 succesfully ...

  virsh start zs93kag70022
  Domain zs93kag70022 started

  Started zs93kag70022 succesfully ...

  virsh start zs93kag70023
  Domain zs93kag70023 started

  Started zs93kag70023 succesfully ...

  virsh start zs93kag70024
  Domain zs93kag70024 started

  Started zs93kag70024 succesfully ...

  virsh start zs93kag70025
  Domain zs93kag70025 started

  Started zs93kag70025 succesfully ...

  virsh start zs93kag70026
  Domain zs93kag70026 started

  Started zs93kag70026 succesfully ...

  virsh start zs93kag70027
  Domain zs93kag70027 started

  Started zs93kag70027 succesfully ...

  virsh start zs93kag70028
  Domain zs93kag70028 started

  Started zs93kag70028 succesfully ...

  virsh start zs93kag70029
  Domain zs93kag70029 started

  Started zs93kag70029 succesfully ...

  virsh start zs93kag70030
  Domain zs93kag70030 started

  Started zs93kag70030 succesfully ...

  virsh start zs93kag70031
  Domain zs93kag70031 started

  Started zs93kag70031 succesfully ...

  virsh start zs93kag70032
  Domain zs93kag70032 started

  Started zs93kag70032 succesfully ...

  virsh start zs93kag70033
  Domain zs93kag70033 started

  Started zs93kag70033 succesfully ...

  virsh start zs93kag70034
  Domain zs93kag70034 started

  Started zs93kag70034 succesfully ...

  virsh start zs93kag70035
  Domain zs93kag70035 started

  Started zs93kag70035 succesfully ...

  virsh start zs93kag70036
  error: Failed to start domain zs93kag70036
  error: internal error: process exited while connecting to monitor: 
2017-08-23T17:21:47.131809Z qemu-kvm: -drive 
file=/guestimages/data1/zs93kag70036.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native:
 Could not open backing file: Could not set AIO state: Inappropriate ioctl for 
device

  Exiting script ... start zs93kag70036 failed
  ubuntu@zm93k8:/home/scottg$

  
  # Show that there are only 17 running guests. 

  ubuntu@zm93k8:/home/scottg$ virsh list |grep run |wc -l
  17

  ubuntu@zm93k8:/home/scottg$ virsh list
   Id    Name                           State
  ----------------------------------------------------
   25    zs93kag70015                   running
   26    zs93kag70020                   running
   27    zs93kag70021                   running
   28    zs93kag70022                   running
   29    zs93kag70023                   running
   30    zs93kag70024                   running
   31    zs93kag70025                   running
   32    zs93kag70026                   running
   33    zs93kag70027                   running
   34    zs93kag70028                   running
   35    zs93kag70029                   running
   36    zs93kag70030                   running
   37    zs93kag70031                   running
   38    zs93kag70032                   running
   39    zs93kag70033                   running
   40    zs93kag70034                   running
   41    zs93kag70035                   running


  # For fun, try starting zs93kag70036  again manually.

  ubuntu@zm93k8:/home/scottg$ date;virsh start zs93kag70036
  Wed Aug 23 13:27:28 EDT 2017
  error: Failed to start domain zs93kag70036
  error: internal error: process exited while connecting to monitor: 
2017-08-23T17:27:30.031782Z qemu-kvm: -drive 
file=/guestimages/data1/zs93kag70036.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native:
 Could not open backing file: Could not set AIO state: Inappropriate ioctl for 
device


  # Show the XML (they're all basically the same)...

  ubuntu@zm93k8:/home/scottg$ cat zs93kag70036.xml
  <domain type='kvm'>
    <name>zs93kag70036</name>
    <memory unit='MiB'>4096</memory>
    <currentMemory unit='MiB'>2048</currentMemory>
    <vcpu placement='static'>2</vcpu>
    <os>
      <type arch='s390x' machine='s390-ccw-virtio'>hvm</type>
    </os>
    <clock offset='utc'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>preserve</on_crash>
    <devices>
      <emulator>/usr/bin/qemu-kvm</emulator>
      <disk type='file' device='disk'>
        <driver name ='qemu' type='qcow2' cache='none' io='native'/>
        <source file='/guestimages/data1/zs93kag70036.qcow2'/>
        <target dev='vda' bus='virtio'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
        <boot order='1'/>
      </disk>
      <interface type='network'>
        <source network='privnet1'/>
        <model type='virtio'/>
        <mac address='52:54:00:70:d0:36'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
      </interface>
  <!--
      <disk type='block' device='disk'>
        <driver name ='qemu' type='raw' cache='none'/>
        <source 
dev='/dev/disk/by-id/dm-uuid-mpath-36005076802810e5540000000000006e4'/>
        <target dev='vde' bus='virtio'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0005'/>
        <readonly/>
      </disk>
  -->
      <disk type='file' device='disk'>
        <driver name ='qemu' type='raw' cache='none' io='native'/>
        <source file='/guestimages/data1/zs93kag70036.prm'/>
        <target dev='vdf' bus='virtio'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0006'/>
      </disk>
      <disk type='file' device='cdrom'>
        <driver name='qemu' type='raw'/>
        <source file='/guestimages/data1/zs93kag70036.iso'/>
        <target dev='sda' bus='scsi'/>
        <readonly/>
        <address type='drive' controller='0' bus='0' target='0' unit='0'/>
      </disk>
      <controller type='usb' index='0' model='none'/>
      <memballoon model='none'/>
      <console type='pty'>
        <target type='sclp' port='0'/>
      </console>
    </devices>
  </domain>

  
  This condition is very easy to replicate.  However,  we may be losing this 
system in the next day or two, so please let me know ASAP if you need any more 
data.  Thank you...  

  - Scott G.

  == Comment: #11 - Viktor Mihajlovski <mihaj...@de.ibm.com> - 2017-09-14 
  In order to support many KVM guests it is advisable to raise the aio-max-nr 
as suggested in the problem description, see also 
http://kvmonz.blogspot.co.uk/p/blog-page_7.html. I would also suggest that the 
system default setting is increased.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1717224/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to