I also collected iotop output from the same run:
http://paste.ubuntu.com/26502363/

The storage setup on these nodes is writethrough bcache with a 400 GB
nvme in front of a 1TB spinning disk.  Since it's writethrough, writes
have to make it to the spinning disk before being counted as sync'd.

The write numbers look high for random i/o on a spinning disk.  It seems
possible that the slow MAAS performance is due to postgresql waiting for
writes to disk to complete, and MAAS threads blocking on that, so that
servicing DB reads is blocked on the commits completing first.

The VMs running on the machine are using this same bcache setup for
their storage pool.  It looks like most of the disk write traffic is
coming from the VMs.

Based on this data we'll make two changes to our setup which I think should 
help alleviate this problem:
- move the VMs storage hosting to separate disk.
- change the storage setup to use writeback bcache.

** Attachment added: "iotop.txt.gz"
   
https://bugs.launchpad.net/maas/+bug/1743249/+attachment/5047065/+files/iotop.txt.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1743249

Title:
  Failed Deployment after timeout trying to retrieve grub cfg

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to