I also collected iotop output from the same run: http://paste.ubuntu.com/26502363/
The storage setup on these nodes is writethrough bcache with a 400 GB nvme in front of a 1TB spinning disk. Since it's writethrough, writes have to make it to the spinning disk before being counted as sync'd. The write numbers look high for random i/o on a spinning disk. It seems possible that the slow MAAS performance is due to postgresql waiting for writes to disk to complete, and MAAS threads blocking on that, so that servicing DB reads is blocked on the commits completing first. The VMs running on the machine are using this same bcache setup for their storage pool. It looks like most of the disk write traffic is coming from the VMs. Based on this data we'll make two changes to our setup which I think should help alleviate this problem: - move the VMs storage hosting to separate disk. - change the storage setup to use writeback bcache. ** Attachment added: "iotop.txt.gz" https://bugs.launchpad.net/maas/+bug/1743249/+attachment/5047065/+files/iotop.txt.gz -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1743249 Title: Failed Deployment after timeout trying to retrieve grub cfg To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs