Hello folks,

We observed that Arm VM instance (Centos 8) runs into timeout at boot time when 
it is launched by Qemu/KVM on an Arm server host that enabled hugepages 
(2MB/1GB).
We also noticed that Will has a workaround that applies to the guest kernel 
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e1645557d19fc6d88d3c40431f63a3c3a4c417b)
But, it simply increased the timeout time from 1s to 5s and didn't accepted by 
the upstream so far for some reasons. Moreover, even it is accepted by the 
upstream, it should take a long time for the distributions like CentOS to 
backport it.

We are thinking that if we have a chance to fix this problem at host side. 
Because most cloud providers need less time than distributions to upgrade their 
software, then eventually, no need to upgrade guest OS.
We have investigated the root cause for a few weeks, and barely understand this 
problem may related to page locking contention when guest kernel allocates 
per-cpu memory for each CPU at boot time.

However,  we are not so sure about the root cause and hopping you could share 
your expertise on this.  Really appreciate if you could share your insights!


Reproduce step:
=============
- Enable 1GB hugepage on the host kernel side. 
- Start vm using following command (qemu must support passing cpu topology 
through acpi&dt):
# qemu-system-aarch64 -name guest=12345,debug-threads=on \
-machine virt,accel=kvm,usb=off,dump-guest-core=off,gic-version=3 \
-cpu host \
-m 122880 \
-object iothread,id=iothread1 \
-object 
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=64424509440,host-nodes=0,policy=preferred
 \
-numa node,nodeid=0,cpus=0-29,memdev=ram-node0 \
-object 
memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=64424509440,host-nodes=1,policy=preferred
 \
-numa node,nodeid=1,cpus=30-59,memdev=ram-node1 \
-smp 60,sockets=2,cores=30,threads=1 \
-bios /usr/share/AAVMF/AAVMF_CODE.fd \
-drive file=centos8.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none 
\
-device 
virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-net none -serial telnet::9001,server,nowait -monitor stdio

Guest kernel boot timeout log sample:
===============================

[ 0.017636] smp: Bringing up secondary CPUs ...
[ 1.039073] CPU1: failed to come online
[ 1.039486] CPU1: failed in unknown state : 0x0
[ 1.052547] Detected VIPT I-cache on CPU2
[ 1.052563] GICv3: CPU2: found redistributor 2 region 0:0x00000000080e0000
[ 1.052708] GICv3: CPU2: using allocated LPI pending table @0x0000000f006a0000
[ 1.052729] CPU2: Booted secondary processor 0x0000000001 [0x481fd010]
[ 2.079846] CPU3: failed to come online
[ 2.082698] CPU3: failed in unknown state : 0x0
[ 2.086925] Detected VIPT I-cache on CPU4
[ 2.086947] GICv3: CPU4: found redistributor 4 region 0:0x0000000008120000
[ 2.087232] GICv3: CPU4: using allocated LPI pending table @0x0000000f006c0000
[ 2.087259] CPU4: Booted secondary processor 0x0000000002 [0x481fd010]
[ 3.120634] CPU5: failed to come online
[ 3.121691] Detected VIPT I-cache on CPU5
[ 3.121754] CPU5: failed in unknown state : 0x0
[ 3.121774] GICv3: CPU5: found redistributor 5 region 0:0x0000000008140000


 Linux perf output (host side) while guest kernel booting:
==============================================

-   19.75%  worker           [kernel.kallsyms]        [k] 
queued_spin_lock_slowpath                                                       
                                                  
   - queued_spin_lock_slowpath                                                  
                                                                                
                            
      - 12.98% kvm_mmu_notifier_invalidate_range_end                            
                                                                                
                            
           __mmu_notifier_invalidate_range_end                                  
                                                                                
                            
           zap_page_range                                                       
                                                                                
                            
           __arm64_sys_madvise                                                  
                                                                                
                            
           el0_svc_common.constprop.3                                           
                                                                                
                            
           el0_svc_handler                                                      
                                                                                
                            
           el0_svc                                                              
                                                                                
                            
           __madvise                                                            
                                                                                
                            
           tcmalloc::PageHeap::DecommitSpan                                     
                                                                                
                            
           tcmalloc::PageHeap::MergeIntoFreeList                                
                                                                                
                            
           tcmalloc::PageHeap::Delete                                           
                                                                                
                            
           tcmalloc::CentralFreeList::ReleaseToSpans                            
                                                                                
                            
           tcmalloc::CentralFreeList::ReleaseListToSpans                        
                                                                                
                            
           tcmalloc::CentralFreeList::InsertRange                               
                                                                                
                            
           tcmalloc::ThreadCache::ReleaseToCentralCache                         
                                                                                
                            
           tcmalloc::ThreadCache::Cleanup                                       
                                                                                
                            
           tcmalloc::ThreadCache::DeleteCache                                   
                                                                                
                            
           __nptl_deallocate_tsd                                                
                                                                                
                            
           start_thread                                                         
                                                                                
                            
           thread_start                                                         
                                                                                
                            
      - 6.77% kvm_mmu_notifier_invalidate_range_start                           
                                                                                
                            
           __mmu_notifier_invalidate_range_start                                
                                                                                
                            
           zap_page_range                                                       
                                                                                
                            
           __arm64_sys_madvise                                                  
                                                                                
                            
           el0_svc_common.constprop.3                                           
                                                                                
                            
           el0_svc_handler                                                      
                                                                                
                            
           el0_svc                                                              
                                                                                
                            
           __madvise                                                            
                                                                                
                            
           tcmalloc::PageHeap::DecommitSpan                                     
                                                                                
                            
           tcmalloc::PageHeap::MergeIntoFreeList                                
                                                                                
                            
           tcmalloc::PageHeap::Delete                                           
                                                                                
                            
           tcmalloc::CentralFreeList::ReleaseToSpans                            
                                                                                
                            
           tcmalloc::CentralFreeList::ReleaseListToSpans                        
                                                                                
                            
           tcmalloc::CentralFreeList::InsertRange                               
                                                                                
                            
           tcmalloc::ThreadCache::ReleaseToCentralCache                         
                                                                                
                            
           tcmalloc::ThreadCache::Cleanup                                       
                                                                                
                            
           tcmalloc::ThreadCache::DeleteCache                                   
                                                                                
                            
           __nptl_deallocate_tsd                                                
                                                                                
                            
           start_thread                                                         
                                                                                
                            
           thread_start
 

Thanks,
Chengdong

Reply via email to