On 20.04.22 22:44, Elliot Berman wrote:
From: Prakruthi Deepak Heragu <quic_pher...@quicinc.com>

During hotplug, the stolen time data structure is unmapped and memset.
There is a possibility of the timer IRQ being triggered before memset
and stolen time is getting updated as part of this timer IRQ handler. This
causes the below crash in timer handler -

   [ 3457.473139][    C5] Unable to handle kernel paging request at virtual 
address ffffffc03df05148
   ...
   [ 3458.154398][    C5] Call trace:
   [ 3458.157648][    C5]  para_steal_clock+0x30/0x50
   [ 3458.162319][    C5]  irqtime_account_process_tick+0x30/0x194
   [ 3458.168148][    C5]  account_process_tick+0x3c/0x280
   [ 3458.173274][    C5]  update_process_times+0x5c/0xf4
   [ 3458.178311][    C5]  tick_sched_timer+0x180/0x384
   [ 3458.183164][    C5]  __run_hrtimer+0x160/0x57c
   [ 3458.187744][    C5]  hrtimer_interrupt+0x258/0x684
   [ 3458.192698][    C5]  arch_timer_handler_virt+0x5c/0xa0
   [ 3458.198002][    C5]  handle_percpu_devid_irq+0xdc/0x414
   [ 3458.203385][    C5]  handle_domain_irq+0xa8/0x168
   [ 3458.208241][    C5]  gic_handle_irq.34493+0x54/0x244
   [ 3458.213359][    C5]  call_on_irq_stack+0x40/0x70
   [ 3458.218125][    C5]  do_interrupt_handler+0x60/0x9c
   [ 3458.223156][    C5]  el1_interrupt+0x34/0x64
   [ 3458.227560][    C5]  el1h_64_irq_handler+0x1c/0x2c
   [ 3458.232503][    C5]  el1h_64_irq+0x7c/0x80
   [ 3458.236736][    C5]  free_vmap_area_noflush+0x108/0x39c
   [ 3458.242126][    C5]  remove_vm_area+0xbc/0x118
   [ 3458.246714][    C5]  vm_remove_mappings+0x48/0x2a4
   [ 3458.251656][    C5]  __vunmap+0x154/0x278
   [ 3458.255796][    C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
   [ 3458.261542][    C5]  cpuhp_invoke_callback+0x248/0xc34
   [ 3458.266842][    C5]  cpuhp_thread_fun+0x1c4/0x248
   [ 3458.271696][    C5]  smpboot_thread_fn+0x1b0/0x400
   [ 3458.276638][    C5]  kthread+0x17c/0x1e0
   [ 3458.280691][    C5]  ret_from_fork+0x10/0x20

As a fix, disable the IRQs during hotplug until we unmap and memset the
stolen time structure.

This will work for the call chain of your observed crash, but are
you sure that para_steal_clock() can't be called from another cpu
concurrently?

In case you verified this can't happen, you can add my:

Reviewed-by: Juergen Gross <jgr...@suse.com>

Otherwise you either need to use RCU for doing the memunmap(), or a
lock to protect stolen_time_region.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to