Hi Alex,

I notice a patch you pushed in https://lkml.org/lkml/2019/2/18/1315
You said the previous commit you pushed may prone to deadlock, could you please 
share the details about how to reproduce the deadlock scene if you know it.
I met a similar question that all lspci command went into D state and libvirtd 
went into Z state when destroy a VM with a GPU passthrou. The stack like that:

2019-03-20T13:37:14.726514+07:00|err|kernel[-]|[2427373.553663] INFO: task 
ps:112058 blocked for more than 120 seconds.
2019-03-20T13:37:14.726576+07:00|err|kernel[-]|[2427373.553667] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-03-20T13:37:14.726599+07:00|info|kernel[-]|[2427373.553669] ps             
 D 0000000000000000     0 112058      1 0x00000004
2019-03-20T13:37:14.726620+07:00|warning|kernel[-]|[2427373.553673] Call Trace:
2019-03-20T13:37:14.726640+07:00|warning|kernel[-]|[2427373.553682]  
[<ffffffff816b7069>] schedule_preempt_disabled+0x29/0x70
2019-03-20T13:37:14.726668+07:00|warning|kernel[-]|[2427373.553684]  
[<ffffffff816b4a21>] __mutex_lock_slowpath+0xe1/0x170
2019-03-20T13:37:14.726689+07:00|warning|kernel[-]|[2427373.553689]  
[<ffffffff816b400f>] mutex_lock+0x1f/0x2f
2019-03-20T13:37:14.726707+07:00|warning|kernel[-]|[2427373.553695]  
[<ffffffff81379337>] pci_bus_save_and_disable+0x37/0x70
2019-03-20T13:37:14.726725+07:00|warning|kernel[-]|[2427373.553697]  
[<ffffffff8137aeb8>] pci_try_reset_bus+0x38/0x80
2019-03-20T13:37:14.726743+07:00|warning|kernel[-]|[2427373.553730]  
[<ffffffffa0261045>] vfio_pci_release+0x3d5/0x430 [vfio_pci]
2019-03-20T13:37:14.726761+07:00|warning|kernel[-]|[2427373.553737]  
[<ffffffffa0260640>] ? vfio_pci_rw+0xc0/0xc0 [vfio_pci]
2019-03-20T13:37:14.726779+07:00|warning|kernel[-]|[2427373.553745]  
[<ffffffffa02529f2>] vfio_device_fops_release+0x22/0x40 [vfio]
2019-03-20T13:37:14.726798+07:00|warning|kernel[-]|[2427373.553751]  
[<ffffffff812179dc>] __fput+0xec/0x260
2019-03-20T13:37:14.726821+07:00|warning|kernel[-]|[2427373.553754]  
[<ffffffff81217c8e>] ____fput+0xe/0x10
2019-03-20T13:37:14.726840+07:00|warning|kernel[-]|[2427373.553758]  
[<ffffffff810b684a>] task_work_run+0xaa/0xe0
2019-03-20T13:37:14.726858+07:00|warning|kernel[-]|[2427373.553763]  
[<ffffffff8102ac12>] do_notify_resume+0x92/0xb0
2019-03-20T13:37:14.726876+07:00|warning|kernel[-]|[2427373.553767]  
[<ffffffff816c264f>] int_signal+0x12/0x17
2019-03-20T13:37:14.726892+07:00|err|kernel[-]|[2427373.553771] INFO: task 
lspci:139540 blocked for more than 120 seconds.
2019-03-20T13:37:14.726910+07:00|err|kernel[-]|[2427373.553772] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-03-20T13:37:14.726929+07:00|info|kernel[-]|[2427373.553773] lspci          
 D 0000000000000000     0 139540 139539 0x00000000
2019-03-20T13:37:14.726948+07:00|warning|kernel[-]|[2427373.553776] Call Trace:
2019-03-20T13:37:14.726970+07:00|warning|kernel[-]|[2427373.553778]  
[<ffffffff816b5f79>] schedule+0x29/0x70
2019-03-20T13:37:14.726989+07:00|warning|kernel[-]|[2427373.553782]  
[<ffffffff81370ca0>] pci_wait_cfg+0xa0/0x110
2019-03-20T13:37:14.727006+07:00|warning|kernel[-]|[2427373.553787]  
[<ffffffff810cfe40>] ? wake_up_state+0x20/0x20
2019-03-20T13:37:14.727023+07:00|warning|kernel[-]|[2427373.553790]  
[<ffffffff81370e15>] pci_user_read_config_dword+0x105/0x110
2019-03-20T13:37:14.727043+07:00|warning|kernel[-]|[2427373.553794]  
[<ffffffff8137e974>] pci_read_config+0x114/0x2c0
2019-03-20T13:37:14.727063+07:00|warning|kernel[-]|[2427373.553799]  
[<ffffffff811f4835>] ? __kmalloc+0x55/0x240
2019-03-20T13:37:14.727084+07:00|warning|kernel[-]|[2427373.553804]  
[<ffffffff812992fe>] read+0xde/0x1f0
2019-03-20T13:37:14.727103+07:00|warning|kernel[-]|[2427373.553807]  
[<ffffffff81215a5f>] vfs_read+0x9f/0x170
2019-03-20T13:37:14.727123+07:00|warning|kernel[-]|[2427373.553809]  
[<ffffffff81216812>] SyS_pread64+0x92/0xc0
2019-03-20T13:37:14.727141+07:00|warning|kernel[-]|[2427373.553812]  
[<ffffffff816c22ef>] system_call_fastpath+0x1c/0x21

It seems that lspci and vfio_pci_release are in deadlock.

Thanks,
Zongyong Wu

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to