[PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug

2020-06-20 Thread Andrey Grodzovsky
entity->rq becomes null aftre device unplugged so just return early in that case. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.

[PATCH v2 0/8] RFC Support hot device unplug in amdgpu

2020-06-20 Thread Andrey Grodzovsky
This RFC is more of a proof of concept then a fully working solution as there are a few unresolved issues we are hoping to get advise on from people on the mailing list. Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs

[PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space

2020-06-20 Thread Andrey Grodzovsky
Helper function to be used to invalidate all BOs CPU mappings once device is removed. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_bo.c| 8 ++-- include/drm/ttm/ttm_bo_driver.h | 7 +++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/tt

[PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late

2020-06-20 Thread Andrey Grodzovsky
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the las

[PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.

2020-06-20 Thread Andrey Grodzovsky
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/a

[PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-06-20 Thread Andrey Grodzovsky
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 +++ drivers/

[PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.

2020-06-20 Thread Andrey Grodzovsky
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 - 1 file changed, 57 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/d

[PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.

2020-06-20 Thread Andrey Grodzovsky
No point to try recovery if device is gone, just messes up things. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 2 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v2 1/8] drm: Add dummy page per device or GEM object

2020-06-20 Thread Andrey Grodzovsky
Will be used to reroute CPU mapped BO's page faults once device is removed. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/drm_file.c | 8 drivers/gpu/drm/drm_prime.c | 10 ++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 i

Re: [PATCH v3] drm/amd: Fix memory leak according to error branch

2020-06-20 Thread Felix Kuehling
Am 2020-06-20 um 9:21 a.m. schrieb Julia Lawall: > > On Sat, 20 Jun 2020, Markus Elfring wrote: > >>> The function kobject_init_and_add alloc memory like: >>> kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs >>> ->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller >>> ->k