Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-08-03 Thread Paweł Gronowski
Hello again, I just finished a bisect of amd-staging-drm-next and it looks like the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1 ("drm/amdgpu: fix system hang issue during GPU reset"). It is a bit tricky, because it is commited on top of my first faulty patch

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-08-03 Thread Paweł Gronowski
Hey Matt, I have just tested the amd-staging-drm-next branch (dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted and I can reproduce your issue with: sudo sh -c 'echo "s 0 305 750" > /sys/class/drm/card0/device/pp_od_clk_voltage' Which makes the sh hang with 100% usage. The

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-31 Thread Matt Coffin
My bisect resulted in the same conclusion, that the problem began with edad8312cbbf9a33c86873fc4093664f150dd5c1. That commit has a LOT of changes, so I'm having problems following what might be relevant, so in case Hawking or Dennis have any insight they could contribute towards letting us know

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-31 Thread Matt Coffin
I actually *just* finished my bisect, and arrived at the same conclusion. The hang appears to be introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1. There are some conflicts with an automatic `git revert`, so I'm picking through the changes now to fully understand what happened and come up

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-30 Thread Paweł Gronowski
Hello Matt, Thank you for your testing. It seems that my gpu (RX 570) does not support the vc setting so I can not exactly reproduce the issue. However I did trace the code path the test case takes and it seems to correctly pass through the while loop that parses the input and fails only in

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-30 Thread Matt Coffin
Hey Pawel, I did confirm that this patch *introduced* the issue both with the bisect, and by testing reverting it. Now, there's a lot of fragile pieces in the dpm handling, so it could be this patch's interaction with something else that's causing it and it may well not be the fault of this

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-30 Thread Matt Coffin
Hello all, I just did some testing with this applied, and while it no longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" > /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking to, and staying at 100% CPU usage, with no indicating information in `dmesg` from the

Re: [PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-29 Thread Alex Deucher
On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski wrote: > > Regression was introduced in commit 38e0c89a19fd > ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which > made the set_pp_od_clk_voltage and set_pp_power_profile_mode return > -EINVAL for previously valid input. This was

[PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

2020-07-29 Thread Paweł Gronowski
Regression was introduced in commit 38e0c89a19fd ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which made the set_pp_od_clk_voltage and set_pp_power_profile_mode return -EINVAL for previously valid input. This was caused by an empty string (starting at the \0 character) being passed