Hello again,
I just finished a bisect of amd-staging-drm-next and it looks like
the hang is first introduced in edad8312cbbf9a33c86873fc4093664f150dd5c1
("drm/amdgpu: fix system hang issue during GPU reset").
It is a bit tricky, because it is commited on top of my first faulty patch
Hey Matt,
I have just tested the amd-staging-drm-next branch
(dd654c76d6e854afad716ded899e4404734aaa10) with my patches reverted
and I can reproduce your issue with:
sudo sh -c 'echo "s 0 305 750" >
/sys/class/drm/card0/device/pp_od_clk_voltage'
Which makes the sh hang with 100% usage.
The
My bisect resulted in the same conclusion, that the problem began with
edad8312cbbf9a33c86873fc4093664f150dd5c1.
That commit has a LOT of changes, so I'm having problems following what
might be relevant, so in case Hawking or Dennis have any insight they
could contribute towards letting us know
I actually *just* finished my bisect, and arrived at the same
conclusion. The hang appears to be introduced in
edad8312cbbf9a33c86873fc4093664f150dd5c1.
There are some conflicts with an automatic `git revert`, so I'm picking
through the changes now to fully understand what happened and come up
Hello Matt,
Thank you for your testing. It seems that my gpu (RX 570) does not support the
vc setting so I can not exactly reproduce the issue. However I did trace the
code path the test case takes and it seems to correctly pass through the while
loop that parses the input and fails only in
Hey Pawel,
I did confirm that this patch *introduced* the issue both with the
bisect, and by testing reverting it.
Now, there's a lot of fragile pieces in the dpm handling, so it could be
this patch's interaction with something else that's causing it and it
may well not be the fault of this
Hello all, I just did some testing with this applied, and while it no
longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
/sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
to, and staying at 100% CPU usage, with no indicating information in
`dmesg` from the
On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski wrote:
>
> Regression was introduced in commit 38e0c89a19fd
> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
> -EINVAL for previously valid input. This was
Regression was introduced in commit 38e0c89a19fd
("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
-EINVAL for previously valid input. This was caused by an empty
string (starting at the \0 character) being passed