https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #172 from line...@xcpp.org ---
I had dpm=2 as a module option. GPU initialization failure does not occur
without dpm=2
--
You are receiving this mail because:
You are the assignee for the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
Alex Deucher changed:
What|Removed |Added
Attachment #146026|text/x-log |text/plain
mime type|
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #171 from line...@xcpp.org ---
Created attachment 146026
--> https://bugs.freedesktop.org/attachment.cgi?id=146026=edit
5.4.0-arch1-1 GPU initialization fails
With kernel version 5.4.0-arch1-1 the GPU can flat out no longer be
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #170 from Peter Hercek ---
Maybe this helps since there is a stack trace. GUI stopped to respond so I shut
it down over ssh. A kernel crash during the shutdown on 5.3.6-arch1-1-ARCH even
when amdgpu.dpm=0. That is the option which
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #169 from picar...@live.de ---
I am using a Radeon VII with Arch Linux, a 1440p144hz and a 4K60Hz monitor, and
I had similar crashes to the others here if I tried running the 1440p144hz
monitor at 144hz, at 60hz it was stable. This
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #168 from line...@xcpp.org ---
Created attachment 145784
--> https://bugs.freedesktop.org/attachment.cgi?id=145784=edit
5.3.7: Fence fallback timer expired on ring
Here is a freeze which went a bit differently.
This time the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #167 from Alex Deucher ---
(In reply to Peter Hercek from comment #166)
> I got the crash after 4 days of use. It looks the same as before:
> ring sdma0 timeout, gpu reset (allegedly successful), many skipped IBs, and
> failure to
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #166 from Peter Hercek ---
I tried, 5.3.6-arch1-1 on archlinux with 3 DP monitors. It should contain the
patch based on the comment from line...@xcpp.org.
I got the crash after 4 days of use. It looks the same as before:
ring sdma0
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #165 from Tom B ---
I just tried 5.3.5 (which is the latest in the arch repo) and it's working fine
for me.
I do have an issue on Wayland. If the screen turns off, Wayland crashes and I
have to hard reset. The log shows
Oct 14
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #164 from line...@xcpp.org ---
(In reply to Tom B from comment #163)
> Gargoyle, linedot, can you confirm whether this crash is with both patches
> applied?
>
> I'm still on 5.3.1 patched and haven't had a single crash.
For 5.3.1
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #163 from Tom B ---
Gargoyle, linedot, can you confirm whether this crash is with both patches
applied?
I'm still on 5.3.1 patched and haven't had a single crash.
--
You are receiving this mail because:
You are the assignee for
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #162 from line...@xcpp.org ---
Created attachment 145730
--> https://bugs.freedesktop.org/attachment.cgi?id=145730=edit
Freeze/Black screen/Crash on 5.3.6
Apologies, I have been on vacation and thus away from my main System.
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #161 from Gargoyle ---
Hi there. I've been trying to solve some lockups and pauses with my system and
have just read this entire thread.
The good news is that I am another Radeon VII owner having the same problems
and I am willing
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #160 from ReddestDream ---
Well, today I had a hard freeze using more than one display with Radeon VII.
Back to Radeon VII + iGPU . . . :(
--
You are receiving this mail because:
You are the assignee for the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #159 from ReddestDream ---
Oh. Also,
cat /sys/kernel/debug/dri/0/amdgpu_pm_info
Now seems to work on 5.3.4 with more than one monitor in. It doesn't report
nonsense values like 0 watts like it did before. :)
--
You are receiving
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #158 from ReddestDream ---
More good news. It seems that 5.3.4 does work for me and doesn't (at least
immediately since I'm typing this from there right now) fall apart into a
glitchy mess.
I'm still not really sure of the complete
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #157 from ReddestDream ---
@Tom B. Well, some good news. Kernel 5.3.4 should have the patches for Radeon
VII included now. I'll do some more tests on that ...
--
You are receiving this mail because:
You are the assignee for the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #156 from Tom B ---
This is strange because with a patched 5.3.1, I have perfect stability. An
uptime of over a week and no issues. Are you saying that the issue comes back
in 5.4? Hopefully not as Linux 5.4 + Mesa 19.3 looks to
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #155 from ReddestDream ---
So, I've done some tests with 5.4-rc1 and it seems like I'm getting similar
results to line...@xcpp.org and sehell...@gmail.com. I'm using GNOME with
Wayland (which works fine with only 1 display).
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #154 from line...@xcpp.org ---
Created attachment 145623
--> https://bugs.freedesktop.org/attachment.cgi?id=145623=edit
5.4.0-rc1 hangup
dmesg with 5.4.0-rc1.
System freezes and becomes unresponsive to input like before
--
You
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #153 from ReddestDream ---
Just FYI, it appears that kernel 5.3.2 does not have the Vega 20 fix commits
that Alex Deucher mentioned.
--
You are receiving this mail because:
You are the assignee for the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #152 from ReddestDream ---
Kernel 5.4-rc1, the first kernel version that includes the Vega 20 patches
noted by Alex Deucher, is now out and in linux-mainline on Arch Linux AUR. :)
I plan to do some testing of this version over the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #151 from line...@xcpp.org ---
Created attachment 145583
--> https://bugs.freedesktop.org/attachment.cgi?id=145583=edit
5.3.1 patched, xorg crash
And here is a dmesg of just an X session crashing
--
You are receiving this mail
https://bugs.freedesktop.org/show_bug.cgi?id=110674
line...@xcpp.org changed:
What|Removed |Added
Attachment #145581|0 |1
is obsolete|
https://bugs.freedesktop.org/show_bug.cgi?id=110674
line...@xcpp.org changed:
What|Removed |Added
CC||line...@xcpp.org
--- Comment #149
https://bugs.freedesktop.org/show_bug.cgi?id=110674
Anthony Rabbito changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #147 from ReddestDream ---
> Already merged to 5.4. I'll take a look at older kernels as well.
@Alex Deucher Thanks so much for all your help! :)
--
You are receiving this mail because:
You are the assignee for the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #146 from Alex Deucher ---
(In reply to tom91136 from comment #145)
> @Alex any plans for the patches to be merged for 5.4 or even backported to
> 5.3 at some point?
Already merged to 5.4. I'll take a look at older kernels as
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #145 from tom91...@gmail.com ---
@Alex any plans for the patches to be merged for 5.4 or even backported to 5.3
at some point?
--
You are receiving this mail because:
You are the assignee for the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #144 from sehell...@gmail.com ---
I also think this is strange. Since yesterday, they turned off and on many
times successfully without any problems. Most likely, it's connected with
something else, but I don’t know where to find.
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #143 from Tom B ---
I'm not sure how KDE handles monitor power behind the scenes but I have an
uptime of 2 days now since applying the patches and with KDE I've let it turn
off the monitors at least 6 or 7 times and suspend/resume 3
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #142 from sehell...@gmail.com ---
(In reply to Alex Deucher from comment #141)
> (In reply to sehellion from comment #140)
> > Created attachment 145463 [details]
> > 5.3.1 with Alex's patches and dual monitors, crash
>
> That's not
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #141 from Alex Deucher ---
(In reply to sehellion from comment #140)
> Created attachment 145463 [details]
> 5.3.1 with Alex's patches and dual monitors, crash
That's not a crash, it's just a warning.
--
You are receiving this
https://bugs.freedesktop.org/show_bug.cgi?id=110674
Alex Deucher changed:
What|Removed |Added
Attachment #145463|text/x-log |text/plain
mime type|
https://bugs.freedesktop.org/show_bug.cgi?id=110674
sehell...@gmail.com changed:
What|Removed |Added
Attachment #145461|0 |1
is obsolete|
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #139 from sehell...@gmail.com ---
Today, when trying to wake up the monitors, the system crashed again.
WARNING: CPU: 4 PID: 32 at
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1720
decide_link_settings+0xe0/0x2a0
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #138 from sehell...@gmail.com ---
Created attachment 145461
--> https://bugs.freedesktop.org/attachment.cgi?id=145461=edit
5.3.1 with Alex's patches and dual monitors
--
You are receiving this mail because:
You are the assignee
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #137 from sehell...@gmail.com ---
(In reply to Alex Deucher from comment #128)
> Do these patches help?
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-
> fixes=c46e5df4ac898108da66a880c4e18f69c74f6c1b
>
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #133 from Anthony Rabbito ---
Created attachment 145459
--> https://bugs.freedesktop.org/attachment.cgi?id=145459=edit
dsmeg log with Alex's patches
Here's my dsmeg with Alex's patches. Going to mess around and see what I can
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #129 from Tom B ---
Thank you Alex! That has fixed it! The card is now correctly setting its
voltages and clocks. I applied the patch to 5.3.1
However, I've noticed a few very minor problems that are probably worth
reporting.
1. I
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #131 from Tom B ---
In addition to my previous comment, [drm] schedsdma0 is not ready, skipping
repeating indefinitely stops after a suspend/resume. After the machine is
resumed these stop appearing but it does suspend and resume
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #132 from Anthony Rabbito ---
Created attachment 145458
--> https://bugs.freedesktop.org/attachment.cgi?id=145458=edit
linux-mainline5.3 dmesg without patches
Here's my current dmesg with two out of three monitors running without
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #134 from Anthony Rabbito ---
Wow ! All three of my monitors are working again. 2560x1440 @ 144Hz
--
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #135 from Adrian Brown ---
@reddestdream Thanks. I don't think the active adapter is the problem as it
works perfectly with my Vega 64. However I will try 18.04 and AMD's driver as
suggested.
--
You are receiving this mail
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #136 from tom91...@gmail.com ---
Been following this thread for a while now as I just got 3 4k 60Hz monitors
connected to the 3 DP ports on my Radeon VII.
I'm getting the exact same errors discussed in this report with matching
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #130 from Anthony Rabbito ---
(In reply to Alex Deucher from comment #128)
> Do these patches help?
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-
> fixes=c46e5df4ac898108da66a880c4e18f69c74f6c1b
>
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #128 from Alex Deucher ---
Do these patches help?
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes=c46e5df4ac898108da66a880c4e18f69c74f6c1b
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #127 from Alex Deucher ---
(In reply to Tom B from comment #15)
> Have been running 5.0 since release without issue but upgraded this morning
> and got crashes as described here within a few seconds of boot.
>
Can you bisect
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #126 from ReddestDream ---
@Adrian Brown Your Linux issue is potentially related to the active adapter.
Have you tried w/o it?
On Windows, the flickering on/around login, at least for me, has been mostly
resolved by using the
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #125 from Adrian Brown ---
I am also getting frequent crashes with a Radeon VII on Kubuntu 19.10 (kernel
5.0.0-29-generic). I see there is some discussion in this thread about it
possibly being related to multiple monitors. But I
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #124 from ReddestDream ---
Created attachment 145254
--> https://bugs.freedesktop.org/attachment.cgi?id=145254=edit
Dmesg 5.3-rc7 w/ Two monitors
This issue is still not fixed on 5.3-rc7. I guess we will probably have to wait
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #123 from ReddestDream ---
A few interesting fixes that touch vega20_hwmgr.c have rolled in from
drm-fixes:
The first is likely the most interesting for our issues, as it touches
min/maxes (tho only the soft ones it seems). The
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #122 from ReddestDream ---
Tested 5.3-rc6. Still has the same issues. Only it's maybe actually worse
because I lose display completely when I use amdgpu.dpm=2 w/Radeon VII
multimonitor on 5.3-rc6, whereas on 5.2.9 I just got
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #121 from ReddestDream ---
Some observations:
1. Nothing at all seems to be up with cur_speed and cur_width. They get set
several times in a row in both runs, but the values are all the same in both.
2. I can't really see anything
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #120 from ReddestDream ---
Created attachment 145159
--> https://bugs.freedesktop.org/attachment.cgi?id=145159=edit
DebugAMDiGPU
Also here is the AMD + iGPU one.
--
You are receiving this mail because:
You are the assignee for
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #119 from ReddestDream ---
Created attachment 145158
--> https://bugs.freedesktop.org/attachment.cgi?id=145158=edit
DebugAMD2Monitors
>I don't think I have time to try it today but if anyone is recompiling the
>code adding
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #118 from ReddestDream ---
So, this is a crazy idea, but ironically I think it might be getting closer to
the truth.
Tom B. attempted reverting ad51c46eec739c18be24178a30b47801b10e0357, which was
known to cause some issue with an
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #116 from ReddestDream ---
Created attachment 145153
--> https://bugs.freedesktop.org/attachment.cgi?id=145153=edit
dmesgAMD2Monitors
I've been doing a few tests. I looked into and compiled 5.3-rc5 along with
these patches, but
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #115 from Tom B ---
I should have noted it earlier, but I had already tried reverting both "golden
values" commits. I've no idea what it does but it didn't fix this crash.
One thing that would be insightful would be logging every
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #114 from ReddestDream ---
5. Tom B., it is probably worth getting a full dmesg with your two monitors in
on a relatively new 5.2.x kernel using at least: amdgpu.dc_log=1 drm.debug=0x1e
log_buf_len=2M
And anything else you might
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #113 from ReddestDream ---
4.
> Given that two different versions of the code produce the same result, my
> hunch is that the problem is B. The card is not in a state where it's able to
> receive power changes.
Something to
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #112 from ReddestDream ---
More ideas:
3. Looking through the crash in sehellion's comment 45:
gfx_v9_0_ring_test_ring+0x19e/0x230 [amdgpu]
amdgpu_ring_test_helper+0x1e/0x90 [amdgpu]
gfx_v9_0_hw_fini+0x299/0x690 [amdgpu]
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #111 from ReddestDream ---
A few other ideas to ponder:
1. Looking into DPM, I found this commit for 5.1-rc1 that looks interesting:
https://github.com/torvalds/linux/commit/7ca881a8651bdeffd99ba8e0010160f9bf60673e
Looks like it
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #110 from ReddestDream ---
> 1. The functions in vega20_ppt.c are used with this new patch so that answers
> my question from earlier, that's what this file is for and why it contains
> similar/identical functions.
I was hoping
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #109 from Tom B ---
Created attachment 145080
--> https://bugs.freedesktop.org/attachment.cgi?id=145080=edit
dmesg with amdgpu.dpm=2
> Tom B., did you try booting with amdgpu.dpm=1 or amdgpu.dpm=2 (default is
> generally -1 for
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #108 from ReddestDream ---
> Booting with amdgpu.dpm=0 on 5.2.7 works.
Tom B., did you try booting with amdgpu.dpm=1 or amdgpu.dpm=2 (default is
generally -1 for automatic)? Seems like one of those might enable the new
experimental
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #107 from ReddestDream ---
> Booting with amdgpu.dpm=0 on 5.2.7 works.
> It is a DPM issue of some kind so although my earlier tests showed that
> hard_min_level was set correctly, it still could be an issue elsewhere in the
>
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #106 from Tom B ---
Booting with amdgpu.dpm=0 on 5.2.7 works.
Performance is poor and as expected I cannot get any information about power
states because /sys/kernel/debug/dri/0/amdgpu_pm_info doesn't exist. I'm
guessing it runs at
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #105 from Tom B ---
> Also, I considered that both of my monitors have audio out support. I wonder
> if audio initialization might be the missing piece to the puzzle, the thing
> that interrupts/changes the state of the card and
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #104 from Tom B ---
I did get very similar crashing when I was running HDMI + DP at different
refresh rates ( see https://bugs.freedesktop.org/show_bug.cgi?id=110510 ). I
switched to DP + DP because HDMI+DP wasn't stable, it could
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #103 from Peter Hercek ---
I boot in BIOS mode and I'm still getting these errors. Though they are rare in
my case with the "better" kernels (around once a week).
Just a note: There were tearing errors in windows drivers of Radeon
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #102 from Tom B ---
> Grasping at straws a bit here, but it occurred to me that maybe Linux kernel
> testing on Radeon VII was done on an early VBIOS that didn't have full UEFI
> support yet. We know that AMD had to issue a VBIOS
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #101 from ReddestDream ---
Grasping at straws a bit here, but it occurred to me that maybe Linux kernel
testing on Radeon VII was done on an early VBIOS that didn't have full UEFI
support yet. We know that AMD had to issue a VBIOS
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #100 from Tom B ---
I've bee trying to work backwards to find the place where screens get
initialised and eventually call vega20_pre_display_configuration_changed_task.
vega20_pre_display_configuration_changed_task is exported as
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #99 from Tom B ---
Created attachment 145062
--> https://bugs.freedesktop.org/attachment.cgi?id=145062=edit
a list of commits 5.0.13 - 5.1.0
Attached is a list of all amdgpu and powerplay commits from 5.0.13 - 5.1.0.
I have
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #98 from Sylvain BERTRAND ---
> The code seems very similar to what we see in
> vega20_notify_smc_display_config_after_ps_adjustment near where we get the "
> [SetHardMinFreq] Set hard min uclk failed!" Maybe this
>
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #97 from Tom B ---
I've been investigating this:
https://github.com/torvalds/linux/commit/94ed6d0cfdb867be9bf05f03d682980bce5d0036
Because vega20 doesn't export display_configuration_change, it jumps to the
newly added else block
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #96 from Tom B ---
Created attachment 145047
--> https://bugs.freedesktop.org/attachment.cgi?id=145047=edit
logging anywhere the number of screens is set
Again, no closer to a fix but another thing to rule out. In addition to
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #95 from Tom B ---
So here's something interesting. In 5.0.13 there is no function
vega20_display_config_changed. This function issues
smu_send_smc_msg_with_param(smu, SMU_MSG_NumOfDisplays, 0);
In fact, in 5.0.13 there is no
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #94 from Tom B ---
Reverting d1a3e239a6016f2bb42a91696056e223982e8538 didn't fix it for me. But
that commit may give some insight because it is related to uclk which is the
first error we get.
I also tried globally increasing
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #93 from Chris Hodapp ---
Note: It might be good for someone else to double-check my conclusion before
too much stock is put into it. Scientific method and all that.
--
You are receiving this mail because:
You are the assignee for
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #92 from ReddestDream ---
>If you follow the callstack:
I've been thinking all this over. The only thing unfortunately that really
sticks out at me still is how Chris Hodapp says that reverting this commit:
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #91 from ReddestDream ---
>It returns 0 on success and -EIO on failure, which is then in turn returned
>from vega20_set_fclk_to_highest_dpm_leve. Where did you see the check/retry on
>EINVAL? Perhaps -EIO should be -EINVAL?
I
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #90 from Tom B ---
I'm not sure this is helpful but I managed to somewhat test the race condition
theory.
If you follow the callstack:
vega20_set_fclk_to_highest_dpm_level -> smum_send_msg_to_smc_with_parameter ->
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #89 from Tom B ---
> It should return -EINVAL instead. Maybe then it would reset and try again
> instead of just ignoring it and continuing with initialization anyway,
> leading to instability.
If you look at
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #88 from ReddestDream ---
>The question then becomes: Why doesn't the race condition happen with only one
>screen? Perhaps it's a matter of speed. With a single display, the driver
>detect the displays, read/parse the EDID data,
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #87 from Tom B ---
> Could be we've got a race condition between the powerplay setup and amdgpu
handing off the card to drm_dev_register to advertise it for normal use.
The question then becomes: Why doesn't the race condition
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #86 from ReddestDream ---
>In addition to that, vega20_set_fclk_to_highest_dpm_level is called several
>times before the card is initialized and even on 5.2.7 works. Something
>happens during or just before the initialization
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #85 from Tom B ---
> Yeah. I've had boots where I have my 2 4K DP monitors in and I don't get
> powerplay error on boot. In fact, it can go a bit and seem stable.
In addition to that, vega20_set_fclk_to_highest_dpm_level is
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #84 from ReddestDream ---
>Need to figure out what exactly what is generating the line "[drm] Initialized
>amdgpu 3.27.0 20150101 for :44:00.0 on minor 0."
That "Initialized amdgpu" message seems to be coming from here:
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #83 from ReddestDream ---
> Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
> 5.2.7 so the issue is not the value from the dpm table. The dpm table is
> probably correct.
Fantastic! Glad you tested
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #82 from Tom B ---
In addition, I will note that the file vega20_baco.c has been added in 5.1
details: https://www.phoronix.com/scan.php?page=news_item=AMD-Vega-12-BACO
commit:
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #81 from Tom B ---
Created attachment 145038
--> https://bugs.freedesktop.org/attachment.cgi?id=145038=edit
5.2.7 dmesg with hard_min_level logged
As mentioned in the previous post, I started logging the value of
hard_min_level.
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #80 from Tom B ---
> I tried something like that before but a huge portion of the commits in that
> range won't build kernels that can boot (at least on my system). I ended up
> resorting to trying reverting individual
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #79 from ReddestDream ---
>I tried something like that before but a huge portion of the commits in that
>range won't build kernels that can boot (at least on my system).
It's interesting that you found
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #78 from Chris Hodapp ---
> I don't see anywhere else to go but bisection from 5.0.13 to 5.1. That should
> at least find something . . .
I tried something like that before but a huge portion of the commits in that
range won't
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #77 from ReddestDream ---
>I guess, you are good for a bisection if you have a "working" kernel.
This is, based on everything here, I'm not convinced that 5.0.13 has 0 issues.
Only that it seems to have fewer issues. But yeah. I
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #76 from Sylvain BERTRAND ---
> Unfortunately, it does look like going through and slowing disabling features
> and/or bisecting might be the only way to find how this issue got started. At
> least if we could narrow it down, we
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #75 from ReddestDream ---
>Here's some additional investigation.
>[SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the
>first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by:
I agree that
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #74 from Sylvain BERTRAND ---
Forcing the memory clock and voltage is not enough: the dc[en]x memory requests
should be given also the highest priority in the arbiter block. I don't recall
how it interacts with the dc[en]x
1 - 100 of 178 matches
Mail list logo