On Wed, 14 Apr 2021 at 11:48, Christian König
wrote:
>
> >> commit f63da9ae7584280582cbc834b20cc18bfb203b14
> >> Author: Philip Yang
> >> Date: Thu Apr 1 00:22:23 2021 -0400
> >>
> >> drm/amdgpu: reserve fence slot to update page table
> >>
>
> That is expected behavior, the application
On Tue, 13 Apr 2021 at 12:29, Christian König wrote:
>
> Hi Mikhail,
>
> the crash is a known issue and should be fixed by:
>
> commit f63da9ae7584280582cbc834b20cc18bfb203b14
> Author: Philip Yang
> Date: Thu Apr 1 00:22:23 2021 -0400
>
> drm/amdgpu: reserve fence slot to update page
Video demonstration: https://youtu.be/3nkvUeB0GSw
How looks kernel traces.
1.
[ 7315.156460] amdgpu :0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156490] amdgpu :0b:00.0: amdgpu: in page starting at
On Wed, 7 Apr 2021 at 15:46, Christian König
wrote:
>
> What hardware are you using
$ inxi -bM
System:Host: fedora Kernel: 5.12.0-0.rc6.184.fc35.x86_64+debug
x86_64 bits: 64 Desktop: GNOME 40.0
Distro: Fedora release 35 (Rawhide)
Machine: Type: Desktop Mobo: ASUSTeK model: ROG
Hi!
During the 5.12 testing cycle I observed the repeatable bug when
launching heavy graphic applications.
The kernel log is flooded with the message "Unexpected multihop in
swaput - likely driver bug.".
Trace:
[ 8707.814899] [ cut here ]
[ 8707.814920] Unexpected multihop
On Tue, 9 Mar 2021 at 07:31, Hillf Danton wrote:
> At the first glance, the zero pointer goes out of the box of race because
>
> 1/ the Call Trace shows it is the free path (of the supposed race victim),
>
> 2/ on the race winner side however either list_del or list_del_init
>would not leave
On Fri, 5 Mar 2021 at 19:22, Hillf Danton wrote:
>
> Yes, it is the same race as we saw before. But after cutting the race
> between poo->stale_lock and pool->lock with the patch above, the race
> between the free path and isolate/putback path came up.
>
> Try the diff below in combination with
On Mon, 1 Mar 2021 at 08:11, Hillf Danton wrote:
>
> What we learn from your reports is
>
> 1/ in z3fold_free(), kref_put() creates the ground zero for the race
> cases reported,
>
> 2/ the stale_lock in combination with lock makes things more
> complicated than thought.
>
> Instead of dropping
On Sat, 13 Feb 2021 at 08:03, Hillf Danton wrote:
>
> The comment below shows a race instance, though I failed to put things
> together to see how within two hours. Cut it and see what will come up.
>
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -1129,19 +1129,22 @@ retry:
> page = NULL;
On Tue, 26 Jan 2021 at 13:28, Hillf Danton wrote:
>
>
> BTW better run the reproducer again with KASAN enabled.
>
It happened today again with kernel 5.11 rc7 (e0756cfc7d7c)
Why not try your patch?
list_del corruption, def70143e848->next is LIST_POISON1 (dead0100)
[ cut
Hi folks.
During the 5.11 test cycle I caught a rare but repeatable problem when
after a day uptime happens "BUG at mm/zswap.c:1275!". I am still not
having an idea how to reproduce it, but maybe the authors of this code
could explain what happens here?
$ grep "mm/zswap.c" dmesg*.txt
On Mon, 8 Feb 2021 at 14:18, Christian König
wrote:
>
> Are the other problems gone as well?
>
And yes and no.
The issue with monitor turns off was gone after rc6 (git3aaf0a27ffc2)
But both traces
1) BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196 (kernel 5.11
On Sun, 31 Jan 2021 at 22:22, Christian König
wrote:
>
>
> Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes.
> Should land in the next -rc by the weekend.
>
> Regards,
> Christian.
I checked this patch [1] for several days.
And I can confirm that the reported issue was gone.
Hi folks.
On 5.11-rc6 (git 3aaf0a27ffc2) I caught a new issue.
For unknown reason sound disappeared in my headset Hyperx orbit s.
But after reconnecting to another USB port headset stopped being
detected as USB device in dmesg
and in log appears a record about bug KASAN: use-after-free.
The 5.11-rc5 (git 76c057c84d28) brought a new issue.
Now the kernel log is flooded with the message "page allocation failure".
Trace:
msedge:cs0: page allocation failure: order:10,
mode:0x190cc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_NOMEMALLOC),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 18 PID:
On Sun, 24 Jan 2021 at 23:23, Mikhail Gavrilov
wrote:
>
> Thanks for looking at the issue.
> Why the proposed patch not intended for testing?
> It is not the final (optimal) variant?
>
>
> --
> Best Regards,
> Mike Gavrilov.
With disabled kasan I got slightly dif
On Thu, 21 Jan 2021 at 18:27, Christian König wrote:
>
> I still have no idea what's going on here.
>
> The KASAN messages from the DC code are completely unrelated.
>
> Please add the full dmesg to your bug report.
>
I did it.
https://gitlab.freedesktop.org/drm/amd/-/issues/1439#note_776267
--
On Sun, 24 Jan 2021 at 16:11, Hillf Danton wrote:
>
> If it is supposed due to the race between pool->stale_lock and
> pool->lock that are both protecting the buddy list_head then adding
> another one can be a cure. The diff below is not for any test.
Thanks for looking at the issue.
Why the
Hi folks,
I am testing new kernels under high load and KASAN found some troubles:
BUG: KASAN: use-after-free in __list_add_valid+0x81/0xa0
Read of size 8 at addr 8881f2cda008 by task ThreadPoolForeg/110220
CPU: 22 PID: 110220 Comm: ThreadPoolForeg Tainted: GW
- ---
On Fri, 15 Jan 2021 at 03:43, Mikhail Gavrilov
wrote:
>
In rc4, the number of warnings has dropped dramatically.
No more errors "kasan slab-out-of-bounds" and no "DMA-API device
driver failed to check map error".
But still not fixed "sleeping function called from inva
On Thu, 14 Jan 2021 at 18:56, Christian König wrote:
> Unfortunately not of hand.
>
> I also don't see any bug reports from other people and can't reproduce
> the last backtrace you send out TTM here.
Because only the most desperate will install kernels with enabled
debug flags and then load the
On Tue, 12 Jan 2021 at 01:45, Christian König wrote:
>
> But what you have in your logs so far are only unrelated symptoms, the
> root of the problem is that somebody is leaking memory.
>
> What you could do as well is to try to enable kmemleak
I captured some memleaks.
Do they contain any
Hi Christian,
On Tue, 12 Jan 2021 at 01:45, Christian König wrote:
>
> Hi Mike,
>
> Unfortunately not, that's DC stuff. Easiest is to assign this as a bug
> tracker to our DC team.
Ok
> At least some progress. Any objections that I add your e-mail address as
> tested-by tag?
Yes, feel free add
On Mon, 11 Jan 2021 at 19:01, Christian König wrote:
> Changing the page table attributes while releasing memory might sleep.
> So we can't use a spinlock here.
>
> Thanks for the report, a patch to fix this is on the mailing list now.
Can you look also the first trace?
Here a same error
Hi folks,
today I joined to testing Kernel 5.11 and saw that the kernel log was
flooded with BUG messages:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0
INFO: lockdep is turned off.
CPU: 15 PID: 266
Hi folks!
I started to see this message every boot after replacing Radeon VII to 6900XT.
$ journalctl | grep "BUG: key"
Dec 31 05:19:42 localhost.localdomain kernel: BUG: key
98b59ab01148 has not been registered!
Dec 31 05:25:44 localhost.localdomain kernel: BUG: key
8d425ba01148 has not
On Tue, 29 Dec 2020 at 20:15, Deucher, Alexander
wrote:
>
> It looks like the driver is not able to access the firmware for some reason.
> Please make sure it is available in your initrd or compiled into the kernel
> depending on your config.
Exactly! Thanks!
# lsinitrd
On Sun, 27 Dec 2020 at 21:39, Mikhail Gavrilov
wrote:
> I suppose the root of cause my problem here:
>
> [3.961326] amdgpu :0b:00.0: Direct firmware load for
> amdgpu/sienna_cichlid_sos.bin failed with error -2
> [3.961359] amdgpu :0b:00.0: amdgpu: failed to in
Hi folks.
I observed hard reproductible the set of bugs.
It always started as
1) kworker/u64:2: page allocation failure: order:5,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
Continious as:
2) WARNING: CPU: 21 PID: 806649 at
Hi folks.
I observed this issue since 5.3 and it still happens with 5.10 git.
This warning has reproductivity 100% reliable when I launch
"Wolfenstein: Youngblood" version of Mesa doesn't matter.
[73690.883948] [ cut here ]
[73690.883953]
On Fri, 16 Oct 2020 at 12:11, Mikhail Gavrilov
wrote:
>
> Hi folks,
> today I joined to testing Kernel 5.10 and see that every boot happens
> this warning:
>
> [ 22.180180] [ cut here ]
> [ 22.180193] WARNING: CPU: 28 PID: 1205 at
> net/netfi
On Fri, 16 Oct 2020 at 17:40, Peter Zijlstra wrote:
>
>
> Joy... __zram_bvec_write() and __zram_bvec_read() take these locks in
> opposite order.
>
> Does something like the (_completely_) untested below cure things?
Excellent! This patch (_completely_) cured all other warnings which
were
Hi folks,
today I joined to testing Kernel 5.10 and see that every boot happens
this warning:
[ 22.180180] [ cut here ]
[ 22.180193] WARNING: CPU: 28 PID: 1205 at
net/netfilter/nf_tables_api.c:622 nft_chain_parse_hook+0x224/0x330
[nf_tables]
[ 22.180194] Modules
Hi folks,
today I joined to testing Kernel 5.10 and see that every boot happens
this warning:
[9.032096] ==
[9.032097] WARNING: possible circular locking dependency detected
[9.032098]
Hi folks!
I have a question.
What happens when dd writes data to a missing device?
For example:
# dd
if=/home/mikhail/Downloads/Fedora-Workstation-Live-x86_64-Rawhide-20201010.n.0.iso
of=/dev/adb
Today I and wrongly entered /dev/adb instead of /dev/sdb,
and what my surprise was when the data
Paolo, Jens I am sorry for the noise.
But today I hit the kernel panic and git blame said that you have
created the file in which happened panic (this I saw from trace)
$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname -r`/vmlinux
__bfq_deactivate_entity+0x15a
On Mon, 13 Jul 2020 at 12:11, Mikhail Gavrilov
wrote:
>
> On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov
> wrote:
> >
> > Hi folks.
> > While testing 5.8 RCs I founded that kernel log flooded by the message
> > "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.
On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov
wrote:
>
> Hi folks.
> While testing 5.8 RCs I founded that kernel log flooded by the message
> "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree
> insert+0xaf/0xc0 [fuse]" when I start podman container.
> In kerne
On Fri, 10 Jul 2020 at 17:15, Alexander Tsoy wrote:
>
> You've probably hit this bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=208353
>
The issue already not reproduced at least on commit git 0bddd227f3dc
--
Best Regards,
Mike Gavrilov.
Hi folks.
While testing 5.8 RCs I founded that kernel log flooded by the message
"WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree
insert+0xaf/0xc0 [fuse]" when I start podman container.
In kernel 5.7 not has such a problem.
[92414.864536] [ cut here ]
Beginning 5.8rc1 (git 69119673bd50) kernel USB headsets (ASUS ROG
Delta and HyperX Cloud Orbit S) play sound as if in slow-motion.
And in 5.8rc4 (git dcde237b9b0e) this still not fixed yet.
The bisecting is problematic because rc1 also has another issue
https://lkml.org/lkml/2020/6/22/21 which
On Mon, 22 Jun 2020 at 10:26, Mikhail Gavrilov
wrote:
>
> Hi folks.
> After upgrade kernel to 5.8RC1 (git69119673bd50) my system stopped
> playing sound.
> In the kernel log, I see the message 'invalid opcode: [#1] SMP
> NOPTI' which probably related to this issue.
The pro
Hi folks.
After upgrade kernel to 5.8RC1 (git69119673bd50) my system stopped
playing sound.
In the kernel log, I see the message 'invalid opcode: [#1] SMP
NOPTI' which probably related to this issue.
[ 19.076508] page:eb1b1dc14b00 refcount:1 mapcount:0
mapping:
Hi folks.
I didn’t do anything unusual, I just restarted the computer after the
update, launched all the applications that I usually launch and went
to drink tea.
When I returned, I found that the monitor was on (it should have
turned off since I had set the energy-saving mode for 5 minutes in DE)
On Wed, 4 Sep 2019 at 13:37, Daniel Vetter wrote:
>
> Extend your backtrac warning slightly like
>
> WARN(r, "we're stuck on fence %pS\n", fence->ops);
>
> Also adding Harry and Alex, I'm not really working on amdgpu ...
[ 3511.998320] [ cut here ]
[ 3511.998714]
On Tue, 4 Dec 2018 at 04:36, Bjorn Helgaas wrote:
>
> [Forwarding this to linux-pci since nobody really monitors the bugzilla]
>
> Possibly the same issue reported here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=109691
> https://bugzilla.kernel.org/show_bug.cgi?id=111601
>
On Tue, 23 Jul 2019 at 10:08, Huang, Ying wrote:
>
> Thanks! I have found another (easier way) to reproduce the panic.
> Could you try the below patch on top of v5.2-rc2? It can fix the panic
> for me.
>
Thanks! Amazing work! The patch fixes the issue completely. The system
worked at a high
On Mon, 22 Jul 2019 at 12:53, Huang, Ying wrote:
>
> Yes. This is quite complex. Is the transparent huge page enabled in
> your system? You can check the output of
>
> $ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
> And, whether is the swap device you use a SSD or
On Mon, 22 Jul 2019 at 06:37, huang ying wrote:
>
> I am trying to reproduce this bug. Can you give me some information
> about your test case?
It not easy, but I try to explain:
1. I have the system with 32Gb RAM, 64GB swap and after boot, I always
launch follow applications:
a. Google
On Mon, 17 Jun 2019 at 17:17, Vlastimil Babka wrote:
>
>
> You told bisect that 5.2-rc1 is good, but it probably isn't.
> What you probably need to do is:
> git bisect good v5.1
> git bisect bad v5.2-rc2
>
$ git bisect log
git bisect start
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux
On Mon, 17 Jun 2019 at 17:17, Vlastimil Babka wrote:
>
> That's commit "tcp: fix retrans timestamp on passive Fast Open" which is
> almost certainly not the culprit.
Yes, I seen also content of this commit.
And it looks like madness.
But I can proving that my bisect are properly created.
Here I
Regards,
Mike Gavrilov.
On Tue, 11 Jun 2019 at 08:59, Mikhail Gavrilov
wrote:
>
> On Wed, 29 May 2019 at 23:09, Michal Hocko wrote:
> >
> >
> > Do you see the same with 5.2-rc1 resp. 5.1?
>
> I can say with 100% certainty that kernel tag 5.1 is not affected by this bug.
On Wed, 29 May 2019 at 23:09, Michal Hocko wrote:
>
> Do you see the same with 5.2-rc1 resp. 5.1?
The problem still occurs at 5.2-rc3.
Unfortunately hard reproducible does not allow to make bisect.
Any ideas what is wrong?
--
Best Regards,
Mike Gavrilov.
On Wed, 29 May 2019 at 23:09, Michal Hocko wrote:
>
> On Wed 29-05-19 22:32:08, Mikhail Gavrilov wrote:
> > On Wed, 29 May 2019 at 09:05, Mikhail Gavrilov
> > wrote:
> > >
> > > Hi folks.
> > > I am observed kernel panic after update to git ta
On Wed, 29 May 2019 at 09:05, Mikhail Gavrilov
wrote:
>
> Hi folks.
> I am observed kernel panic after update to git tag 5.2-rc2.
> This crash happens at memory pressing when swap being used.
>
> Unfortunately in journalctl saved only this:
>
Now I captured better trace.
:
Hi folks.
I am observed kernel panic after update to git tag 5.2-rc2.
This crash happens at memory pressing when swap being used.
Unfortunately in journalctl saved only this:
May 29 08:02:02 localhost.localdomain kernel: page:e9095823
refcount:1 mapcount:1 mapping:8f3ffeb36949
On Mon, 27 May 2019 at 21:16, Mikhail Gavrilov
wrote:
>
> I am bisected issue. I hope it help understand what is happened on my
> computer.
>
> $ git bisect log
> git bisect start
> # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux
On Sat, 18 May 2019 at 16:07, Mikhail Gavrilov
wrote:
>
> It happens today again.
>
> [18018.969636] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908:
> inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent:
> invalid extent entries - magic f30a, entries 8, max 3
Hi folks.
Yesterday I updated kernel to 5.2 (git commit 7e9890a3500d)
I always leave computer working at night.
Today at morning I am found that computer are hanged.
I was connect via ssh and look at kernel log.
There I had seen strange records which I never seen before:
[28616.429757] EXT4-fs
On Mon, 15 Apr 2019 at 01:07, Mikhail Gavrilov
wrote:
>
>
> Thanks, with this patch problem was gone.
> We have time land it in 5.1?
>
I received automated email with follow content:
> [This is an automated email]
>
> This commit has been processed because it contains
On Sun, 14 Apr 2019 at 22:51, Thomas Gleixner wrote:
> Because mails fall through the cracks occasionally.
>
> Does the patch below cure your problem?
>
> Thanks,
>
> tglx
>
> 8<--
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -426,6 +426,8 @@
at arch/x86/kernel/process.c:383
(inlined by) __speculation_ctrl_update at arch/x86/kernel/process.c:439
(inlined by) speculation_ctrl_update at arch/x86/kernel/process.c:482
--
Best Regards,
Mike Gavrilov.
On Sat, 2 Feb 2019 at 23:33, Mikhail Gavrilov
wrote:
>
> Hi folks.
> I at
On Wed, 30 Jan 2019 at 01:24, Michal Hocko wrote:
> I do not think so. I plan to repost tomorrow with the updated changelog
> and gathered review and tested-by tags. Can I assume yours as well?
Sure
--
Best Regards,
Mike Gavrilov.
> Linus, could you take the revert please?
>
> From 817b18d3db36a6900ca9043af8c1416c56358be3 Mon Sep 17 00:00:00 2001
> From: Michal Hocko
> Date: Fri, 25 Jan 2019 19:08:58 +0100
> Subject: [PATCH] Revert "mm, memory_hotplug: initialize struct pages for the
> full memory section"
>
> This
64 matches
Mail list logo