Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-09-01 Thread 'Sebastian Andrzej Siewior'
On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote: > Hi Sebastian, Hi Thomas, > any progress on your side? > > Do you think the patch could be applied for the next versions? Yes. The ->active_mm change needs to be protected against scheduling regardless of the arch/mmu. Otherwise the mm wi

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-08-19 Thread 'Sebastian Andrzej Siewior'
On 2020-08-12 14:45:22 [+0200], Thomas Graziadei wrote: > Hi Sebastian, Hi Thomas, > any progress on your side? due to lack of time none. But I am on it… > Do you think the patch could be applied for the next versions? So I had a theory why it happens but then you said no so now I need to figur

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-08-12 Thread Thomas Graziadei
om>; Thomas Graziadei < > thomas.grazia...@omicronenergy.com>; Thomas Gleixner < > t...@linutronix.de>; linux-kernel@vger.kernel.org; > rost...@goodmis.org > Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 > and PowerPC e500 > > On 2020-05-29 18:3

RE: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-07-10 Thread Thomas Graziadei
goodmis.org Subject: Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500 On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote: > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > > In order to get it back into the RT queue I need to understand why >

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-07-06 Thread Sebastian Andrzej Siewior
On 2020-05-29 18:37:22 [+0200], To Mark Marshall wrote: > On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > > In order to get it back into the RT queue I need to understand why it is > > required. What exactly is it fixing. Let me stare at for a little… > > it used to be local_irq_disable(

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-05-29 Thread Mark Marshall
My config is attached. This is the greatly reduced config that I used when trying to narrow down the problem. We normally have much more enabled, but that had no effect on the bug in my testing. We do, unfortunately, have quite a few out-of-tree patches, but they are all in USB or Networking, wh

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-05-29 Thread Sebastian Andrzej Siewior
On 2020-05-29 18:15:18 [+0200], To Mark Marshall wrote: > In order to get it back into the RT queue I need to understand why it is > required. What exactly is it fixing. Let me stare at for a little… it used to be local_irq_disable() which then became preempt_disable() local_irq_disable() due to A

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-05-29 Thread Sebastian Andrzej Siewior
On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote: > Hi Sebastian & list, Hi, > I had assumed that my e-mail had got lost or overlooked, I was meaning to > post a follow up message this week... > > All I could find from the debugging and tracing that we added was that > something was going wron

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-05-29 Thread Mark Marshall
Hi Sebastian & list, I had assumed that my e-mail had got lost or overlooked, I was meaning to post a follow up message this week... All I could find from the debugging and tracing that we added was that something was going wrong with the mm data structures somewhere in the exec code. In the end

Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500

2020-05-29 Thread Sebastian Andrzej Siewior
On 2020-05-04 11:40:08 [+0200], Mark Marshall wrote: > The easiest way we have found to reproduce the crash is to repeatedly > insert and then remove a module. The crash then appears to be related > to either paging in the module or in exiting the mdev process. (The > crash does also happen at ot

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-27 Thread Herton R. Krzesinski
On Wed, Jun 27, 2018 at 11:06:14AM -0400, Pavel Tatashin wrote: > On Wed, Jun 27, 2018 at 10:15 AM Herton R. Krzesinski > wrote: > > Thanks, I'll try it. It'll probably work since I tried memory_reserve > > but "hacking" it at e820__register_nosave_regions, anyway I'll confirm > > it here. > > I

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-27 Thread Pavel Tatashin
On Wed, Jun 27, 2018 at 10:15 AM Herton R. Krzesinski wrote: > Thanks, I'll try it. It'll probably work since I tried memory_reserve > but "hacking" it at e820__register_nosave_regions, anyway I'll confirm > it here. If it works, please send the new "Reserved but unavailable" value from dmesg: [

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-27 Thread Herton R. Krzesinski
On Wed, Jun 27, 2018 at 09:14:21AM -0400, Pavel Tatashin wrote: > It does look similar to what Naoya was seeing. Herton, have you tried > applying: > > http://ozlabs.org/~akpm/mmots/broken-out/x86-e820-put-e820_type_ram-regions-into-memblockreserved.patch Thanks, I'll try it. It'll probably work

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-27 Thread Pavel Tatashin
It does look similar to what Naoya was seeing. Herton, have you tried applying: http://ozlabs.org/~akpm/mmots/broken-out/x86-e820-put-e820_type_ram-regions-into-memblockreserved.patch Thank you, Pavel On Wed, Jun 27, 2018 at 3:35 AM Michal Hocko wrote: > > This smells like an issue Naoya was see

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-27 Thread Michal Hocko
This smells like an issue Naoya was seeing I guess (Cced) On Tue 26-06-18 17:41:52, Herton R. Krzesinski wrote: > Hi, > > While running proc01 test from ltp, or as I later found out if you just read > kpagecount ("cat /proc/kpagecount > /dev/null"), I started to get the > following > oops on la

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-26 Thread Herton R. Krzesinski
On Tue, Jun 26, 2018 at 04:50:27PM -0400, Pavel Tatashin wrote: > Herton, > > Thank you for analysis, could you please attach the config that you are using. Sure, the config file is attached. > > Thank you, > Pavel > On Tue, Jun 26, 2018 at 4:42 PM Herton R. Krzesinski > wrote: > > > > Hi, >

Re: Kernel crash after "mm: initialize pages on demand during boot"

2018-06-26 Thread Pavel Tatashin
Herton, Thank you for analysis, could you please attach the config that you are using. Thank you, Pavel On Tue, Jun 26, 2018 at 4:42 PM Herton R. Krzesinski wrote: > > Hi, > > While running proc01 test from ltp, or as I later found out if you just read > kpagecount ("cat /proc/kpagecount > /dev

Re: Kernel crash in free_pipe_info()

2017-11-10 Thread Linus Torvalds
On Thu, Nov 9, 2017 at 10:07 PM, Simon Brewer wrote: > > This looks familiar... > > https://github.com/moby/moby/issues/34472 > > From the bug report: > "In particular, it looks like either docker-containerd or > docker-containerd-shim (the log is cut off) has a pipe open that is > causing a kerne

Re: Kernel crash in free_pipe_info()

2017-11-10 Thread Cong Wang
Hi, Simon On Thu, Nov 9, 2017 at 10:07 PM, Simon Brewer wrote: > This looks familiar... > > https://github.com/moby/moby/issues/34472 > > From the bug report: > "In particular, it looks like either docker-containerd or > docker-containerd-shim (the log is cut off) has a pipe open that is > causin

Re: Kernel crash in free_pipe_info()

2017-11-09 Thread Simon Brewer
On 1 November 2017 at 14:19, Cong Wang wrote: > On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds > wrote: >> On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang wrote: >>> >>> 1. The faulty addresses are all near 0001, with one exception >>> of null (which is the most recent one) >> >> Well, t

Re: Kernel crash in free_pipe_info()

2017-10-31 Thread Cong Wang
On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang wrote: >> >> 1. The faulty addresses are all near 0001, with one exception >> of null (which is the most recent one) > > Well, they're at 8(%rax), except for that last case. > > And in

Re: Kernel crash in free_pipe_info()

2017-10-31 Thread Linus Torvalds
On Mon, Oct 30, 2017 at 9:44 PM, Al Viro wrote: > On Mon, Oct 30, 2017 at 07:08:46PM -0700, Linus Torvalds wrote: >> >> Well, they're at 8(%rax), except for that last case. > > 0x10(%rax)? Duh, yes. >> Except the offset is that %r12*0x28+0x10, so we're talking a byte >> offset of 330 bytes into

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Al Viro
On Mon, Oct 30, 2017 at 08:06:23PM -0700, Linus Torvalds wrote: > We do that "free_pipe_info(inode->i_pipe);", but we never actually > clear inode->i_pipe, so now we have an inode that looks like a pipe > inode, and has a stale pointer to a pipe_inode_info. > > It all looks technically correct. I

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Al Viro
On Mon, Oct 30, 2017 at 07:08:46PM -0700, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang wrote: > > > > 1. The faulty addresses are all near 0001, with one exception > > of null (which is the most recent one) > > Well, they're at 8(%rax), except for that last case.

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Linus Torvalds
On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds wrote: > > I'm not seeing anything that makes sense. I'll have to think about this. Al, would you mind taking a look at the error handling in create_pipe_files(). In particular, look here: - we start out allocating the inode with "get_pipe_inode(

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Linus Torvalds
On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang wrote: > > 1. The faulty addresses are all near 0001, with one exception > of null (which is the most recent one) Well, they're at 8(%rax), except for that last case. And in every case (_including_ that last case), %rax has a very interestin

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Cong Wang
On Mon, Oct 30, 2017 at 3:14 PM, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 1:58 PM, Cong Wang wrote: >> >> We got more than a dozen of kernel crashes at free_pipe_info() on our >> 4.1 kernel, they are all very similar to this one (with slightly >> different faulty addresses): > > Were it no

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Cong Wang
On Mon, Oct 30, 2017 at 3:26 PM, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 3:14 PM, Linus Torvalds > wrote: >> On Mon, Oct 30, 2017 at 1:58 PM, Cong Wang wrote: >>> >>> We got more than a dozen of kernel crashes at free_pipe_info() on our >>> 4.1 kernel, they are all very similar to this o

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Linus Torvalds
On Mon, Oct 30, 2017 at 3:14 PM, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 1:58 PM, Cong Wang wrote: >> >> We got more than a dozen of kernel crashes at free_pipe_info() on our >> 4.1 kernel, they are all very similar to this one (with slightly >> different faulty addresses): > > Were it no

Re: Kernel crash in free_pipe_info()

2017-10-30 Thread Linus Torvalds
On Mon, Oct 30, 2017 at 1:58 PM, Cong Wang wrote: > > We got more than a dozen of kernel crashes at free_pipe_info() on our > 4.1 kernel, they are all very similar to this one (with slightly > different faulty addresses): Were it not for the pointer to the much more recent powerpc version at

Re: Kernel crash at boot time or reboot on ARM64/Hikey

2017-03-18 Thread Kirill A. Shutemov
On Thu, Mar 16, 2017 at 02:33:40PM +0100, Daniel Lezcano wrote: > On Wed, Mar 15, 2017 at 02:42:50PM -0700, Linus Torvalds wrote: > > Should be fixed in current Git already.. > > > > The v4.11-rc2 still has the issue. Have you tried actual master? There's fix after -rc2. -- Kirill A. Shutemov

Re: Kernel crash at boot time or reboot on ARM64/Hikey

2017-03-16 Thread Daniel Lezcano
On Thu, Mar 16, 2017 at 07:10:24AM -0700, Linus Torvalds wrote: > On Mar 16, 2017 6:33 AM, "Daniel Lezcano" wrote: > > On Wed, Mar 15, 2017 at 02:42:50PM -0700, Linus Torvalds wrote: > > Should be fixed in current Git already.. > > > > The v4.11-rc2 still has the issue. > > > Yes. The fix is i

Re: Kernel crash at boot time or reboot on ARM64/Hikey

2017-03-16 Thread Daniel Lezcano
On Wed, Mar 15, 2017 at 02:42:50PM -0700, Linus Torvalds wrote: > Should be fixed in current Git already.. > The v4.11-rc2 still has the issue. At appears at boot time or when rebooting and spits a lot of traces: [ ... ] ** 2467 printk messages dropped ** [ 11.218861] 8540: 8000223e8550

Re: Kernel crash at boot time or reboot on ARM64/Hikey

2017-03-15 Thread Daniel Lezcano
On Wed, Mar 15, 2017 at 02:42:50PM -0700, Linus Torvalds wrote: > Should be fixed in current Git already.. Ok, thanks. -- Daniel

Re: Kernel crash on startup - bisected to commit 3b24d854cb35

2016-04-11 Thread Kalle Valo
Larry Finger writes: >> Can you double check you have this fix ? >> >> commit 8501786929de4616b10b8059ad97abd304a7dddf >> Author: Eric Dumazet >> Date: Wed Apr 6 22:07:34 2016 -0700 >> >> tcp/dccp: fix inet_reuseport_add_sock() >> >> David Ahern reported panics in __inet_hash() cause

Re: Kernel crash on startup - bisected to commit 3b24d854cb35

2016-04-09 Thread Larry Finger
On 04/09/2016 12:33 AM, Eric Dumazet wrote: On Fri, Apr 8, 2016 at 10:28 PM, Larry Finger wrote: Following a recent pull of the wireless-drivers-next repo. my system got a kernel panic on startup at native_apic_msr_write+0x27. The problem was bisected to commit 3b24d854cb35 ("tcp/dccp: do not t

Re: Kernel crash on startup - bisected to commit 3b24d854cb35

2016-04-08 Thread Eric Dumazet
On Fri, Apr 8, 2016 at 10:28 PM, Larry Finger wrote: > Following a recent pull of the wireless-drivers-next repo. my system got a > kernel panic on startup at native_apic_msr_write+0x27. The problem was > bisected to commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt > under synflood"

Re: Kernel crash with "xen: avoid early crash of memory limited dom0"

2015-09-04 Thread Juergen Gross
On 09/04/2015 11:48 AM, Roger Pau Monné wrote: Hello, The following commit: commit fa84f27e21200dbe7d18c21af424d1de703b7567 Author: Juergen Gross Date: Wed Aug 19 18:52:34 2015 +0200 xen: avoid early crash of memory limited dom0 Which is queued in: git://git.kernel.org/pub/scm/linux/kerne

Re: kernel crash in bpf_jit on x86_64 when running nmap

2014-10-10 Thread Alexei Starovoitov
On Fri, Oct 10, 2014 at 1:44 PM, Darrick J. Wong wrote: > Hi everyone, > > I was running nmap on a x86_64 qemu guest and experienced the following crash: > > # nmap -sS -O -vvv 192.168.122.1 > Starting Nmap 6.40 ( http://nmap.org ) at 2014-10-10 13:14 PDT > Initiating ARP Ping Scan at 13:14 > Scan

Re: Kernel crash in cgroup_pidlist_destroy_work_fn()

2014-09-18 Thread Cong Wang
Hi, Zefan Thanks for your reply. You are right, vfs refcount should guarantee us there is no more file read before we destroy that cgroup. I thought there is somewhere else could release that cgroup refcount. Maybe I didn't make it clear, this bug is hardly reproducible, we only saw it once (oth

Re: Kernel crash in cgroup_pidlist_destroy_work_fn()

2014-09-17 Thread Li Zefan
On 2014/9/17 13:29, Li Zefan wrote: > On 2014/9/17 7:56, Cong Wang wrote: >> Hi, Tejun >> >> >> We saw some kernel null pointer dereference in >> cgroup_pidlist_destroy_work_fn(), more precisely at >> __mutex_lock_slowpath(), on 3.14. I can show you the full stack trace >> on request. >> > > Yes,

Re: Kernel crash in cgroup_pidlist_destroy_work_fn()

2014-09-16 Thread Li Zefan
On 2014/9/17 7:56, Cong Wang wrote: > Hi, Tejun > > > We saw some kernel null pointer dereference in > cgroup_pidlist_destroy_work_fn(), more precisely at > __mutex_lock_slowpath(), on 3.14. I can show you the full stack trace > on request. > Yes, please. > Looking at the code, it seems flush_

Re: Kernel crash with lspci -vv and SAS Controller LSI Logic / Symbios Logic MegaRAID SAS 9240

2012-07-19 Thread stepping stone GmbH
Dear Adam It did not work (exact same behaviour) with a newer kernel (3.5-rc7). The problem was the firmware. After a upgrade it works; with older kernels too. Great thanks and sorry to bother you (all). Greetings David Am 19.07.2012 02:39, schrieb adam radford: On 7/18/12, stepping stone

Re: Kernel crash with lspci -vv and SAS Controller LSI Logic / Symbios Logic MegaRAID SAS 9240

2012-07-18 Thread adam radford
On 7/18/12, stepping stone GmbH wrote: ... > > a tail of lspci -vv http://pastebin.com/kjh8ig9q > PCI Config reads from lspci -vvv don't go through the megaraid_sas driver itself. It looks like your system hung up while trying to do a PCI Config read of Capabilities 0xd0: VPD (Vital Product Data

Re: Kernel crash in 2.6.21

2008-01-29 Thread Kristoffer Ericson
On Tue, 29 Jan 2008 12:38:43 -0500 Jerry Geis <[EMAIL PROTECTED]> wrote: > Below is a kernel crash for 2.6.21 > > The kernel runs for a number of days/weeks and then the crash below. > I am not running X windows. Just a server. Without knowing much about the issue I would suggest you compile a f

Re: Kernel crash problem and Madwifi

2005-03-24 Thread Erik Mouw
On Thu, Mar 24, 2005 at 08:35:49PM +0530, govind raj wrote: > kernel version:2.4.29 > board :net4521 > wireless card : Atheros-5212 > diriver madwifi-cvs-current.tar.bz2(v 1.30 2005/02/22) According to the FAQ at http://www.mattfoster.clara.co.uk/madwifi-1.htm#2 , Madwifi is based on a binary-onl

Re: Kernel crash using NFSv3 on 2.4.4

2001-04-30 Thread Trond Myklebust
> " " == Steffen Persvold <[EMAIL PROTECTED]> writes: > Hi all, I have compiled a stock 2.4.4 kernel and applied SGI's > kdb patch v1.8. Most of the time this runs just fine, but one > time when I tried to copy a file from a NFS server I got a > kernel fault. Luckily it ju

Re: kernel crash

2001-04-13 Thread nak
Ok, I see this has started to interfere with the bug tracking process so I'll kill it now (I seriously thought it was already dead). Yep it was an april fools joke. for the record: 2 offers for help; 1 post asking me if I mounted a scratch monkey( heh); 1 email begging me to abandon t

Re: kernel crash

2001-04-12 Thread Russell King
On Thu, Apr 12, 2001 at 05:16:33PM +0200, Cyrille Ngalle wrote: > This is just to reinforce the message below. And why is it of interest to LKML? I can think if no one here who'd be interested in it. > This crash is ver easy to reproduce. > > Use bootldr (with the last patch from Nico) [it al

RE: kernel crash

2001-04-12 Thread Cyrille Ngalle
Hi all !! This is just to reinforce the message below. This crash is ver easy to reproduce. Use bootldr (with the last patch from Nico) [it also happens with Redboot] + Lunix 2.4.0 + patch rmk2 + diff rmk2-np2 + a ramdisk. Once logged in Linux, type these commands : SHELL> mknod /dev/dsp c 1

Re: kernel crash

2001-04-02 Thread Wayne . Brown
I hope you were using a scratch monkey... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Re: Kernel crash - reboot or hang

2001-03-08 Thread Keith Owens
On Thu, 8 Mar 2001 16:17:23 +0200, Mircea Damian <[EMAIL PROTECTED]> wrote: >I had two crashes with 2.4.2 and 2.4.2-pre2 on my local SMTP/POP3/SAMBA/WWW >server (once under some load and the second one - with 2.4.2-pre2 - while >it was almost idle). >Should I use kdb or just remote logging would

Re: Kernel crash during resync of raid5 on SMP

2001-03-08 Thread Neil Brown
On Thursday March 8, [EMAIL PROTECTED] wrote: > On Thu, 8 Mar 2001 08:55:28 +1100 (EST), Neil Brown wrote: > > >On Wednesday March 7, [EMAIL PROTECTED] wrote: > >> I run a Dual prozessor SMP system on 2.4.2-ac12 for a while > >> in degraded mode. Today I put in a new disk to switch to > >> full r

Re: Kernel crash - reboot or hang

2001-03-08 Thread Chris Mason
On Thursday, March 08, 2001 04:17:23 PM +0200 Mircea Damian <[EMAIL PROTECTED]> wrote: > > Hello, > > I NEED TO TRACE THIS!!! > > I had two crashes with 2.4.2 and 2.4.2-pre2 on my local > SMTP/POP3/SAMBA/WWW server (once under some load and the second one - > with 2.4.2-pre2 - while it was a

Re: Kernel crash during resync of raid5 on SMP

2001-03-08 Thread Otto Meier
On Thu, 8 Mar 2001 08:55:28 +1100 (EST), Neil Brown wrote: >On Wednesday March 7, [EMAIL PROTECTED] wrote: >> I run a Dual prozessor SMP system on 2.4.2-ac12 for a while >> in degraded mode. Today I put in a new disk to switch to >> full raid5 mode. Shortly after the command raidhotadd the >> s

Re: Kernel crash during resync of raid5 on SMP

2001-03-07 Thread Neil Brown
On Wednesday March 7, [EMAIL PROTECTED] wrote: > I run a Dual prozessor SMP system on 2.4.2-ac12 for a while > in degraded mode. Today I put in a new disk to switch to > full raid5 mode. Shortly after the command raidhotadd the > system crashed with the message lost interrupt on cpu1. Was there

Re: Kernel crash problem

2001-02-12 Thread Alan Cox
> appears can not be closed and I get a lot of data lost. > I have redhat 7;kernel 2.4.1,pentium II. > What can I do against that?? Does it do this with a 2.2 kernel ? And what hardware do you have - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a mes