Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-16 Thread Gautham R Shenoy
Hello Michael, On Mon, Jan 14, 2019 at 12:11:44PM -0600, Michael Bringmann wrote: > On 1/9/19 12:08 AM, Gautham R Shenoy wrote: > > > I did some testing during the holidays. Here are the observations: > > > > 1) With just your patch (without any additional debug patch), if I run > > DLPAR on

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-14 Thread Michael Bringmann
On 1/9/19 12:08 AM, Gautham R Shenoy wrote: > I did some testing during the holidays. Here are the observations: > > 1) With just your patch (without any additional debug patch), if I run > DLPAR on /off operations on a system that has SMT=off, I am able to > see a crash involving RTAS stack

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-08 Thread Gautham R Shenoy
Hello Thiago, Wish you a happy 2019! On Sat, Dec 08, 2018 at 12:40:52AM -0200, Thiago Jung Bauermann wrote: > > Gautham R Shenoy writes: > > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: > >> Sure. I will test the patch and report back. > > > > I added the following debug

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-11 Thread Michael Bringmann
Note from Scott Mayes on latest crash: Michael, Since the partition crashed, I was able to get the last .2 seconds worth of RTAS call trace leading up to the crash. Best I could tell from that bit of trace was that the removal of a processor involved the following steps: -- Call to stop-self

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-10 Thread Thiago Jung Bauermann
Hello Michael, Michael Bringmann writes: > I have asked Scott Mayes to take a look at one of these crashes from > the phyp side. I will let you know if he finds anything notable. Thanks! It might make sense to test whether booting with cede_offline=off makes the bug go away. One suspicion

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-10 Thread Michael Bringmann
I have asked Scott Mayes to take a look at one of these crashes from the phyp side. I will let you know if he finds anything notable. Michael On 12/07/2018 08:40 PM, Thiago Jung Bauermann wrote: > > Gautham R Shenoy writes: >> On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Thiago Jung Bauermann
Gautham R Shenoy writes: > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: >> Sure. I will test the patch and report back. > > I added the following debug patch on top of your patch, and after an > hour's run, the system crashed. Appending the log at the end. Thank you very

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Gautham R Shenoy
On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: > Hi Thiago, > > > Sure. I will test the patch and report back. I added the following debug patch on top of your patch, and after an hour's run, the system crashed. Appending the log at the end. I suppose we still need to

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Gautham R Shenoy
Hi Thiago, On Thu, Dec 06, 2018 at 03:28:17PM -0200, Thiago Jung Bauermann wrote: [..snip..] > > > I posted a similar patch last year, but I wasn't able to arrive at a > root cause analysis like you did: > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html Ah! Nice.

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-06 Thread Thiago Jung Bauermann
Hello Gautham, Gautham R. Shenoy writes: > From: "Gautham R. Shenoy" > > Currently running DLPAR offline/online operations in a loop on a > POWER9 system with SMT=off results in the following crash: > > [ 223.321032] cpu 112 (hwid 112) Ready to die... > [ 223.355963] Querying DEAD? cpu 113

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-06 Thread Thiago Jung Bauermann
Hello Gautham, Gautham R. Shenoy writes: > From: "Gautham R. Shenoy" > > Currently running DLPAR offline/online operations in a loop on a > POWER9 system with SMT=off results in the following crash: > > [ 223.321032] cpu 112 (hwid 112) Ready to die... > [ 223.355963] Querying DEAD? cpu 113