[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-24 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-05-24 18:16 EDT--- (In reply to comment #259) > In bug #167562, Canonical reports that these fixes have been put in > bionic-proposed (assumed to mean linux-image-4.15.0-23-generic). We need to > test this ASAP in order to prevent the patches from b

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-24 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-24 14:37 EDT--- In bug #167562, Canonical reports that these fixes have been put in bionic-proposed (assumed to mean linux-image-4.15.0-23-generic). We need to test this ASAP in order to prevent the patches from being reverted. Can we get the

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-21 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-21 13:20 EDT--- *** Bug 168018 has been marked as a duplicate of this bug. *** -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1762844 T

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-11 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-02 14:39 EDT--- The SAN incident in the previous dmesg log shows only a single port (WWPN) glitching. The logs from panics showed two ports glitching at the same time. Also, this incident did not show the port logging back in for about 8 minute

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-11 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-11 12:12 EDT--- Some information coming in on the SAN where this reproduces. It appears that there is some undesirable configuration, where fast switches are backed by slower switches between host and disks. The current theory is that other ac

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-10 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-10 14:13 EDT--- Being able to reproduce this on ltc-boston113 seems to have been a temporary condition. I can no longer reproduce there, Pegas or Ubuntu. Without some idea of what external conditions are causing this, it will be very difficult

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-10 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-10 12:59 EDT--- I have had some luck reproducing this, on ltc-boston113 (previously unable to reproduce there). I had altered the boot parameters to remove "quiet splash" and added "qla2xxx.logging=0x1e40", and got the kworker panic during

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-09 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-09 11:34 EDT--- There was a period of SAN instability observed on boslcp1 this morning, at about May 9 05:01:28 to 05:51:56. This involved 2 ports simultaneously handling relogins. This was a Pegas kernel that should be susceptible to the pan

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-08 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-08 12:09 EDT--- It appears that there were some SAN incidents yesterday on boslcp3, approx. times were May 7 12:44:54 through 14:28:17. All were for one port, so not exactly the situation I think caused the panic. If we could correlate these S

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-07 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-07 12:10 EDT--- Of the "boslcp" systems, only 3 appear to have QLogic adapters. Of those, one has been running without the extended error logging and so collected no data, and one has been down (or non-functional) for about 36 hours. Of the dat

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-05 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-05 13:23 EDT--- The boslcp6 logs look characteristic of the qla2xxx issue (panic in process_one_work()). Don't have detailed qla2xxx logging so can't determine SAN disposition. -- You received this bug notification because you are a member of

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-05 Thread bugproxy
--- Comment From cdead...@us.ibm.com 2018-05-05 10:31 EDT--- Yesterday, the decision was made at Padma's daily KVM meeting to only track System Firmware Mustfix issues using the LC GA1 Mustfix label since that is all that applies to the Supermicro team. The OS Kernel/KVM issues will be ma

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-04 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-05-04 11:10 EDT--- We could not able to install 'sar' package due to 166588 prior patch. And also 'xfs' was being used on the system from the prior run. To overcome both, we planned fresh installation . Installed latest ubutnu1804 kernel(4.

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-03 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-03 08:20 EDT--- There were a large number of SAN incidents in the evening, although none involved two ports at the same time. Still, many involved relogin while the logout was still being processed - so there is some confidence that the patches

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-03 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-05-03 03:22 EDT--- boslcp3 host console dumps messages related to qlogic driver. Latest tee logs for boslcp3 host : kte111.isst.aus.stglabs.ibm.com 9.3.111.155 [kte/don2rry] kte111:/LOGS/boslcp3-host-may1.txt [ipjoga@kte (AUS) ~]$ ls -l /LOG

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-02 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-02 14:39 EDT--- The SAN incident in the previous dmesg log shows only a single port (WWPN) glitching. The logs from panics showed two ports glitching at the same time. Also, this incident did not show the port logging back in for about 8 minute

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-05-02 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-05-02 14:27 EDT--- Unfortunately, the current test run was executed without "dmesg -n debug" so the captured console output has no value. I corrected that, and so future console output should have what we need. The good news is that the dmesg buf

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-27 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-27 08:14 EDT--- (In reply to comment #211) > - We have decided to replace the qlogic by Emulex. > - Apply the new kernel patch in 208. > - add the slub_debug=FZPU > System is up with latest kernel and ready now. > root@boslcp3:~# uname -a > L

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-27 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-27 07:33 EDT--- (In reply to comment #211) > - We have decided to replace the qlogic by Emulex. > - Apply the new kernel patch in 208. > - add the slub_debug=FZPU > System is up with latest kernel and ready now. > root@boslcp3:~# uname -a > Linux

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-04-26 20:38 EDT--- - We have decided to replace the qlogic by Emulex. - Apply the new kernel patch in 208. - add the slub_debug=FZPU System is up with latest kernel and ready now. root@boslcp3:~# uname -a Linux boslcp3 4.15.0-20-generic #21+bug16658

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From mauri...@br.ibm.com 2018-04-26 13:57 EDT--- The skiroot kernel build is available at: http://dorno.rch.stglabs.ibm.com/~mauricfo/kernel/skiroot/bz166588/zImage.epapr_4.15.14-openpower1.bz166588c132 (In reply to comment #200) > Dwip and I talked, and we don't feel there is

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-26 10:58 EDT--- I took a quick look at the crash stacks mentioned in c191-c193. Since we don't have a debug kernel for "4.15.0-15-generic #16+bug166588" I just looked at the stacks. From that it seems reasonable to draw the conclusion that these

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-26 10:48 EDT--- The crashdumps that were collected are for a different/custom kernel. That kernel was built using the same name as the stock Ubuntu kernel, which causes more confusion. We need to have the dbgsym version of the kernel to analyze

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From mauri...@br.ibm.com 2018-04-26 10:22 EDT--- (In reply to comment #194) > 3) Mauricio will use the information in [2] above to rebuild the Skiroot in > Bug 167103 comment 22, but with Dwip's patch replaced by the patches in [2]. > In other words, a Skiroot with the tlbie fix

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-26 10:12 EDT--- I am attaching the FOUR commits identified before: === commit d8630bb95f46ea118dede63bd75533faa64f9612 Author: Quinn Tran Date: Thu Dec 28 12:33:43 2017 -0800 commit 9cd883f07a54e5301d5

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-26 09:59 EDT--- In the KVM Scrum discussion today, it was decided that: 1) Doug will jump on boslcp3 and reboot (multiple times if needed) in an attempt to reproduce the PETITBOOT issue described in comment 191 (process_one_work crash). Once in th

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-26 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-26 04:18 EDT--- boslcp3 hit with 166588 again. It was running with Guestavo's Patch mentioned in c151. The system was running from last 40 hours, but we observed slowness during y'day evening. Today morning it was not reachable and noticed

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-25 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-25 17:09 EDT--- I have been trying to reproduce this using portdisable/portenable on the FC switch. So far, no problem seen. I made some runs with extra qla2xxx debug logging, and see the timing is not quite the same as seen on the fabric conn

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-25 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-04-25 16:45 EDT--- (In reply to comment #181) > boslcp3 host is running with IO run on qlogic disks & stress-ng IO class > & 2 guests are running 30+ hours of stress run. boslcp3g4 guest is facing > out of network issue( updated bug#165570- c41 for

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-25 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-25 10:20 EDT--- I was not able to find any suspect tasks in the 04/18 crashdump, aside what Dwip already mentioned. I found 3 tasks that were in __queue_work(), but all those target pools were currently empty so they did not exhibit the problem

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-25 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-25 09:04 EDT--- (In reply to comment #181) > boslcp3 host is running with IO run on qlogic disks & stress-ng IO class > & 2 guests are running 30+ hours of stress run. boslcp3g4 guest is facing > out of network issue( updated bug#165570- c41 for gu

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-25 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-25 07:39 EDT--- boslcp3 host is running with IO run on qlogic disks & stress-ng IO class & 2 guests are running 30+ hours of stress run. boslcp3g4 guest is facing out of network issue( updated bug#165570- c41 for guest out of network issue)

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-24 23:51 EDT--- (In reply to comment #174) > (In reply to comment #85) > > Copied the dump to our kte server > > > > kte111.isst.aus.stglabs.ibm.com 9.3.111.155 [kte/don2rry] > > > > kte111:/LOGS/boslcp3/BZ166588/ > > > > h# ls -l /LOGS/bosl

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-24 13:48 EDT--- I was able to compile the upstream qla2xxx driver version 10.00.00.04-k (commit 1d1db6a3ca32ad52e97ed42d5c005d49fda7b589) under Ubuntu kernel 4.15.0-15-generic without errors or warnings. I have not tried it yet, but also can't

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-24 12:50 EDT--- My attempts at running stress-ng on ltc-boston1 don't seem to use the QLogic disks. Are there options or config files needed to get it to stress certain disks? -- You received this bug notification because you are a member of

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-24 11:27 EDT--- (In reply to comment #85) > Copied the dump to our kte server > > kte111.isst.aus.stglabs.ibm.com 9.3.111.155 [kte/don2rry] > > kte111:/LOGS/boslcp3/BZ166588/ > > h# ls -l /LOGS/boslcp3/BZ166588/ > total 4 > drwxr-xr-x 2 root roo

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-24 11:06 EDT--- I'm not up to speed on the double-free issue, but if multiple work queues/pools referenced the same work item, you could get a double free situation. Essentially, the qla2xxx driver doing the double (triple, ...) insertion of a

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-24 10:52 EDT--- Obviously this corruption happened a while ago. I poked around a bit to see if there is any smoking gun around but nothing that meets the eye. Since we have been seeing all this in the context of other qla2xxx issues (where ther

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-24 10:42 EDT--- (In reply to comment #166) > this point, I don't see a connection to KVM or even Ubuntu vs. Pegas. This > appears to be something that will happen in any distro that has the right > vintage of qla2xxx driver. Not sure why we think t

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-24 10:39 EDT--- Doug, I somehow missed that note about the dump. It is on boslcp3 (root/don2rry): /var/crash/201804181042 . I believe they may have mirrored it to some other location as well (I thought I saw a note about that, somewhere in this

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-24 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-24 10:09 EDT--- The two commits that Dwip mentions look very pertinent. There may be others, though, as there appears that a fair amount of work has been done in this area. I still haven't gotten access to the dump(s), but another issue is tha

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-24 00:12 EDT--- While at it, please pull in the following commit as well (whenever the next composite test kernel is being built)... ### commit eaf75d1815dad230dac2f1e8f1dc0349b2d50071 Author: Quinn Tran

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-23 23:49 EDT--- I decided to take a look at qla2xxx driver's free and delete paths a little more since my gut feeling was that these kinds of issues have to be encountered by others too. Looking a little deeper I discovered these: (Note this was

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-04-23 17:35 EDT--- (In reply to comment #154) > (In reply to comment #153) > Chanh, please also clarify the steps your're using on your test. We have a > Dev P9 system ready to start reproducing/debugging this (comment 144), we > need direction to h

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-23 16:14 EDT--- Regarding the "init" condition of the work item in the crash analysis, besides INIT_WORK() this condition would also be present after using list_del_init(), which is done just prior to executing the work function. So, if multipl

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-23 16:04 EDT--- Looking closer at the logs of the crash in comment #81, I see that there are 3 calls into qlt_unreg_sess(), for the same port, in a span of less than 2 seconds. Between the time that the first instance of qlt_free_session_done()

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-23 15:51 EDT--- (In reply to comment #153) > Current status of boslcp3 for record here: > root@boslcp3:~# uname -a > Linux boslcp3 4.15.15tst1 #4 SMP Sat Apr 21 16:57:31 CDT 2018 ppc64le > ppc64le ppc64le GNU/Linux > root@boslcp3:~# uptime > 14:32

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-23 15:30 EDT--- Chanh, please provide results of the testing with kernel in comment 151 --- Comment From chngu...@us.ibm.com 2018-04-23 15:35 EDT--- Current status of boslcp3 for record here: root@boslcp3:~# uname -a Linux boslcp3 4.15.15t

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-23 09:58 EDT--- Regarding rcu_sched stalls, or any other manifestation of hangs, having a work item in this condition (next, prev point to itself) on an active worklist would effectively cause a linked-list loop. When a kworker thread reaches t

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-23 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-23 09:46 EDT--- Latest update on boslcp3 boslcp3g1, boslcp3g4 guests are up & running for 32 hours without any hang/crash. boslcp3g3 guest run went fine for 24 hours but after that seen "nfs: server 10.33.11.31 not responding, timed out" m

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-22 Thread bugproxy
--- Comment From cha...@us.ibm.com 2018-04-22 11:52 EDT--- *** Bug 167045 has been marked as a duplicate of this bug. *** -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1762844 Tit

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-22 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-22 11:43 EDT--- I think the fact that two threads are in qlt_free_session_done() for the same fcport at the same time is definitely a problem. I'm not sure how two different kworker pools could contain the same work item. If the same work item

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-22 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-22 10:12 EDT--- (In reply to comment #130) > (In reply to comment #128) > > Machine still up... > > > > root@boslcp3:~# virsh list > > IdName State > > > > 8

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-22 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-22 09:45 EDT--- In response to Klaus #124 ... This change is not related to anything before ...165988 or the reverted stuff. The fix only relates to the Qlogic adapter's delete/free work handling flow for session unregistration -it attempts to

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-22 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-22 09:36 EDT--- Machine still up... root@boslcp3:~# virsh list IdName State 8 boslcp3g4 running 14boslcp3g3 runn

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-22 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-22 08:26 EDT--- (In reply to comment #121) > The kernel mentioned in #114 was a quick, rough attempt to force one instance > of free_work/del_work pending at a time. > > With that, the machine still seems to be up. > > root@boslcp3:~# uptime > 22:

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-21 23:43 EDT--- The kernel mentioned in #114 was a quick, rough attempt to force one instance of free_work/del_work pending at a time. With that, the machine still seems to be up. root@boslcp3:~# uptime 22:21:50 up 3:41, 2 users, load averag

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-21 23:33 EDT--- Continuing from the description at #84... This is going to be long post. I debated putting it as as attachment but placing it in the main body will probably help in searching in the future. ==

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-21 21:30 EDT--- I gave a test kernel to Chanh to try out on boslcp3, based on the observations from the crash.(it has taken a while ...) - just a quick initial attempt. I will soon be posting the analysis. Meanwhile, boslcp3 still seems to be u

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-04-21 20:09 EDT--- Dwip provided a new kernel and we start test on 3 guests. root@boslcp3:~# date Sat Apr 21 19:00:07 CDT 2018 root@boslcp3:~# uptime 19:00:09 up 20 min, 3 users, load average: 0.01, 0.42, 0.48 root@boslcp3:~# uname -a Linux boslcp

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-21 19:17 EDT--- We need to see the console messages *before* things go wrong. Please capture the SOL console output from boot until this happens. -- You received this bug notification because you are a member of Kernel Packages, which is subsc

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-21 18:55 EDT--- (In reply to comment #80) > (In reply to comment #79) > > Machine still seems to be up... will check if I can observe anything > > interesting ... > > System just crashes it now. The vmcore is at /var/crash/201804181042 Can we retr

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-04-21 17:02 EDT--- Not sure what is going on. The SOL console print out all of these messages... rcu_sched self-detected stall on CPU [20705.652053] 95-: (1 GPs behind) idle=c72/2/0 softirq=179/180 fqs=2586003 [20705.652101] (t=5172329 jiffie

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-21 14:26 EDT--- I have ltc-boston1 setup with Ubuntu kernel 4.15.0-15, but there is no SAN connected to the QLE2742. I see no problem there right now. I have reserve the system for this bug until Monday evening. -- You received this bug notif

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-21 09:21 EDT--- Should we go back to the stock Ubuntu kernel in an attempt to identify if bug 167104 is a result of the custom kernel or the newest PNOR? --- Comment From prad...@us.ibm.com 2018-04-21 13:17 EDT--- (In reply to comment #10

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From chngu...@us.ibm.com 2018-04-20 16:48 EDT--- Boslcp3 is back with the new kernel from #94. root@boslcp3:~# cat /proc/cmdline root=UUID=bab108a0-d0a6-4609-87f1-6e33d0ad633c ro splash quiet crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M@128M I w

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-21 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-21 08:45 EDT--- The latest logs show a panic in process_one_work() on CPU 145, some sort of NULL pointer fault, followed by 2 CPUs (22, 125) getting a "Bad interrupt in KVM entry/exit code, sig: 6" panic (possibly in response to the panic IPI).

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-20 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-21 02:01 EDT--- Updated boslcp3 with latest PNOR:0420 & restarted tests on guests with kernel '4.15.0-18-generic'. $ ./ipmis bmc-boslcp3 fru print 47 Product Name : OpenPOWER Firmware Product Version : open-power-SUPERMICRO-

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-20 Thread bugproxy
--- Comment From prad...@us.ibm.com 2018-04-21 01:53 EDT--- Looks like an Oops similar to the previous one in comment#39 starting a sequence of events root@boslcp3:~# [ 2837.030181] Unable to handle kernel paging request for data at address 0x0008 [ 2837.030253] Faulting instruction

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-20 Thread bugproxy
--- Comment From dnban...@us.ibm.com 2018-04-20 16:33 EDT--- Klaus, I am not aware of the particular tests being run. But I pinged Chanh so that he can start a new round of tests. However ... I do see that boslcp3 now has reverted to the prior kernel: Linux boslcp3 4.13.0-25-generic #29-U

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-20 Thread bugproxy
--- Comment From bjki...@us.ibm.com 2018-04-20 15:11 EDT--- Below is a test kernel with the four QLogic commits that were added to the 4.15.0-15.16 kernel reverted, plus the patch from 166877. Please run this and update the bug if the crash is still seen. https://ibm.ent.box.com/s/n29ure

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-20 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-20 15:39 EDT--- Padma is reporting that the boslcp3 is available. Dwip, I think Indira won't be available at this time of the day. Can you jump in and try to reproduce with the debug kernel in comment #94 above? Thanks -- You received this bug

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-19 Thread bugproxy
--- Comment From rajanikanth...@in.ibm.com 2018-04-19 03:51 EDT--- Copied the dump to our kte server kte111.isst.aus.stglabs.ibm.com 9.3.111.155 [kte/don2rry] kte111:/LOGS/boslcp3/BZ166588/ h# ls -l /LOGS/boslcp3/BZ166588/ total 4 drwxr-xr-x 2 root root 4096 Apr 19 02:42 201804181042 T

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-18 Thread bugproxy
--- Comment From kla...@br.ibm.com 2018-04-18 12:27 EDT--- Nick made some interesting comments about lockups in LTC bug 166684, comment #24 about the hard lockup watchdog being added in Kernel 4.13. Also other comments about RCU stall warnings being too aggressive, but at least in this l

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-16 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-16 17:21 EDT--- We're waiting for a reproduce and a kdump. Also more logs, including firmware logs/eSELs/etc. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bu

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-15 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-16 01:24 EDT--- (In reply to comment #54) > Please collect the dmesg log and a crashdump. Collected dl logs from xmon prompt & unable to take crashdump from xmon prompt ,we have bug#10 opened. Regards, Indira -- You received this bug

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-15 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-15 16:36 EDT--- Please collect the dmesg log and a crashdump. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1762844 Title: ISST-LTE:

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-13 Thread bugproxy
--- Comment From dougm...@us.ibm.com 2018-04-13 14:51 EDT--- I believe that the "1" in c000200e5848b701 is a flag. The address actually used will be c000200e5848b700. The flags PAGE_MAPPING_ANON and/or PAGE_MAPPING_MOVABLE are added to page addresses, and are stripped of before dereferen

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-13 Thread bugproxy
--- Comment From bjki...@us.ibm.com 2018-04-13 14:24 EDT--- Dwip - excellent suggestion, I agree with your suggestion on next steps. If this is a double free we need to catch that earlier than where we are crashing. -- You received this bug notification because you are a member of Kernel

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-12 Thread bugproxy
--- Comment From indira.pr...@in.ibm.com 2018-04-12 11:10 EDT--- Hi, Today i have tried rebooting boslcp3 system and crash issue recreated. For first attempt, after rebooting host it booted with latest kernel & i have attempted disable stop4, 5 commands then it immediately crashed & enter

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-11 Thread bugproxy
--- Comment From chetj...@in.ibm.com 2018-04-12 01:24 EDT--- (In reply to comment #18) > Can you see if the bug happens with and of these mainline kernels? We can > perform a kernel bisect if we can narrow down to the last good kernel > version and first bad one: > > v4.14 Final: http://ke

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-11 Thread bugproxy
--- Comment From chetj...@in.ibm.com 2018-04-11 03:14 EDT--- (In reply to comment #16) > Can you test again on a third system? > Can this be a hw problem on the first system? No. This cannot he an hardware issue, since we are running fine on the same system from last 4 months with multiple

[Kernel-packages] [Bug 1762844] Comment bridged from LTC Bugzilla

2018-04-10 Thread bugproxy
--- Comment From cha...@us.ibm.com 2018-04-10 21:32 EDT--- According to test they have another bostonLC (boslcp4) and they did update to this kernel and system is booting up normally. root@boslcp4:~# uname -a Linux boslcp4 4.15.0-15-generic #16-Ubuntu SMP Wed Apr 4 13:57:51 UTC 2018 ppc64