Public bug reported:

== Comment: #0 - Application Cdeadmin  - 2018-03-19 09:30:53 ==
== Comment: #1 - Application Cdeadmin <> - 2018-03-19 09:30:55 ==
== Comment: #2 - Application Cdeadmin <> - 2018-03-19 09:30:57 ==
------- Comment From brihh 2018-03-19 09:10:30 EDT -------
Needless to say, machine is pretty unusable. Needed for Performance testing for 
release.

== Comment: #3 - Application Cdeadmin <> - 2018-03-19 10:10:59 ==
------- Comment From vaibhav92 2018-03-19 10:06:17 EDT -------
@pridhiviraj Any idea who can look into this ?
------- Comment From dougmill-ibm 2018-03-19 10:09:59 EDT -------
If machine is locked-up and console is unresponsive, try collecting eSEL data 
from the BMC.

== Comment: #4 - Application Cdeadmin <c> - 2018-03-19 10:40:59 ==
------- Comment From pridhiviraj 2018-03-19 10:38:47 EDT -------
@brihh Can you use latest 03/15 PNOR and re-create the issue. And also before 
it hangs please collect OPAL and kernel logs. @vaibhav92 If it is re-creatable 
OPAL/EM team need to look at it.

== Comment: #5 - Application Cdeadmin <> - 2018-03-19 11:01:58 ==
------- Comment From vaibhav92 2018-03-19 10:41:53 EDT -------
Saw this in the kernel-log of the system:

[ 1247.404962] PM: suspend entry (s2idle)
[ 1247.404970] PM: Syncing filesystems ... done.

Looks like its getting suspended after 20 mins of inactivity. Looking at the 
/etc/systemd/logind.conf see IdleAction by default is 'ignore':
#IdleAction=ignore
#IdleActionSec=30min

But clearly someone is issuing a suspend to the system. So this probably need 
to be looked by the distro/Power-Management/EM team
------- Comment From megcurry 2018-03-19 10:43:14 EDT -------
Please advise re. Assignments and Labels that would get the right team working 
on this.....Mirror label gets a Bz opened, and that is necessary or at least 
useful for some of the LTC teams to look at things, right?
------- Comment From brihh 2018-03-19 10:47:45 EDT -------
odd about inactivity - i once had a test running for 20mins and it still froze:

`08:52:23 up 20 min,  4 users,  load average: 64.40, 98.94, 64.32`
------- Comment From brihh 2018-03-19 10:49:30 EDT -------
@pridhiviraj  where is the latest 3/15 PNOR that I can load?

```

== Comment: #8 - Application Cdeadmin <> - 2018-03-19 12:40:54 ==
------- Comment From pridhiviraj 2018-03-19 12:34:18 EDT -------
```
Mar 19 11:35:01 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry 
is Mon Mar 19 11:35:31 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Mar 19 11:35:01 p215n15 CRON[6464]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Mar 19 11:42:39 p215n15 systemd[1]: Starting Cleanup of Temporary Directories...
Mar 19 11:42:39 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry 
is Mon Mar 19 11:43:39 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Mar 19 11:42:40 p215n15 systemd-tmpfiles[6504]: 
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Mar 19 11:42:40 p215n15 systemd[1]: Started Cleanup of Temporary Directories.
Mar 19 11:45:01 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry 
is Mon Mar 19 11:46:01 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Mar 19 11:45:01 p215n15 CRON[6524]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Mar 19 11:47:39 p215n15 systemd[1]: Starting Message of the Day...
Mar 19 11:47:39 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry 
is Mon Mar 19 11:48:39 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Mar 19 11:47:40 p215n15 50-motd-news[6528]:  * Meltdown, Spectre and Ubuntu: 
What are the attack vectors,
Mar 19 11:47:40 p215n15 50-motd-news[6528]:    how the fixes work, and 
everything else you need to know
Mar 19 11:47:40 p215n15 50-motd-news[6528]:    - https://ubu.one/u2Know
Mar 19 11:47:40 p215n15 systemd[1]: Started Message of the Day.
Mar 19 11:48:02 p215n15 NetworkManager[4662]: <info>  [1521474482.7252] 
manager: sleep: sleep requested (sleeping: no  enabled: yes)
Mar 19 11:48:02 p215n15 NetworkManager[4662]: <info>  [1521474482.7259] 
manager: NetworkManager state is now ASLEEP
Mar 19 11:48:02 p215n15 gnome-shell[5538]: Screen lock is locked down, not 
locking
Mar 19 11:48:02 p215n15 whoopsie[5264]: [11:48:02] offline
Mar 19 11:48:02 p215n15 systemd[1]: Reached target Sleep.
Mar 19 11:48:02 p215n15 systemd[1]: Starting Suspend...
Mar 19 11:48:02 p215n15 systemd-sleep[6576]: Suspending system...
Mar 19 11:48:02 p215n15 kernel: [ 1246.906320] PM: suspend entry (s2idle)
```
@vaibhav92 You are right, from the above messages looks like system is 
suspending by some cron job i guess.

== Comment: #9 - Application Cdeadmin <> - 2018-03-19 13:30:57 ==
------- Comment From brihh 2018-03-19 13:28:36 EDT -------
@pridhiviraj interesting. know offhand how i can shut that off? seems a bit 
annoying .. :-)

== Comment: #11 - Application Cdeadmin <> - 2018-03-19 17:10:51 ==
------- Comment From mzipse 2018-03-19 16:52:11 EDT -------
At our daily defect call, it was suggested that we check to see if Opal-PRD is 
running, which is a prereq for the Firmware recovery to work properly.  
Opal-PRD is an app that should be part of the distro and should automatically 
be started.  Never-the-less, if you are want to check on it, here's the command 
to run at the OS..... 

sudo service opal-prd status

You'll need root authority to do this.

The output will look something like this.....
# sudo service opal-prd status
? opal-prd.service - OPAL PRD daemon
   Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Wed 2018-03-14 19:04:36 CDT; 4 days ago
     Docs: man:opal-prd(8)
 Main PID: 5085 (opal-prd)
    Tasks: 1
   CGroup: /system.slice/opal-prd.service
           ??5085 /usr/sbin/opal-prd --pnor /dev/mtd0

Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00d0] 1d000000 bd021c00 
0000bf02 1c000000    *................*
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00e0] c1021b00 0000c302 
1d000000 00030000    *................*
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00f0] 0000ff06 21465245 
51000206 16000000    *....!FREQ.......*
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0100] 5d08f600 00006008 
f6000000 6308f600    *].....`.....c...*
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0110] 00006608 f6000000 
6908f600 00006c08    *..f.....i.....l.*
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0120] f6000000 6f08f600 
00007208             *....o.....r.    *
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:OCC1 rsp status=0x00, 
length=0x01CC
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:rsp data: (up to 16 bytes)
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0000] 03100200 03000000 
00000000 00000000    *................*
Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:<<processOccError()

== Comment: #13 - Application Cdeadmin <> - 2018-03-19 17:30:51 ==
------- Comment From brihh 2018-03-19 17:22:06 EDT -------
@mzipse seems to be there and running:
```
root@p215n15:~# service opal-prd status
? opal-prd.service - OPAL PRD daemon
   Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Mon 2018-03-19 18:19:58 EDT; 1min 35s ago
     Docs: man:opal-prd(8)
 Main PID: 4648 (opal-prd)
    Tasks: 1 (limit: 27033)
   CGroup: /system.slice/opal-prd.service
           ??4648 /usr/sbin/opal-prd

Mar 19 18:19:58 p215n15 opal-prd[4648]: SCOM: read: chip 0x8, addr 0x8010a50, 
val 0x0, rc 0
Mar 19 18:19:58 p215n15 opal-prd[4648]: SCOM: read: chip 0x8, addr 0x8010a54, 
val 0x0, rc 0
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: PRDF:<<PRDF::noLock_initialize()
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: PRDF:<<PRDF::initialize()
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: 
ATTN_SLOW:I>Service::enableAttns() enter
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: 
ATTN_SLOW:I>Service::enableAttns() exit
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: 
ATTN_SLOW:I><<ATTN_RT::enableAttns rc: 0
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: calling get_ipoll_events
Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: enabling IPOLL events 
0x5b90000000000000
Mar 19 18:19:58 p215n15 opal-prd[4648]: FW: writing init message
```

== Comment: #14 - Application Cdeadmin <> - 2018-03-20 03:50:53 ==
------- Comment From vaibhav92 2018-03-20 03:47:08 EDT -------
Did some kernel tracing and it seems that someone is starting the 
systemd-suspend.service that invokes systemd-sleep which then forces a suspend 
by writing to /sys/power/state file. Further investigation is needed as to who 
is invoking the systemd-suspend.service. In the meantime you can issue command 
on the host that will disable the service and prevent system from getting 
suspended:
`systemctl mask systemd-suspend.service`


Actually, it is also possible that the presence of the suspend config indicates 
user error. If Ubuntu Server was installed, I don't think suspend should have 
been configured. Did someone install "desktop Ubuntu" (or some other non-server 
config) on a server?

Also, 18.04 should be used, not 17.10 (and never 17.04), on P9.

== Comment: #25 - Application Cdeadmin <> - 2018-03-23 00:20:53 ==
------- Comment From vaibhav92 2018-03-23 00:19:40 EDT -------
Hi @mzipse @stewart-ibm. AFAIK this issue is not related to CAPP or CAPI at 
all. @brian_horton had confirmed that he was seeing this even without 
enabling/running any CAPI workloads. Disabling the systemd-suspend.service made 
the problem go away. Hence asked the bug to be mirrored to canonical.  @mzipse 
not sure what CAPP errors PRD team saw. Can you please ask them to get in touch 
with me.

== Comment: #26 - Vaibhav Jain <> - 2018-03-23 00:39:45 ==
Summary of the issue:

Ubuntu 18.04 is forcing a system suspend after 20 mins of system boot. Suspend 
is forced even if system is running a workload or user is logged on to the 
terminal and performing any activity.  The issue goes away if systemd-suspend 
service is disabled via:
"systemctl mask systemd-suspend.service"

So requesting canonical to look into this issue as a possible bug in
systemd or user inactivity monitor.

~ Vaibhav

** Affects: ubuntu-power-systems
     Importance: Critical
     Assignee: Canonical Foundations Team (canonical-foundations)
         Status: Triaged

** Affects: systemd (Ubuntu)
     Importance: Undecided
     Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
         Status: New


** Tags: architecture-ppc64le bugnameltc-165829 severity-critical 
targetmilestone-inin1804 triage-g

** Tags added: architecture-ppc64le bugnameltc-165829 severity-critical
targetmilestone-inin1804

** Changed in: ubuntu
     Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage 
(ubuntu-power-triage)

** Package changed: ubuntu => systemd (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1758273

Title:
  DD2.2 freezes/hangs after 20mins of uptime

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1758273/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to