Public bug reported: == Comment: #1 - Application Cdeadmin <cdead...@us.ibm.com> - 2016-12-02 04:55:07 == ==== State: Open by: tdylla on 01 December 2016 07:24:33 ====
Notice: This Note entry was modified. 2 non-ascii character(s) were replaced with question marks. BMC yl13u2bmc/9.5.57.84 Gui - ADMIN/admin ssh - sysadmin/superuser OS yl13u2os/9.5.57.85 ssh - root/Pumpk1ns root@YL13U2OS:~# ver cat: /proc/device-tree/openprom/model: No such file or directory ver 1.5.4.5 - OS, HTX, Firmware and Machine details OS: GNU/Linux OS Version: Ubuntu 16.04.1 LTS \n \l Kernel Version: 4.4.0-47-generic HTX Version: htxubuntu-422 Host Name: YL13U2OS Machine Serial No: 100CC9A Machine Type/Model: 8335-GTB root@YL13U2OS:~# uname -a Linux YL13U2OS 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux root@YL13U2OS:~# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.1 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.1 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial Dasd exercisers fail with a write error. These have never failed before. root@YL13U2OS:~# lsblk -o KNAME,TYPE,SIZE,MODEL,ROTA KNAME TYPE SIZE MODEL ROTA sda disk 1.8T ST2000NX0253 1 sda1 part 1.8T 1 sdb disk 1.8T ST2000NX0253 1 sdb1 part 1.8T 1 Getting HTX erros from yl13u2os.rch.stglabs.ibm.com ######################## Result Starts Here ################################ Currently running ECG/MDT : /usr/lpp/htx//mdt/mdt.whit =========================== --------------------------------------------------------------------- Device id:/dev/sda1 Timestamp:Dec 1 01:22:57 2016 err=00000001 sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:rule_1_3 numopers= 1907729 loop= 1322123 blk=0xc08768b0 len=262144 dir=DOWN min_blkno=0xaea86084 max_blkno=0xe8e080af BWRC LBA fencepost Detail: th_num min_lba max_lba status 0 0 2476e9ff R 1 4766ee58 74704057 R 2 74704058 99783457 R 3 c0876ab0 e8e080af R write error - errno: 1(?) --------------------------------------------------------------------- --------------------------------------------------------------------- Device id:/dev/sda1 Timestamp:Dec 1 01:22:57 2016 err=00000001 sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error --------------------------------------------------------------------- --------------------------------------------------------------------- Device id:/dev/sdb1 Timestamp:Dec 1 01:23:08 2016 err=00000001 sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:rule_1_1 numopers= 1907729 loop= 1394165 blk=0x49e45458 len=262144 dir=DOWN min_blkno=0x3a38202c max_blkno=0x74704057 BWRC LBA fencepost Detail: th_num min_lba max_lba status 0 0 247c47ff R 1 49e45658 74704057 R 2 74704058 99d2a657 R 3 c0d344b0 e8e080af R write error - errno: 1(?) --------------------------------------------------------------------- --------------------------------------------------------------------- Device id:/dev/sdb1 Timestamp:Dec 1 01:23:08 2016 err=00000001 sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error --------------------------------------------------------------------- ######################### Result Ends Here ################################# System is still running exercisers. Feel Free to play with the system. System is available for any debug that is needed. ==== State: Open by: mamukul1 on 01 December 2016 15:41:32 ==== Write() failing with errno 1 for both sda1 and sdb1. Some errors seen in dmesg as well in same timeframe. Over to hxestorage to debug further. #=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=# <Note by preeti, 2016/12/01 23:47:34 seq: 7 rel: 0 action: note> Both the devices are failing with errno. set to 1 for write() system call, which means "operation not permitted". --------------------------------------------------------------------- Device id:/dev/sda1 Timestamp:Dec 1 01:22:57 2016 err=00000001 sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:rule_1_3 numopers= 1907729 loop= 1322123 blk=0xc08768b0 len=262144 dir=DOWN min_blkno=0xaea86084 max_blkno=0xe8e080af BWRC LBA fencepost Detail: th_num min_lba max_lba status 0 0 2476e9ff R 1 4766ee58 74704057 R 2 74704058 99783457 R 3 c0876ab0 ) e8e080af R write error - errno: 1(?? Below is corresponding data in kernel logs (Not sure if it is related to error): Dec 1 01:22:57 YL13U2OS kernel: [50119.193567] EXT4-fs (sda1): VFS: Can't find ext4 filesystem Dec 1 01:22:57 YL13U2OS kernel: [50119.201895] EXT4-fs (sda1): VFS: Can't find ext4 filesystem Dec 1 01:22:57 YL13U2OS kernel: [50119.207728] EXT4-fs (sda1): VFS: Can't find ext4 filesystem Dec 1 01:22:57 YL13U2OS kernel: [50119.234961] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sda1 Dec 1 01:22:57 YL13U2OS kernel: [50119.249926] FAT-fs (sda1): bogus number of FAT structure Dec 1 01:22:57 YL13U2OS kernel: [50119.250215] FAT-fs (sda1): Can't find a valid FAT filesystem Dec 1 01:22:58 YL13U2OS kernel: [50119.700556] XFS (sda1): Invalid superblock magic number Dec 1 01:22:58 YL13U2OS kernel: [50120.448485] FAT-fs (sda1): bogus number of FAT structure Dec 1 01:22:58 YL13U2OS kernel: [50120.448818] FAT-fs (sda1): Can't find a valid FAT filesystem Dec 1 01:22:59 YL13U2OS kernel: [50120.463705] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sda1. Dec 1 01:22:59 YL13U2OS kernel: [50120.468236] hfsplus: unable to find HFS+ superblock Dec 1 01:22:59 YL13U2OS kernel: [50120.474019] qnx4: no qnx4 filesystem (no root dir). Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] ufs: You didn't specify the type of your ufs filesystem Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ... Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old Dec 1 01:22:59 YL13U2OS kernel: [50120.481654] ufs: ufs_fill_super(): bad magic number Dec 1 01:22:59 YL13U2OS kernel: [50120.487379] hfs: can't find a HFS filesystem on dev sda1 Will transfer to Linux to look further. <Note by preeti, 2016/12/02 04:35:35 seq: 8 rel: 0 action: assign> == Comment: #2 - Application Cdeadmin <cdead...@us.ibm.com> - 2016-12-02 09:55:08 == ==== State: Open by: tdylla on 02 December 2016 09:53:18 ==== I noticed on a different system that has htxubuntu-424 installed along with a patch from defect sw372840 that the sdb exercisers is running just fine. It currently has a cycle count of 2 and current stanza of 5. The device on this other system is exactly the same drive type. sdb disk 1.8T ST2000NX0253 sdb1 part 1.8T == Comment: #3 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-05 05:43:45 == root@YL13U2OS:~# cat /proc/partitions major minor #blocks name 1 0 65536 ram0 1 1 65536 ram1 1 2 65536 ram2 1 3 65536 ram3 1 4 65536 ram4 1 5 65536 ram5 1 6 65536 ram6 1 7 65536 ram7 1 8 65536 ram8 1 9 65536 ram9 1 10 65536 ram10 1 11 65536 ram11 1 12 65536 ram12 1 13 65536 ram13 1 14 65536 ram14 1 15 65536 ram15 259 0 3125616984 nvme0n1 259 1 7168 nvme0n1p1 259 2 2999266304 nvme0n1p2 259 3 126342144 nvme0n1p3 8 0 1953514584 sda 8 1 1953513560 sda1 8 16 1953514584 sdb 8 17 1953513560 sdb1 11 0 1048575 sr0 11 1 1048575 sr1 11 2 1048575 sr2 11 3 1048575 sr3 root@YL13U2OS:~# mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) udev on /dev type devtmpfs (rw,nosuid,relatime,size=508856128k,nr_inodes=7950877,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=107151232k,mode=755) /dev/nvme0n1p2 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct) mqueue on /dev/mqueue type mqueue (rw,relatime) debugfs on /sys/kernel/debug type debugfs (rw,relatime) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime) configfs on /sys/kernel/config type configfs (rw,relatime) lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=107151232k,mode=700) root@YL13U2OS:~# df Filesystem 1K-blocks Used Available Use% Mounted on udev 508856128 0 508856128 0% /dev tmpfs 107151232 32832 107118400 1% /run /dev/nvme0n1p2 2952071944 7906084 2794186164 1% / tmpfs 535756096 0 535756096 0% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 535756096 0 535756096 0% /sys/fs/cgroup tmpfs 107151232 0 107151232 0% /run/user/0 root@YL13U2OS:~# == Comment: #7 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-06 05:43:06 == root@YL13U2OS:~# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on udev devtmpfs 508856128 0 508856128 0% /dev tmpfs tmpfs 107151232 32832 107118400 1% /run /dev/nvme0n1p2 ext4 2952071944 7931124 2794161124 1% / tmpfs tmpfs 535756096 0 535756096 0% /dev/shm tmpfs tmpfs 5120 0 5120 0% /run/lock tmpfs tmpfs 535756096 0 535756096 0% /sys/fs/cgroup tmpfs tmpfs 107151232 0 107151232 0% /run/user/0 root@YL13U2OS:~# == Comment: #8 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-06 06:33:48 == root@YL13U2OS:~# cat /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> # / was on /dev/nvme0n1p2 during installation UUID=6cddb0e5-477c-4d64-807a-631b2d12dfac / ext4 errors=remount-ro 0 1 # swap was on /dev/nvme0n1p3 during installation UUID=00693a84-74f6-4ded-b82d-6a938880ba8a none swap sw 0 0 root@YL13U2OS:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 1 1.8T 0 disk ??sda1 8:1 1 1.8T 0 part sdb 8:16 1 1.8T 0 disk ??sdb1 8:17 1 1.8T 0 part sr0 11:0 1 1024M 0 rom sr1 11:1 1 1024M 0 rom sr2 11:2 1 1024M 0 rom sr3 11:3 1 1024M 0 rom nvme0n1 259:0 0 2.9T 0 disk ??nvme0n1p1 259:1 0 7M 0 part ??nvme0n1p2 259:2 0 2.8T 0 part / ??nvme0n1p3 259:3 0 120.5G 0 part [SWAP] root@YL13U2OS:~# lsblk --fs NAME FSTYPE LABEL UUID MOUNTPOINT sda ??sda1 sdb ??sdb1 sr0 sr1 sr2 sr3 nvme0n1 ??nvme0n1p1 ??nvme0n1p2 ext4 6cddb0e5-477c-4d64-807a-631b2d12dfac / ??nvme0n1p3 swap 00693a84-74f6-4ded-b82d-6a938880ba8a [SWAP] root@YL13U2OS:~# grep -B 1 '"hxestorage"' /usr/lpp/htx/mdt/mdt sda1: HE_name = "hxestorage" * Hardware Exerciser name, 14 char -- sdb1: HE_name = "hxestorage" * Hardware Exerciser name, 14 char root@YL13U2OS:~# root@YL13U2OS:~# root@YL13U2OS:~# grep 'Device id' /tmp/htxerr Device id:/dev/sda1 Device id:/dev/sda1 Device id:/dev/sdb1 Device id:/dev/sdb1 root@YL13U2OS:~# sda1 and sdb2 are only disks being exercised and both have errored out due after write failure. nvme0n1p1 disk is being used by OS and thus not getting exercised by HTX. == Comment: #9 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-06 07:52:38 == [Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem [Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem [Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem [Thu Dec 1 01:22:57 2016] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sda1 [Thu Dec 1 01:22:57 2016] FAT-fs (sda1): bogus number of FAT structure [Thu Dec 1 01:22:57 2016] FAT-fs (sda1): Can't find a valid FAT filesystem [Thu Dec 1 01:22:57 2016] XFS (sda1): Invalid superblock magic number [Thu Dec 1 01:22:58 2016] FAT-fs (sda1): bogus number of FAT structure [Thu Dec 1 01:22:58 2016] FAT-fs (sda1): Can't find a valid FAT filesystem [Thu Dec 1 01:22:58 2016] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sda1. [Thu Dec 1 01:22:58 2016] hfsplus: unable to find HFS+ superblock [Thu Dec 1 01:22:58 2016] qnx4: no qnx4 filesystem (no root dir). [Thu Dec 1 01:22:58 2016] ufs: You didn't specify the type of your ufs filesystem mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ... >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old [Thu Dec 1 01:22:58 2016] ufs: ufs_fill_super(): bad magic number [Thu Dec 1 01:22:58 2016] hfs: can't find a HFS filesystem on dev sda1 [Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem [Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem [Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem [Thu Dec 1 01:23:08 2016] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sdb1 [Thu Dec 1 01:23:08 2016] FAT-fs (sdb1): bogus number of FAT structure [Thu Dec 1 01:23:08 2016] FAT-fs (sdb1): Can't find a valid FAT filesystem [Thu Dec 1 01:23:08 2016] XFS (sdb1): Invalid superblock magic number [Thu Dec 1 01:23:10 2016] FAT-fs (sdb1): bogus number of FAT structure [Thu Dec 1 01:23:10 2016] FAT-fs (sdb1): Can't find a valid FAT filesystem [Thu Dec 1 01:23:10 2016] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sdb1. [Thu Dec 1 01:23:10 2016] hfsplus: unable to find HFS+ superblock [Thu Dec 1 01:23:10 2016] qnx4: no qnx4 filesystem (no root dir). [Thu Dec 1 01:23:10 2016] ufs: You didn't specify the type of your ufs filesystem mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ... >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old [Thu Dec 1 01:23:10 2016] ufs: ufs_fill_super(): bad magic number [Thu Dec 1 01:23:10 2016] hfs: can't find a HFS filesystem on dev sdb1 Linux has failed to detect file systems on sda1, sdb1 disks, causing write failures for HTX exerciser. Similar fails are reported for nvme disk also in Linux kernel log. == Comment: #10 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-06 08:01:35 == Linux errors are being by os-prober. I ran os-probe manually and FS fails got logged in Linux log. So os-probe got invoked while HTX was running. This caused write fails for sda1, sdb1 disks along with nvme disks and also logged Linux errors. == Comment: #11 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-06 08:04:55 == What operation was tried while HTX was running, once these errors were seen ? Was it apt upgrade or some thing else ? == Comment: #12 - Application Cdeadmin <cdead...@us.ibm.com> - 2016-12-07 10:56:09 == ==== State: MoreInfo by: tdylla on 07 December 2016 10:53:58 ==== HTX was started using htx command line commands. From then on, the system was monitored through "System Live Monitor" No other commands were executed by a user. This failure happened during an overnight run. I believe that the Ubuntu OS was loaded to automatically load Security Fix's which is required. ** Affects: os-prober (Ubuntu) Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-149477 severity-high targetmilestone-inin16042 ** Tags added: architecture-ppc64le bugnameltc-149477 severity-high targetmilestone-inin16042 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1648561 Title: htxubuntu SDB dasd exercisers fail To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/os-prober/+bug/1648561/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs