> On 28 Dec 2017, at 5:12 PM, Lode Lesage <1737...@bugs.launchpad.net> wrote: > > I'm starting to think it isn't a software issue. > I tried to solve it again because it was becoming impossible to work, and > decided to install Ubuntu MATE 16.4.3 LTS (kernel 4.10.0-42-generic) because > I read some people had less problems with NVMe drives on that.
This issue is distro-agnostic. > > The problem persisted however. > I also noticed that sometimes when I booted and went into the BIOS the drive > would even lose connection/not be visible there, which to me indicates that > it might be a hardware issue? Any thoughts on that? Any way I can determine > for sure that I don't have a faulty drive? There are three things that worth to try, - Check if Windows also has this problem - Update system BIOS to latest version. - Update NVMe firmware to latest version, probably only available under Windows. IIRC, Samsung NVMes also have this problem under Windows, a firmware update solved the issue. > > -- > You received this bug notification because you are subscribed to linux > in Ubuntu. > https://bugs.launchpad.net/bugs/1737934 > > Title: > Samsung SM961 NVMe SSD randomly unmounts/loses connection/unavailable > > Status in linux package in Ubuntu: > Confirmed > > Bug description: > Seems related to these bugs: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 > https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704 > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748 > > Problem: > At seemingly random times my computer (brand new Lenovo Thinkpad T470) seems > to lose access to the Samsung SM961 256GB SSD drive it has inside. When this > happens the whole OS freezes up and when I try to power down I see a black > terminal-like screen that prints the following errors: > > EXT4-fs error (device nvme0n1p2): ext4_find_entry:1431: inode #7471275 > (or #741278): comm gmain (or systemd-journal or ...): reading > directory iblock 0 > > This error seems to be repeated endlessly, though I've only let it go > for a few minutes. No other errors are printed. > > This is the only drive it has. > I don't know if this occurs in Windows too since I removed Windows and > installed Ubuntu immediatly after updating the BIOS. > > Info: > Distro: Ubuntu MATE 17.10 > > sudo uname -r > 4.13.0-19-generic > > sudo nvme get-feature -f 0x0c -H /dev/nvme0 (with latency set to 250) > get-feature:0xc (Autonomous Power State Transition), Current value:0x000001 > Autonomous Power State Transition Enable (APSTE): Enabled > Auto PST Entries ................. > Entry[ 0] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 1] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 2] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 3] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 4] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 5] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 6] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 7] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 8] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[ 9] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[10] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[11] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[12] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[13] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[14] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[15] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[16] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[17] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[18] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[19] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[20] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[21] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[22] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[23] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[24] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[25] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[26] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[27] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[28] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[29] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[30] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > Entry[31] > ................. > Idle Time Prior to Transition (ITPT): 0 ms > Idle Transition Power State (ITPS): 0 > ................. > > sudo smartctl -a /dev/nvme0 > smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-19-generic] (local build) > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Number: SAMSUNG MZVLW256HEHP-000L7 > Serial Number: S35ENX0JA13385 > Firmware Version: 4L7QCXB7 > PCI Vendor/Subsystem ID: 0x144d > IEEE OUI Identifier: 0x002538 > Total NVM Capacity: 256.060.514.304 [256 GB] > Unallocated NVM Capacity: 0 > Controller ID: 2 > Number of Namespaces: 1 > Namespace 1 Size/Capacity: 256.060.514.304 [256 GB] > Namespace 1 Utilization: 17.834.708.992 [17,8 GB] > Namespace 1 Formatted LBA Size: 512 > Local Time is: Wed Dec 13 10:52:39 2017 CET > Firmware Updates (0x16): 3 Slots, no Reset required > Optional Admin Commands (0x0017): Security Format Frmw_DL *Other* > Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat > Warning Comp. Temp. Threshold: 69 Celsius > Critical Comp. Temp. Threshold: 72 Celsius > > Supported Power States > St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat > 0 + 7.60W - - 0 0 0 0 0 0 > 1 + 6.00W - - 1 1 1 1 0 0 > 2 + 5.10W - - 2 2 2 2 0 0 > 3 - 0.0400W - - 3 3 3 3 210 1500 > 4 - 0.0050W - - 4 4 4 4 2200 6000 > > Supported LBA Sizes (NSID 0x1) > Id Fmt Data Metadt Rel_Perf > 0 + 512 0 0 > > === START OF SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) > Critical Warning: 0x00 > Temperature: 35 Celsius > Available Spare: 100% > Available Spare Threshold: 10% > Percentage Used: 0% > Data Units Read: 151.517 [77,5 GB] > Data Units Written: 160.733 [82,2 GB] > Host Read Commands: 1.874.938 > Host Write Commands: 1.650.810 > Controller Busy Time: 10 > Power Cycles: 96 > Power On Hours: 14 > Unsafe Shutdowns: 78 > Media and Data Integrity Errors: 0 > Error Information Log Entries: 38 > Warning Comp. Temperature Time: 0 > Critical Comp. Temperature Time: 0 > Temperature Sensor 1: 35 Celsius > Temperature Sensor 2: 61 Celsius > > Error Information (NVMe Log 0x01, max 64 entries) > Num ErrCount SQId CmdId Status PELoc LBA NSID VS > 0 38 0 0x0018 0x4004 0x02c 0 0 - > 1 37 0 0x0017 0x4004 0x02c 0 0 - > 2 36 0 0x0018 0x4004 0x02c 0 0 - > 3 35 0 0x0017 0x4004 0x02c 0 0 - > 4 34 0 0x0018 0x4004 0x02c 0 0 - > 5 33 0 0x0017 0x4004 0x02c 0 0 - > 6 32 0 0x0018 0x4004 0x02c 0 0 - > 7 31 0 0x0017 0x4004 0x02c 0 0 - > 8 30 0 0x0018 0x4004 0x02c 0 0 - > 9 29 0 0x0017 0x4004 0x02c 0 0 - > 10 28 0 0x0018 0x4004 0x02c 0 0 - > 11 27 0 0x0017 0x4004 0x02c 0 0 - > 12 26 0 0x0018 0x4004 0x02c 0 0 - > 13 25 0 0x0017 0x4004 0x02c 0 0 - > 14 24 0 0x0018 0x4004 0x02c 0 0 - > 15 23 0 0x0017 0x4004 0x02c 0 0 - > ... (22 entries not shown) > > > lspci -nn > 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core > Processor Host Bridge/DRAM Registers [8086:5904] (rev 02) > 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 > [8086:5916] (rev 02) > 00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 > xHCI Controller [8086:9d2f] (rev 21) > 00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise > Point-LP Thermal subsystem [8086:9d31] (rev 21) > 00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP > CSME HECI #1 [8086:9d3a] (rev 21) > 00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express > Root Port [8086:9d10] (rev f1) > 00:1c.6 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express > Root Port #7 [8086:9d16] (rev f1) > 00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express > Root Port #9 [8086:9d18] (rev f1) > 00:1d.2 PCI bridge [0604]: Intel Corporation Device [8086:9d1a] (rev f1) > 00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller > [8086:9d58] (rev 21) > 00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC > [8086:9d21] (rev 21) > 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d71] (rev 21) > 00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] > (rev 21) > 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection > (4) I219-V [8086:15d8] (rev 21) > 04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 > [8086:24fd] (rev 78) > 3e:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd > NVMe SSD Controller SM961/PM961 [144d:a804] > > sudo nvme list > Node SN Model > Namespace Usage Format FW Rev > ---------------- -------------------- > ---------------------------------------- --------- -------------------------- > ---------------- -------- > /dev/nvme0n1 S35ENX0JA13385 SAMSUNG MZVLW256HEHP-000L7 > 1 17,83 GB / 256,06 GB 512 B + 0 B 4L7QCXB7 > > I tried looking for kernel errors with dmesg | grep -i nvme and dmesg > | grep -i EXT4-fs, but nothing of value shows up (only that the drive > was mounted). > > What have I tried: > Reading the bug reports mentioned above it seemed that my problem should > already be fixed since I'm on kernel 4.13. > Since I still have the problem, I should be able to temporarily fix it by > setting > GRUB_CMDLINE_LINUX_DEFAULT="nvme_core.default_ps_max_latency_us=0" > and running sudo update-grub. > This doesn't help however, since I still get random loss of connection to > the drive. Sometimes this happens after minutes of booting, sometimes after > hours, but I haven't been able to run stable for a full day. > I have tried all values for the latency I found in various bug reports > online, specifically I tried: 0, 250, 5500, 6000 and 11000. It seems to run > most stable with 250 and least stable with 0, where the error happens > seconds/minutes after boot. > > I'm at my wits' end here. I could just stick a normal SATA SSD in there but > then this brand new one would be a waste. Any help would be greatly > appreciated. > If more info is needed I'll do my best to provide it. > > ProblemType: Bug > DistroRelease: Ubuntu 17.10 > Package: linux-image-4.13.0-19-generic 4.13.0-19.22 > ProcVersionSignature: Ubuntu 4.13.0-19.22-generic 4.13.13 > Uname: Linux 4.13.0-19-generic x86_64 > ApportVersion: 2.20.7-0ubuntu3.5 > Architecture: amd64 > AudioDevicesInUse: > USER PID ACCESS COMMAND > /dev/snd/controlC0: musilitar 1334 F.... pulseaudio > CurrentDesktop: MATE > Date: Wed Dec 13 11:11:54 2017 > InstallationDate: Installed on 2017-12-07 (5 days ago) > InstallationMedia: Ubuntu-MATE 17.10 "Artful Aardvark" - Release amd64 > (20171018) > MachineType: LENOVO 20HD0001MB > ProcFB: 0 inteldrmfb > ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-19-generic.efi.signed > root=UUID=99902074-7315-4905-8a7d-97b65a09bb74 ro quiet splash > nvme_core.default_ps_max_latency_us=250 vt.handoff=7 > RelatedPackageVersions: > linux-restricted-modules-4.13.0-19-generic N/A > linux-backports-modules-4.13.0-19-generic N/A > linux-firmware 1.169.1 > SourcePackage: linux > UpgradeStatus: No upgrade log present (probably fresh install) > dmi.bios.date: 11/10/2017 > dmi.bios.vendor: LENOVO > dmi.bios.version: N1QET68W (1.43 ) > dmi.board.asset.tag: Not Available > dmi.board.name: 20HD0001MB > dmi.board.vendor: LENOVO > dmi.board.version: SDK0J40697 WIN > dmi.chassis.asset.tag: No Asset Information > dmi.chassis.type: 10 > dmi.chassis.vendor: LENOVO > dmi.chassis.version: None > dmi.modalias: > dmi:bvnLENOVO:bvrN1QET68W(1.43):bd11/10/2017:svnLENOVO:pn20HD0001MB:pvrThinkPadT470:rvnLENOVO:rn20HD0001MB:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone: > dmi.product.family: ThinkPad T470 > dmi.product.name: 20HD0001MB > dmi.product.version: ThinkPad T470 > dmi.sys.vendor: LENOVO > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934/+subscriptions -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1737934 Title: Samsung SM961 NVMe SSD randomly unmounts/loses connection/unavailable To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs