> On 28 Dec 2017, at 5:12 PM, Lode Lesage <1737...@bugs.launchpad.net> wrote:
> 
> I'm starting to think it isn't a software issue.
> I tried to solve it again because it was becoming impossible to work, and 
> decided to install Ubuntu MATE 16.4.3 LTS (kernel 4.10.0-42-generic) because 
> I read some people had less problems with NVMe drives on that.

This issue is distro-agnostic.

> 
> The problem persisted however.
> I also noticed that sometimes when I booted and went into the BIOS the drive 
> would even lose connection/not be visible there, which to me indicates that 
> it might be a hardware issue? Any thoughts on that? Any way I can determine 
> for sure that I don't have a faulty drive?

There are three things that worth to try,
- Check if Windows also has this problem
- Update system BIOS to latest version.
- Update NVMe firmware to latest version, probably only available under Windows.

IIRC, Samsung NVMes also have this problem under Windows, a firmware
update solved the issue.

> 
> -- 
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1737934
> 
> Title:
>  Samsung SM961 NVMe SSD randomly unmounts/loses connection/unavailable
> 
> Status in linux package in Ubuntu:
>  Confirmed
> 
> Bug description:
>  Seems related to these bugs:
>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
>  https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704
>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748
> 
>  Problem:
>  At seemingly random times my computer (brand new Lenovo Thinkpad T470) seems 
> to lose access to the Samsung SM961 256GB SSD drive it has inside. When this 
> happens the whole OS freezes up and when I try to power down I see a black 
> terminal-like screen that prints the following errors:
> 
>  EXT4-fs error (device nvme0n1p2): ext4_find_entry:1431: inode #7471275
>  (or #741278): comm gmain (or systemd-journal or ...): reading
>  directory iblock 0
> 
>  This error seems to be repeated endlessly, though I've only let it go
>  for a few minutes. No other errors are printed.
> 
>  This is the only drive it has.
>  I don't know if this occurs in Windows too since I removed Windows and 
> installed Ubuntu immediatly after updating the BIOS.
> 
>  Info:
>  Distro: Ubuntu MATE 17.10
> 
>  sudo uname -r
>  4.13.0-19-generic
> 
>  sudo nvme get-feature -f 0x0c -H /dev/nvme0 (with latency set to 250)
>  get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
>       Autonomous Power State Transition Enable (APSTE): Enabled
>       Auto PST Entries        .................
>       Entry[ 0]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 1]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 2]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 3]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 4]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 5]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 6]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 7]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 8]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[ 9]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[10]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[11]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[12]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[13]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[14]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[15]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[16]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[17]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[18]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[19]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[20]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[21]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[22]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[23]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[24]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[25]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[26]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[27]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[28]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[29]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[30]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
>       Entry[31]   
>       .................
>       Idle Time Prior to Transition (ITPT): 0 ms
>       Idle Transition Power State   (ITPS): 0
>       .................
> 
>  sudo smartctl -a /dev/nvme0
>  smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-19-generic] (local build)
>  Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
>  === START OF INFORMATION SECTION ===
>  Model Number:                       SAMSUNG MZVLW256HEHP-000L7
>  Serial Number:                      S35ENX0JA13385
>  Firmware Version:                   4L7QCXB7
>  PCI Vendor/Subsystem ID:            0x144d
>  IEEE OUI Identifier:                0x002538
>  Total NVM Capacity:                 256.060.514.304 [256 GB]
>  Unallocated NVM Capacity:           0
>  Controller ID:                      2
>  Number of Namespaces:               1
>  Namespace 1 Size/Capacity:          256.060.514.304 [256 GB]
>  Namespace 1 Utilization:            17.834.708.992 [17,8 GB]
>  Namespace 1 Formatted LBA Size:     512
>  Local Time is:                      Wed Dec 13 10:52:39 2017 CET
>  Firmware Updates (0x16):            3 Slots, no Reset required
>  Optional Admin Commands (0x0017):   Security Format Frmw_DL *Other*
>  Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
>  Warning  Comp. Temp. Threshold:     69 Celsius
>  Critical Comp. Temp. Threshold:     72 Celsius
> 
>  Supported Power States
>  St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
>   0 +     7.60W       -        -    0  0  0  0        0       0
>   1 +     6.00W       -        -    1  1  1  1        0       0
>   2 +     5.10W       -        -    2  2  2  2        0       0
>   3 -   0.0400W       -        -    3  3  3  3      210    1500
>   4 -   0.0050W       -        -    4  4  4  4     2200    6000
> 
>  Supported LBA Sizes (NSID 0x1)
>  Id Fmt  Data  Metadt  Rel_Perf
>   0 +     512       0         0
> 
>  === START OF SMART DATA SECTION ===
>  SMART overall-health self-assessment test result: PASSED
> 
>  SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
>  Critical Warning:                   0x00
>  Temperature:                        35 Celsius
>  Available Spare:                    100%
>  Available Spare Threshold:          10%
>  Percentage Used:                    0%
>  Data Units Read:                    151.517 [77,5 GB]
>  Data Units Written:                 160.733 [82,2 GB]
>  Host Read Commands:                 1.874.938
>  Host Write Commands:                1.650.810
>  Controller Busy Time:               10
>  Power Cycles:                       96
>  Power On Hours:                     14
>  Unsafe Shutdowns:                   78
>  Media and Data Integrity Errors:    0
>  Error Information Log Entries:      38
>  Warning  Comp. Temperature Time:    0
>  Critical Comp. Temperature Time:    0
>  Temperature Sensor 1:               35 Celsius
>  Temperature Sensor 2:               61 Celsius
> 
>  Error Information (NVMe Log 0x01, max 64 entries)
>  Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
>    0         38     0  0x0018  0x4004  0x02c            0     0     -
>    1         37     0  0x0017  0x4004  0x02c            0     0     -
>    2         36     0  0x0018  0x4004  0x02c            0     0     -
>    3         35     0  0x0017  0x4004  0x02c            0     0     -
>    4         34     0  0x0018  0x4004  0x02c            0     0     -
>    5         33     0  0x0017  0x4004  0x02c            0     0     -
>    6         32     0  0x0018  0x4004  0x02c            0     0     -
>    7         31     0  0x0017  0x4004  0x02c            0     0     -
>    8         30     0  0x0018  0x4004  0x02c            0     0     -
>    9         29     0  0x0017  0x4004  0x02c            0     0     -
>   10         28     0  0x0018  0x4004  0x02c            0     0     -
>   11         27     0  0x0017  0x4004  0x02c            0     0     -
>   12         26     0  0x0018  0x4004  0x02c            0     0     -
>   13         25     0  0x0017  0x4004  0x02c            0     0     -
>   14         24     0  0x0018  0x4004  0x02c            0     0     -
>   15         23     0  0x0017  0x4004  0x02c            0     0     -
>  ... (22 entries not shown)
> 
> 
>  lspci -nn
>  00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core 
> Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
>  00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 
> [8086:5916] (rev 02)
>  00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 
> xHCI Controller [8086:9d2f] (rev 21)
>  00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise 
> Point-LP Thermal subsystem [8086:9d31] (rev 21)
>  00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP 
> CSME HECI #1 [8086:9d3a] (rev 21)
>  00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express 
> Root Port [8086:9d10] (rev f1)
>  00:1c.6 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express 
> Root Port #7 [8086:9d16] (rev f1)
>  00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express 
> Root Port #9 [8086:9d18] (rev f1)
>  00:1d.2 PCI bridge [0604]: Intel Corporation Device [8086:9d1a] (rev f1)
>  00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller 
> [8086:9d58] (rev 21)
>  00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC 
> [8086:9d21] (rev 21)
>  00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d71] (rev 21)
>  00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] 
> (rev 21)
>  00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection 
> (4) I219-V [8086:15d8] (rev 21)
>  04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 
> [8086:24fd] (rev 78)
>  3e:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd 
> NVMe SSD Controller SM961/PM961 [144d:a804]
> 
>  sudo nvme list
>  Node             SN                   Model                                  
>   Namespace Usage                      Format           FW Rev  
>  ---------------- -------------------- 
> ---------------------------------------- --------- -------------------------- 
> ---------------- --------
>  /dev/nvme0n1     S35ENX0JA13385       SAMSUNG MZVLW256HEHP-000L7             
>   1          17,83  GB / 256,06  GB    512   B +  0 B   4L7QCXB7
> 
>  I tried looking for kernel errors with dmesg | grep -i nvme and dmesg
>  | grep -i EXT4-fs, but nothing of value shows up (only that the drive
>  was mounted).
> 
>  What have I tried:
>  Reading the bug reports mentioned above it seemed that my problem should 
> already be fixed since I'm on kernel 4.13.
>  Since I still have the problem, I should be able to temporarily fix it by 
> setting 
>  GRUB_CMDLINE_LINUX_DEFAULT="nvme_core.default_ps_max_latency_us=0"
>  and running sudo update-grub.
>  This doesn't help however, since I still get random loss of connection to 
> the drive. Sometimes this happens after minutes of booting, sometimes after 
> hours, but I haven't been able to run stable for a full day.
>  I have tried all values for the latency I found in various bug reports 
> online, specifically I tried: 0, 250, 5500, 6000 and 11000. It seems to run 
> most stable with 250 and least stable with 0, where the error happens 
> seconds/minutes after boot.
> 
>  I'm at my wits' end here. I could just stick a normal SATA SSD in there but 
> then this brand new one would be a waste. Any help would be greatly 
> appreciated.
>  If more info is needed I'll do my best to provide it.
> 
>  ProblemType: Bug
>  DistroRelease: Ubuntu 17.10
>  Package: linux-image-4.13.0-19-generic 4.13.0-19.22
>  ProcVersionSignature: Ubuntu 4.13.0-19.22-generic 4.13.13
>  Uname: Linux 4.13.0-19-generic x86_64
>  ApportVersion: 2.20.7-0ubuntu3.5
>  Architecture: amd64
>  AudioDevicesInUse:
>   USER        PID ACCESS COMMAND
>   /dev/snd/controlC0:  musilitar   1334 F.... pulseaudio
>  CurrentDesktop: MATE
>  Date: Wed Dec 13 11:11:54 2017
>  InstallationDate: Installed on 2017-12-07 (5 days ago)
>  InstallationMedia: Ubuntu-MATE 17.10 "Artful Aardvark" - Release amd64 
> (20171018)
>  MachineType: LENOVO 20HD0001MB
>  ProcFB: 0 inteldrmfb
>  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-19-generic.efi.signed 
> root=UUID=99902074-7315-4905-8a7d-97b65a09bb74 ro quiet splash 
> nvme_core.default_ps_max_latency_us=250 vt.handoff=7
>  RelatedPackageVersions:
>   linux-restricted-modules-4.13.0-19-generic N/A
>   linux-backports-modules-4.13.0-19-generic  N/A
>   linux-firmware                             1.169.1
>  SourcePackage: linux
>  UpgradeStatus: No upgrade log present (probably fresh install)
>  dmi.bios.date: 11/10/2017
>  dmi.bios.vendor: LENOVO
>  dmi.bios.version: N1QET68W (1.43 )
>  dmi.board.asset.tag: Not Available
>  dmi.board.name: 20HD0001MB
>  dmi.board.vendor: LENOVO
>  dmi.board.version: SDK0J40697 WIN
>  dmi.chassis.asset.tag: No Asset Information
>  dmi.chassis.type: 10
>  dmi.chassis.vendor: LENOVO
>  dmi.chassis.version: None
>  dmi.modalias: 
> dmi:bvnLENOVO:bvrN1QET68W(1.43):bd11/10/2017:svnLENOVO:pn20HD0001MB:pvrThinkPadT470:rvnLENOVO:rn20HD0001MB:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
>  dmi.product.family: ThinkPad T470
>  dmi.product.name: 20HD0001MB
>  dmi.product.version: ThinkPad T470
>  dmi.sys.vendor: LENOVO
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1737934

Title:
  Samsung SM961 NVMe SSD randomly unmounts/loses connection/unavailable

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to