Public bug reported:

Some of our ubuntu machines (16.04 on 4.15 kernel) are suddenly turning
into a read-only filesystem after approx. 5 minutes operation:

The error is the following:

{{{
Jan  7 13:26:12 lj000601 kernel [  311.818652] ata1.00: READ LOG DMA EXT 
failed, trying PIO
Jan  7 13:26:12 lj000601 kernel [  311.823232] ata1.00: exception Emask 0x0 
SAct 0x10000 SErr 0x0 action 0x0
Jan  7 13:26:12 lj000601 kernel [  311.823237] ata1.00: irq_stat 0x40000008
Jan  7 13:26:12 lj000601 kernel [  311.823242] ata1.00: failed command: READ 
FPDMA QUEUED
Jan  7 13:26:12 lj000601 kernel [  311.823250] ata1.00: cmd 
60/08:80:38:1b:c1/00:00:02:00:00/40 tag 16 ncq dma 4096 in
Jan  7 13:26:12 lj000601 kernel [  311.823250]          res 
41/40:00:38:1b:c1/00:00:02:00:00/00 Emask 0x409 (media error) <F>
Jan  7 13:26:12 lj000601 kernel [  311.823254] ata1.00: status: { DRDY ERR }
Jan  7 13:26:12 lj000601 kernel [  311.823257] ata1.00: error: { UNC }
Jan  7 13:26:12 lj000601 kernel [  311.828470] ata1.00: configured for UDMA/133
Jan  7 13:26:12 lj000601 kernel [  311.829567] sd 0:0:0:0: [sda] tag#16 FAILED 
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan  7 13:26:12 lj000601 kernel [  311.829571] sd 0:0:0:0: [sda] tag#16 Sense 
Key : Medium Error [current]
Jan  7 13:26:12 lj000601 kernel [  311.829575] sd 0:0:0:0: [sda] tag#16 Add. 
Sense: Unrecovered read error - auto reallocate failed
Jan  7 13:26:12 lj000601 kernel [  311.829579] sd 0:0:0:0: [sda] tag#16 CDB: 
Read(10) 28 00 02 c1 1b 38 00 00 08 00
Jan  7 13:26:12 lj000601 kernel [  311.829582] print_req_error: I/O error, dev 
sda, sector 46209848
Jan  7 13:26:12 lj000601 kernel [  311.829615] EXT4-fs error (device sda1): 
ext4_find_entry:1454: inode #1444593: comm updatedb.mlocat: reading directory 
lblock 0
Jan  7 13:26:12 lj000601 kernel [  311.829617] ata1: EH complete
Jan  7 13:26:12 lj000601 kernel [  311.830654] Aborting journal on device 
sda1-8.
Jan  7 13:26:12 lj000601 kernel [  311.831394] EXT4-fs (sda1): Remounting 
filesystem read-only
Jan  7 13:26:12 lj000601 kernel [  311.831407] EXT4-fs error (device sda1): 
ext4_journal_check_start:61: Detected aborted journal
}}}

PS: see further details in kernel.log

The machines have moderated disk access rates, they are retail point of
sale (graphical interface, internal web server, local postgres and
several USB devices), nothing terribly complex.

The recovery process is laborious, requiring local intervention to run
fsck on the faulty block. Then it comes back as if nothing happened, for
a while though, because we are starting seeing the issue resurfacing.

The easy conclusion is hardware defect, but the problem happen in a wide
range to SSDs manufacturers and level of usage, as seen in the
smartctl.txt attached.

Looking forward to any hints on debugging this problem further.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "kernel.log"
   https://bugs.launchpad.net/bugs/1858784/+attachment/5318546/+files/kernel.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1858784

Title:
  Read-only filesystem

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1858784/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to