On 6/10/19 9:05 AM, athomp...@merlin.mb.ca wrote: >>Synopsis: fsck doesn't always flag clean after filesystem corruption >>Category: system >>Environment: > System : OpenBSD 6.5 > Details : OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019 > > r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >>Description: > After seemingly-minor FS corruption, fsck is unable to clear the dirty > flag despite claiming it did. > Alternately, mount(2) is looking at something other than just the flag > fsck(8) manipulates. >>How-To-Repeat: > Experience multiple power events and storage failure events (due to bad > hardware). > Root filesystem gets some minor corruption, and remounts r/o. Fsck > auto-prunes the FS, marking it as clean. > mount(2) *still* cannot, at that point, mount the FS r/w. > Even running fsck -fy /dev/sd0a from a 6.5 ramdisk didn't allow the > installed system to boot properly, despite fsck claiming it had marked the FS > clean. > Related - system boot continues even if root is r/o, so pretty much > everything just fails. >>Fix: > No known fix. I restored from backups. > I understand this isn't a very helpful bugreport, more just a heads-up > that there's SOME corner case where fsck & mount don't agree the filesystem > is clean. > Unfortunately, I didn't think to image the disk before restoring, sorry > :-( > > > dmesg: > OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019 > > r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP ... > pvbus0 at mainbus0: VMware ... > vmwpvs0 at pci3 dev 0 function 0 "VMware PVSCSI" rev 0x02: apic 1 int 18 > scsibus1 at vmwpvs0: 16 targets > sd0 at scsibus1 targ 0 lun 0: <VMware, Virtual disk, 2.0> SCSI4 0/direct fixed > sd0: 20480MB, 512 bytes/sector, 41943040 sectors
Reading your description, I was pretty sure we'd see this. Change your VMware host bus adapter from whatever the VMware one is to an LSI or something else. I suspect your problem will go away. I think there's a long-standing bug in the vmwpvs driver where the first write to disk gets lost. fsck? First write is correcting the disk, it gets lost, thus you still have a problem. You could also do some sacrificial first write to the disk, but I don't think you will see a big problem if you just switch to an LSI SCSI or SAS driver. Nick.