On 6/10/19 9:05 AM, athomp...@merlin.mb.ca wrote:
>>Synopsis:     fsck doesn't always flag clean after filesystem corruption
>>Category:     system
>>Environment:
>       System      : OpenBSD 6.5
>       Details     : OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019
>                        
> r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>       Architecture: OpenBSD.amd64
>       Machine     : amd64
>>Description:
>       After seemingly-minor FS corruption, fsck is unable to clear the dirty 
> flag despite claiming it did.
>       Alternately, mount(2) is looking at something other than just the flag 
> fsck(8) manipulates.
>>How-To-Repeat:
>       Experience multiple power events and storage failure events (due to bad 
> hardware).
>       Root filesystem gets some minor corruption, and remounts r/o.  Fsck 
> auto-prunes the FS, marking it as clean.
>       mount(2) *still* cannot, at that point, mount the FS r/w.
>       Even running fsck -fy /dev/sd0a from a 6.5 ramdisk didn't allow the 
> installed system to boot properly, despite fsck claiming it had marked the FS 
> clean.
>       Related - system boot continues even if root is r/o, so pretty much 
> everything just fails.
>>Fix:
>       No known fix.  I restored from backups.
>       I understand this isn't a very helpful bugreport, more just a heads-up 
> that there's SOME corner case where fsck & mount don't agree the filesystem 
> is clean.
>       Unfortunately, I didn't think to image the disk before restoring, sorry 
> :-(
> 
> 
> dmesg:
> OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019
>     
> r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
...
> pvbus0 at mainbus0: VMware
...
> vmwpvs0 at pci3 dev 0 function 0 "VMware PVSCSI" rev 0x02: apic 1 int 18
> scsibus1 at vmwpvs0: 16 targets
> sd0 at scsibus1 targ 0 lun 0: <VMware, Virtual disk, 2.0> SCSI4 0/direct fixed
> sd0: 20480MB, 512 bytes/sector, 41943040 sectors

Reading your description, I was pretty sure we'd see this.

Change your VMware host bus adapter from whatever the VMware one is to
an LSI or something else.  I suspect your problem will go away.

I think there's a long-standing bug in the vmwpvs driver where the first
write to disk gets lost.  fsck?  First write is correcting the disk, it
gets lost, thus you still have a problem.  You could also do some
sacrificial first write to the disk, but I don't think you will see a
big problem if you just switch to an LSI SCSI or SAS driver.

Nick.

Reply via email to