On Tue, Jan 12, 2021 at 11:05 PM Chris Murphy <li...@colorremedies.com> wrote: > > > Short version: Josef (Btrfs dev) and I agree there's probably > something wrong with the drive. The advice is to replace it, just > because trying to investigate further isn't worth your time, and also > necessarily puts your data at risk. > > Longer version: > > LVM thinp uses device-mapper as its backend to do all the work, and we > see checksum errors in the months old report. Where LVM thick has a > simpler on-disk format, so it's not as likely to discover such a > problem. And LUKS/dm-crypt is also using device-mapper as its backend > to do all work. So the two problems have two things in common: the > drive, and device-mapper. It's more probable there's an issue with the > drive, just because if there was a problem with device-mapper, we'd > have dozens of reports of it at the rate you're seeing this problem > once every couple of months (if that trend holds). > > Is it possible LVM+ext4 on this same drive is more reliable? I think > that's specious. The problem can be masked due to much less stringent > integrity guarantees, i.e. there are no data checksums. Since the data > is overwhelmingly the largest target for corruption, just being a much > bigger volume compared to file system metadata. All things being > equal, there's a greater chance the problem affects data. On the other > hand, if it adversely impacts metadata, it could be true that e2fsck > has a better chance of fixing the damage than btrfsck right now. Of > course no fsck fixes your data. > > So if you keep using the drive, you're in a catch-22. Btrfs is more > sensitive because everything is checksummed, so the good news is > you'll be informed about it fairly quickly, the bad news is that it's > harder to repair in this case. If you revert to LVM+ext4 the automatic > fsck at startup might fix up these problems as they occur, but it's > possible undetected data corruption shows up later or replicates into > your backups. > > Regardless of what you decide to do, definitely keep frequent backups > of anything important. >
Ok first I don't mean to imply that I don't believe you or Josef when you say there is something wrong with my HDD or that you are wrong. But I have a lot of questions that I want to discuss : 1) Is it possible there is nothing wrong with my drive, but there is something with my BIOS/HDD Firmware ? May be my firmware is not capable of BTRFS's stringent write requirements ? I say this because I have used Windows with NTFS on this machine, I have used Ubuntu with EXT4, and Fedora with thick-LVM with EXT4. None of these configurations gave me any such problems. 2) Since there is a high likelihood that my filesystem is not completely fixed, then when I take a backup using partclone, dd or clonezilla won't those errors be carried over ? Even if I buy a new drive and restore the backup, I still might get crashes. 3) This is a weird question but can you recommend me a HDD that I can buy which can handle BTRFS ? Or even which features I might look for while buying (not a SSD but a HDD) 4) My manufacturer HP, does not make firmware updates for Linux, only for Windows. So is there a way to update the firmware(if available) without being on Windows ? Any ideas? Would a Windows PXE help ? 5) When you say "checksum errors in the month's old report" - which report are you referring to ? The thin-LVM crash or the smartctl crash ? -- Regards, Sreyan Chakravarty _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org