On Tue, Jan 12, 2021 at 11:05 PM Chris Murphy <li...@colorremedies.com> wrote:
>
>
> Short version: Josef (Btrfs dev) and I agree there's probably
> something wrong with the drive. The advice is to replace it, just
> because trying to investigate further isn't worth your time, and also
> necessarily puts your data at risk.
>
> Longer version:
>
> LVM thinp uses device-mapper as its backend to do all the work, and we
> see checksum errors in the months old report. Where LVM thick has a
> simpler on-disk format, so it's not as likely to discover such a
> problem. And LUKS/dm-crypt is also using device-mapper as its backend
> to do all work. So the two problems have two things in common: the
> drive, and device-mapper. It's more probable there's an issue with the
> drive, just because if there was a problem with device-mapper, we'd
> have dozens of reports of it at the rate you're seeing this problem
> once every couple of months (if that trend holds).
>
> Is it possible LVM+ext4 on this same drive is more reliable? I think
> that's specious. The problem can be masked due to much less stringent
> integrity guarantees, i.e. there are no data checksums. Since the data
> is overwhelmingly the largest target for corruption, just being a much
> bigger volume compared to file system metadata. All things being
> equal, there's a greater chance the problem affects data. On the other
> hand, if it adversely impacts metadata, it could be true that e2fsck
> has a better chance of fixing the damage than btrfsck right now. Of
> course no fsck fixes your data.
>
> So if you keep using the drive, you're in a catch-22. Btrfs is more
> sensitive because everything is checksummed, so the good news is
> you'll be informed about it fairly quickly, the bad news is that it's
> harder to repair in this case. If you revert to LVM+ext4 the automatic
> fsck at startup might fix up these problems as they occur, but it's
> possible undetected data corruption shows up later or replicates into
> your backups.
>
> Regardless of what you decide to do, definitely keep frequent backups
> of anything important.
>

Ok first I don't mean to imply that I don't believe you or Josef when
you say there is something wrong with my HDD or that you are wrong.
But I have a lot of questions that I want to discuss :

1) Is it possible there is nothing wrong with my drive, but there is
something with my BIOS/HDD Firmware ? May be my firmware is not
capable of BTRFS's stringent write requirements ?

I say this because I have used Windows with NTFS on this machine, I
have used Ubuntu with EXT4, and Fedora with thick-LVM with EXT4. None
of these configurations gave me any such problems.

2) Since there is a high likelihood that my filesystem is not
completely fixed, then when I take a backup using partclone, dd or
clonezilla won't those errors be carried over ?

Even if I buy a new drive and restore the backup, I still might get crashes.

3) This is a weird question but can you recommend me a HDD that I can
buy which can handle BTRFS ? Or even which features I might look for
while buying (not a SSD but a HDD)

4) My manufacturer HP, does not make firmware updates for Linux, only
for Windows. So is there a way to update the firmware(if available)
without being on Windows ? Any ideas? Would a Windows PXE help ?

5) When you say "checksum errors in the month's old report" - which
report are you referring to ? The thin-LVM crash or the smartctl crash
?

-- 
Regards,
Sreyan Chakravarty
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org

Reply via email to