Re: Ongoing Btrfs stability issues

2018-03-14 Thread Goffredo Baroncelli
On 03/14/2018 08:27 PM, Austin S. Hemmelgarn wrote: > On 2018-03-14 14:39, Goffredo Baroncelli wrote: >> On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: >> [...] In btrfs, a checksum mismatch creates an -EIO error during the reading. In a conventional filesystem (or a btrfs

Re: Ongoing Btrfs stability issues

2018-03-14 Thread Austin S. Hemmelgarn
On 2018-03-14 14:39, Goffredo Baroncelli wrote: On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: [...] In btrfs, a checksum mismatch creates an -EIO error during the reading. In a conventional filesystem (or a btrfs filesystem w/o datasum) there is no checksum, so this problem doesn't

Re: Ongoing Btrfs stability issues

2018-03-14 Thread Goffredo Baroncelli
On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: [...] >> >> In btrfs, a checksum mismatch creates an -EIO error during the reading. In a >> conventional filesystem (or a btrfs filesystem w/o datasum) there is no >> checksum, so this problem doesn't exist. >> >> I am curious how ZFS solves

Re: Ongoing Btrfs stability issues

2018-03-14 Thread Austin S. Hemmelgarn
On 2018-03-13 15:36, Goffredo Baroncelli wrote: On 03/12/2018 10:48 PM, Christoph Anton Mitterer wrote: On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: Unfortunately no, the likelihood might be 100%: there are some patterns which trigger this problem quite easily. See The link

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Christoph Anton Mitterer
On Tue, 2018-03-13 at 20:36 +0100, Goffredo Baroncelli wrote: > A checksum mismatch, is returned as -EIO by a read() syscall. This is > an event handled badly by most part of the programs. Then these programs must simply be fixed... otherwise they'll also fail under normal circumstances with

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Goffredo Baroncelli
On 03/12/2018 10:48 PM, Christoph Anton Mitterer wrote: > On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: >> Unfortunately no, the likelihood might be 100%: there are some >> patterns which trigger this problem quite easily. See The link which >> I posted in my previous email. There

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Patrik Lundquist
On 9 March 2018 at 20:05, Alex Adriaanse wrote: > > Yes, we have PostgreSQL databases running these VMs that put a heavy I/O load > on these machines. Dump the databases and recreate them with --data-checksums and Btrfs No_COW attribute. You can add this to

Re: Ongoing Btrfs stability issues

2018-03-12 Thread Christoph Anton Mitterer
On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: > Unfortunately no, the likelihood might be 100%: there are some > patterns which trigger this problem quite easily. See The link which > I posted in my previous email. There was a program which creates a > bad checksum (in COW+DATASUM

Re: Ongoing Btrfs stability issues

2018-03-12 Thread Goffredo Baroncelli
On 03/11/2018 11:37 PM, Christoph Anton Mitterer wrote: > On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote: >> >> COW is needed to properly checksum the data. Otherwise is not >> possible to ensure the coherency between data and checksum (however I >> have to point out that BTRFS fails

Re: Ongoing Btrfs stability issues

2018-03-11 Thread Christoph Anton Mitterer
On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote: > > COW is needed to properly checksum the data. Otherwise is not > possible to ensure the coherency between data and checksum (however I > have to point out that BTRFS fails even in this case [*]). > We could rearrange this sentence,

Re: Ongoing Btrfs stability issues

2018-03-11 Thread Goffredo Baroncelli
On 03/10/2018 03:29 PM, Christoph Anton Mitterer wrote: > On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote: >> So for OLTP workloads you definitely want nodatacow enabled, bear in >> mind this also disables crc checksumming, but your db engine should >> already have such functionality

Re: Ongoing Btrfs stability issues

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote: > So for OLTP workloads you definitely want nodatacow enabled, bear in > mind this also disables crc checksumming, but your db engine should > already have such functionality implemented in it. Unlike repeated claims made here on the list

Re: Ongoing Btrfs stability issues

2018-03-10 Thread Nikolay Borisov
On 9.03.2018 21:05, Alex Adriaanse wrote: > Am I correct to understand that nodatacow doesn't really avoid CoW when > you're using snapshots? In a filesystem that's snapshotted Yes, so nodatacow won't interfere with how snapshots operate. For more information on that topic check the

Re: Ongoing Btrfs stability issues

2018-03-09 Thread Alex Adriaanse
On Mar 9, 2018, at 3:54 AM, Nikolay Borisov wrote: > >> Sorry, I clearly missed that one. I have applied the patch you referenced >> and rebooted the VM in question. This morning we had another FS failure on >> the same machine that caused it to go into readonly mode. This

Re: Ongoing Btrfs stability issues

2018-03-09 Thread Nikolay Borisov
> Sorry, I clearly missed that one. I have applied the patch you referenced and > rebooted the VM in question. This morning we had another FS failure on the > same machine that caused it to go into readonly mode. This happened after > that device was experiencing 100% I/O utilization for some

Re: Ongoing Btrfs stability issues

2018-03-08 Thread Alex Adriaanse
On Mar 2, 2018, at 11:29 AM, Liu Bo wrote: > On Thu, Mar 01, 2018 at 09:40:41PM +0200, Nikolay Borisov wrote: >> On 1.03.2018 21:04, Alex Adriaanse wrote: >>> Thanks so much for the suggestions so far, everyone. I wanted to report >>> back on this. Last Friday I made the

Re: Ongoing Btrfs stability issues

2018-03-02 Thread Liu Bo
On Thu, Mar 01, 2018 at 09:40:41PM +0200, Nikolay Borisov wrote: > > > On 1.03.2018 21:04, Alex Adriaanse wrote: > > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn > > wrote: ... > > > [496003.641729] BTRFS: error (device xvdc) in __btrfs_free_extent:7076: > >

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Qu Wenruo
On 2018年03月02日 03:04, Alex Adriaanse wrote: > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn > wrote: >> I would suggest changing this to eliminate the balance with '-dusage=10' >> (it's redundant with the '-dusage=20' one unless your filesystem is in >>

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Nikolay Borisov
On 1.03.2018 21:04, Alex Adriaanse wrote: > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn > wrote: >> I would suggest changing this to eliminate the balance with '-dusage=10' >> (it's redundant with the '-dusage=20' one unless your filesystem is in >>

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Alex Adriaanse
On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn wrote: > I would suggest changing this to eliminate the balance with '-dusage=10' > (it's redundant with the '-dusage=20' one unless your filesystem is in > pathologically bad shape), and adding equivalent filters for

Re: Ongoing Btrfs stability issues

2018-02-17 Thread Shehbaz Jaffer
>First of all, the ssd mount option does not have anything to do with having single or DUP metadata. Sorry about that, I agree with you. -nossd would not help in increasing reliability in any way. One alternative would be to format and force duplication of metadata during filesystem creation on

Re: Ongoing Btrfs stability issues

2018-02-17 Thread Hans van Kranenburg
On 02/17/2018 05:34 AM, Shehbaz Jaffer wrote: >> It's hosted on an EBS volume; we don't use ephemeral storage at all. The EBS >> volumes are all SSD > > I have recently done some SSD corruption experiments on small set of > workloads, so I thought I would share my experience. > > While creating

Re: Ongoing Btrfs stability issues

2018-02-16 Thread Shehbaz Jaffer
>It's hosted on an EBS volume; we don't use ephemeral storage at all. The EBS >volumes are all SSD I have recently done some SSD corruption experiments on small set of workloads, so I thought I would share my experience. While creating btrfs using mkfs.btrfs command for SSDs, by default the

Re: Ongoing Btrfs stability issues

2018-02-16 Thread Duncan
Austin S. Hemmelgarn posted on Fri, 16 Feb 2018 14:44:07 -0500 as excerpted: > This will probably sound like an odd question, but does BTRFS think your > storage devices are SSD's or not? Based on what you're saying, it > sounds like you're running into issues resulting from the >

Re: Ongoing Btrfs stability issues

2018-02-16 Thread Austin S. Hemmelgarn
On 2018-02-15 11:18, Alex Adriaanse wrote: We've been using Btrfs in production on AWS EC2 with EBS devices for over 2 years. There is so much I love about Btrfs: CoW snapshots, compression, subvolumes, flexibility, the tools, etc. However, lack of stability has been a serious ongoing issue 

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Nikolay Borisov
On 16.02.2018 06:54, Alex Adriaanse wrote: > >> On Feb 15, 2018, at 2:42 PM, Nikolay Borisov wrote: >> >> On 15.02.2018 21:41, Alex Adriaanse wrote: >>> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: So in all of the cases you are

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Alex Adriaanse
> On Feb 15, 2018, at 2:42 PM, Nikolay Borisov wrote: > > On 15.02.2018 21:41, Alex Adriaanse wrote: >> >>> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: >>> >>> So in all of the cases you are hitting some form of premature enospc. >>> There was a

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Nikolay Borisov
On 15.02.2018 21:41, Alex Adriaanse wrote: > >> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: >> >> So in all of the cases you are hitting some form of premature enospc. >> There was a fix that landed in 4.15 that should have fixed a rather >> long-standing issue with

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Alex Adriaanse
> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: > > So in all of the cases you are hitting some form of premature enospc. > There was a fix that landed in 4.15 that should have fixed a rather > long-standing issue with the way metadata reservations are satisfied, >

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Nikolay Borisov
On 15.02.2018 18:18, Alex Adriaanse wrote: > We've been using Btrfs in production on AWS EC2 with EBS devices for over 2 > years. There is so much I love about Btrfs: CoW snapshots, compression, > subvolumes, flexibility, the tools, etc. However, lack of stability has been > a serious ongoing

Ongoing Btrfs stability issues

2018-02-15 Thread Alex Adriaanse
We've been using Btrfs in production on AWS EC2 with EBS devices for over 2 years. There is so much I love about Btrfs: CoW snapshots, compression, subvolumes, flexibility, the tools, etc. However, lack of stability has been a serious ongoing issue for us, and we're getting to the point that