Re: Ongoing Btrfs stability issues

2018-03-14 Thread Goffredo Baroncelli
On 03/14/2018 08:27 PM, Austin S. Hemmelgarn wrote: > On 2018-03-14 14:39, Goffredo Baroncelli wrote: >> On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: >> [...] In btrfs, a checksum mismatch creates an -EIO error during the reading. In a conventional filesystem (or a btrfs file

Re: Ongoing Btrfs stability issues

2018-03-14 Thread Austin S. Hemmelgarn
On 2018-03-14 14:39, Goffredo Baroncelli wrote: On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: [...] In btrfs, a checksum mismatch creates an -EIO error during the reading. In a conventional filesystem (or a btrfs filesystem w/o datasum) there is no checksum, so this problem doesn't exis

Re: Ongoing Btrfs stability issues

2018-03-14 Thread Goffredo Baroncelli
On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: [...] >> >> In btrfs, a checksum mismatch creates an -EIO error during the reading. In a >> conventional filesystem (or a btrfs filesystem w/o datasum) there is no >> checksum, so this problem doesn't exist. >> >> I am curious how ZFS solves thi

Re: Ongoing Btrfs stability issues

2018-03-14 Thread Austin S. Hemmelgarn
On 2018-03-13 15:36, Goffredo Baroncelli wrote: On 03/12/2018 10:48 PM, Christoph Anton Mitterer wrote: On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: Unfortunately no, the likelihood might be 100%: there are some patterns which trigger this problem quite easily. See The link whi

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Christoph Anton Mitterer
On Tue, 2018-03-13 at 20:36 +0100, Goffredo Baroncelli wrote: > A checksum mismatch, is returned as -EIO by a read() syscall. This is > an event handled badly by most part of the programs. Then these programs must simply be fixed... otherwise they'll also fail under normal circumstances with btrfs,

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Goffredo Baroncelli
On 03/12/2018 10:48 PM, Christoph Anton Mitterer wrote: > On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: >> Unfortunately no, the likelihood might be 100%: there are some >> patterns which trigger this problem quite easily. See The link which >> I posted in my previous email. There w

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Patrik Lundquist
On 9 March 2018 at 20:05, Alex Adriaanse wrote: > > Yes, we have PostgreSQL databases running these VMs that put a heavy I/O load > on these machines. Dump the databases and recreate them with --data-checksums and Btrfs No_COW attribute. You can add this to /etc/postgresql-common/createcluster.

Re: Ongoing Btrfs stability issues

2018-03-12 Thread Christoph Anton Mitterer
On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: > Unfortunately no, the likelihood might be 100%: there are some > patterns which trigger this problem quite easily. See The link which > I posted in my previous email. There was a program which creates a > bad checksum (in COW+DATASUM m

Re: Ongoing Btrfs stability issues

2018-03-12 Thread Goffredo Baroncelli
On 03/11/2018 11:37 PM, Christoph Anton Mitterer wrote: > On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote: >> >> COW is needed to properly checksum the data. Otherwise is not >> possible to ensure the coherency between data and checksum (however I >> have to point out that BTRFS fails

Re: Ongoing Btrfs stability issues

2018-03-11 Thread Christoph Anton Mitterer
On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote: > > COW is needed to properly checksum the data. Otherwise is not > possible to ensure the coherency between data and checksum (however I > have to point out that BTRFS fails even in this case [*]). > We could rearrange this sentence, s

Re: Ongoing Btrfs stability issues

2018-03-11 Thread Goffredo Baroncelli
On 03/10/2018 03:29 PM, Christoph Anton Mitterer wrote: > On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote: >> So for OLTP workloads you definitely want nodatacow enabled, bear in >> mind this also disables crc checksumming, but your db engine should >> already have such functionality imple

Re: Ongoing Btrfs stability issues

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote: > So for OLTP workloads you definitely want nodatacow enabled, bear in > mind this also disables crc checksumming, but your db engine should > already have such functionality implemented in it. Unlike repeated claims made here on the list a

Re: Ongoing Btrfs stability issues

2018-03-10 Thread Nikolay Borisov
On 9.03.2018 21:05, Alex Adriaanse wrote: > Am I correct to understand that nodatacow doesn't really avoid CoW when > you're using snapshots? In a filesystem that's snapshotted Yes, so nodatacow won't interfere with how snapshots operate. For more information on that topic check the following

Re: Ongoing Btrfs stability issues

2018-03-09 Thread Alex Adriaanse
On Mar 9, 2018, at 3:54 AM, Nikolay Borisov wrote: > >> Sorry, I clearly missed that one. I have applied the patch you referenced >> and rebooted the VM in question. This morning we had another FS failure on >> the same machine that caused it to go into readonly mode. This happened >> after th

Re: Ongoing Btrfs stability issues

2018-03-09 Thread Nikolay Borisov
> Sorry, I clearly missed that one. I have applied the patch you referenced and > rebooted the VM in question. This morning we had another FS failure on the > same machine that caused it to go into readonly mode. This happened after > that device was experiencing 100% I/O utilization for some t

Re: Ongoing Btrfs stability issues

2018-03-08 Thread Alex Adriaanse
On Mar 2, 2018, at 11:29 AM, Liu Bo wrote: > On Thu, Mar 01, 2018 at 09:40:41PM +0200, Nikolay Borisov wrote: >> On 1.03.2018 21:04, Alex Adriaanse wrote: >>> Thanks so much for the suggestions so far, everyone. I wanted to report >>> back on this. Last Friday I made the following changes per su

Re: Ongoing Btrfs stability issues

2018-03-02 Thread Liu Bo
On Thu, Mar 01, 2018 at 09:40:41PM +0200, Nikolay Borisov wrote: > > > On 1.03.2018 21:04, Alex Adriaanse wrote: > > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn > > wrote: ... > > > [496003.641729] BTRFS: error (device xvdc) in __btrfs_free_extent:7076: > > errno=-28 No space left > >

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Qu Wenruo
On 2018年03月02日 03:04, Alex Adriaanse wrote: > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn > wrote: >> I would suggest changing this to eliminate the balance with '-dusage=10' >> (it's redundant with the '-dusage=20' one unless your filesystem is in >> pathologically bad shape), and addi

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Nikolay Borisov
On 1.03.2018 21:04, Alex Adriaanse wrote: > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn > wrote: >> I would suggest changing this to eliminate the balance with '-dusage=10' >> (it's redundant with the '-dusage=20' one unless your filesystem is in >> pathologically bad shape), and addin

Re: Ongoing Btrfs stability issues

2018-03-01 Thread Alex Adriaanse
On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn wrote: > I would suggest changing this to eliminate the balance with '-dusage=10' > (it's redundant with the '-dusage=20' one unless your filesystem is in > pathologically bad shape), and adding equivalent filters for balancing > metadata (which

Re: Ongoing Btrfs stability issues

2018-02-17 Thread Shehbaz Jaffer
>First of all, the ssd mount option does not have anything to do with having single or DUP metadata. Sorry about that, I agree with you. -nossd would not help in increasing reliability in any way. One alternative would be to format and force duplication of metadata during filesystem creation on SS

Re: Ongoing Btrfs stability issues

2018-02-17 Thread Hans van Kranenburg
On 02/17/2018 05:34 AM, Shehbaz Jaffer wrote: >> It's hosted on an EBS volume; we don't use ephemeral storage at all. The EBS >> volumes are all SSD > > I have recently done some SSD corruption experiments on small set of > workloads, so I thought I would share my experience. > > While creating

Re: Ongoing Btrfs stability issues

2018-02-16 Thread Shehbaz Jaffer
>It's hosted on an EBS volume; we don't use ephemeral storage at all. The EBS >volumes are all SSD I have recently done some SSD corruption experiments on small set of workloads, so I thought I would share my experience. While creating btrfs using mkfs.btrfs command for SSDs, by default the meta

Re: Ongoing Btrfs stability issues

2018-02-16 Thread Duncan
Austin S. Hemmelgarn posted on Fri, 16 Feb 2018 14:44:07 -0500 as excerpted: > This will probably sound like an odd question, but does BTRFS think your > storage devices are SSD's or not? Based on what you're saying, it > sounds like you're running into issues resulting from the > over-aggressive

Re: Ongoing Btrfs stability issues

2018-02-16 Thread Austin S. Hemmelgarn
into readonly mode. We've spent an enormous amount of time trying to recover corrupted filesystems, and the time that servers were down as a result of Btrfs  instability has accumulated to many days. We've made many changes to try to improve Btrfs stability: upgrading to newer kernels, se

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Nikolay Borisov
On 16.02.2018 06:54, Alex Adriaanse wrote: > >> On Feb 15, 2018, at 2:42 PM, Nikolay Borisov wrote: >> >> On 15.02.2018 21:41, Alex Adriaanse wrote: >>> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: So in all of the cases you are hitting some form of premature enospc. >>>

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Alex Adriaanse
> On Feb 15, 2018, at 2:42 PM, Nikolay Borisov wrote: > > On 15.02.2018 21:41, Alex Adriaanse wrote: >> >>> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: >>> >>> So in all of the cases you are hitting some form of premature enospc. >>> There was a fix that landed in 4.15 that should ha

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Nikolay Borisov
On 15.02.2018 21:41, Alex Adriaanse wrote: > >> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: >> >> So in all of the cases you are hitting some form of premature enospc. >> There was a fix that landed in 4.15 that should have fixed a rather >> long-standing issue with the way metadata re

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Alex Adriaanse
> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov wrote: > > So in all of the cases you are hitting some form of premature enospc. > There was a fix that landed in 4.15 that should have fixed a rather > long-standing issue with the way metadata reservations are satisfied, > namely: > > 996478ca9c

Re: Ongoing Btrfs stability issues

2018-02-15 Thread Nikolay Borisov
freezing, or the filesystem > going into readonly mode. We've spent an enormous amount of time trying to > recover corrupted filesystems, and the time that servers were down as a > result of Btrfs instability has accumulated to many days. > > We've made many changes to

Ongoing Btrfs stability issues

2018-02-15 Thread Alex Adriaanse
rmous amount of time trying to recover corrupted filesystems, and the time that servers were down as a result of Btrfs  instability has accumulated to many days. We've made many changes to try to improve Btrfs stability: upgrading to newer kernels, setting up nightly balances, setting up

Re: btrfs stability

2016-05-26 Thread Roman Mamedov
On Fri, 27 May 2016 00:42:07 +0200 Diego Torres wrote: > Btrfs is the only fs that can add drives one by one to an existing raid > setup, and use the new space inmediately, without replacing all the drives. Ext4, XFS, JFS or pretty much any FS which can be resized upwards can also do that, when

btrfs stability

2016-05-26 Thread Diego Torres
Hi there, I've been using btrfs with a raid5 configuration with 3 disks for 6 months, and then with 4 disks for a couple of months more. I run a weekly scrub, and a monthly balance. Btrfs is the only fs that can add drives one by one to an existing raid setup, and use the new space inmediately, wi

Re: btrfs stability

2013-01-28 Thread Josef Bacik
On Sat, Jan 26, 2013 at 01:27:11PM -0700, Andrew McNabb wrote: > Here's an update. I tried the new kernel, and I seem to be having some > new (possibly worse problems. In my ssh session, I'm seeing many errors > of this sort: > > Message from syslogd@guru at Jan 26 13:13:14 ... > kernel:[ 308.

Re: btrfs stability

2013-01-28 Thread Josef Bacik
On Sat, Jan 26, 2013 at 01:27:11PM -0700, Andrew McNabb wrote: > Here's an update. I tried the new kernel, and I seem to be having some > new (possibly worse problems. In my ssh session, I'm seeing many errors > of this sort: > > Message from syslogd@guru at Jan 26 13:13:14 ... > kernel:[ 308.

Re: btrfs stability

2013-01-26 Thread Andrew McNabb
Here's an update. I tried the new kernel, and I seem to be having some new (possibly worse problems. In my ssh session, I'm seeing many errors of this sort: Message from syslogd@guru at Jan 26 13:13:14 ... kernel:[ 308.223834] BUG: soft lockup - CPU#0 stuck for 23s! [btrfs-endio-wri:2073] Me

Re: btrfs stability

2013-01-25 Thread Andrew McNabb
On Fri, Jan 25, 2013 at 03:53:22PM -0500, Josef Bacik wrote: > > Actually for this one, how did you remove the disk? Did you just yank it out > while the box was running? Did you mount -o degraded and then delete the > device > and then remove it? How exactly did you get to this situation. Th

Re: btrfs stability

2013-01-25 Thread Andrew McNabb
On Fri, Jan 25, 2013 at 03:37:17PM -0500, Josef Bacik wrote: > > https://bugzilla.redhat.com/show_bug.cgi?id=903794 > > This one is just a allocator warning because the relocator doesn't do the > right > accounting for relocation. It's just complainig, we need to fix it but it > won't > keep it

Re: btrfs stability

2013-01-25 Thread Josef Bacik
On Fri, Jan 25, 2013 at 01:05:14PM -0700, Andrew McNabb wrote: > I tried creating a multi-device btrfs filesystem for the first time (on > Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems. I > had heard that btrfs is now reasonably stable, and though I expected to > possibly see

Re: btrfs stability

2013-01-25 Thread Josef Bacik
On Fri, Jan 25, 2013 at 01:05:14PM -0700, Andrew McNabb wrote: > I tried creating a multi-device btrfs filesystem for the first time (on > Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems. I > had heard that btrfs is now reasonably stable, and though I expected to > possibly see

btrfs stability

2013-01-25 Thread Andrew McNabb
I tried creating a multi-device btrfs filesystem for the first time (on Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems. I had heard that btrfs is now reasonably stable, and though I expected to possibly see a problem here or there, I was a little surprised at just how many pro