Re: [ceph-users] 2x replication: A BIG warning

2016-12-12 Thread Oliver Humpage
> On 12 Dec 2016, at 07:59, Wido den Hollander wrote: > > As David already said, when all OSDs are up and in for a PG Ceph will wait > for ALL OSDs to Ack the write. Writes in RADOS are always synchronous. Apologies, I missed that. Clearly I’ve been misunderstanding min_size for a while then:

Re: [ceph-users] 2x replication: A BIG warning

2016-12-11 Thread Wido den Hollander
> Op 9 december 2016 om 22:31 schreef Oliver Humpage : > > > > > On 7 Dec 2016, at 15:01, Wido den Hollander wrote: > > > > I would always run with min_size = 2 and manually switch to min_size = 1 if > > the situation really requires it at that moment. > > Thanks for this thread, it’s been

Re: [ceph-users] 2x replication: A BIG warning

2016-12-09 Thread David Turner
rshed.co.uk] Sent: Friday, December 09, 2016 2:31 PM To: ceph-us...@ceph.com Subject: Re: [ceph-users] 2x replication: A BIG warning On 7 Dec 2016, at 15:01, Wido den Hollander mailto:w...@42on.com>> wrote: I would always run with min_size = 2 and manually switch to min_size = 1 if the si

Re: [ceph-users] 2x replication: A BIG warning

2016-12-09 Thread Oliver Humpage
> On 7 Dec 2016, at 15:01, Wido den Hollander wrote: > > I would always run with min_size = 2 and manually switch to min_size = 1 if > the situation really requires it at that moment. Thanks for this thread, it’s been really useful. I might have misunderstood, but does min_size=2 also mean th

Re: [ceph-users] 2x replication: A BIG warning

2016-12-09 Thread Kees Meijs
Hi Wido, Since it's a Friday night, I decided to just go for it. ;-) It took a while to rebalance the cache tier but all went well. Thanks again for your valuable advice! Best regards, enjoy your weekend, Kees On 07-12-16 14:58, Wido den Hollander wrote: >> Anyway, any things to consider or cou

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 15:54 schreef LOIC DEVULDER : > > > Hi Wido, > > > As a Ceph consultant I get numerous calls throughout the year to help people > > with getting their broken Ceph clusters back online. > > > > The causes of downtime vary vastly, but one of the biggest causes is that > >

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread LOIC DEVULDER
> -Message d'origine- > De : Wido den Hollander [mailto:w...@42on.com] > Envoyé : mercredi 7 décembre 2016 16:01 > À : ceph-us...@ceph.com; LOIC DEVULDER - U329683 > Objet : RE: [ceph-users] 2x replication: A BIG warning > > > > Op 7 december 201

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread LOIC DEVULDER
Hi Wido, > As a Ceph consultant I get numerous calls throughout the year to help people > with getting their broken Ceph clusters back online. > > The causes of downtime vary vastly, but one of the biggest causes is that > people use replication 2x. size = 2, min_size = 1. We are building a Ceph

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Peter Maloney
On 12/07/16 14:58, Wido den Hollander wrote: >> Op 7 december 2016 om 11:29 schreef Kees Meijs : >> >> >> Hi Wido, >> >> Valid point. At this moment, we're using a cache pool with size = 2 and >> would like to "upgrade" to size = 3. >> >> Again, you're absolutely right... ;-) >> >> Anyway, any thin

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 11:29 schreef Kees Meijs : > > > Hi Wido, > > Valid point. At this moment, we're using a cache pool with size = 2 and > would like to "upgrade" to size = 3. > > Again, you're absolutely right... ;-) > > Anyway, any things to consider or could we just: > > 1. Run "cep

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 10:06 schreef Dan van der Ster : > > > Hi Wido, > > Thanks for the warning. We have one pool as you described (size 2, > min_size 1), simply because 3 replicas would be too expensive and > erasure coding didn't meet our performance requirements. We are well > aware of th

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Дмитрий Глушенок
Hi, The assumptions are: - OSD nearly full - HDD vendor not hides real LSE (latent sector error) rate like 1 in 10^18 under "not more than 1 unrecoverable error in 10^15 bits read" In case of disk (OSD) failure Ceph have to read copy of the disk from other nodes (to restore redundancy). More yo

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Christian Balzer
Hello, On Wed, 7 Dec 2016 14:49:28 +0300 Дмитрий Глушенок wrote: > RAID10 also will suffer from LSE on big disks, isn't it? > IF LSE stands for latent sector errors, then yes, but that's not limited to large disks per se. And you counter it by having another replica and checksums like in ZFS o

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wolfgang Link
Hi I'm very interested in this calculation. What assumption do you have done? Network speed, osd degree of fulfilment, etc? Thanks Wolfgang On 12/07/2016 11:16 AM, Дмитрий Глушенок wrote: > Hi, > > Let me add a little math to your warning: with LSE rate of 1 in 10^15 on > modern 8 TB disks

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Дмитрий Глушенок
RAID10 also will suffer from LSE on big disks, isn't it? > 7 дек. 2016 г., в 13:35, Christian Balzer написал(а): > > > > Hello, > > On Wed, 7 Dec 2016 13:16:45 +0300 Дмитрий Глушенок wrote: > >> Hi, >> >> Let me add a little math to your warning: with LSE rate of 1 in 10^15 on >> modern 8

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Christian Balzer
Hello, On Wed, 7 Dec 2016 13:16:45 +0300 Дмитрий Глушенок wrote: > Hi, > > Let me add a little math to your warning: with LSE rate of 1 in 10^15 on > modern 8 TB disks there is 5,8% chance to hit LSE during recovery of 8 TB > disk. So, every 18th recovery will probably fail. Similarly to RAI

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Kees Meijs
Hi Wido, Valid point. At this moment, we're using a cache pool with size = 2 and would like to "upgrade" to size = 3. Again, you're absolutely right... ;-) Anyway, any things to consider or could we just: 1. Run "ceph osd pool set cache size 3". 2. Wait for rebalancing to complete. 3. Run "c

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Дмитрий Глушенок
Hi, Let me add a little math to your warning: with LSE rate of 1 in 10^15 on modern 8 TB disks there is 5,8% chance to hit LSE during recovery of 8 TB disk. So, every 18th recovery will probably fail. Similarly to RAID6 (two parity disks) size=3 mitigates the problem. By the way - why it is a c

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Dan van der Ster
Hi Wido, Thanks for the warning. We have one pool as you described (size 2, min_size 1), simply because 3 replicas would be too expensive and erasure coding didn't meet our performance requirements. We are well aware of the risks, but of course this is a balancing act between risk and cost. Anywa

[ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
Hi, As a Ceph consultant I get numerous calls throughout the year to help people with getting their broken Ceph clusters back online. The causes of downtime vary vastly, but one of the biggest causes is that people use replication 2x. size = 2, min_size = 1. In 2016 the amount of cases I have