[ceph-users] Re: Is ceph itself a single point of failure?

2021-11-22 Thread Marc
> > Many of us deploy ceph as a solution to storage high-availability. > > During the time, I've encountered a couple of moments when ceph refused > to > deliver I/O to VMs even when a tiny part of the PGs were stuck in > non-active states due to challenges on the OSDs. I do not know what you me

[ceph-users] Re: Is ceph itself a single point of failure?

2021-11-22 Thread Marius Leustean
> I do not know what you mean by this, you can tune this with your min size and replication. It is hard to believe that exactly harddrives fail in the same pg. I wonder if this is not more related to your 'non-default' config? In my setup size=2 and min_size=1. I had cases when 1 PG being stuck in

[ceph-users] Re: Is ceph itself a single point of failure?

2021-11-22 Thread Janne Johansson
Den mån 22 nov. 2021 kl 11:40 skrev Marius Leustean : > > I do not know what you mean by this, you can tune this with your min size > and replication. It is hard to believe that exactly harddrives fail in the > same pg. I wonder if this is not more related to your 'non-default' config? > > In my se

[ceph-users] Re: Is ceph itself a single point of failure?

2021-11-22 Thread Martin Verges
> In my setup size=2 and min_size=1 just don't. > Real case: host goes down, individual OSDs from other hosts started consuming >100GB RAM during backfill and get OOM-killed configure your cluster in a better way can help There will never be a single system that redundant that it has 100% uptim

[ceph-users] Re: Is ceph itself a single point of failure?

2021-11-22 Thread Eino Tuominen
On Monday, November 22, 2021 at 12:39 Marius Leustean wrote: > In my setup size=2 and min_size=1. I'm sorry, but that's the root cause of the problems you're seeing. You really want size=3, min_size=2 for your production cluster unless you have some specific uncommon use case and you really k