[ceph-users] Re: Latency increase after upgrade 14.2.8 to 14.2.16

2021-02-12 Thread Björn Dolkemeier
Thanks for the quick reply, Frank. Sorry, the graphs/attachment where filtered. Here is an example of one latency: https://drive.google.com/file/d/1qSWmSmZ6JXVweepcoY13ofhfWXrBi2uZ/view?usp=sharing I’m aware

[ceph-users] Latency increase after upgrade 14.2.8 to 14.2.16

2021-02-12 Thread Björn Dolkemeier
Hi, after upgrading Ceph from 14.2.8 to 14.2.16 we experienced increased latencies. There were no changes in hardware, configuration, workload or networking, just a rolling-update via ceph-ansible on running production cluster. The cluster consists of 16 OSDs (all SSD) over 4 Nodes. The VMs

[ceph-users] Re: Removing secondary data pool from mds

2021-02-12 Thread Frank Schilder
Hi Michael, I also think it would be safe to delete. The object count might be an incorrect reference count of lost objects that didn't get decremented. This might be fixed by running a deep scrub over all PGs in that pool. I don't know rados well enough to find out where such an object count

[ceph-users] Re: Network design issues

2021-02-12 Thread Frank Schilder
By the way, thanks for reminding me of bmon! Of course. I have a decent collection of live monitoring tools installed and bmon was one of the first. How could I forget? Another tool I became good friends with is atop. It gives a really good overview of the entire system, including network,

[ceph-users] Re: Removing secondary data pool from mds

2021-02-12 Thread Michael Thomas
Hi Frank, We're not using snapshots. I was able to run: ceph daemon mds.ceph1 dump cache /tmp/cache.txt ...and scan for the stray object to find the cap id that was accessing the object. I matched this with the entity name in: ceph daemon mds.ceph1 session ls ...to determine the

[ceph-users] Re: Backups of monitor

2021-02-12 Thread Frank Schilder
> So if you are doing maintenance on a mon host in a 5 mon cluster you will > still have 3 in the quorum. Exactly. I was in exactly this situation, doing maintenance on 1 and screwing up number 2. Service outage. I will update to 5 as soon as I can. Secondly: I actually do not believe a MON

[ceph-users] Re: Increasing QD=1 performance (lowering latency)

2021-02-12 Thread Mark Nelson
FWIW, the current RDMA implementation is part of the async messenger: https://github.com/ceph/ceph/tree/master/src/msg/async/rdma Haomai probably is the leading authority on it, though we just got a shipment of new performance test gear for the community lab that theoretically will let us

[ceph-users] Re: Backups of monitor

2021-02-12 Thread Anthony D'Atri
>> So if you are doing maintenance on a mon host in a 5 mon cluster you will >> still have 3 in the quorum. > > Exactly. I was in exactly this situation, doing maintenance on 1 and screwing > up number 2. Service outage Been there. I had a cluster that nominally had 5 mons. Two suffered

[ceph-users] Re: Storage-class split objects

2021-02-12 Thread Marcelo
Hello Casey. Thanks again, I still couldn't quite understand this issue of objects in Ceph, and with your explanation it became clearer. Thank you, Marcelo Em qui., 11 de fev. de 2021 às 13:36, Casey Bodley escreveu: > On Thu, Feb 11, 2021 at 9:31 AM Marcelo wrote: > > > > Hi Casey, thank you

[ceph-users] Re: Backups of monitor

2021-02-12 Thread Freddy Andersen
I would say everyone recommends at least 3 monitors and since they need to be 1,3,5 or 7 I always read that as 5 is the best number (if you have 5 servers in your cluster). The other reason is high availability since the MONs use Paxos for the quorum and I like to have 3 in the quorum you need

[ceph-users] Re: Backups of monitor

2021-02-12 Thread huxia...@horebdata.cn
Why 5 instead of 3 MONs are required? huxia...@horebdata.cn From: Freddy Andersen Date: 2021-02-12 16:05 To: huxia...@horebdata.cn; Marc; Michal Strnad; ceph-users Subject: Re: [ceph-users] Re: Backups of monitor I would say production should have 5 MON servers From: huxia...@horebdata.cn

[ceph-users] Re: Backups of monitor

2021-02-12 Thread Freddy Andersen
I would say production should have 5 MON servers From: huxia...@horebdata.cn Date: Friday, February 12, 2021 at 7:59 AM To: Marc , Michal Strnad , ceph-users Subject: [ceph-users] Re: Backups of monitor Normally any production Ceph cluster will have at least 3 MONs, does it reall need a

[ceph-users] Re: Backups of monitor

2021-02-12 Thread huxia...@horebdata.cn
Normally any production Ceph cluster will have at least 3 MONs, does it reall need a backup of MON? samuel huxia...@horebdata.cn From: Marc Date: 2021-02-12 14:36 To: Michal Strnad; ceph-users@ceph.io Subject: [ceph-users] Re: Backups of monitor So why not create an extra start it only when

[ceph-users] Network design issues

2021-02-12 Thread Frank Schilder
Dear cephers, I believe we are facing a bottleneck due to an inappropriate overall network design and would like to hear about experience and recommendations. I start with a description of the urgent problem/question and follow up with more details/questions. These observations are on our HPC

[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-02-12 Thread Stefan Kooman
On 1/27/21 9:08 AM, Martin Hronek wrote: So before the next MDS the FS config where changed to one active and one standby-replay node, the idea was that since the MDS replay nodes follows the active one the handover would be smoother. The active state was reached faster, but we still

[ceph-users] Re: Backups of monitor

2021-02-12 Thread Marc
So why not create an extra start it only when you want to make a backup, wait until it is up to date, stop it and then stop it to back it up? > -Original Message- > From: Michal Strnad > Sent: 11 February 2021 21:15 > To: ceph-users@ceph.io > Subject: [ceph-users] Backups of monitor >

[ceph-users] Backups of monitor

2021-02-12 Thread Michal Strnad
Hi all, We are looking for a proper solution for backups of monitor (all maps that they hold). On the internet we found advice that we have to stop one of monitor, back it up (dump) and start daemon again. But this is not right approach due to risk of loosing quorum and need of synchronization

[ceph-users] Re: reinstalling node with orchestrator/cephadm

2021-02-12 Thread Kenneth Waegeman
On 08/02/2021 16:52, Kenneth Waegeman wrote: Hi Eugen, all, Thanks for sharing your results! Since we have multiple clusters and clusters with +500 OSDs, this solution is not feasible for us. In the meantime I created an issue for this : https://tracker.ceph.com/issues/49159 Hi all, For

[ceph-users] Re: Cannot access "Object Gateway" in dashboard after setting rgw api keys

2021-02-12 Thread Troels Hansen
Completely missed the "set-rgw-api-user-id" as its not mentioned in the guide for adding the users.That helped, as I can now access the "clients" list However, now get 500 when accessing anything but the "clients" page. Suspect I need something, but documentation is pretty much non-existing.

[ceph-users] Re: Increasing QD=1 performance (lowering latency)

2021-02-12 Thread Max Krasilnikov
День добрий! Thu, Feb 11, 2021 at 04:00:31PM +0100, joachim.kraftmayer wrote: > Hi Wido, > > do you know what happened to mellanox's ceph rdma project of 2018? We tested ceph/rdma on Mellanox ConnectX-4 Lx during one year and saw no visible benefits. But it was strange connection outages