[ceph-users] Re: How to avoid 'bad port / jabber flood' = ceph killer?

2022-01-27 Thread Daniel Poelzleithner
On 27/01/2022 16:25, Harry G. Coin wrote: 1: What's a better way at 'mid-failure diagnosis time' to know directly which cable to pull instead of 'one by one until the offender is found'? 2: Related, in the same spirit as ceph's 'devicehealth', is there a way to profile 'usual and customary'

[ceph-users] Re: MON slow ops and growing MON store

2022-01-10 Thread Daniel Poelzleithner
Hi, > Like last time, after I restarted all five MONs, the store size > decreased and everything went back to normal. I also had to restart MGRs > and MDSs afterwards. This starts looking like a bug to me. In our case, we had a real database corruption in the rocksdb that caused version counter

[ceph-users] Re: URGENT: logm spam in ceph-mon store

2021-12-16 Thread Daniel Poelzleithner
Hi, I think I found the culpit: I changed the paxos debug level to 20 and fond this in mon store log: 2021-12-16T18:35:07.814+0100 7fec66e79700 20 mon.server6@0(leader).paxosservice(logm 5064..29067286) maybe_trim 5064~29067236 2021-12-16T18:35:07.814+0100 7fec66e79700 10 mon.serve

[ceph-users] URGENT: logm spam in ceph-mon store

2021-12-16 Thread Daniel Poelzleithner
releases ? kind regards poelzi On 30/11/2021 02:59, Daniel Poelzleithner wrote: Hi, for reasons we are not sure yet, ours ceph-mon grew to wopping 40GB for a small 6 node cluster. Rebuilding a mon causes the same size on the new node. After dumping the monstore-tools: # cat /tmp/ceph-mon-keys.log

[ceph-users] logm spam in ceph-mon store

2021-11-29 Thread Daniel Poelzleithner
Hi, for reasons we are not sure yet, ours ceph-mon grew to wopping 40GB for a small 6 node cluster. Rebuilding a mon causes the same size on the new node. After dumping the monstore-tools: > # cat /tmp/ceph-mon-keys.log| awk '{print $1}' | uniq -c 200 auth 2 config 11 health 203

[ceph-users] Re: [IMPORTANT NOTICE] Potential data corruption in Pacific

2021-10-29 Thread Daniel Poelzleithner
On 29/10/2021 11:23, Tobias Fischer wrote: > I would propose to either create a separate Mailing list for these kind > of Information from the Ceph Dev Community or use a Mailing list where > not that much is happening, e.g. ceph-announce> > What do you think? I like that, low traffic ML are e

[ceph-users] Re: Cephfs + inotify

2021-10-08 Thread Daniel Poelzleithner
On 08/10/2021 21:19, David Rivera wrote: > I've used inotify against a kernel mount a few months back. Worked fine for > me if I recall correctly. It can very much depend on the source of changes. It is easy to imagine changes originating from localhost get inotify events, while changes from othe

[ceph-users] Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

2020-09-18 Thread Daniel Poelzleithner
On 2020-09-17 19:21, vita...@yourcmc.ru wrote: > It does, RGW really needs SSDs for bucket indexes. CephFS also needs SSDs for > metadata in any setup that's used by more than 1 user :). Nah. I crashed my first cephfs with my music library, a 2 TB git annex repo, just me alone (slow ops on mds).

[ceph-users] Re: RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

2020-08-05 Thread Daniel Poelzleithner
On 2020-08-05 15:23, Matt Benjamin wrote: > There is new lifecycle processing logic backported to Octopus, it > looks like, in 15.2.3. I'm looking at the non-current calculation to > see if it could incorrectly rely on a stale value (from an eralier > entry). So, you don't care about samever ?

[ceph-users] Slow Ops start piling up, Mon Corruption ?

2020-06-16 Thread Daniel Poelzleithner
Hi, we had bad blocks on one OSD and around the same time a network switch outage, which seems to have caused some corruption on the mon service. > # ceph -s cluster: id: d7c5c9c7-a227-4e33-ab43-3f4aa1eb0630 health: HEALTH_WARN 1 daemons have recently crashed