Hi Folks
We've noticed that in a cluster of 21 nodes (5 mgrs&mons & 504 OSDs with 24
per node) that the mgr's are, after a non specific period of time, dropping
out of the cluster. The logs only show the following:
debug 2020-12-10T02:02:50.409+ 7f1005840700 0 log_channel(cluster) log
[DBG]
Do you have the prometheus module enabled? Turn that off, it's causing
issues. I replaced it with another ceph exporter from Github and almost
forgot about it.
Here's the relevant issue report:
https://tracker.ceph.com/issues/39264#change-179946
On 10/12/2020 16:43, Welby McRoberts wrote:
H
FYI, this is the ceph-exporter we're using at the moment:
https://github.com/digitalocean/ceph_exporter
It's not as good, but it does the job mostly. Some more specific metrics
are missing, but the majority is there.
On 10/12/2020 19:01, Janek Bevendorff wrote:
Do you have the prometheus mod
Hi all,
Got an odd issue that I'm not sure how to solve on our Nautilus 14.2.9 EC
cluster.
The primary OSD of an EC 8+3 PG died this morning with a very sad disk
(thousands of pending sectors). After the down out interval a new 'up' primary
was assigned and the backfill started. Twenty minutes
Hi,
I am uisng 15.2.7 on CentOS 8.1. I have a number of old buckets that are listed
with
# radosgw-admin metadata list bucket.instance
but are not listed with:
# radosgw-admin bucket list
Lets say that one of them is:
'old-bucket' and its instance is 'c100feda-5e16-48a4-b908-7be61aa877ef.123.1'
Hi Janek,
We realize this, we referenced that issue in our initial email. We do want
the metrics exposed by Ceph internally, and would prefer to work towards a
fix upstream. We appreciate the suggestion for a workaround, however!
Again, we're happy to provide whatever information we can that woul
A few more things of note after more poking with the help of Dan vdS.
1) The object that the backfill is crashing on has an mtime of a few minutes
before the original primary died this morning, and a 'rados get' gives an
input/output error. So it looks like a new object that was possibly corrupt
Hi all,
I want to benchmark my production cluster with cbt. I read a bit of the
code and I see something strange in it, for example, it's going to create
ceph-osd by it selves (
https://github.com/ceph/cbt/blob/master/cluster/ceph.py#L373) and also
shutdown the whole cluster!! (
https://github.com