[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
.84 to 1.22 is a pretty big. I suspect your balancer is turned off or something in your CRUSH map is confounding it. > On Mar 20, 2024, at 5:20 PM, Michael Worsham > wrote: > > It seems to be relatively close to that +/- 1.00 range. > > ubuntu@juju-5dcfd8-3-lxd-2:~$ sudo ceph osd df > ID

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Anthony D'Atri
Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them behind. > On Mar 20, 2024, at 5:28 PM, Igor Fedotov wrote: > > Hi Thorne, > > unfortunately I'm unaware of any tools high level enough to easily map files > to rados objects without deep undestanding how this w

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov
Thorne, if that's a bug in Ceph which causes space leakage you might be unable to reclaim the space without total purge of the pool. The problem is that we still uncertain if this is a leakage or something else. Hence the need for more thorough research. Thanks, Igor On 3/20/2024 9:13 PM

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov
Thorne, if that's a bug in Ceph which causes space leakage you might be unable to reclaim the space without total purge of the pool. The problem is that we still uncertain if this is a leakage or something else. Hence the need for more thorough research. Thanks, Igor On 3/20/2024 9:13 PM

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov
Hi Thorne, unfortunately I'm unaware of any tools high level enough to easily map files to rados objects without deep undestanding how this works. You might want to try "rados ls" command to get the list of all the objects in the cephfs data pool. And then  learn how that mapping is performed

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Michael Worsham
It seems to be relatively close to that +/- 1.00 range. ubuntu@juju-5dcfd8-3-lxd-2:~$ sudo ceph osd df ID CLASS WEIGHTREWEIGHT SIZE RAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS 1ssd 18.19040 1.0 18 TiB 10 TiB 10 TiB 11 GiB 38 GiB 8.1 TiB

[ceph-users] Re: OSD does not die when disk has failures

2024-03-20 Thread Igor Fedotov
Hi Robert, I presume the plan was to support handling EIO at upper layers. But apparently that hasn't been completed. Or there are some bugs... Will take a look. Thanks, Igor On 3/19/2024 3:36 PM, Robert Sander wrote: Hi, On 3/19/24 13:00, Igor Fedotov wrote: translating EIO to upper l

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
Looks like you have one device class and the same replication on all pools, which makes that simpler. Your MAX AVAIL figures are lower than I would expect if you're using size=3, so I'd check if you have the balancer enabled, if it's working properly. Run ceph osd df and look at the VAR colum

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Michael Worsham
I had a request from the upper management wanting to use SolarWinds to be able to extract what I am looking at and have SolarWinds track it in terms of total available space, remaining space of the overall cluster, and I guess would be the current RGW pools/buckets we have and their allocated si

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Bandelow, Gunnar
Hi, i just wanted to mention, that i am running a cluster with reef 18.2.1 with the same issue. 4 PGs start to deepscrub but dont finish since mid february. In the pg dump they are shown as scheduled for deep scrub. They sometimes change their status from active+clean to active+clean+scrubbing+de

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Michel Jouvin
Hi Rafael, Good to know I am not alone! Additional information ~6h after the OSD restart: over the 20 PGs impacted, 2 have been processed successfully... I don't have a clear picture on how Ceph prioritize the scrub of one PG over another, I had thought that the oldest/expired scrubs are take

[ceph-users] node-exporter error

2024-03-20 Thread quag...@bol.com.br
Hello, After some time, I'm adding some more disks on a new machine in the ceph cluster. However, there is a container that is not rising. It is the "node-exporter". Below is an excerpt from the log that reports the error: Mar 20 15:51:08 adafn02 ceph-da43a27a-eee8-11eb-9c87-525

[ceph-users] Re: Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Anthony D'Atri
> On Mar 20, 2024, at 14:42, Michael Worsham > wrote: > > Is there an easy way to poll a Ceph cluster to see how much space is available `ceph df` The exporter has percentages per pool as well. > and how much space is available per bucket? Are you using RGW quotas? > > Looking for a wa

[ceph-users] Need easy way to calculate Ceph cluster space for SolarWinds

2024-03-20 Thread Michael Worsham
Is there an easy way to poll a Ceph cluster to see how much space is available and how much space is available per bucket? Looking for a way to use SolarWinds to monitor the entire Ceph cluster space utilization and then also be able to break down each RGW bucket to see how much space it was pr

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Thorne Lawler
Alexander, Thanks for explaining this. As I suspected, this is a high abstract pursuit of what caused the problem, and while I'm sure this makes sense for Ceph developers, it isn't going to happen in this case. I don't care how it got this way- the tools used to create this pool will never b

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread quag...@bol.com.br
Hi,      I upgraded a cluster 2 weeks ago here. The situation is the same as Michel.      A lot of PGs no scrubbed/deep-scrubed. Rafael.___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Alexander E. Patrakov
Hi Thorne, The idea is quite simple. By retesting the leak with a separate pool, used by nobody except you, in the case if the leak exists and is reproducible (which is not a given), you can definitely pinpoint it without giving any chance to the alternate hypothesis "somebody wrote some data in p

[ceph-users] Re: Why a lot of pgs are degraded after host(+osd) restarted?

2024-03-20 Thread Joshua Baergen
Hi Jaemin, It is normal for PGs to become degraded during a host reboot, since a copy of the data was taken offline and needs to be resynchronized after the host comes back. Normally this is quick, as the recovery mechanism only needs to modify those objects that have changed while the host is dow

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Anthony D'Atri
Suggest issuing an explicit deep scrub against one of the subject PGs, see if it takes. > On Mar 20, 2024, at 8:20 AM, Michel Jouvin > wrote: > > Hi, > > We have a Reef cluster that started to complain a couple of weeks ago about > ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looki

[ceph-users] Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Michel Jouvin
Hi, We have a Reef cluster that started to complain a couple of weeks ago about ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looking at it since a few days, I saw this affect only those PGs that could not be scrubbed since mid-February. Old the other PGs are regularly scrubbed. I d

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Thorne Lawler
Alexander, I'm happy to create a new pool if it will help, but I don't presently see how creating a new pool will help us to identify the source of the 10TB discrepancy in this original cephfs pool. Please help me to understand what you are hoping to find...? On 20/03/2024 6:35 pm, Alexander

[ceph-users] cephadm auto disk preparation and OSD installation incomplete

2024-03-20 Thread Kuhring, Mathias
Dear ceph community, We have trouble with new disks not being properly prepared resp. OSDs not being fully installed by cephadm. We just added one new node each with ~40 HDDs each to two of our ceph clusters. In one cluster all but 5 disks got installed automatically. In the other none got instal

[ceph-users] Why a lot of pgs are degraded after host(+osd) restarted?

2024-03-20 Thread Jaemin Joo
Hi all, While I am testing host failover, there are a lot of degraded pg after host(+osd) is up. In spite that it takes a short time to restart, I don't understand why pg should check all objects related to the failed host(+osd). I'd like to know how to prevent to become degraded pg when osd resta

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Alexander E. Patrakov
Thorne, That's why I asked you to create a separate pool. All writes go to the original pool, and it is possible to see object counts per-pool. On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler wrote: > Alexander, > > Thank you, but as I said to Igor: The 5.5TB of files on this filesystem > are vir

[ceph-users] Re: mon stuck in probing

2024-03-20 Thread faicker mo
Hi, this is the debug log, 2024-03-13T11:14:28.087+0800 7f6984a95640 4 mon.memb4@3(probing) e6 probe_timeout 0x5650c2b0c3a0 2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6 bootstrap 2024-03-13T11:14:28.087+0800 7f6984a95640 10 mon.memb4@3(probing) e6 sync_reset_requester 2024