[ceph-users] Re: fixing future rctimes

2021-03-23 Thread Byrne, Thomas (STFC,RAL,SC)
Thanks Dan (and Theo), Ah, that's a shame, that PR looks good to me, and would certainly allow me to restore some order to our futuristic rctimes (and use them for backup purposes). I'm not sure if there is there anything I can do to help get it merged, but happy to help if possible. Cheers,

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Dan van der Ster
While you're watching things, if an OSD is getting too close for comfort to the full ratio, you can temporarily increase it, e.g. ceph osd set-full-ratio 0.96 But don't set that too high -- you can really break an OSD if it gets 100% full (and then can't delete objects or whatever...) -- dan

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Boris Behrens
Ok, then I will try to reweight the most filled OSDs to .95 and see if this helps. Am Di., 23. März 2021 um 19:13 Uhr schrieb Dan van der Ster < d...@vanderster.com>: > Data goes to *all* PGs uniformly. > Max_avail is limited by the available space on the most full OSD -- > you should pay close a

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Dan van der Ster
Data goes to *all* PGs uniformly. Max_avail is limited by the available space on the most full OSD -- you should pay close attention to those and make sure they are moving in the right direction (decreasing!) Another point -- IMHO you should aim to get all PGs active+clean before you add yet anoth

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Stefan Kooman
On 3/23/21 2:52 PM, Dan van der Ster wrote: Not sure. But anyway ceph has been skipping interfaces named "lo" since v10, but then dropped that in 14.2.18 (by accident, IMO). You should be able to get your osds listening to the correct IP using cluster network = 10.1.50.0/8 public network = 10.1

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Boris Behrens
So, doing nothing and wait for the ceph to recover? In theory there should be enough disk space (more disks arriving tomorrow), but I fear that there might be an issue, when the backups get exported over night to this s3. Currently the max_avail lingers around 13TB and I hope, that the data will g

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Dan van der Ster
Hi, backfill_toofull is not a bad thing when the cluster is really full like yours. You should expect some of the most full OSDs to eventually start decreasing in usage, as the PGs are moved to the new OSDs. Those backfill_toofull states should then resolve themselves as the OSD usage flattens out

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Boris Behrens
Ok, I should have listened to you :) In the last week we added more storage but the issue got worse instead. Today I realized that the PGs were up to 90GB (bytes column in ceph pg ls said 95705749636), and the balance kept mentioning the 2048 PGs for this pool. We were at 72% utilization (ceph osd

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Stefan Kooman
On 3/23/21 8:29 AM, Dan van der Ster wrote: Hi Sam, Yeah somehow `lo:` is not getting skipped, probably due to those patches. (I guess it is because the 2nd patch looks for `lo:` but in fact the ifa_name is probably just `lo` without the colon) https://github.com/ceph/ceph/blob/master/src/

[ceph-users] Nautilus block-db resize - ceph-bluestore-tool

2021-03-23 Thread Dave Hall
Hello, Based on other discussions in this list I have concluded that I need to add NVMe to my OSD nodes and expand the NVMe (DB/WAL) for each OSD. Is there a way to do this without destroying and rebuilding each OSD (after safe removal from the cluster, of course)? Is there a way to use ceph-blu

[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Stefan Kooman
On 3/23/21 11:00 AM, Nico Schottelius wrote: Stefan Kooman writes: OSDs from the wrong class (hdd). Does anyone have a hint on how to fix this? Do you have: osd_class_update_on_start enabled? So this one is a bit funky. It seems to be off, but the behaviour would indicate it isn't. Checkin

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Sam Skipsey
Hi, Indeed, we ended up with a config like that yesterday, and the cluster is pretty healthy now [just moving a few pgs around as ceph is wont to do]. Sam On Tue, 23 Mar 2021 at 14:11, Stefan Kooman wrote: > On 3/23/21 2:52 PM, Dan van der Ster wrote: > > Not sure. But anyway ceph has been ski

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Dan van der Ster
Not sure. But anyway ceph has been skipping interfaces named "lo" since v10, but then dropped that in 14.2.18 (by accident, IMO). You should be able to get your osds listening to the correct IP using cluster network = 10.1.50.0/8 public network = 10.1.50.0/8 does that work? - dan On Tue, Mar

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Sam Skipsey
I should note that our cluster is entirely IPv4 [because it's on a private network, so there's no need to go IPv6], which maybe influences matters? Sam On Tue, 23 Mar 2021 at 11:43, Stefan Kooman wrote: > On 3/23/21 8:29 AM, Dan van der Ster wrote: > > Hi Sam, > > > > Yeah somehow `lo:` is not

[ceph-users] Re: Nautilus block-db resize - ceph-bluestore-tool

2021-03-23 Thread Igor Fedotov
Hi Dave, For sure ceph-bluestore-tool can be used for that. Unfortunately it lacks LVM tag manipulation stuff required to properly setup DB/WAL volume for Ceph. See https://tracker.ceph.com/issues/42928 Which means that LVM tags to be updated manually if pure ceph-bluestore-tool is used.

[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Stefan Kooman
On 3/22/21 3:52 PM, Nico Schottelius wrote: Hello, follow up from my mail from 2020 [0], it seems that OSDs sometimes have "multiple classes" assigned: [15:47:15] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush rm-device-class osd.4 done removing class of osd(s): 4 [15:47:17] server6.

[ceph-users] Re: How to reset and configure replication on multiple RGW servers from scratch?

2021-03-23 Thread Scheurer François
Dear All We have the same question here, if anyone can help ... Thank you! We did not find any documentation about the steps to reset & restart the sync. Especially the implications of 'bilog trim', 'mdlog trim' and 'datalog trim'. Our secondary zone is read-only. Both master and secondary zone

[ceph-users] Re: fixing future rctimes

2021-03-23 Thread Dan van der Ster
Hi Tom, Teo prepared a PR but we didn't get feedback: https://github.com/ceph/ceph/pull/37938 --- dan On Tue, Mar 23, 2021 at 11:55 AM Byrne, Thomas (STFC,RAL,SC) wrote: > > Hi Dan, > > Did you get anywhere with fixing your future rctimes, or understanding why > you were getting them in the fi

[ceph-users] Re: fixing future rctimes

2021-03-23 Thread Byrne, Thomas (STFC,RAL,SC)
Hi Dan, Did you get anywhere with fixing your future rctimes, or understanding why you were getting them in the first place? I think we've run into this problem, future rctimes with no associated future subdir/item. The other similarity is the future rctimes always seem to end in .090, compare

[ceph-users] Re: New Issue - Mapping Block Devices

2021-03-23 Thread 胡 玮文
> 在 2021年3月23日,13:12,duluxoz 写道: > > Hi All, > > I've got a new issue (hopefully this one will be the last). > > I have a working Ceph (Octopus) cluster with a replicated pool (my-pool), an > erasure-coded pool (my-pool-data), and an image (my-image) created - all > *seems* to be working co

[ceph-users] Re: How to know which client hold the lock of a file

2021-03-23 Thread Eugen Block
Hi, you can list all clients of an active MDS like this (within the container): ceph daemon mds.cephfs.ses7-host4.qvrwta session ls or list blocked operations with ceph daemon mds.cephfs.ses7-host4.qvrwta dump_blocked_ops There are a couple of other commands that could be useful, e.g. dump_

[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Nico Schottelius
Stefan Kooman writes: >> OSDs from the wrong class (hdd). Does anyone have a hint on how to fix >> this? > > Do you have: osd_class_update_on_start enabled? So this one is a bit funky. It seems to be off, but the behaviour would indicate it isn't. Checking the typical configurations: [10:38:53

[ceph-users] Re: Multisite RGW - Large omap objects related with bilogs

2021-03-23 Thread Scheurer François
Dear All We have the same question here, if anyone can help ... Thank you! Cheers Francois From: ceph-users on behalf of P. O. Sent: Friday, August 9, 2019 11:05 AM To: ceph-us...@lists.ceph.com Subject: [ceph-users] Multisite RGW - Large omap objects relate

[ceph-users] Re: New Issue - Mapping Block Devices

2021-03-23 Thread Paweł Sadowski
Hello, Can you show output from 'lsblk' command? Regards, On 3/23/21 9:38 AM, duluxoz wrote: > Hi Ilya, > > OK, so I've updated the my-id permissions to include 'profile rbd > pool=my-pool-data'. > > Yes, "rbd device map" does succeed (both before and after the my-id > update). > > The full dmes

[ceph-users] Re: New Issue - Mapping Block Devices

2021-03-23 Thread duluxoz
Hi Ilya, OK, so I've updated the my-id permissions to include 'profile rbd pool=my-pool-data'. Yes, "rbd device map" does succeed (both before and after the my-id update). The full dmesg form the "rbd device map" command is: [18538.539416] libceph: mon0 (1):6789 session established [18538.55

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Dan van der Ster
Sam, see https://tracker.ceph.com/issues/49938 and https://github.com/ceph/ceph/pull/40334 On Tue, Mar 23, 2021 at 8:29 AM Dan van der Ster wrote: > > Hi Sam, > > Yeah somehow `lo:` is not getting skipped, probably due to those > patches. (I guess it is because the 2nd patch looks for `lo:` but i

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Dan van der Ster
Hi Sam, Yeah somehow `lo:` is not getting skipped, probably due to those patches. (I guess it is because the 2nd patch looks for `lo:` but in fact the ifa_name is probably just `lo` without the colon) https://github.com/ceph/ceph/blob/master/src/common/ipaddr.cc#L110 I don't know why this im