[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Oliver Schmidt
> To be honest I am not confident that "ceph osd set-require-min-compat-client > nautilus" is a necessary step for you. What prompted you to run that command? > > That step is not listed here: > https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous You're

[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Anthony D'Atri
This is my understanding as well: as with CRUSH tunable sets, features *happen* to be named after releases don't always correlate 1:1. > On May 25, 2023, at 15:49, Wesley Dillingham wrote: > > Fairly confident this is normal. I just checked a pacific cluster and they > all report luminous as

[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Wesley Dillingham
Fairly confident this is normal. I just checked a pacific cluster and they all report luminous as well. I think some of the backstory of this is luminous is the release where up-maps were released and there hasnt been a reason to increment the features release of subsequent daemons. To be honest

[ceph-users] Re: Orchestration seems not to work

2023-05-25 Thread Thomas Widhalm
I now ran the command on everey host. And I did find two that couldn't connect. They were the last two I added and never got any daemons. I fixed that (copied (/etc/ceph and installed cephadm) and rebooted them but it didn't change a thing for now. All others could connect to all others

[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Oliver Schmidt
Hi Marc, > > I think for an upgrade the rocksdb is necessary. Check this for your monitors > > cat /var/lib/ceph/mon/ceph-a/kv_backend Thanks, but I already had migrated all mons to use rocksdb when upgrading to Luminous. ~ # cat /srv/ceph/mon/ceph-host1/kv_backend rocksdb Is this what you

[ceph-users] Re: Orchestration seems not to work

2023-05-25 Thread Thomas Widhalm
Hi, So sorry I didn't see your reply. Had some tough weeks (father in law died and that gave us some turmoil) I just came back to debugging and didn't realize until now that you did in fact answer my e-mail. I just ran your script on the host that is running the active manager. Thanks a lot

[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Marc
> > on our way towards getting our cluster to a current Ceph release, we > updated all hosts and clients to Nautilus 14.2.22. I think for an upgrade the rocksdb is necessary. Check this for your monitors cat /var/lib/ceph/mon/ceph-a/kv_backend ___

[ceph-users] `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Oliver Schmidt
Dear Ceph community, on our way towards getting our cluster to a current Ceph release, we updated all hosts and clients to Nautilus 14.2.22. But despite setting `ceph osd set-require-min-compat-client nautilus`, the release reported by `ceph features` is still "luminous". Is this supposed to

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Sandip Divekar
Hi Chris, I think, you have missed one steps and that is to change mtime for directory explicitly. Please have a look at highlighted steps. CEPHFS ===

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Stefan Kooman
On 5/25/23 18:17, Igor Fedotov wrote: Perhaps... I don't like the idea to use fragmentation score as a real index. IMO it's mostly like a very imprecise first turn marker to alert that something might be wrong. But not a real quantitative high-quality estimate. Chiming in on the high

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
Yeah this looks fine. Please collect all of them for a given OSD. Then restart OSD, wait more to come (1-2 days) and collect them too. A side note - in the attached probe I can't see any fragmentation at all - amount of allocations is equal to amount of fragments, e.g. cnt: 27637 frags:

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Fox, Kevin M
Ok, I'm gathering the "allocation stats probe" stuff. Not sure I follow what you mean by the historic probes. just: | egrep "allocation stats probe|probe" ? That gets something like: May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]: debug

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
Just run through available logs for a specific OSD (which you suspect suffer from high fragmentation) and collect all allocation stats probes you can find ("allocation stats probe" string is a perfect grep pattern, please append lines with historic probes following day-0 line as well. Given

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Fox, Kevin M
If you can give me instructions on what you want me to gather before the restart and after restart I can do it. I have some running away right now. Thanks, Kevin From: Igor Fedotov Sent: Thursday, May 25, 2023 9:17 AM To: Fox, Kevin M; Hector Martin;

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
Perhaps... I don't like the idea to use fragmentation score as a real index. IMO it's mostly like a very imprecise first turn marker to alert that something might be wrong. But not a real quantitative high-quality estimate. So in fact I'd like to see a series of allocation probes showing

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Chris Palmer
Hi Sandip Ceph servers (debian11/ceph base with Proxmox installed on top - NOT the ceph that comes with Proxmox!): ceph@pve1:~$ uname -a Linux pve1 5.15.107-2-pve #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z) x86_64 GNU/Linux ceph@pve1:~$ ceph version ceph version 17.2.6

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Fox, Kevin M
Is this related to https://tracker.ceph.com/issues/58022 ? We still see run away osds at times, somewhat randomly, that causes runaway fragmentation issues. Thanks, Kevin From: Igor Fedotov Sent: Thursday, May 25, 2023 8:29 AM To: Hector Martin;

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Sandip Divekar
Copy-pasting reply from Joseph. = Hello Greogry, We are setting the mtime to 01 Jan 1970 00:00 1. Create a directory "dir1" 2. set mtime of the "dir1 to 0 -> i.e 1 jan 1970 3. Create child

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
Hi Hector, I can advise two tools for further fragmentation analysis: 1) One might want to use ceph-bluestore-tool's free-dump command to get a list of free chunks for an OSD and try to analyze whether it's really highly fragmented and lacks long enough extents. free-dump just returns a list

[ceph-users] Feature/Change Request: Don't send alert emails for --sticky muted WARN conditions

2023-05-25 Thread Edward R Huyer
I recently upgraded to Quincy, and toggled on the BULK flag of a few pools. As a result, my cluster has been spending the last several days shuffling data while growing the pool pg counts. That in turn has resulted in a steadily increasing number of pgs being flagged PG_NOT_DEEP_SCRUBBED.

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Gregory Farnum
I haven’t checked the logs, but the most obvious way this happens is if the mtime set on the directory is in the future compared to the time on the client or server making changes — CephFS does not move times backwards. (This causes some problems but prevents many, many others when times are not

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Sandip Divekar
Hi Chris, Kindly request you that follow steps given in previous mail and paste the output here. The reason behind this request is that we have encountered an issue which is easily reproducible on Latest version of both quincy and pacific, also we have thoroughly investigated the matter and

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Mark Nelson
On 5/24/23 09:18, Hector Martin wrote: On 24/05/2023 22.07, Mark Nelson wrote: Yep, bluestore fragmentation is an issue.  It's sort of a natural result of using copy-on-write and never implementing any kind of defragmentation scheme.  Adam and I have been talking about doing it now, probably

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Chris Palmer
Hi Milind I just tried this using the ceph kernel client and ceph-common 17.2.6 package in the latest Fedora kernel, against Ceph 17.2.6 and it worked perfectly... There must be some other factor in play. Chris On 25/05/2023 13:04, Sandip Divekar wrote: Hello Milind, We are using Ceph

[ceph-users] Re: Orchestration seems not to work

2023-05-25 Thread Thomas Widhalm
What caught my eye is that this is also true for Disks on Hosts. I added another disk to an OSD host. I can zap it with cephadm, I can even make it an OSD with "ceph orch daemon add osd ceph06:/dev/sdb" and it will be listed as new OSD in Ceph Dashboard. But, when I look at the "Physical

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Sandip Divekar
Hello Milind, We are using Ceph Kernel Client. But we found this same behavior while using Libcephfs library. Should we treat this as a bug? Or Is there any existing bug for similar issue ? Thanks and Regards, Sandip Divekar From: Milind Changire Sent: Thursday, May 25, 2023 4:24 PM To:

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-25 Thread Milind Changire
try the command with the --id argument: # ceph --id admin --cluster floki daemon mds.icadmin011 dump cache /tmp/dump.txt I presume that your keyring has an appropriate entry for the client.admin user On Wed, May 24, 2023 at 5:10 PM Emmanuel Jaep wrote: > Absolutely! :-) > >

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Milind Changire
Sandip, What type of client are you using ? kernel client or fuse client ? If it's the kernel client, then it's a bug. FYI - Pacific and Quincy fuse clients do the right thing On Wed, May 24, 2023 at 9:24 PM Sandip Divekar < sandip.dive...@hitachivantara.com> wrote: > Hi Team, > > I'm writing

[ceph-users] Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

2023-05-25 Thread achhen
Thanks. In the meantime we were able to narrow down the cause of the RAM consumption a little. ceph mds cache status shows, that the cache is within the limit (32G): { "pool": { "items": 758820483, "bytes": 32642572344 } } The remaining memory belongs to buffer_anon:

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-25 Thread achhen
Hi Emmanuel, regarding stopping state. We had a similar issue. see subject: MDS Upgrade from 17.2.5 to 17.2.6 not possible​ We solved this by failing the MDS, which was in the stop state, but I don't know if that's a good idea in general. What does the log of the mds (stopping) shows? We

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-25 Thread Emmanuel Jaep
Hi Eugen, Also, do you know why you use a multi-active MDS setup? To be completely candid, I don't really know why this choice was made. I assume the goal was to provide fault-tolerance and load-balancing. Was that a requirement for subtree pinning (otherwise multiple active daemons would

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-25 Thread Emmanuel Jaep
Hi Wes, thanks for the heads-up. Best, Emmanuel On Wed, May 24, 2023 at 5:47 PM Wesley Dillingham wrote: > There was a memory issue with standby-replay that may have been resolved > since and fix is in 16.2.10 (not sure), the suggestion at the time was to > avoid standby-replay. > > Perhaps