[ceph-users] Re: The ceph balancer sets upmap items which violates my crushrule

2020-12-16 Thread Andras Pataki
Hi Manuel, We also had a similar problem, that for a two step crush selection rule, the balancer kept proposing upmaps that were invalid: step take root-disk step choose indep 3 type pod step choose indep 3 type rack step chooseleaf indep 1 type osd step

[ceph-users] Re: Monitors not starting, getting "e3 handle_auth_request failed to assign global_id"

2020-12-16 Thread Suresh Rama
We had same issue and this is stable after upgrading from 14.2.11 to 14.2.15. Also, the size of the DB is not same for the one failed to join since the information it had to sync is huge. The compact on reboot does the job but it takes a long time to catch up. You can force the join by quorum e

[ceph-users] Re: ceph-fuse false passed X_OK check

2020-12-16 Thread Patrick Donnelly
On Wed, Dec 16, 2020 at 5:46 PM Alex Taylor wrote: > > Hi Cephers, > > I'm using VSCode remote development with a docker server. It worked OK > but fails to start the debugger after /root mounted by ceph-fuse. The > log shows that the binary passes access X_OK check but cannot be > actually execut

[ceph-users] v14.2.16 Nautilus released

2020-12-16 Thread David Galloway
This is the 16th backport release in the Nautilus series. This release fixes a security flaw in CephFS. We recommend users to update to this release. Notable Changes --- * CVE-2020-27781 : OpenStack Manila use of ceph_volume_client.py library allowed tenant access to any Ceph credent

[ceph-users] v15.2.8 Octopus released

2020-12-16 Thread David Galloway
We're happy to announce the 8th backport release in the Octopus series. This release fixes a security flaw in CephFS and includes a number of bug fixes. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at h

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Suresh Rama
Thanks stefan. Will review your feedback. Matt suggested the same. On Wed, Dec 16, 2020, 4:38 AM Stefan Kooman wrote: > On 12/16/20 10:21 AM, Matthew Vernon wrote: > > Hi, > > > > On 15/12/2020 20:44, Suresh Rama wrote: > > > > TL;DR: use a real NTP client, not systemd-timesyncd > > +1. We ha

[ceph-users] Re: ceph-fuse false passed X_OK check

2020-12-16 Thread Alex Taylor
Sorry forgot to mention, the ceph version is Luminous v12.2.13 On Thu, Dec 17, 2020 at 9:45 AM Alex Taylor wrote: > > Hi Cephers, > > I'm using VSCode remote development with a docker server. It worked OK > but fails to start the debugger after /root mounted by ceph-fuse. The > log shows that the

[ceph-users] [OSSN-0087] Ceph user credential leakage to consumers of OpenStack Manila

2020-12-16 Thread gouthampravi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hello, Forwarding a security note that was shared with the OpenStack community here for your awareness. This concerns a security vulnerability that has now been addressed. I'd like to thank Ceph contributors: Patrick Donnelly, Kotresh Hiremath Ravis

[ceph-users] ceph-fuse false passed X_OK check

2020-12-16 Thread Alex Taylor
Hi Cephers, I'm using VSCode remote development with a docker server. It worked OK but fails to start the debugger after /root mounted by ceph-fuse. The log shows that the binary passes access X_OK check but cannot be actually executed. see: ``` strace_log: access("/root/.vscode-server/extension

[ceph-users] block.db Permission denied

2020-12-16 Thread Seena Fallah
Hi. When I deployed an OSD with a separate db block it gets me a Permission denied on its path! I don't have any idea why but the only change I've done with my previous deployments was I change the osd_crush_initial_weight from 0 to 1. when I restart the host OSD can get up without any errors. I h

[ceph-users] Re: allocate_bluefs_freespace failed to allocate / ceph_abort_msg("bluefs enospc")

2020-12-16 Thread Igor Fedotov
Hi Stephan,  it looks like you've faced the following bug: https://tracker.ceph.com/issues/47883 To workaround it you might want to switch both bluestore and bluefs allocators back to bitmap for now. The fixes for Octopus/Nautilus are on their ways: https://github.com/ceph/ceph/pull/38474

[ceph-users] bug? cant turn off rbd cache?

2020-12-16 Thread Philip Brown
Oops.. sent this to the "wrong" list previously. (lists.ceph.com) Lets try the proper one this time :-/ not sure if this an actual bug or I'm doing something else wrong. but in Octopus, I have on the master node # ceph --version ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) oct

[ceph-users] Re: The ceph balancer sets upmap items which violates my crushrule

2020-12-16 Thread Dan van der Ster
Hi Manuel, Take a look at this tracker in which I was initially confused by something similar. https://tracker.ceph.com/issues/47361 In my case it was a mistake in our crush tree. So please check if something similar applies, otherwise I suggest to open a new bug with all the details. Cheers, D

[ceph-users] Re: Possibly unused client

2020-12-16 Thread Alexander E. Patrakov
Yes, thanks. This client was indeed unused. On Wed, Dec 16, 2020 at 5:54 PM Eugen Block wrote: > > Hi, > > the /var/log/ceph/ceph.audit.log file contains the client names: > > 2020-12-16 13:51:11.534010 mgr. (mgr.897778001) 1089671 : audit > [DBG] from='client.908207535 v1::0/3495403341' > entity

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Stefan Kooman
On 12/16/20 10:21 AM, Matthew Vernon wrote: Hi, On 15/12/2020 20:44, Suresh Rama wrote: TL;DR: use a real NTP client, not systemd-timesyncd +1. We have a lot of "ntp" daemons running, but on Ceph we use "chrony", and it's way faster with converging (especially with very unstable clock sourc

[ceph-users] Re: Possibly unused client

2020-12-16 Thread Eugen Block
Hi, the /var/log/ceph/ceph.audit.log file contains the client names: 2020-12-16 13:51:11.534010 mgr. (mgr.897778001) 1089671 : audit [DBG] from='client.908207535 v1::0/3495403341' entity='client.admin' cmd=[{"prefix": "pg stat", "target": ["mgr", ""]}]: dispatch Does that help? Regards,

[ceph-users] Possibly unused client

2020-12-16 Thread Alexander E. Patrakov
Hello, While working with a customer, I went through the output of "ceph auth list", and found a client entry that nobody can tell what it is used for. There is a strong suspicion that it is an unused left-over from old times, but again, nobody is sure. How can I confirm that it was not used for,

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11

2020-12-16 Thread Frédéric Nass
Hi Suresh, 24 HDDs backed by only by 2 NVMes looks like a high ratio. What triggers my bell in your post is "upgraded from Luminous to Nautilus" and "Elasticsearch" which mainly reads to index data and also "memory leak". You might want to take a look at the current value of bluefs_buffered_i

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-16 Thread Frédéric Nass
Regarding RocksDB compaction, if you were in a situation were RocksDB had overspilled to HDDs (if your cluster is using an hybrid setup), the compaction should have move the bits back to fast devices. So it might have helped in this situation too. Regards, Frédéric. Le 16/12/2020 à 09:57, Fr

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Matthew Vernon
Hi, On 15/12/2020 20:44, Suresh Rama wrote: TL;DR: use a real NTP client, not systemd-timesyncd 1) We audited the network (inspecting TOR, iperf, MTR) and nothing was indicating any issue but OSD logs were keep complaining about BADAUTHORIZER ...this is quite possibly due to clock skew on yo

[ceph-users] Re: ceph stuck removing image from trash

2020-12-16 Thread Eugen Block
I haven't done much with rbd trash yet but you probably should still see rbd_data.43def5e07bf47 objects in that pool, correct? What if you deleted those objects in a for loop to "help purge"? I'm not sure if that would work, though. Zitat von Anthony D'Atri : Perhaps setting the object-ma

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-16 Thread Frédéric Nass
Hi Sefan, This has me thinking that the issue your cluster may be facing is probably with bluefs_buffered_io set to true, as this has been reported to induce excessive swap usage (and OSDs flapping or OOMing as consequences) in some versions starting from Nautilus I believe. Can you check th

[ceph-users] Re: ceph stuck removing image from trash

2020-12-16 Thread Anthony D'Atri
Perhaps setting the object-map feature on the image, and/or running rbd object-map rebuild? Though I suspect that might perform an equivalent process and take just as long? > On Dec 15, 2020, at 11:49 PM, 胡 玮文 wrote: > > Hi Andre, > > I once faced the same problem. It turns out that ceph nee