[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Zakhar Kirpichenko
Thanks, Eugen. It is similar in the sense that the mgr is getting OOM-killed. It started happening in our cluster after the upgrade to 16.2.14. We haven't had this issue with earlier Pacific releases. /Z On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: > Just checking it on the phone, but isn’t

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Anthony D'Atri
I encountered mgr ballooning multiple times with Luminous, but have not since. At the time, I could often achieve relief by sending the admin socket a heap release - it would show large amounts of memory unused but not yet released. That experience is one reason I got Rook recently to allow

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Eugen Block
Just checking it on the phone, but isn’t this quite similar? https://tracker.ceph.com/issues/45136 Zitat von Zakhar Kirpichenko : Hi, I'm facing a rather new issue with our Ceph cluster: from time to time ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over 100 GB RAM:

[ceph-users] Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Zakhar Kirpichenko
Hi, I'm facing a rather new issue with our Ceph cluster: from time to time ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over 100 GB RAM: [Nov21 15:02] tp_osd_tp invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ +0.10]

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-21 Thread Venky Shankar
Hi Yuri, On Fri, Nov 10, 2023 at 1:22 PM Venky Shankar wrote: > > Hi Yuri, > > On Fri, Nov 10, 2023 at 4:55 AM Yuri Weinstein wrote: > > > > I've updated all approvals and merged PRs in the tracker and it looks > > like we are ready for gibba, LRC upgrades pending approval/update from > >

[ceph-users] Re: Service Discovery issue in Reef 18.2.0 release ( upgrading )

2023-11-21 Thread Stefan Kooman
On 15-11-2023 07:09, Brent Kennedy wrote: Greetings group! We recently reloaded a cluster from scratch using cephadm and reef. The cluster came up, no issues. We then decided to upgrade two existing cephadm clusters that were on quincy. Those two clusters came up just fine but there is

[ceph-users] [RGW][STS] How to use Roles to limit access to only buckets of one user?

2023-11-21 Thread Rudenko Aleksandr
Hi, I have setup with one default tenant and next user/bucket structure: user1 bucket1 bucket11 user2 bucket2 user3 bucket3 IAM and STS APIs are enabled, user1 has roles=* capabilities. When user1 permit user2 to assume role with

[ceph-users] Re: really need help how to save old client out of hang?

2023-11-21 Thread Eugen Block
Hi, were you able to resolve that situation in the meantime? If not, I would probably try to 'umount -l' and see if that helps. If not, you can check if the client is still blacklisted: ceph osd blocklist ls (or blacklist) If it's still blocklisted, you could try to remove it: ceph osd

[ceph-users] Re: After hardware failure tried to recover ceph and followed instructions for recovery using OSDS

2023-11-21 Thread Eugen Block
Hi, I guess you could just redeploy the third MON which fails to start (after the orchestrator is responding again) unless you figured it out already. What is it logging? 1 osds exist in the crush map but not in the osdmap This could be due to the input/output error, but it's just a