[ceph-users] Re: MDS Behind on Trimming...

2024-03-27 Thread Xiubo Li
On 3/28/24 04:03, Erich Weiler wrote: Hi All, I've been battling this for a while and I'm not sure where to go from here.  I have a Ceph health warning as such: # ceph -s   cluster:     id: 58bde08a-d7ed-11ee-9098-506b4b4da440     health: HEALTH_WARN     1 MDSs report slow

[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Alex
Hi Adam! In addition to my earlier question of is there a way of trying a more targeted upgrade first so we don't risk accidentally breaking the entire production cluster, `ceph config dump | grep container_image` shows: global basic container_image

[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Alex
Thanks! Is there a way of trying out the update on one osd first to make sure we don't nuke the entire production cluster? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Adam King
From the ceph versions output I can see "osd": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160 }, It seems like all the OSD daemons on this cluster are using that 16.2.10-160 image, and I'm guessing most of them are running, so it

[ceph-users] Re: Call for Interest: Managed SMB Protocol Support

2024-03-27 Thread Angelo Hongens
Yes, I'd love this! A lot of companies want samba for simple file access from windows/mac clients. I know quite some companies that buy netapp as 'easy smb storage'. Having ceph do built-in (or bolt-on) samba instead of having to manage external samba clusters would be nice, and would make

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Adam King
I missed a step in the calculation. The total_memory_kb I mentioned earlier is also multiplied by the value of the mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for all the daemons. That value defaults to 0.7. That might explain it seeming like it's getting a value lower

[ceph-users] Failed adding back a node

2024-03-27 Thread Alex
Hello. We're rebuilding our OSD nodes. Once cluster worked without any issues, this one is being stubborn I attempted to add one back to the cluster and seeing the error below in out logs: cephadm ['--image', 'registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160', 'pull'] 2024-03-27

[ceph-users] Re: Return value from cephadm host-maintenance?

2024-03-27 Thread John Mulligan
(adding the list back to the thread) On Wednesday, March 27, 2024 12:54:34 PM EDT Daniel Brown wrote: > John > > > I got curious and was taking another quick look through the python script > for cephadm. > That's always welcome. :-D > This is probably too simple of a question to be asking —

[ceph-users] MDS Behind on Trimming...

2024-03-27 Thread Erich Weiler
Hi All, I've been battling this for a while and I'm not sure where to go from here. I have a Ceph health warning as such: # ceph -s cluster: id: 58bde08a-d7ed-11ee-9098-506b4b4da440 health: HEALTH_WARN 1 MDSs report slow requests 1 MDSs behind on

[ceph-users] Ceph user/bucket usage metrics

2024-03-27 Thread Kushagr Gupta
Hi team, I am new to ceph and I am looking to monitor the user/bucket usage for ceph. As per the following link: https://docs.ceph.com/en/latest/radosgw/metrics/ But when I enabled the same using the command: 'ceph config set client.rgw CONFIG_VARIABLE VALUE' I ould only see the following perf

[ceph-users] Re: RGW Data Loss Bug in Octopus 15.2.0 through 15.2.6

2024-03-27 Thread xu chenhui
Hi, Eric Ivancich I have similar problem in ceph version 16.2.5. Has this problem been completely resolved in Pacific version? Our bucket has no lifecycle rules and no copy operation. This is a very serious data loss issue for us and It happens occasionally in our environment. Detail

[ceph-users] Re: Call for Interest: Managed SMB Protocol Support

2024-03-27 Thread John Mulligan
On Tuesday, March 26, 2024 10:53:29 PM EDT David Yang wrote: > This is great, we are currently using the smb protocol heavily to > export kernel-mounted cephfs. > But I encountered a problem. When there are many smb clients > enumerating or listing the same directory, the smb server will >

[ceph-users] nvme hpe

2024-03-27 Thread Albert Shih
Hi. I notice in the log I got log from each node Mar 27 01:12:59 cthulhu1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/nvme mo000800kxprv smart-log-add --json /dev/nvme1n1 Mar 27 01:13:06 cthulhu1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/nvme hpe

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Mads Aasted
Hi Adam. So doing the calculations with what you are stating here I arrive at a total sum for all the listed processes at 13.3 (roughly) gb, for everything except the osds, leaving well in excess of +4gb for each OSD. Besides the mon daemon which i can tell on my host has a limit of 2gb , none of

[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread Alexander E. Patrakov
Hello Daniel, The situation is not as bad as you described. It is just PG_BACKFILL_FULL, which means: if the backfills proceed, then one osd will become backfillfull (i.e., over 90% by default). This is definitely something that the balancer should be able to resolve if it were allowed to act.

[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread Daniel Williams
The backfilling was caused by decommissioning an old host and moving a bunch of OSD to new machines. Balancer has not been activated since the backfill started / OSDs were moved around on hosts. Busy OSD level ? Do you mean fullness? The cluster is relatively unused in terms of business. # ceph

[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread David C.
Hi Daniel, Changing pg_num when some OSD is almost full is not a good strategy (or even dangerous). What is causing this backfilling? loss of an OSD? balancer? other ? What is the least busy OSD level (sort -nrk17) Is the balancer activated? (upmap?) Once the situation stabilizes, it becomes

[ceph-users] Re: Ha proxy and S3

2024-03-27 Thread Gheorghiță Butnaru
yes, you can deploy an ingress service with cephadm [1]. You can customize the haproxy config if you need something specific [2]. ceph config-key set mgr/cephadm/services/ingress/haproxy.cfg -i haproxy.cfg.j2 [1]

[ceph-users] Re: Ha proxy and S3

2024-03-27 Thread Marc
> > But is it a good practice to use cephadm to deploy the HA Proxy or it's > better do deploy it manually on a other server (who does only that). > Afaik cephadm's only viable option is podman. As I undertand, podman does nothing with managing tasks that can move to other hosts automatically.

[ceph-users] Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread Daniel Williams
Hey, I'm running ceph version 18.2.1 (reef) but this problem must have existed a long time before reef. The documentation says the autoscaler will target 100 pgs per OSD but I'm only seeing ~10. My erasure encoding is a stripe of 6 data 3 parity. Could that be the reason? PGs numbers for that EC

[ceph-users] Ha proxy and S3

2024-03-27 Thread Albert Shih
Hi, If I'm correct in a S3 installation it's good practice to have a HA proxy, I also read somewhere the cephadm tool can deploy the HA Proxy. But is it a good practice to use cephadm to deploy the HA Proxy or it's better do deploy it manually on a other server (who does only that). Regards