[ceph-users] Re: how to upgrade host os under ceph

2022-10-26 Thread shubjero
We've done 14.04 -> 16.04 -> 18.04 -> 20.04 all at various stages of our ceph cluster life. The latest 18.04 to 20.04 was painless and we ran: apt update && apt dist-upgrade -y -o Dpkg::Options::=\"--force-confdef\" -o Dpkg::Options::=\"--force-confold\" do-release-upgrade --allow-third-party -f

[ceph-users] Trying to debug "Failed to send data to Zabbix"

2021-10-19 Thread shubjero
Hey all, Recently upgraded to Ceph Octopus (15.2.14). We also run Zabbix 5.0.15. Have had ceph/zabbix monitoring for a long time. After the Ceph Octopus update I installed the latest version of the Ceph template in Zabbix

[ceph-users] Re: radosgw breaking because of too many open files

2021-10-05 Thread shubjero
ss and we seem to be humming along nicely now. On Tue, Oct 5, 2021 at 4:55 PM shubjero wrote: > > Just upgraded from Ceph Nautilus to Ceph Octopus on Ubuntu 18.04 using > standard ubuntu packages from the Ceph repo. > > Upgrade has gone OK but we are having issues with ou

[ceph-users] radosgw breaking because of too many open files

2021-10-05 Thread shubjero
Just upgraded from Ceph Nautilus to Ceph Octopus on Ubuntu 18.04 using standard ubuntu packages from the Ceph repo. Upgrade has gone OK but we are having issues with our radosgw service, eventually failing after some load, here's what we see in the logs: 2021-10-05T15:55:16.328-0400 7fa47700

[ceph-users] Re: Multipart uploads with partsizes larger than 16MiB failing on Nautilus

2020-09-10 Thread shubjero
ax_chunk_size > rgw_put_obj_min_window_size, > because we try to write in units of chunk size but the window is too > small to write a single chunk. > > On Wed, Sep 9, 2020 at 8:51 AM shubjero wrote: > > > > Will do Matt > > > > On Tue, Sep 8, 2020 at 5

[ceph-users] Re: Multipart uploads with partsizes larger than 16MiB failing on Nautilus

2020-09-09 Thread shubjero
Will do Matt On Tue, Sep 8, 2020 at 5:36 PM Matt Benjamin wrote: > > thanks, Shubjero > > Would you consider creating a ceph tracker issue for this? > > regards, > > Matt > > On Tue, Sep 8, 2020 at 4:13 PM shubjero wrote: > > > > I had been looking

[ceph-users] Multipart uploads with partsizes larger than 16MiB failing on Nautilus

2020-09-08 Thread shubjero
Hey all, I'm creating a new post for this issue as we've narrowed the problem down to a partsize limitation on multipart upload. We have discovered that in our production Nautilus (14.2.11) cluster and our lab Nautilus (14.2.10) cluster that multipart uploads with a configured part size of

[ceph-users] Re: RadosGW and DNS Round-Robin

2020-09-04 Thread shubjero
We have our object storage endpoint fqdn DNS round robining to 2 IP's. Those 2 IP's are managed by keepalived across 3 servers running haproxy where each haproxy instance is listening on each round robin'd IP and then load balanced to 5 servers running radosgw. On Fri, Sep 4, 2020 at 12:35 PM

[ceph-users] Re: Multipart upload issue from Java SDK clients

2020-09-04 Thread shubjero
, Sep 2, 2020 at 3:15 PM shubjero wrote: > > Good day, > > I am having an issue with some multipart uploads to radosgw. I > recently upgraded my cluster from Mimic to Nautilus and began having > problems with multipart uploads from clients using the Java AWS SDK > (specifi

[ceph-users] Multipart upload issue from Java SDK clients

2020-09-02 Thread shubjero
Good day, I am having an issue with some multipart uploads to radosgw. I recently upgraded my cluster from Mimic to Nautilus and began having problems with multipart uploads from clients using the Java AWS SDK (specifically 1.11.219). I do NOT have issues with multipart uploads with other clients

[ceph-users] OSD node OS upgrade strategy

2020-06-19 Thread shubjero
Hi all, I have a 39 node, 1404 spinning disk Ceph Mimic cluster across 6 racks for a total of 9.1PiB raw and about 40% utilized. These storage nodes started their life on Ubuntu 14.04 and in-place upgraded to 16.04 2 years ago however I have started a project to do fresh installs of each OSD node

[ceph-users] Re: No reply or very slow reply from Prometheus plugin - ceph-mgr 13.2.8 mimic

2020-03-27 Thread shubjero
I've reported stability problems with ceph-mgr w/ prometheus plugin enabled on all versions we ran in production which were several versions of Luminous and Mimic. Our solution was to disable the prometheus exporter. I am using Zabbix instead. Our cluster is 1404 OSD's in size with about 9PB raw

[ceph-users] Re: Question about ceph-balancer and OSD reweights

2020-02-28 Thread shubjero
I talked to some guys on IRC about going back to the non-1 reweight OSD's and setting them to 1. I went from a standard deviation of 2+ to 0.5. Awesome. On Wed, Feb 26, 2020 at 10:08 AM shubjero wrote: > > Right, but should I be proactively returning any reweighted OSD's that > are n

[ceph-users] Re: Question about ceph-balancer and OSD reweights

2020-02-26 Thread shubjero
Right, but should I be proactively returning any reweighted OSD's that are not 1. to 1.? On Wed, Feb 26, 2020 at 3:36 AM Konstantin Shalygin wrote: > > On 2/26/20 3:40 AM, shubjero wrote: > > I'm running a Ceph Mimic cluster 13.2.6 and we use the ceph-balancer >

[ceph-users] Question about ceph-balancer and OSD reweights

2020-02-25 Thread shubjero
Hi all, I'm running a Ceph Mimic cluster 13.2.6 and we use the ceph-balancer in upmap mode. This cluster is fairly old and pre-Mimic we used to set osd reweights to balance the standard deviation of the cluster. Since moving to Mimic about 9 months ago I enabled the ceph-balancer with upmap mode

[ceph-users] HEALTH_WARN due to large omap object wont clear even after trim

2019-09-19 Thread shubjero
Hey all, Yesterday our cluster went in to HEALTH_WARN due to 1 large omap object in the .usage pool (I've posted about this in the past). Last time we resolved the issue by trimming the usage log below the alert threshold but this time it seems like the alert wont clear even after trimming and

[ceph-users] Bucket policies with OpenStack integration and limiting access

2019-09-09 Thread shubjero
Good day, We have a Ceph cluster and make use of object-storage and integrate with OpenStack. Each OpenStack project/tenant is given a radosgw user which allows all keystone users of that project to access the object-storage as that single radosgw user. The radosgw user is the project id of the

[ceph-users] Re: Mgr stability

2019-08-14 Thread shubjero
I'm having a similar issue with ceph-mgr stability problems since upgrading from 13.2.5 to 13.2.6. I have isolated the crashing to the prometheus module being enabled and notice much better stability when the prometheus module is NOT enabled. No more failovers, however I do notice that even with