[ceph-users] 1 Large omap object found

2023-07-30 Thread Mark Johnson
I've been going round and round in circles trying to work this one out but I'm getting nowhere. We're running a 4 node quincy cluster (17.2.6) which recently reported the following: ceph.log-20230729.gz:2023-07-28T08:31:42.390003+ osd.26 (osd.26) 13834 : cluster [WRN] Large omap object fou

[ceph-users] Re: 1 Large omap object found

2023-07-31 Thread Mark Johnson
ers.uid   admin > users.keys  JBWPRAPP1AQG471AMGC4 > users.uid   e434b82737cf4138b899c0785b49112d.buckets > users.uid   e434b82737cf4138b899c0785b49112d > > > > Zitat von Mark Johnson : > > > I've been going round and round in circles trying to work t

[ceph-users] Re: 1 Large omap object found

2023-07-31 Thread Mark Johnson
g Jewel to this new Quincy cluster and have no prior experience with autoscale so we were just assuming autoscale would manage PG counts better than us doing it manually. As you can probably guess, we don't have much experience with Ceph. Regards, Mark Johnson On Mon, 2023-07-31 at

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Mark Johnson
is recommended. But could you also share the output of: > > ceph pg ls-by-pool default.rgw.meta > > That's where the large omap was reported, maybe you'll need to  > increase the pg_num for that pool as well. Personally, I always  > disable the autoscaler. >

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Mark Johnson
ar we didn't get any reports causing any issue at all. But I'd be curious if the devs or someone with more experience has a better advice. [1] https://www.suse.com/support/kb/doc/?id=19698 Zitat von Mark Johnson mailto:ma...@iovox.com>>: Here you go. It doesn't format ver

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Mark Johnson
;t fail as such, it didn't apply any changes. Is there a way to apply this on the fly without restarting the cluster? On Tue, 2023-08-01 at 22:44 +, Mark Johnson wrote: Thanks for that. That's pretty much how I was reading it, but the text you provided is a lot more explanatory

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Mark Johnson
ub on that PG as it probably only checks the value against the threshold at scrub time, and when that was done, the health warning has now cleared. Not sure if that persists across restarts or not but I'll cross that bridge if/when I come to it. On Wed, 2023-08-02 at 05:31 +0000, Mark Joh

[ceph-users] Re: 1 Large omap object found

2023-08-01 Thread Mark Johnson
ue. Zitat von Mark Johnson mailto:ma...@iovox.com>>: Never mind, I think I worked it out. I consulted the Quincy documentation which just said to do this: ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 200 But when i did that, the health warning didn't clear.

[ceph-users] Global AVAIL vs Pool MAX AVAIL

2021-01-11 Thread Mark Johnson
Can someone please explain to me the difference between the Global "AVAIL" and the "MAX AVAIL" in the pools table when I do a "ceph df detail"? The reason being that we have a total of 14 pools, however almost all of our data exists in one pool. A "ceph df detail" shows the following: GLOBAL:

[ceph-users] Re: Global AVAIL vs Pool MAX AVAIL

2021-01-11 Thread Mark Johnson
unbalanced OSD utilization. What version of Ceph? Do you have any balancing enabled? Do ceph osd df | sort -nk8 | head ceph osd df | sort -nk8 | tail and I’ll bet you have OSDs way more full than others. The STDDEV value that ceph df reports I suspect is accordingly high On Jan 11, 2021,

[ceph-users] Gradually Increasing PG/PGP

2021-02-21 Thread Mark Johnson
Hi, Probably a basic/stupid question but I'm asking anyway. Through lack of knowledge and experience at the time, when we set up our pools, our pool that holds the majority of our data was created with a PG/PGP num of 64. As the amount of data has grown, this has started causing issues with b

[ceph-users] 'ceph df' %USED explanation

2021-02-28 Thread Mark Johnson
I'm in the middle of increasing PG count for one of our pools by making small increments, waiting for the process to complete, rinse and repeat. I'm doing it this way so I can control when all this activity is happening and keeping it away from the busier production traffic times. I'm expectin

[ceph-users] Can't get one OSD (out of 14) to start

2021-04-16 Thread Mark Johnson
Really not sure where to go with this one. Firstly, a description of my cluster. Yes, I know there are a lot of "not ideals" here but this is what I inherited. The cluster is running Jewel and has two storage/mon nodes and an additional mon only node, with a pool size of 2. Today, we had a s

[ceph-users] Re: Can't get one OSD (out of 14) to start

2021-04-16 Thread Mark Johnson
ool size of 3. But that's not going to help me right now. -----Original Message- From: Mark Johnson mailto:mark%20johnson%20%3cma...@iovox.com%3e>> To: ceph-users@ceph.io mailto:%22ceph-us...@ceph.io%22%20%3cceph-us...@ceph.io%3e>> Subject: [ceph-users] Can't get on

[ceph-users] Re: Can't get one OSD (out of 14) to start

2021-04-16 Thread Mark Johnson
From: Alex Gorbachev mailto:alex%20gorbachev%20%3...@iss-integration.com%3e>> To: Mark Johnson mailto:mark%20johnson%20%3cma...@iovox.com%3e>> Cc: ceph-users@ceph.io mailto:%22ceph-us...@ceph.io%22%20%3cceph-us...@ceph.io%3e>> Subject: Re: [ceph-users] Re: Can't get one OSD

[ceph-users] Re: Can't get one OSD (out of 14) to start

2021-04-16 Thread Mark Johnson
g a 'ceph pg repair' to one of those PGs and got the following: # ceph pg repair 1.38 instructing pg 1.38 on osd.17 to repair But it doesn't appear to be doing anything. Health status still says the exact same thing. No idea where to go from here. -Original Message- Fro

[ceph-users] Re: Can't get one OSD (out of 14) to start

2021-04-16 Thread Mark Johnson
is: RuntimeError: "None": exception "['{"prefix": "get_command_descriptions", "pgid": "30.7a"}']": exception 'int' object is not iterable and RuntimeError: "None": exception "['{"prefix": &quo

[ceph-users] Single OSD crash/restarting during scrub operation on specific PG

2021-04-20 Thread Mark Johnson
ghobject_t&, const Index&, IndexedPath*)' thread 7f43382c7700 time 2021-04-21 03:55:10.081373 2021-04-21 03:55:10.157208 7f43382c7700 -1 *** Caught signal (Aborted) ** in thread 7f43382c7700 thread_name:tp_osd_tp 0> 2021-04-21 03:55:10.157208 7f43382c7700 -1 *** Caught signal

[ceph-users] pgs stuck backfill_toofull

2020-10-28 Thread Mark Johnson
I've been struggling with this one for a few days now. We had an OSD report as near full a few days ago. Had this happen a couple of times before and a reweight-by-utilization has sorted it out in the past. Tried the same again but this time we ended up with a couple of pgs in a state of back

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Mark Johnson
r OSD. Please be aware that PG planning requires caution as you cannot reduce the PG count of a pool in your version. You need to know how much data is in the pools right now and what the future plan is. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Mark Johnson
k Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Mark Johnson < <mailto:ma...@iovox.com> ma...@iovox.com > Sent: 29 October 2020 08:19:01 To: <mailto:ceph-users@ceph.io> ceph-users@ceph.io ; Frank Schilder Subject: Re