Hi Everyone,
I'm putting together a HDD cluster with an ECC pool dedicated to the backup
environment. Traffic via s3. Version 18.2, 7 OSD nodes, 12 * 12TB HDD +
1NVME each, 4+2 ECC pool.
Wondering if there is some general guidance for startup setup/tuning in
regards to s3 object size. Files are
When running a cephfs scrub the MDS will crash with the following backtrace
-1> 2024-05-25T09:00:23.028+1000 7ef2958006c0 -1
/usr/src/debug/ceph/ceph-18.2.2/src/mds/MDSRank.cc: In function 'void
MDSRank::abort(std::string_view)' thread 7ef2958006c0 time
2024-05-25T09:00:23.031373+1000
On 24.05.2024 21:07, Mazzystr wrote:
I did the obnoxious task of updating ceph.conf and restarting all my
osds.
ceph --admin-daemon /var/run/ceph/ceph-osd.*.asok config get
osd_op_queue
{
"osd_op_queue": "wpq"
}
I have some spare memory on my target host/osd and increased the target
Now that you're on wpq, you can try tweaking osd_max_backfills (up)
and osd_recovery_sleep (down).
Josh
On Fri, May 24, 2024 at 1:07 PM Mazzystr wrote:
>
> I did the obnoxious task of updating ceph.conf and restarting all my osds.
>
> ceph --admin-daemon /var/run/ceph/ceph-osd.*.asok config get
I did the obnoxious task of updating ceph.conf and restarting all my osds.
ceph --admin-daemon /var/run/ceph/ceph-osd.*.asok config get osd_op_queue
{
"osd_op_queue": "wpq"
}
I have some spare memory on my target host/osd and increased the target
memory of that OSD to 10 Gb and restarted.
Hi,
I guess you mean use something like "step take DCA class hdd"
instead of "step take default class hdd" as in:
rule rule-ec-k7m11 {
id 1
type erasure
min_size 3
max_size 18
step set_chooseleaf_tries 5
step set_choose_tries 100
step
It requires an OSD restart, unfortunately.
Josh
On Fri, May 24, 2024 at 11:03 AM Mazzystr wrote:
>
> Is that a setting that can be applied runtime or does it req osd restart?
>
> On Fri, May 24, 2024 at 9:59 AM Joshua Baergen
> wrote:
>
> > Hey Chris,
> >
> > A number of users have been
Is that a setting that can be applied runtime or does it req osd restart?
On Fri, May 24, 2024 at 9:59 AM Joshua Baergen
wrote:
> Hey Chris,
>
> A number of users have been reporting issues with recovery on Reef
> with mClock. Most folks have had success reverting to
> osd_op_queue=wpq. AIUI
Hey Chris,
A number of users have been reporting issues with recovery on Reef
with mClock. Most folks have had success reverting to
osd_op_queue=wpq. AIUI 18.2.3 should have some mClock improvements but
I haven't looked at the list myself yet.
Josh
On Fri, May 24, 2024 at 10:55 AM Mazzystr
Hi all,
Goodness I'd say it's been at least 3 major releases since I had to do a
recovery. I have disks with 60-75,000 power_on_hours. I just updated from
Octopus to Reef last month and I'm hit with 3 disk failures and the mclock
ugliness. My recovery is moving at a wondrous 21 mb/sec after
Thanks Enrico,
We are only syncing metadata between sites, so I don't think that bug will be
the cause of our issues.
I have been able to delete ~30k objects without causing the RGW to stop
processing.
Thanks
Iain
From: Enrico Bocchi
Sent: 22 May 2024 13:48
Hi Eugen,
so it is partly "unexpectedly expected" and partly buggy. I really wish the
crush implementation was honouring a few obvious invariants. It is extremely
counter-intuitive that mappings taken from a sub-set change even if both, the
sub-set and the mapping instructions themselves
Hello Sebastian,
I just checked the survey and you're right, the issue was within the question.
Got me a bit confused when I read it but I clicked anyway. Who doesn't like
clicking? :-D
What best describes your deployment target? *
1/ Bare metal (RPMs/Binary)
2/ Containers (cephadm/Rook)
3/
Hi Frédéric,
I agree. Maybe we should re-frame things? Containers can run on
bare-metal and containers can run virtualized. And distribution packages
can run bare-metal and virtualized as well.
What about asking independently about:
* Do you run containers or distribution packages?
* Do
I start to think that the root cause of the remapping is just the fact
that the crush rule(s) contain(s) the "step take default" line:
step take default class hdd
My interpretation is that crush simply tries to honor the rule:
consider everything underneath the "default" root, so
Thanks for being my rubber ducky.
Turns out I didn't had the rgw_zonegroup configured in the first apply.
Then adding it to the config and applying it, does not restart or
reconfigure the containers.
After doing a ceph orch restart rgw.customer it seems to work now.
Happy weekend everybody.
Am
Hi,
we are currently in the process of adopting the main s3 cluster to
orchestrator.
We have two realms (one for us and one for the customer).
The old config worked fine and depending on the port I requested, I got
different x-amz-request-id header back:
x-amz-request-id:
Hi,
thanks for picking that up so quickly!
I haven't used a host spec file yet to add new hosts, but if you read
my thread about the unknown PGs, this might be my first choice to do
that in the future. So thanks again for bringig it to my attention. ;-)
Regards,
Eugen
Zitat von Matthew
Hello everyone,
Nice talk yesterday. :-)
Regarding containers vs RPMs and orchestration, and the related discussion from
yesterday, I wanted to share a few things (which I wasn't able to share
yesterday on the call due to a headset/bluetooth stack issue) to explain why we
use cephadm and ceph
Hi Frank,
thanks for looking up those trackers. I haven't looked into them yet,
I'll read your response in detail later, but I wanted to add some new
observation:
I added another root bucket (custom) to the osd tree:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS
Hi,
just for the archives:
On Tue, 5 Mar 2024, Anthony D'Atri wrote:
* Try applying the settings to global so that mons/mgrs get them.
Setting osd_deep_scrub_interval at global instead at osd immediately turns
health to OK and removes the false warning from PGs not scrubbed in time.
HTH,
21 matches
Mail list logo