[ceph-users] Strange behavior when using storage classes

2023-02-21 Thread Michal Strnad
Hi all, we encountered some strange behavior when using storage classes for S3 protocol. Some objects end up in a different pool than we would expect. Below is a list of commands used for create an account with replicated storage class, upload some files to the bucket and checked that they

[ceph-users] Re: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

2023-02-21 Thread Kuhring, Mathias
Hey Li, thank you for the quick reply. So the kernel on the cluster nodes might be the issue here? I thought the client kernel is the only relevant one (since we cephadm). Anyhow, we plan to upgrade the cluster nodes to Rocky 8 soon. We'll see if this helps with the issue. Best, Mathias On

[ceph-users] Re: Undo "radosgw-admin bi purge"

2023-02-21 Thread Richard Bade
Hi Robert, A colleague and I ran into this a few weeks ago. The way we managed to get access back to delete the bucket properly (using radosgw-admin bucket rm) was to reshard the bucket. This created a new bucket index and therefore it was then possible to delete it. If you are looking to get

[ceph-users] Re: Undo "radosgw-admin bi purge"

2023-02-21 Thread J. Eric Ivancich
When the admin runs “bi purge” they have the option of supplying a bucket_id with the “--bucket-id” command-line argument. This was useful back when resharding did not automatically remove the older bucket index shards (which it now does), which had a different bucket_id from the current bucket

[ceph-users] Re: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

2023-02-21 Thread Boris Behrens
Thanks a lot Josh. That really seems like my problem. That does not look healthy in the cluster. oof. ~# ceph tell osd.* perf dump |grep 'osd_pglog\|^osd\.[0-9]' osd.0: { "osd_pglog_bytes": 459617868, "osd_pglog_items": 2955043, osd.1: { "osd_pglog_bytes": 598414548,

[ceph-users] Re: increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

2023-02-21 Thread Josh Baergen
Hi Boris, This sounds a bit like https://tracker.ceph.com/issues/53729. https://tracker.ceph.com/issues/53729#note-65 might help you diagnose whether this is the case. Josh On Tue, Feb 21, 2023 at 9:29 AM Boris Behrens wrote: > > Hi, > today I wanted to increase the PGs from 2k -> 4k and

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Alvaro Soto
Hey I have seen that kind of behavior in the past and we manage to flash the firmware to increase the cache size, which will kill the drive a little bit faster so only to use in lab environments. I'm unaware if the Samsung magician software can do that. btw, a few evo's 9xx are listed by Samsung

[ceph-users] increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

2023-02-21 Thread Boris Behrens
Hi, today I wanted to increase the PGs from 2k -> 4k and random OSDs went offline in the cluster. After some investigation we saw, that the OSDs got OOM killed (I've seen a host that went from 90GB used memory to 190GB before OOM kills happen). We have around 24 SSD OSDs per host and

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Sven Kieske
Hi, I'm just writing to share some old knowledge, which is: never, ever use consumer ssd for ceph! see e.g. https://old.reddit.com/r/Proxmox/comments/izg6e5/questions_on_running_ceph_using_consumer_ssds/g6it9uv/ -- Mit freundlichen Grüßen / Regards Sven Kieske Systementwickler / systems

[ceph-users] Re: Stuck OSD service specification - can't remove

2023-02-21 Thread Eugen Block
Hi, did you ever resolve that? I'm stuck with the same "deleting" service in 'ceph orch ls' and found your thread. Thanks, Eugen ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Michael Wodniok
Hi Marc, I would try something the near your example, yes. You could add "size=" for testing until once written the whole disk (by io, not by all available cells) independent of time. If the result is either high deviation or the results are far from the specification it's very likely there

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2023-02-21 Thread Ilya Dryomov
On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li wrote: > > > On 20/02/2023 22:28, Kuhring, Mathias wrote: > > Hey Dan, hey Ilya > > > > I know this issue is two years old already, but we are having similar > > issues. > > > > Do you know, if the fixes got ever backported to RHEL kernels? > > It's

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Marc
What fio test would indicate this behaviour up front? I guess something like this, but with a duration larger than this disk cache? [randwrite-4k-seq] stonewall bs=4k rw=randwrite fsync=1 > thank you for your hint - any input is appreciated. Please note that > Ceph does highly random IO

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Phil Regnauld
Michael Wodniok (wodniok) writes: > Hi all, > > digging around debugging, why our (small: 10 Hosts/~60 OSDs) cluster is so > slow even while recovering I found out one of our key issues are some SSDs > with SLC cache (in our case Samsung SSD 870 EVO) - which we just recycled > from other use

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Michael Wodniok
Hi Ken, thank you for your hint - any input is appreciated. Please note that Ceph does highly random IO (especially when having small object sizes), AnandTech also states: "Some of our other tests have shown a few signs that the 870 EVO's write performance can drop when the SLC cache runs

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread mailing-lists
Dear Michael, I don't have an explanation for your problem unfortunately, but I just wondered that you experience a drop in performance, that this SSD shouldn't have. Your SSDs drives (Samsung 870 EVO) should not get slower on large writes. You can verify this on the post you've attached [1]