date:20240116

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-16 Thread Eugen Block

Hi, this is not an easy topic and there is no formula that can be applied to all clusters. From my experience, it is exactly how the discussion went in the thread you mentioned, trial & error. Looking at your session ls output, this reminds of a debug session we had a few years ago:

[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

2024-01-16 Thread Szabo, Istvan (Agoda)

Hi Goetz, Which method you finally choose? We've done a successful migration from Centos 8 to ubuntu 20.04 but we have a centos 7 nautilus cluster which we'd like to move to Ubuntu 20.04 octopus same as you. Wonder any of you tried to skip Rocky 8 from the flow? Thank you __

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-16 Thread Özkan Göksu

This my active MDS perf dump output: root@ud-01:~# ceph tell mds.ud-data.ud-02.xcoojt perf dump { "AsyncMessenger::Worker-0": { "msgr_recv_messages": 17179307, "msgr_send_messages": 15867134, "msgr_recv_bytes": 445239812294, "msgr_send_bytes": 42003529245,

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-16 Thread Özkan Göksu

All of my clients are servers located at 2 hop away with 10Gbit network and 2x Xeon CPU/16++ cores and minimum 64GB ram with SSD OS drive + 8GB spare. I use ceph kernel mount only and this is the command: - mount.ceph admin@$fsid.ud-data=/volumes/subvolumegroup ${MOUNT_DIR} -o name=admin,secret=XXX

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-16 Thread Özkan Göksu

Let me share some outputs about my cluster. root@ud-01:~# ceph fs status ud-data - 84 clients === RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS 0active ud-data.ud-02.xcoojt Reqs: 31 /s 3022k 3021k 52.6k 385k POOL TYPE USED AVAIL

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-16 Thread Özkan Göksu

Hello Eugen. Thank you for the answer. According to knowledge and test results at this issue: https://github.com/ceph/ceph/pull/38574 I've tried their advice and I've applied the following changes. max_mds = 4 standby_mds = 1 mds_cache_memory_limit = 16GB mds_recall_max_caps = 4 When I set t

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-16 Thread Drew Weaver

>Groovy. Channel drives are IMHO a pain, though in the case of certain >manufacturers it can be the only way to get firmware updates. Channel drives >often only have a 3 year warranty, vs 5 for generic drives. Yes, we have run into this with Kioxia as far as being able to find new firmware. W

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-16 Thread Igor Fedotov

Hi Jan, I've just fired an upstream ticket for your case, see https://tracker.ceph.com/issues/64053 for more details. You might want to tune (or preferably just remove) your custom bluestore_cache_.*_ratio settings to fix the issue. This is reproducible and fixable in my lab this way. Hop

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-16 Thread Anthony D'Atri

> > NVMe SSDs shouldn’t cost significantly more than SATA SSDs. Hint: certain > tier-one chassis manufacturers mark both the fsck up. You can get a better > warranty and pricing by buying drives from a VAR. > > We stopped buying “Vendor FW” drives a long time ago. Groovy. Cha

[ceph-users] Re: How does mclock work?

2024-01-16 Thread Frédéric Nass

Sridhar, Thanks a lot for this explantation. It's clearer now. So at the end of the day (at least with balanced profile) it's a lower bound and no upper limit and a balanced distribution between client and cluster IOPS. Regards, Frédéric. -Message original- De: Sr

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-16 Thread Drew Weaver

By HBA I suspect you mean a non-RAID HBA? Yes, something like the HBA355 NVMe SSDs shouldn’t cost significantly more than SATA SSDs. Hint: certain tier-one chassis manufacturers mark both the fsck up. You can get a better warranty and pricing by buying drives from a VAR. We st

[ceph-users] Re: ceph pg mark_unfound_lost delete results in confused ceph

2024-01-16 Thread Oliver Dzombic

Hi, just in case someone else might run into this or similar issues. The following helped to solve the issue: 1. restarting the active mgr brought: pg 10.17 is stuck inactive for 18m, current state unknown, last acting [] .. the pg into inactive without last acting 2. so we recreated the p

[ceph-users] OSD read latency grows over time

2024-01-16 Thread Roman Pashin

Hello Ceph users, we see strange issue on last recent Ceph installation v17.6.2. We store data on HDD pool, index pool is on SSD. Each OSD store its wal on NVME partition. Benchmarks didn't expose any issues with cluster, but since we placed production load on it we see constantly growing OSD late

[ceph-users] Re: [quincy 17.2.7] ceph orchestrator not doing anything

2024-01-16 Thread Boris

Good morning Eugen, I just found this thread and saw that I had a test image for rgw in the config. After removing the global and the rgw config value everything was instantly fine. Cheers and a happy week Boris Am Di., 16. Jan. 2024 um 10:20 Uhr schrieb Eugen Block : > Hi, > > there have bee

[ceph-users] Email duplicates.

2024-01-16 Thread Roman Pashin

Hi owners of ceph-users list, I've been trying to post new message for the first time. First has been bounced because I've registered, but not subscribed to list. Than I've subscribed and sent message with picture, which was larger than allowed 500KB and got into quarantine as well. I've decided to

[ceph-users] Re: [quincy 17.2.7] ceph orchestrator not doing anything

2024-01-16 Thread Eugen Block

Hi, there have been a few threads with this topic, one of them is this one [1]. The issue there was that different ceph container images were in use. Can you check your container versions? If you don't configure a global image for all ceph daemons, e.g.: quincy-1:~ # ceph config set globa

[ceph-users] Re: erasure-code-lrc Questions regarding repair

2024-01-16 Thread Eugen Block

Hi, I don't really have an answer, I just wanted to mention that I created a tracker issue [1] because I believe there's a bug in the LRC plugin. But there hasn't been any response yet. [1] https://tracker.ceph.com/issues/61861 Zitat von Ansgar Jazdzewski : hi folks, I currently test er

[ceph-users] Re: Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

2024-01-16 Thread Eugen Block

Hi, could you provide more details what exactly you tried and which configs you set? Which compression mode are you running? In a small Pacific test cluster I just set the mode to "force" (default "none"): storage:~ # ceph config set osd bluestore_compression_mode force And then after a

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-16 Thread Eugen Block

Hi, I have dealt with this topic multiple times, the SUSE team helped understanding what's going on under the hood. The summary can be found in this thread [1]. What helped in our case was to reduce the mds_recall_max_caps from 30k (default) to 3k. We tried it in steps of 1k IIRC. So I su

[ceph-users] Re: [v18.2.1] problem with wrong osd device symlinks after upgrade to 18.2.1

2024-01-16 Thread Eugen Block

Did you find an existing tracker issue for that? I suggest to report your findings there. Thanks! Eugen Zitat von Reto Gysi : Hi Eugen LV tags seem to look ok to me. LV_tags: - root@zephir:~# lvs -a -o +devices,tags | egrep 'osd1| LV' | grep -v osd12 LV

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

[ceph-users] Re: Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

[ceph-users] Re: Stuck in upgrade process to reef

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

[ceph-users] Re: How does mclock work?

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

[ceph-users] Re: ceph pg mark_unfound_lost delete results in confused ceph

[ceph-users] OSD read latency grows over time

[ceph-users] Re: [quincy 17.2.7] ceph orchestrator not doing anything

[ceph-users] Email duplicates.

[ceph-users] Re: [quincy 17.2.7] ceph orchestrator not doing anything

[ceph-users] Re: erasure-code-lrc Questions regarding repair

[ceph-users] Re: Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

[ceph-users] Re: [v18.2.1] problem with wrong osd device symlinks after upgrade to 18.2.1

20 matches

Site Navigation

Mail list logo

Footer information