[ceph-users] Re: Please discuss about Slow Peering

2024-05-16 Thread Frank Schilder
This is a long shot: if you are using octopus, you might be hit by this pglog-dup problem: https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/. They don't mention slow peering explicitly in the blog, but its also a consequence because the up+acting OSDs need to go through the PG_log duri

[ceph-users] Re: Please discuss about Slow Peering

2024-05-16 Thread Anthony D'Atri
If using jumbo frames, also ensure that they're consistently enabled on all OS instances and network devices. > On May 16, 2024, at 09:30, Frank Schilder wrote: > > This is a long shot: if you are using octopus, you might be hit by this > pglog-dup problem: > https://docs.clyso.com/blog/osds

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread 서민우
We used the "kioxia kcd6xvul3t20" model. Any infamous information of this Model? 2024년 5월 17일 (금) 오전 2:58, Anthony D'Atri 님이 작성: > If using jumbo frames, also ensure that they're consistently enabled on > all OS instances and network devices. > > > On May 16, 2024, at 09:30, Frank Schilder wrote

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Frank Schilder
We are using the read-intensive kioxia drives (octopus cluster) in RBD pools and are very happy with them. I don't think its the drives. The last possibility I could think of is CPU. We run 4 OSDs per 1.92TB Kioxia drive to utilize their performance (single OSD per disk doesn't cut it at all) a

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread 서민우
I compared your advice to my current setup. I think it worked that increasing 'osd_memory_target'. I changed 4G -> 8G then, the latency of our test decreased about 50%. I have additional questions, We use 13 disk (3.2TB NVMe) per server and allocate one OSD to each disk. In other words 1 Node has

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Anthony D'Atri
> > > I have additional questions, > We use 13 disk (3.2TB NVMe) per server and allocate one OSD to each disk. In > other words 1 Node has 13 osds. > Do you think this is inefficient? > Is it better to create more OSD by creating LV on the disk? Not with the most recent Ceph releases. I suspec

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Frank Schilder
> Not with the most recent Ceph releases. Actually, this depends. If its SSDs for which IOPs profit from higher iodepth, it is very likely to improve performance, because until today each OSD has only one kv_sync_thread and this is typically the bottleneck with heavy IOPs load. Having 2-4 kv_sy