[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-07 Thread Dan van der Ster
Hi again, I'm not sure the html mail made it to the lists -- resending in plain text. I've also opened https://tracker.ceph.com/issues/56488 Cheers, Dan On Wed, Jul 6, 2022 at 11:43 PM Dan van der Ster wrote: > > Hi Igor and others, > > (apologies for html, but i want to share a plot ;) ) > >

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-07 Thread Konstantin Shalygin
Hi, > On 7 Jul 2022, at 13:04, Dan van der Ster wrote: > > I'm not sure the html mail made it to the lists -- resending in plain text. > I've also opened https://tracker.ceph.com/issues/56488 > I think with pacific you need to redeploy all OSD's to respec

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-07 Thread Dan van der Ster
Hi, On Thu, Jul 7, 2022 at 2:37 PM Konstantin Shalygin wrote: > > Hi, > > On 7 Jul 2022, at 13:04, Dan van der Ster wrote: > > I'm not sure the html mail made it to the lists -- resending in plain text. > I've also opened https://tracker.ceph.com/issues/56488 > > > I think with pacific you need

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-07 Thread Konstantin Shalygin
On 7 Jul 2022, at 15:41, Dan van der Ster wrote: > > How is one supposed to redeploy OSDs on a multi-PB cluster while the > performance is degraded? This is very strong point of view! Good that this case can be fixed with set bluestore_prefer_deferred_size_hdd to 128k, and I think we need anal

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-12 Thread Igor Fedotov
Hi Dan, I can confirm this is a regression introduced by https://github.com/ceph/ceph/pull/42725. Indeed strict comparison is a key point in your specific case but generally  it looks like this piece of code needs more redesign to better handle fragmented allocations (and issue deferred writ

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-12 Thread Konstantin Shalygin
Hi Igor, > On 12 Jul 2022, at 14:16, Igor Fedotov wrote: > > Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd prefer > not to raise it that high as 128K to avoid too many writes being deferred > (and hence DB overburden). For clarification, perhaps you mean bluestore_prefe

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-12 Thread Igor Fedotov
yep! Thanks and sorry for the confusion. On 7/12/2022 2:23 PM, Konstantin Shalygin wrote: Hi Igor, On 12 Jul 2022, at 14:16, Igor Fedotov wrote: Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd prefer not to raise it that high as 128K to avoid too many writes being defe

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-12 Thread Dan van der Ster
Hi Igor, Thank you for the reply and information. I confirm that `ceph config set osd bluestore_prefer_deferred_size_hdd 65537` correctly defers writes in my clusters. Best regards, Dan On Tue, Jul 12, 2022 at 1:16 PM Igor Fedotov wrote: > > Hi Dan, > > I can confirm this is a regression int

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread David Orman
Is this something that makes sense to do the 'quick' fix on for the next pacific release to minimize impact to users until the improved iteration can be implemented? On Tue, Jul 12, 2022 at 6:16 AM Igor Fedotov wrote: > Hi Dan, > > I can confirm this is a regression introduced by > https://githu

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Igor Fedotov
May be. My plan is to attempt to make general fix and if this wouldn't work within a short time frame - publish a 'quick' one. On 7/13/2022 4:58 PM, David Orman wrote: Is this something that makes sense to do the 'quick' fix on for the next pacific release to minimize impact to users until the

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Zakhar Kirpichenko
Hi! My apologies for butting in. Please confirm that bluestore_prefer_deferred_size_hdd is a runtime option, which doesn't require OSDs to be stopped or rebuilt? Best regards, Zakhar On Tue, 12 Jul 2022 at 14:46, Dan van der Ster wrote: > Hi Igor, > > Thank you for the reply and information. >

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Dan van der Ster
Yes, that is correct. No need to restart the osds. .. Dan On Thu., Jul. 14, 2022, 07:04 Zakhar Kirpichenko, wrote: > Hi! > > My apologies for butting in. Please confirm > that bluestore_prefer_deferred_size_hdd is a runtime option, which doesn't > require OSDs to be stopped or rebuilt? > > Bes

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Zakhar Kirpichenko
Many thanks, Dan. Much appreciated! /Z On Thu, 14 Jul 2022 at 08:43, Dan van der Ster wrote: > Yes, that is correct. No need to restart the osds. > > .. Dan > > > On Thu., Jul. 14, 2022, 07:04 Zakhar Kirpichenko, > wrote: > >> Hi! >> >> My apologies for butting in. Please confirm >> that blues

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-14 Thread Konstantin Shalygin
Dan, do you tested the redeploy one of your OSD with default pacific bluestore_min_alloc_size_hdd (4096) ? This will also resolves this issue (just not affected, when all options in their defaults)? Thanks, k > On 14 Jul 2022, at 08:43, Dan van der Ster wrote: >

[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-14 Thread Dan van der Ster
OK I recreated one OSD. It now has 4k min_alloc_size: 2022-07-14T10:52:58.382+0200 7fe5ec0aa200 1 bluestore(/var/lib/ceph/osd/ceph-0/) _open_super_meta min_alloc_size 0x1000 and I tested all these bluestore_prefer_deferred_size_hdd values: 4096: not deferred 4097: "_do_alloc_write deferring 0x1