[ceph-users] Re: OSD crash on Onode::put

2023-01-16 Thread Dongdong Tao
Hi Frank, I don't have an operational workaround, the patch https://github.com/ceph/ceph/pull/46911/commits/f43f596aac97200a70db7a70a230eb9343018159 is simple and can be applied cleanly. Yes, restarting the OSD will clear pool entries, you can restart it when the bluestore_onode items are very

[ceph-users] Re: OSD crash on Onode::put

2023-01-13 Thread Frank Schilder
ceph-users@ceph.io Cc: d...@ceph.io Subject: Re: [ceph-users] Re: OSD crash on Onode::put Hi Frank, IMO all the below logic is a bit of overkill and no one can provide 100% valid guidance on specific numbers atm. Generally I agree with Dongdong's point that crash is effectively an OSD restart and hen

[ceph-users] Re: OSD crash on Onode::put

2023-01-13 Thread Frank Schilder
To: Serkan Çoban; Anthony D'Atri Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: OSD crash on Onode::put Hi Anthony and Serkan, I checked the drive temperatures and there is nothing special about this slot. The disks in this slot are from different vendors and were not populated incrementally

[ceph-users] Re: OSD crash on Onode::put

2023-01-12 Thread Igor Fedotov
king here how few onode items are acceptable before performance drops painfully. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____________________ From: Igor Fedotov Sent: 09 January 2023 13:34:42 To: Dongdong Tao;ceph-users@ceph.io Cc:d.

[ceph-users] Re: OSD crash on Onode::put

2023-01-11 Thread Frank Schilder
Hi Anthony and Serkan, I checked the drive temperatures and there is nothing special about this slot. The disks in this slot are from different vendors and were not populated incrementally. It might be a very weird coincidence. I seem to have an OSD developing this problem in another slot on a

[ceph-users] Re: OSD crash on Onode::put

2023-01-11 Thread Frank Schilder
.@gmail.com Subject: Re: [ceph-users] Re: OSD crash on Onode::put Hi Frank, I don't have an operational workaround, the patch https://github.com/ceph/ceph/pull/46911/commits/f43f596aac97200a70db7a70a230eb9343018159 is simple and can be applied cleanly. Yes, restarting the OSD will clear pool e

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Anthony D'Atri
Could this be a temporal co-incidence? E.g. each host got a different model drive in slot 19 via an incremental expansion. > On Jan 10, 2023, at 05:27, Frank Schilder wrote: > > Following up on my previous post, we have identical OSD hosts. The very > strange observation now is, that all

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Serkan Çoban
Slot 19 is inside the chassis? Do you check chassis temperature? I sometimes have more failure rate in chassis HDDs than in front of the chassis. In our case it was related to the temperature difference. On Tue, Jan 10, 2023 at 1:28 PM Frank Schilder wrote: > > Following up on my previous post,

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Frank Schilder
Following up on my previous post, we have identical OSD hosts. The very strange observation now is, that all outlier OSDs are in exactly the same disk slot on these hosts. We have 5 problematic OSDs and they are all in slot 19 on 5 different hosts. This is an extremely strange and unlikely

[ceph-users] Re: OSD crash on Onode::put

2023-01-10 Thread Frank Schilder
king here how few onode items are acceptable before performance drops painfully. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____________________ From: Igor Fedotov Sent: 09 January 2023 13:34:42 To: Dongdong Tao; ceph-users@ceph.io Cc: d.

[ceph-users] Re: OSD crash on Onode::put

2023-01-09 Thread Igor Fedotov
Hi Dongdong, thanks a lot for your post, it's really helpful. Thanks, Igor On 1/5/2023 6:12 AM, Dongdong Tao wrote: I see many users recently reporting that they have been struggling with this Onode::put race condition issue[1] on both the latest Octopus and pacific. Igor opened a PR [2]