subject:"\[ceph\-users\] Re\: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures"

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-21 Thread Igor Fedotov

Hi! Can you share OSD logs demostrating such a restart? Thanks, Igor On 20/09/2023 20:16, sbeng...@gmail.com wrote: Since upgrading to 18.2.0 , OSDs are very frequently restarting due to livenessprobe failures making the cluster unusable. Has anyone else seen this behavior? Upgrade path:

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-21 Thread Travis Nielsen

If there is nothing obvious in the OSD logs such as failing to start, and if the OSDs appear to be running until the liveness probe restarts them, you could disable or change the timeouts on the liveness probe. See https://rook.io/docs/rook/latest/CRDs/Cluster/ceph-cluster-crd/#health-settings . B

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-21 Thread Sudhin Bengeri

Igor, Travis, Thanks for your attention to this issue. We extended the timeout for the liveness probe yesterday, and also extended the time after which a down OSD deployment is deleted by the operator. Once all the OSD deployments were recreated by the operator, we observed two OSD restarts - whi

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-22 Thread Peter Goron

Hi, For the record, in the past we faced a similar issue with OSDs being killed one after each other every day starting from midnight. The root cause was linked to device_health_check launched by mgr on each OSD. While OSD is doing device_health_check, OSD admin socket is busy and can't answer to

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-26 Thread sbengeri

Hi Igor, Please let where can I upload the OSD logs. Thanks. Sudhin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-26 Thread Igor Fedotov

Hi Sudhin, any publicly available cloud storage, e.g. Google drive should work. Thanks, Igor On 26/09/2023 22:52, sbeng...@gmail.com wrote: Hi Igor, Please let where can I upload the OSD logs. Thanks. Sudhin ___ ceph-users mailing list -- ceph-user

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-27 Thread sbengeri

Hi Igor, I have copied three OSD logs to https://drive.google.com/file/d/1aQxibFJR6Dzvr3RbuqnpPhaSMhPSL--F/view?usp=sharing Hopefully they include some meaningful information. Thank you. Sudhin ___ ceph-users mailing list -- ceph-users@ceph.io To uns

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Igor Fedotov

Hi Sudhin, It looks like manual DB compactions are (periodically?) issued via admin socket for your OSDs, which (my working hypothesis) triggers DB access stalls. Here are the log lines indicating such calls debug 2023-09-22T11:24:55.234+ 7fc4efa20700 1 osd.1 1192508 triggering manual

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Mark Nelson

There are some pretty strange compaction behavior happening in these logs. For instance, in osd0, we see a O-1 CF L1 compaction that's taking ~204 seconds: 2023-09-21T20:03:59.378+ 7f16a286c700 4 rocksdb: (Original Log Time 2023/09/21-20:03:59.381808) EVENT_LOG_v1 {"time_micros": 169532

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

9 matches

Site Navigation

Mail list logo

Footer information