Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-07 Thread Igor Fedotov
Users' *Asunto:* Re: [ceph-users] 14.2.2 - OSD Crash Hi Manuel, as Brad pointed out timeouts and suicides are rather consequences of some other issues with OSDs. I recall at least two recent relevant tickets: https://tracker.ceph.com/issues/36482 https://tracker.ceph.com/issues/40741 (see

Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-07 Thread EDH - Manuel Rios Fernandez
to a NVME of 480GB per node helps in this situation but not sure. Manuel De: Igor Fedotov Enviado el: miƩrcoles, 7 de agosto de 2019 13:10 Para: EDH - Manuel Rios Fernandez ; 'Ceph Users' Asunto: Re: [ceph-users] 14.2.2 - OSD Crash Hi Manuel, as Brad pointed out timeouts

Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-07 Thread Igor Fedotov
Hi Manuel, as Brad pointed out timeouts and suicides are rather consequences of some other issues with OSDs. I recall at least two recent relevant tickets: https://tracker.ceph.com/issues/36482 https://tracker.ceph.com/issues/40741 (see last comments) Both had massive and slow reads from

Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-06 Thread Brad Hubbard
-63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150 You hit a suicide timeout, that's fatal. On line 80 the process kills the thread based on the assumption it's hung. src/common/HeartbeatMap.cc: 66

[ceph-users] 14.2.2 - OSD Crash

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi We got a pair of OSD located in node that crash randomly since 14.2.2 OS Version : Centos 7.6 There're a ton of lines before crash , I will unespected: -- 3045> 2019-08-07 00:39:32.013 7fe9a4996700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed