Users'
*Asunto:* Re: [ceph-users] 14.2.2 - OSD Crash
Hi Manuel,
as Brad pointed out timeouts and suicides are rather consequences of
some other issues with OSDs.
I recall at least two recent relevant tickets:
https://tracker.ceph.com/issues/36482
https://tracker.ceph.com/issues/40741 (see
to a NVME of 480GB per node helps in this
situation but not sure.
Manuel
De: Igor Fedotov
Enviado el: miƩrcoles, 7 de agosto de 2019 13:10
Para: EDH - Manuel Rios Fernandez ; 'Ceph Users'
Asunto: Re: [ceph-users] 14.2.2 - OSD Crash
Hi Manuel,
as Brad pointed out timeouts
Hi Manuel,
as Brad pointed out timeouts and suicides are rather consequences of
some other issues with OSDs.
I recall at least two recent relevant tickets:
https://tracker.ceph.com/issues/36482
https://tracker.ceph.com/issues/40741 (see last comments)
Both had massive and slow reads from
-63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map
clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed
out after 150
You hit a suicide timeout, that's fatal. On line 80 the process kills
the thread based on the assumption it's hung.
src/common/HeartbeatMap.cc:
66
Hi
We got a pair of OSD located in node that crash randomly since 14.2.2
OS Version : Centos 7.6
There're a ton of lines before crash , I will unespected:
--
3045> 2019-08-07 00:39:32.013 7fe9a4996700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed