Re: [ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-09 Thread Jonas Jelten
Yay! I've managed to get the cluster up and running again. Deep scrub is still fixing inconsistencies. I had to do a depth-first-search in the tree of startup errors. My procedure was the already described one: Find and delete PGs from OSDs which trigger the assertion. I've created a script to a

Re: [ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-05 Thread Neha Ojha
Hi JJ, In the case, the condition olog.head >= log.tail is not true, therefore it crashes. Could you please open a tracker issue(https://tracker.ceph.com/) and attach the osd logs and the pg dump output? Thanks, Neha On Thu, Oct 4, 2018 at 9:29 AM, Jonas Jelten wrote: > Hello! > > Unfortunately

[ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-04 Thread Jonas Jelten
Hello! Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some ODSs crash when they start peering. I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2). The problem was induced by when RAM was filled up and ODS processes then crashed because of memory allocation failures. No weird