Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
ase excuse any typos. On Fri, May 24, 2019, 4:42 AM Kevin Flöh <mailto:kevin.fl...@kit.edu>> wrote: Hi, we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" but this is just hanging forever if we are looking for unfound objects. It works fine f

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
and found nothing. This is also working for non unfound objects. Is there another way to find the corresponding file? On 24.05.19 11:12 vorm., Burkhard Linke wrote: Hi, On 5/24/19 9:48 AM, Kevin Flöh wrote: We got the object ids of the missing objects with|ceph pg 1.24c li

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
ose objects with:| ceph pg 1.24c mark_unfound_lost revert But first we would like to know which file(s) is affected. Is there a way to map the object id to the corresponding file? || On 23.05.19 3:52 nachm., Alexandre Marangone wrote: The PGs will stay active+recovery_wait+degraded until you so

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
. If anything else happens, you should stop and let us know. -- dan On Thu, May 23, 2019 at 10:59 AM Kevin Flöh wrote: This is the current status of ceph: cluster: id: 23e72372-0d44-4cad-b24f-3641b14b86f4 health: HEALTH_ERR 9/125481144 objects unfound (0.000

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
ing another PG. On Thu, May 23, 2019 at 10:53 AM Kevin Flöh wrote: Hi, we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery w

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going? Kevin On 22.05.19 6:03 nachm., Robert LeBlanc wrote: On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <mailto:kevin.fl...@kit.edu>> wrote: Hi, thank you, it worked. The PGs are not i

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread Kevin Flöh
to repair? Regards, Kevin On 21.05.19 4:52 nachm., Wido den Hollander wrote: On 5/21/19 4:48 PM, Kevin Flöh wrote: Hi, we gave up on the incomplete pgs since we do not have enough complete shards to restore them. What is the procedure to get rid of these pgs? You need to start with markin

Re: [ceph-users] Major ceph disaster

2019-05-21 Thread Kevin Flöh
Hi, we gave up on the incomplete pgs since we do not have enough complete shards to restore them. What is the procedure to get rid of these pgs? regards, Kevin On 20.05.19 9:22 vorm., Kevin Flöh wrote: Hi Frederic, we do not have access to the original OSDs. We exported the remaining

Re: [ceph-users] Major ceph disaster

2019-05-20 Thread Kevin Flöh
then. Best, Kevin On 17.05.19 2:36 nachm., Frédéric Nass wrote: Le 14/05/2019 à 10:04, Kevin Flöh a écrit : On 13.05.19 11:21 nachm., Dan van der Ster wrote: Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs? It would be useful to double confirm that: check with `ceph

Re: [ceph-users] Major ceph disaster

2019-05-17 Thread Kevin Flöh
-id} mark_unfound_lost revert|delete Cheers, Kevin On 15.05.19 8:55 vorm., Kevin Flöh wrote: The hdds of OSDs 4 and 23 are completely lost, we cannot access them in any way. Is it possible to use the shards which are maybe stored on working OSDs as shown in the all_participants list? On 14.05.19

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh
ceph osd pool get ec31 min_size min_size: 3 On 15.05.19 9:09 vorm., Konstantin Shalygin wrote: ceph osd pool get ec31 min_size ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh
The hdds of OSDs 4 and 23 are completely lost, we cannot access them in any way. Is it possible to use the shards which are maybe stored on working OSDs as shown in the all_participants list? On 14.05.19 5:24 nachm., Dan van der Ster wrote: On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh
Hi, since we have 3+1 ec I didn't try before. But when I run the command you suggested I get the following error: ceph osd pool set ec31 min_size 2 Error EINVAL: pool min_size must be between 3 and 4 On 14.05.19 6:18 nachm., Konstantin Shalygin wrote: peering does not seem to be blocked

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
quot;: "4(1),23(2),24(0)"     }     ]     }     ],     "probing_osds": [     "2(0)",     "4(1)",     "23(2)",     "24(0)",   

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 14.05.19 10:08 vorm., Dan van der Ster wrote: On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
he old one and copy whatever is left. Best regards, Kevin On Mon, May 13, 2019 at 4:20 PM Kevin Flöh wrote: Dear ceph experts, we have several (maybe related) problems with our ceph cluster, let me first show you the current ceph status: cluster: id: 23e72372-0d44-4cad-b24f-36

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here is what happened: One osd daemon could not be started and therefore we decided to mark the osd as lost

[ceph-users] Major ceph disaster

2019-05-13 Thread Kevin Flöh
Dear ceph experts, we have several (maybe related) problems with our ceph cluster, let me first show you the current ceph status:   cluster:     id: 23e72372-0d44-4cad-b24f-3641b14b86f4     health: HEALTH_ERR     1 MDSs report slow metadata IOs     1 MDSs report slow