Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius
Hey Burkhard, we did actually restart osd.61, which led to the current status. Best, Nico Burkhard Linke writes:> > On 01/23/2018 08:54 AM, Nico Schottelius wrote: >> Good morning, >> >> the osd.61 actually just crashed and the disk is still

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Burkhard Linke
Hi, On 01/23/2018 08:54 AM, Nico Schottelius wrote: Good morning, the osd.61 actually just crashed and the disk is still intact. However, after 8 hours of rebuilding, the unfound objects are still missing: *snipsnap* Is there any chance to recover those pgs or did we actually lose data

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius
... while trying to locate which VMs are potentially affected by a revert/delete, we noticed that root@server1:~# rados -p one-hdd ls hangs. Where does ceph store the index of block devices found in a pool? And is it possible that this information is in one of the damaged pgs? Nico Nico

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
Good morning, the osd.61 actually just crashed and the disk is still intact. However, after 8 hours of rebuilding, the unfound objects are still missing: root@server1:~# ceph -s cluster: id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab health: HEALTH_WARN noscrub,nodeep-scrub

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread David Turner
Weight the remaining disks you added to 0.0. They seem to be a bad batch. This will start moving their data off of them and back onto the rest of the cluster. I generally suggest not to add storage in more than what you can afford to lose, unless you trust your burn-in process. So if you have a

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
While writing, yet another disk (osd.61 now) died and now we have 172 pgs down: [19:32:35] server2:~# ceph -s cluster: id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 21033/2263701 objects misplaced (0.929%)

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread David Turner
I do remember seeing that exactly. As the number of recovery_wait pgs decreased, the number of unfound objects decreased until they were all found. Unfortunately it blocked some IO from happening during the recovery, but in the long run we ended up with full data integrity again. On Mon, Jan 22,

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
Hey David, thanks for the fast answer. All our pools are running with size=3, min_size=2 and the two disks were in 2 different hosts. What I am a bit worried about is the output of "ceph pg 4.fa query" (see below) that indicates that ceph already queried all other hosts and did not find the

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread David Turner
I have had the same problem before with unfound objects that happened while backfilling after losing a drive. We didn't lose drives outside of the failure domains and ultimately didn't lose any data, but we did have to wait until after all of the PGs in recovery_wait state were caught up. So if

[ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
Hello, we added about 7 new disks yesterday/today and our cluster became very slow. While the rebalancing took place, 2 of the 7 new added disks died. Our cluster is still recovering, however we spotted that there are a lot of unfound objects. We lost osd.63 and osd.64, which seem not to be