Hey Burkhard,
we did actually restart osd.61, which led to the current status.
Best,
Nico
Burkhard Linke writes:>
> On 01/23/2018 08:54 AM, Nico Schottelius wrote:
>> Good morning,
>>
>> the osd.61 actually just crashed and the disk is still
Hi,
On 01/23/2018 08:54 AM, Nico Schottelius wrote:
Good morning,
the osd.61 actually just crashed and the disk is still intact. However,
after 8 hours of rebuilding, the unfound objects are still missing:
*snipsnap*
Is there any chance to recover those pgs or did we actually lose data
... while trying to locate which VMs are potentially affected by a
revert/delete, we noticed that
root@server1:~# rados -p one-hdd ls
hangs. Where does ceph store the index of block devices found in a pool?
And is it possible that this information is in one of the damaged pgs?
Nico
Nico
Good morning,
the osd.61 actually just crashed and the disk is still intact. However,
after 8 hours of rebuilding, the unfound objects are still missing:
root@server1:~# ceph -s
cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_WARN
noscrub,nodeep-scrub
Weight the remaining disks you added to 0.0. They seem to be a bad batch.
This will start moving their data off of them and back onto the rest of the
cluster. I generally suggest not to add storage in more than what you can
afford to lose, unless you trust your burn-in process. So if you have a
While writing, yet another disk (osd.61 now) died and now we have
172 pgs down:
[19:32:35] server2:~# ceph -s
cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
21033/2263701 objects misplaced (0.929%)
I do remember seeing that exactly. As the number of recovery_wait pgs
decreased, the number of unfound objects decreased until they were all
found. Unfortunately it blocked some IO from happening during the
recovery, but in the long run we ended up with full data integrity again.
On Mon, Jan 22,
Hey David,
thanks for the fast answer. All our pools are running with size=3,
min_size=2 and the two disks were in 2 different hosts.
What I am a bit worried about is the output of "ceph pg 4.fa query" (see
below) that indicates that ceph already queried all other hosts and did
not find the
I have had the same problem before with unfound objects that happened while
backfilling after losing a drive. We didn't lose drives outside of the
failure domains and ultimately didn't lose any data, but we did have to
wait until after all of the PGs in recovery_wait state were caught up. So
if
Hello,
we added about 7 new disks yesterday/today and our cluster became very
slow. While the rebalancing took place, 2 of the 7 new added disks
died.
Our cluster is still recovering, however we spotted that there are a lot
of unfound objects.
We lost osd.63 and osd.64, which seem not to be
10 matches
Mail list logo