The "out" OSD was "out" before the crash and doesn't hold any data as it
was weighted out prior.
Restarting OSDs named as repeat offenders as listed by 'ceph health
detail' has cleared problems.
Thanks to all for the guidance and suffering my panic,
--
Eric
On 4/12/16 12:38 PM, Eric Hall wr
Hi,
looks like one of your OSDs has been marked as out. Just make sure it’s in so
you can read '67 osds: 67 up, 67 in' rather than '67 osds: 67 up, 66 in’ in the
‘ceph -s’ output
You can quickly check which one is not in with the ‘ceph old tree’ command
JC
> On Apr 12, 2016, at 11:21, Joao Ed
On 04/12/2016 07:16 PM, Eric Hall wrote:
Removed mon on mon1, added mon on mon1 via ceph-deply. mons now have
quorum.
I am left with:
cluster 5ee52b50-838e-44c4-be3c-fc596dc46f4e
health HEALTH_WARN 1086 pgs peering; 1086 pgs stuck inactive; 1086
pgs stuck unclean; pool vms has too few
Removed mon on mon1, added mon on mon1 via ceph-deply. mons now have
quorum.
I am left with:
cluster 5ee52b50-838e-44c4-be3c-fc596dc46f4e
health HEALTH_WARN 1086 pgs peering; 1086 pgs stuck inactive; 1086
pgs stuck unclean; pool vms has too few pgs
monmap e5: 3 mons at
{cephsecur
On 04/12/2016 06:38 PM, Eric Hall wrote:
Ok, mon2 and mon3 are happy together, but mon1 dies with
mon/MonitorDBStore.h: 287: FAILED assert(0 == "failed to write to db")
I take this to mean mon1:store.db is corrupt as I see no permission issues.
So... remove mon1 and add a mon?
Nothing special
Ok, mon2 and mon3 are happy together, but mon1 dies with
mon/MonitorDBStore.h: 287: FAILED assert(0 == "failed to write to db")
I take this to mean mon1:store.db is corrupt as I see no permission issues.
So... remove mon1 and add a mon?
Nothing special to worry about re-adding a mon on mon1, o
On 04/12/2016 05:06 PM, Joao Eduardo Luis wrote:
On 04/12/2016 04:27 PM, Eric Hall wrote:
On 4/12/16 9:53 AM, Joao Eduardo Luis wrote:
So this looks like the monitors didn't remove version 1, but this may
just be a red herring.
What matters, really, is the values in 'first_committed' and
'las
On 04/12/2016 04:27 PM, Eric Hall wrote:
On 4/12/16 9:53 AM, Joao Eduardo Luis wrote:
So this looks like the monitors didn't remove version 1, but this may
just be a red herring.
What matters, really, is the values in 'first_committed' and
'last_committed'. If either first or last_committed ha
On 4/12/16 9:53 AM, Joao Eduardo Luis wrote:
So this looks like the monitors didn't remove version 1, but this may
just be a red herring.
What matters, really, is the values in 'first_committed' and
'last_committed'. If either first or last_committed happens to be '1',
then there may be a bug s
On 04/12/2016 03:33 PM, Eric Hall wrote:
On 4/12/16 9:02 AM, Gregory Farnum wrote:
On Tue, Apr 12, 2016 at 4:41 AM, Eric Hall
wrote:
On 4/12/16 12:01 AM, Gregory Farnum wrote:
Exactly what values are you reading that's giving you those values?
The "real" OSDMap epoch is going to be at least 3
On 4/12/16 9:02 AM, Gregory Farnum wrote:
On Tue, Apr 12, 2016 at 4:41 AM, Eric Hall wrote:
On 4/12/16 12:01 AM, Gregory Farnum wrote:
Exactly what values are you reading that's giving you those values?
The "real" OSDMap epoch is going to be at least 38630...if you're very
lucky it will be exa
On Tue, Apr 12, 2016 at 4:41 AM, Eric Hall wrote:
> On 4/12/16 12:01 AM, Gregory Farnum wrote:
>>
>> On Mon, Apr 11, 2016 at 3:45 PM, Eric Hall
>> wrote:
>>>
>>> Power failure in data center has left 3 mons unable to start with
>>> mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)
>>
On 4/12/16 12:01 AM, Gregory Farnum wrote:
On Mon, Apr 11, 2016 at 3:45 PM, Eric Hall wrote:
Power failure in data center has left 3 mons unable to start with
mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)
Have found simliar problem discussed at
http://irclogs.ceph.widodh.nl/in
On Mon, Apr 11, 2016 at 3:45 PM, Eric Hall wrote:
> Power failure in data center has left 3 mons unable to start with
> mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)
>
> Have found simliar problem discussed at
> http://irclogs.ceph.widodh.nl/index.php?date=2015-05-29, but am unsur
Power failure in data center has left 3 mons unable to start with
mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)
Have found simliar problem discussed at
http://irclogs.ceph.widodh.nl/index.php?date=2015-05-29, but am unsure
how to proceed.
If I read
ceph-kvstore-tool /var/lib
15 matches
Mail list logo