Re: [ceph-users] Flapping OSDs

2017-04-07 Thread Vlad Blando
gt;>>> >>>> >>>> >>>> I am curious if those OSDs are flapping all at once? If a single host >>>> is affected I would consider the network connectivity (bottlenecks and >>>> misconfigured bonds can generate strange situations), storage co

Re: [ceph-users] Flapping OSDs

2017-04-06 Thread Vlad Blando
would consider the network connectivity (bottlenecks and >>> misconfigured bonds can generate strange situations), storage controller >>> and firmware. >>> >>> >>> >>> Cheers, >>> >>> Maxime >>> >>> >>>

Re: [ceph-users] Flapping OSDs

2017-04-03 Thread Brian :
e >> >> >> >> *From: *ceph-users on behalf of Vlad >> Blando >> *Date: *Sunday 2 April 2017 16:28 >> *To: *ceph-users >> *Subject: *[ceph-users] Flapping OSDs >> >> >> >> Hi, >> >> >> >> One of my c

Re: [ceph-users] Flapping OSDs

2017-04-03 Thread Vlad Blando
ituations), storage controller > and firmware. > > > > Cheers, > > Maxime > > > > *From: *ceph-users on behalf of Vlad > Blando > *Date: *Sunday 2 April 2017 16:28 > *To: *ceph-users > *Subject: *[ceph-users] Flapping OSDs > > > > Hi,

Re: [ceph-users] Flapping OSDs

2017-04-02 Thread Maxime Guyot
Blando Date: Sunday 2 April 2017 16:28 To: ceph-users Subject: [ceph-users] Flapping OSDs Hi, One of my ceph nodes have flapping OSDs, network between nodes are fine, it's on a 10GBase-T network. I don't see anything wrong with the network, but these OSDs are going up/down. [root@ava

[ceph-users] Flapping OSDs

2017-04-02 Thread Vlad Blando
Hi, One of my ceph nodes have flapping OSDs, network between nodes are fine, it's on a 10GBase-T network. I don't see anything wrong with the network, but these OSDs are going up/down. [root@avatar0-ceph4 ~]# ceph osd tree # idweight type name up/down reweight -1 174.7 root defa

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-03 Thread Tom Christensen
set in some of the test software… > >> > >> Paul > >> > >> > >> From: ceph-users on behalf of Tom > >> Christensen > >> Date: Monday, 30 November 2015 at 23:20 > >> To: "ceph-users@lists.ceph.com" > >> Subject:

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-02 Thread Gregory Farnum
in Ceph – can >> anybody confirm this? >> I could not find any usage in the Ceph source code except that the value >> is set in some of the test software… >> >> Paul >> >> >> From: ceph-users on behalf of Tom >> Christensen >> Date: Mond

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-01 Thread Tom Christensen
xattr use omap’ is no longer used in Ceph – can >> anybody confirm this? >> I could not find any usage in the Ceph source code except that the value >> is set in some of the test software… >> >> Paul >> >> >> From: ceph-users on behalf of Tom >

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-01 Thread Tom Christensen
h source code except that the value > is set in some of the test software… > > Paul > > > From: ceph-users on behalf of Tom > Christensen > Date: Monday, 30 November 2015 at 23:20 > To: "ceph-users@lists.ceph.com" > Subject: Re: [ceph-users] Flapping OSDs

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-01 Thread HEWLETT, Paul (Paul)
of Tom Christensen mailto:pav...@gmail.com>> Date: Monday, 30 November 2015 at 23:20 To: "ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs What counts as

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-01 Thread Dan van der Ster
On Tue, Dec 1, 2015 at 12:20 AM, Tom Christensen wrote: > What counts as ancient? Concurrent to our hammer upgrade we went from > 3.16->3.19 on ubuntu 14.04. We are looking to revert to the 3.16 kernel > we'd been running because we're also seeing an intermittent (its happened > twice in 2 weeks

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Tom Christensen
What counts as ancient? Concurrent to our hammer upgrade we went from 3.16->3.19 on ubuntu 14.04. We are looking to revert to the 3.16 kernel we'd been running because we're also seeing an intermittent (its happened twice in 2 weeks) massive load spike that completely hangs the osd node (we're ta

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Dan van der Ster
The trick with debugging heartbeat errors is to grep back through the log to find the last thing the affected thread was doing, e.g. is 0x7f5affe72700 stuck in messaging, writing to the disk, reading through the omap, etc.. I agree this doesn't look to be network related, but if you want to rule i

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Tom Christensen
No, CPU and memory look normal. We haven't been fast/lucky enough with iostat to see if we're just slamming the disk itself, I continue to attempt to catch one, get logged into the node, find the disk and get iostat running before the OSD comes back up. We haven't flapped that many OSDs, and most

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Wido den Hollander
On 11/30/2015 08:56 PM, Tom Christensen wrote: > We recently upgraded to 0.94.3 from firefly and now for the last week > have had intermittent slow requests and flapping OSDs. We have been > unable to nail down the cause, but its feeling like it may be related to > our osdmaps not getting deleted

[ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Tom Christensen
We recently upgraded to 0.94.3 from firefly and now for the last week have had intermittent slow requests and flapping OSDs. We have been unable to nail down the cause, but its feeling like it may be related to our osdmaps not getting deleted properly. Most of our osds are now storing over 100GB

Re: [ceph-users] Flapping OSDs. Safe to upgrade?

2014-05-14 Thread Craig Lewis
Anything in dmesg? Just [188924.137100] init: ceph-osd (ceph/6) main process (8262) killed by ABRT signal [188924.137138] init: ceph-osd (ceph/6) main process ended, respawning When you say restart, do you mean a physical restart, or just restarting the daemon? If it takes a physical re

Re: [ceph-users] Flapping OSDs. Safe to upgrade?

2014-05-14 Thread Brian Rak
Anything in dmesg? When you say restart, do you mean a physical restart, or just restarting the daemon? If it takes a physical restart and you're using intel NICs, it might be worth upgrading network drivers. Old versions have some bugs that cause them to just drop traffic. On 5/14/2014 9:0

[ceph-users] Flapping OSDs. Safe to upgrade?

2014-05-14 Thread Craig Lewis
I have 4 OSDs that won't stay in the cluster. I restart them, they join for a bit, then get kicked out because they stop responding to pings from the other OSDs. I don't know what the issue is. The disks look fine. SMART reports no errors or reallocated sectors. iostat says the disks are n