Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-02-15 Thread Sam Lang
On Mon, Feb 11, 2013 at 7:39 PM, Isaac Otsiabah zmoo...@yahoo.com wrote: Yes, there were osd daemons running on the same node that the monitor was running on. If that is the case then i will run a test case with the monitor running on a different node where no osd is running and see what

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-02-15 Thread Isaac Otsiabah
Hello Sam and Gregory, i got machines today and tested it with the monitor process running on a separate system with no osd daemons and i did not see the problem. On Monday i will do a few test to confirm. Isaac - Original Message - From: Sam Lang sam.l...@inktank.com To: Isaac

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-02-11 Thread Gregory Farnum
jIsaac, I'm sorry I haven't been able to wrangle any time to look into this more yet, but Sage pointed out in a related thread that there might be some buggy handling of things like this if the OSD and the monitor are located on the same host. Am I correct in assuming that with your small cluster,

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-02-11 Thread Isaac Otsiabah
Yes, there were osd daemons running on the same node that the monitor was running on.  If that is the case then i will run a test case with the monitor running on a different node where no osd is running and see what happens. Thank you. Isaac From: Gregory

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-28 Thread Isaac Otsiabah
Gregory, i recreated the osd down problem again this morning on two nodes (g13ct, g14ct). First, i created a 1-node cluster on g13ct (with osd.0, 1 ,2) and then added host g14ct (osd3. 4, 5). osd.1 went down for about 1 minute and half after adding osd 3, 4, 5 were adde4d. i have included the

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-25 Thread Isaac Otsiabah
Gregory, the network physical layout is simple, the two networks are separate. the 192.168.0 and the 192.168.1 are not subnets within a network. Isaac  - Original Message - From: Gregory Farnum g...@inktank.com To: Isaac Otsiabah zmoo...@yahoo.com Cc: ceph-devel@vger.kernel.org

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-25 Thread Sam Lang
On Fri, Jan 25, 2013 at 11:51 AM, Isaac Otsiabah zmoo...@yahoo.com wrote: Gregory, the network physical layout is simple, the two networks are separate. the 192.168.0 and the 192.168.1 are not subnets within a network. Hi Isaac, Could you send us your routing tables on the osds (route -n).

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-24 Thread Isaac Otsiabah
Gregory, i tried send the the attached debug output several times and the mail server  rejected them all probably becauseof the file size so i cut the log file size down and it is attached. You will see the reconnection failures by the error message line below. The ceph version is 0.56 it

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-24 Thread Gregory Farnum
What's the physical layout of your networking? This additional log may prove helpful as well, but I really need a bit more context in evaluating the messages I see from the first one. :) -Greg On Thursday, January 24, 2013 at 9:24 AM, Isaac Otsiabah wrote: Gregory, i tried send the the

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-07 Thread Gregory Farnum
On Monday, January 7, 2013 at 1:00 PM, Isaac Otsiabah wrote: When i add a new host (with osd's) to my existing cluster, 1 or 2 previous osd(s) goes down for about 2 minutes and then they come back up. [root@h1ct ~]# ceph osd tree # id weight type name up/down reweight -1 3 root