Re: [Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-12 Thread Vijay Bellur
On 08/11/2012 10:16 PM, Harry Mangalam wrote: On Sat, Aug 11, 2012 at 9:41 AM, Brian Candler b.cand...@pobox.com Maybe worth trying an strace (strace -f -p pid 2strace.out) on the glusterfsd process, or whatever it is which is causing the high load, during such a burst,

Re: [Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-11 Thread Brian Candler
On Sat, Aug 11, 2012 at 12:11:39PM +0100, Nux! wrote: On 10.08.2012 22:16, Harry Mangalam wrote: pbs3:/dev/md127 8.2T 5.9T 2.3T 73% /bducgl --- Harry, The name of that md device (127) indicated there may be something dodgy going on there. A device shouldn't be named 127 unless some

Re: [Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-11 Thread Harry Mangalam
Thanks for your comments. I use mdadm on many servers and I've seen md numbering like this a fair bit. Usually it occurs after a another RAID has been created and the numbering shifts. Neil Brown (mdadm's author) , seems to think it's fine. So I don't think that's the problem. And you're right

Re: [Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-11 Thread Brian Candler
On Sat, Aug 11, 2012 at 08:31:51AM -0700, Harry Mangalam wrote: Re the size difference, I'll explicitly rebalance the brick after the fix-layout finishes, but I'm even more worried about this fantastic increase in CPU usage and its effect on user performance. This presumably means you

Re: [Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-11 Thread Harry Mangalam
On Sat, Aug 11, 2012 at 9:41 AM, Brian Candler b.cand...@pobox.com wrote: On Sat, Aug 11, 2012 at 08:31:51AM -0700, Harry Mangalam wrote: Re the size difference, I'll explicitly rebalance the brick after the fix-layout finishes, but I'm even more worried about this fantastic

Re: [Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-11 Thread Joe Julian
Check your client logs. I have seen that with network issues causing disconnects. Harry Mangalam hjmanga...@gmail.com wrote: Thanks for your comments. I use mdadm on many servers and I've seen md numbering like this a fair bit. Usually it occurs after a another RAID has been created and the

[Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

2012-08-10 Thread Harry Mangalam
running 3.3 distributed on IPoIB on 4 nodes, 1 brick per node. Any idea why, on one of those nodes, glusterfsd would go berserk, running up to 370% CPU and driving load to 30 (file performance on the clients slows to a crawl). While very slow, it continued to serve out files. This is the second