Re: [Ganglia-developers] reorganizing clusters

2006-03-23 Thread Alex Balk
A setup which "solves" the update issue while maintaining a level of HA is to have 2 (or more) unicast send channels from each node to a pair (or more) of gmond aggregators and to have a multicast channel setup between the aggregators themselves. The cost is more network traffic, but it's pretty i

Re: [Ganglia-developers] reorganizing clusters

2006-03-23 Thread Chuck Simmons
Thanks for the clarifications. " (1) When a node receives a broadcast from another node that it hasn't seen before, it may want to send its data back to the first node. If I start node A and it broadcasts to an empty cluster, then I start node B and it broadcasts to A, then it might be

Re: [Ganglia-developers] reorganizing clusters

2006-03-23 Thread Jason A. Smith
On Thu, 2006-03-23 at 15:47 -0800, Chuck Simmons wrote: > Alex -- > > Thanks for the details. Telneting to a gmond XML port to dump > internal state is a nice debugging technique. > > One of my problems is that I'm running a secondary daemon using the > gmetric subroutine libraries, and it took

Re: [Ganglia-developers] reorganizing clusters

2006-03-23 Thread Chuck Simmons
Alex -- Thanks for the details. Telneting to a gmond XML port to dump internal state is a nice debugging technique. One of my problems is that I'm running a secondary daemon using the gmetric subroutine libraries, and it took me awhile to realize that daemon is in some ways equivalent to 'g

Re: [Ganglia-developers] reorganizing clusters

2006-03-23 Thread Alex Balk
Hi Chuck, See below... Chuck Simmons wrote: > The number of cpus does get sorted out, but I don't believe that > restarting 'gmond' is a solution. The problem occurs after restarting > a number of 'gmond' processes, and the problem is caused because > 'gmond' is not reporting the information

Re: [Ganglia-developers] reorganizing clusters

2006-03-23 Thread Chuck Simmons
The number of cpus does get sorted out, but I don't believe that restarting 'gmond' is a solution. The problem occurs after restarting a number of 'gmond' processes, and the problem is caused because 'gmond' is not reporting the information. Does 'gmond' maintain a timestamp on disk as to whe

[Ganglia-developers] Re: [torqueusers] ANNOUNCE: Public release of Ganglia Job Monarch v0.1.0

2006-03-23 Thread Ramon Bastiaans
Hi Bernard, Thanks for the suggestion. Yes I agree, this is on my wish/todo list, along with clickable nodes in the clusterimage. The are some caveats with implementing this, since the clusterimage I draw uses live data from the XML stream and the HTML tooltips over the image would have to u