RE: [Ganglia-general] Sending alert

2004-03-02 Thread Jason A. Smith
Instead of cluttering the main ganglia program with this, it might be better to write a separate application that periodically polls a gmond/gmetad and parses the XML data to look for potential problems. In my opinion, this is the proper place since it would act more like a client tool that uses

Re: [Ganglia-general] Sending alert

2004-03-02 Thread Daniel Rich
On Mar 2, 2004, at 8:15 AM, Jason A. Smith wrote: Instead of cluttering the main ganglia program with this, it might be better to write a separate application that periodically polls a gmond/gmetad and parses the XML data to look for potential problems. In my opinion, this is the proper place

Re: [Ganglia-general] Sending alert

2004-03-02 Thread Leif Nixon
Daniel Rich [EMAIL PROTECTED] writes: My personal choice would be a Nagios plugin that could return both host and/or cluster status. I have it on my plate to write one, if it ever makes it higher than the mass of other things on my plate these days.. Sadly, Nagios doesn't play well with

Re: [Ganglia-general] Sending alert

2004-03-02 Thread canon
I'm not sure I agree with your conclusion. Why couldn't the nagios plugin connect to a gmond on one of the nodes in each cluster and parse the XML. It should also have a timeout to go to another node, if the primary happens to be down. I see ganglia's purpose as to collect performance

Re: [Ganglia-general] Sending alert

2004-03-02 Thread Leif Nixon
[EMAIL PROTECTED] writes: I'm not sure I agree with your conclusion. Why couldn't the nagios plugin connect to a gmond on one of the nodes in each cluster and parse the XML. It should also have a timeout to go to another node, if the primary happens to be down. A Nagios plugin can only

Re: [Ganglia-general] Sending alert

2004-03-02 Thread canon
This is similar to the status checks I do for our batch scheduler system. After I grab the status from the batch system, I cache it. That way it doesn't have to redo the query. I was assuming that the normal host ping (done for the host check) would still be done by nagios. I guess your goal

Re: [Ganglia-general] Sending alert

2004-03-02 Thread Dan Rich
[EMAIL PROTECTED] wrote: I'm not sure I agree with your conclusion. Why couldn't the nagios plugin connect to a gmond on one of the nodes in each cluster and parse the XML. It should also have a timeout to go to another node, if the primary happens to be down. I see ganglia's purpose as to