Hi all,

I have Ganglia (2.4.1, and CVS) setup on Solaris, Debian, Redhat, and
IRIX. Each box seems to be working fine individually (i.e. ganglia
compiled and gathers data okay). But I seem to be getting the problems
as described by Steven Wagner in this post:
http://sourceforge.net/mailarchive/message.php?msg_id=1777620. I've had
a quick grep through the source and couldn't find the timeouts mentioned
by Steven in the metrics.h file (but it is late here so I could have
just missed it...).

My ganglia setup looks something like this:

Monitoring box, running Debian
Running Gmetad
(also running gmond)
     |
     |
Linux Redhat cluster
(30 boxes running gmond)
     |
     |
Two IRIX boxes
(running gmond)
     |
     |
One IRIX, one Solaris, one Redhat box
(running gmond)

(these boxes are all on the same network with switches between them)

The monitoring box and the three boxes are part of one multicast group,
and the two irix boxes are part of their own multicast group, and the
cluster is part of its own multicast group. Im not seeing the timeouts
thing on the latter two groups, only on the first one (i.e. the one with
IRIX, Solaris and Linux in it). The gmetad on the monitoring box is
looking at all three groups. Just to restate, the Redhat cluster and the
two lone IRIX boxes are working fine.

What I normally see happening is that the gmond running on the
monitoring box is the only one which is constantly there in a gstat -a
output. The Redhat box also shows up from time to time, and the IRIX and
solaris boxes rarely, if ever show up, unless I restart all the gmond's
on the boxes in that group, and then they disappear after 30 seconds to
a minute, leaving just the gmond on the monitoring box showing in the
output of gstat and on the php webpages. After another 2 - 5 minutes,
the Redhat box will show up, and occasionally after that the IRIX box
will too. Then in a minute or so, all the boxes will disappear from the
output, leaving only the monitoring box.

Anybody have any ideas on fixing this? (Sorry if this is the wrong list
to be posting to).
Let me know if you need more info/output/whatever...

Cheers, James

-- 
Surely the 4 sysadmins of the apocalypse should be: 
edquota, rm -rf, kill -9, and shutdown


Reply via email to