I believe that with a gmetad polling interval of 5 minutes you will
probably end up seeing a lot of your nodes as dead.  See the host_alive
function in the ganglia.php file.  The webfrontend will consider a host
alive as long as it last heard from it in the last 4*TMAX seconds and I
believe that TMAX is set to 20 seconds in the gmond code.  Therefore if
you reload the webfrontend shortly before gmetad is about to get fresh
data there is a good change that most nodes will have TN greater than
4*TMAX.  It looks like ganglia3 has TMAX hard coded to 20 seconds for
hosts, see:

ganglia-3.0.1/gmond/gmond.c - line 960

I couldn't find it in the code for ganglia2, but with a running gmond it
appears to be set to 70 seconds.

~Jason


On Thu, 2005-08-18 at 18:43, Utsav Agarwal wrote:
> Hello all,
> 
>  
> 
> A quick response would help!
> 
>  
> 
> Our cluster nodes send udp unicast packets to a gmond ‘collector’. The
> gmond.conf on all the nodes (compute and collector) has the following
> values: 
> 
> cleanup_threshold = 300 secs, heartbeat = 20 secs, collect_every = 300
> secs, time_threshold = 900 secs
> 
>  
> 
> Now, the gmetad server polls the gmond ‘collector’ every 300 secs. (5
> minutes). What we see is that the nodes are shown up sometimes, and
> then down sometimes. They flap often. Generally, either all nodes are
> shown up or all nodes are shown down. While reporting the nodes are
> down, it also shows that it received a heartbeat within the last 20
> seconds.
> 
>  
> 
> We need to know the exact reason this is happening.
> 
>  
> 
> The gmetad.conf file has default values for rrd archives. Changing the
> gmetad server to poll every 120 seconds, does not seem to solve the
> problem either.
> 
>  
> 
> Any suggestions or guidelines to follow for gmetad polling interval
> and gmond cleanup_threshold values will be appreciated.
> 
>  
> 
> Thanks,
> 
> ------------------------------------------------------------------------------------
> 
> Utsav Agarwal
> 
> Systems Analyst
> 
> ------------------------------------------------------------------------------------
> 
>  
> 
> 

Reply via email to