What do you mean exactly by "not responding to Ganglia"??
Are you using unicast or multicast?
Have you tried to tcpdump to see if data is being sent ?
Cheers,
Sergio
On 24 Jun 2015, at 20:08, Kamran Khan wrote:
> Anybody?
>
> Anything?
>
> --
> Kamran Khan
> PSSC Labs
> HPC Software / Techn
Hello Jared,
yes, most likely this is because of multicasting.
Unless you really want to use multiple gmond as collectors, it's simpler and
more robust to use unicast to the gmond on the host which runs gmetad.
Otherwise, to debug multicast the first thing would be to tcpdump on the host
running
Hello Doug,
On 5 Aug 2014, at 02:52, Doug Johnson wrote:
> Hi Sergio (and Ganglia community),
>
> Thanks so much for the prompt and helpful response. This is just what I was
> looking for. One unexpected "surprise" was that my "simple" four-data-node
> AWS/EC2 MapReduce cluster has almost 1,
Hello Doug,
I'm not sure if it's the kind of "extract stats" that you have in mind, but you
can simply use rrdtool to dump the rrd files created by Ganglia to an XML
format.
Then you can delete the rrd and start anew.
Cheers,
Sergio
On 12 Jul 2014, at 00:39, Doug Johnson wrote:
> I'm sure
gmond?
>
> We have no control over what happens in the clusters, and consequently, if it
> works like you're saying, chances are that we'll have lots of problems like
> these, lots of times...without quick fixing actions.
>
> Cumprimentos / Best regards,
> Cristóvão
gmond?
>
> We have no control over what happens in the clusters, and consequently, if it
> works like you're saying, chances are that we'll have lots of problems like
> these, lots of times...without quick fixing actions.
>
> Cumprimentos / Best regards,
> Cristóvão
Hi Cristovao,
that depends on how many metrics and on the rrd creation settings. Sure
150MB looks like a lot. An ls -la may give more hints...
Ciao,
Sergio
On 30 Jun 2014 16:35, "Cristovao Jose Domingues Cordeiro" <
cristovao.corde...@cern.ch> wrote:
> Someone?
>
> Cumprimentos / Best regards,
On 28 Jan 2014, at 20:10, Adam Compton wrote:
> The gmond "globals" configuration option "host_tmax" controls how long a host
> can go without a heartbeat before being seen as "down"; it's set to 20 by
> default, but the value in the config file gets multiplied by 4, so the
> default timeout
Hello Adrian,
On 25 Jan 2014, at 21:34, Adrian Sevcenco wrote:
> On 01/25/2014 09:37 PM, Sergio Ballestrero wrote:
>> Hello Adrian, if the host for which you send gmetrics is not a gmond
>> "client", you need to also spoof a "heartbeat" metric, else the
>&
Hello Adrian,
if the host for which you send gmetrics is not a gmond "client", you need to
also spoof a "heartbeat" metric, else the collector will think the host is down
:
HOST=ups.example.com
IP=$(getent hosts $HOST | cut -d" " -f1)
GMON=/etc/ganglia/gmond.UPS.conf
T=$(ping -qn -c 1 -w 6 $IP
Hello Hector,
that lowercase m stands for the "milli" unit prefix. So 20m is 20millipercent
in this case, or 0.020% . You should just use more your CPUs... ;-)
Cheers,
Sergio
On 12 Jul 2013, at 00:02, Hector Fernandez wrote:
> Dear all,
>
> I am looking at my monitoring graphs about the cpu
Hello Bejamin,
yes, of course - clusters are "just" many physical (and/or virtual) machines.
Cheers,
Sergio
On 26 Jun 2013, at 04:35, Benjamin Wang (gendwang) wrote:
> Hi,
> The description of Ganglia describes that Ganglia can monitor the
> high-performance computing systems such as clust
Hello Adrian,
On 13 Jun 2013, at 15:32, Adrian Sevcenco wrote:
> On 06/09/2013 11:26 AM, Sergio Ballestrero wrote:
>> We run a single gmetad with 11 collector gmond (listen-only) and 1
>> "client" gmond (send-only) to monitor the host itself, all using unicast.
> Hi!
)
On 9 Jun 2013, at 10:26, Sergio Ballestrero wrote:
> Hello Christophe,
> (since we've spoken before) as you know we're also using Ganglia. We've
> recently added all HLT nodes to the monitoring, so we're up to 1949 nodes
> monitored now by a single server - an
Hello Christophe,
(since we've spoken before) as you know we're also using Ganglia. We've
recently added all HLT nodes to the monitoring, so we're up to 1949 nodes
monitored now by a single server - and on the same server we also run Icinga
(although with a "light" config)
We have one server [
ng nodes are RHEL5 and some
> RHEL6, some are VMs and some are physical. There does not seem to be any
> specific pattern. (Also network_report seems to be the only report for which
> this happens.)
>
> -----Original Message-
> From: Sergio Ballestrero [mailto:sergio.bal
Eric,
(for v3.2) look at libmetrics/linux/metrics.c . It uses /proc/net/dev .
Anything in common on the non-reporting nodes? HW? OS version?
Cheers, Sergio
On 7 Aug 2012, at 16:54, Pronko, Eric wrote:
> I asked back in July and did not see any replies, but figured it wouldn’t
> hurt to ask aga
On 25 Jul 2012, at 18:30, Douglas Wagner wrote:
>> On Wed, Jul 25, 2012 at 2:22 AM, karthik
>> wrote:
>> Hi,
>> I have built an application. I need to make ganglia monitor that application.
>> .Can someone help me how to modify the ganglia. What are the steps involved
>> in
>> it?
>>
>> Thank
Hello Jeff,
On 21 Jul 2012, at 16:55, Jeff Layton wrote:
> Good morning,
>
> Apologies for the simple question. I've got a simple cluster
> with a master node and one compute node. I installed the
> latest Ganglia on the master node (3.4.0) - libganglia,
> ganglia-gmond, ganglia-metad, ganglia
That depends: shorter time, shorter gaps but more traffic. Longer time, longer
gaps, less traffic. I use ~2 minutes.
Note also this from a previous thread :
On 5 May 2012, at 10:56, Sergio Ballestrero wrote:
> I had tried this but I was ending up with incomplete *_reports (e.g. missing
>
On 9 May 2012, at 19:26, Morteza wrote:
> Hi every body
>
> Is it possible to see the network traffic that transfers between each two
> node? For example I want to know how much data node1 has sent to node2 and
> vice versa.
> If ganglia doesn’t have this capability, how to add this to it?
T
Hi Eric, all,
I had tried this but I was ending up with incomplete *_reports (e.g. missing #
of CPUs, total memory etc.).
Seeing this thread, I've tried again, and realised that I also had to adjust
time_threshold for constant or long lived metrics. Hope this helps others who
may stumble in t
Hello Arnau,
We've done better - exabytes per second! ;-)
We've seen that on some specific types of NICs on Scientific Linux 5, running
Ganglia 3.2.0
I have updated Roger's patch (attached), and that seems to have mostly solved
the issue.
I recently saw another occurrence of it but didn't have
Hello Ganglia users and devels,
I have a deployment of Ganglia 3.2 with two monitoring servers and ~300
clients.
Each server runs gWeb2, a single gmetad (C, not Python) and multiple gmond, on
separate ports, for separating different "clusters"
Mostly because of network constraints, I am using
24 matches
Mail list logo