Re: [Ganglia-general] Ganglia HeadNode 'gmond' problem

2015-06-24 Thread Sergio Ballestrero
What do you mean exactly by "not responding to Ganglia"?? Are you using unicast or multicast? Have you tried to tcpdump to see if data is being sent ? Cheers, Sergio On 24 Jun 2015, at 20:08, Kamran Khan wrote: > Anybody? > > Anything? > > -- > Kamran Khan > PSSC Labs > HPC Software / Techn

Re: [Ganglia-general] Ganglia 3.6.1 and CentOS 6.5

2015-02-19 Thread Sergio Ballestrero
Hello Jared, yes, most likely this is because of multicasting. Unless you really want to use multiple gmond as collectors, it's simpler and more robust to use unicast to the gmond on the host which runs gmetad. Otherwise, to debug multicast the first thing would be to tcpdump on the host running

Re: [Ganglia-general] Extract Ganglia data for processing in R and python

2014-08-04 Thread Sergio Ballestrero
Hello Doug, On 5 Aug 2014, at 02:52, Doug Johnson wrote: > Hi Sergio (and Ganglia community), > > Thanks so much for the prompt and helpful response. This is just what I was > looking for. One unexpected "surprise" was that my "simple" four-data-node > AWS/EC2 MapReduce cluster has almost 1,

Re: [Ganglia-general] Extract Ganglia data for processing in R and python

2014-07-12 Thread Sergio Ballestrero
Hello Doug, I'm not sure if it's the kind of "extract stats" that you have in mind, but you can simply use rrdtool to dump the rrd files created by Ganglia to an XML format. Then you can delete the rrd and start anew. Cheers, Sergio On 12 Jul 2014, at 00:39, Doug Johnson wrote: > I'm sure

Re: [Ganglia-general] Huge metrics' size being reported to gmetad

2014-07-01 Thread Sergio Ballestrero
gmond? > > We have no control over what happens in the clusters, and consequently, if it > works like you're saying, chances are that we'll have lots of problems like > these, lots of times...without quick fixing actions. > > Cumprimentos / Best regards, > Cristóvão

Re: [Ganglia-general] Huge metrics' size being reported to gmetad

2014-07-01 Thread Sergio Ballestrero
gmond? > > We have no control over what happens in the clusters, and consequently, if it > works like you're saying, chances are that we'll have lots of problems like > these, lots of times...without quick fixing actions. > > Cumprimentos / Best regards, > Cristóvão

Re: [Ganglia-general] Huge metrics' size being reported to gmetad

2014-06-30 Thread Sergio Ballestrero
Hi Cristovao, that depends on how many metrics and on the rrd creation settings. Sure 150MB looks like a lot. An ls -la may give more hints... Ciao, Sergio On 30 Jun 2014 16:35, "Cristovao Jose Domingues Cordeiro" < cristovao.corde...@cern.ch> wrote: > Someone? > > Cumprimentos / Best regards,

Re: [Ganglia-general] multiple clusters with just one collector

2014-01-28 Thread Sergio Ballestrero
On 28 Jan 2014, at 20:10, Adam Compton wrote: > The gmond "globals" configuration option "host_tmax" controls how long a host > can go without a heartbeat before being seen as "down"; it's set to 20 by > default, but the value in the config file gets multiplied by 4, so the > default timeout

Re: [Ganglia-general] multiple clusters with just one collector

2014-01-28 Thread Sergio Ballestrero
Hello Adrian, On 25 Jan 2014, at 21:34, Adrian Sevcenco wrote: > On 01/25/2014 09:37 PM, Sergio Ballestrero wrote: >> Hello Adrian, if the host for which you send gmetrics is not a gmond >> "client", you need to also spoof a "heartbeat" metric, else the >&

Re: [Ganglia-general] multiple clusters with just one collector

2014-01-25 Thread Sergio Ballestrero
Hello Adrian, if the host for which you send gmetrics is not a gmond "client", you need to also spoof a "heartbeat" metric, else the collector will think the host is down : HOST=ups.example.com IP=$(getent hosts $HOST | cut -d" " -f1) GMON=/etc/ganglia/gmond.UPS.conf T=$(ping -qn -c 1 -w 6 $IP

Re: [Ganglia-general] Units of time CPU utilized

2013-07-11 Thread Sergio Ballestrero
Hello Hector, that lowercase m stands for the "milli" unit prefix. So 20m is 20millipercent in this case, or 0.020% . You should just use more your CPUs... ;-) Cheers, Sergio On 12 Jul 2013, at 00:02, Hector Fernandez wrote: > Dear all, > > I am looking at my monitoring graphs about the cpu

Re: [Ganglia-general] What type of device can be monitored by Ganglia?

2013-06-25 Thread Sergio Ballestrero
Hello Bejamin, yes, of course - clusters are "just" many physical (and/or virtual) machines. Cheers, Sergio On 26 Jun 2013, at 04:35, Benjamin Wang (gendwang) wrote: > Hi, > The description of Ganglia describes that Ganglia can monitor the > high-performance computing systems such as clust

Re: [Ganglia-general] Scalability issue

2013-06-18 Thread Sergio Ballestrero
Hello Adrian, On 13 Jun 2013, at 15:32, Adrian Sevcenco wrote: > On 06/09/2013 11:26 AM, Sergio Ballestrero wrote: >> We run a single gmetad with 11 collector gmond (listen-only) and 1 >> "client" gmond (send-only) to monitor the host itself, all using unicast. > Hi!

Re: [Ganglia-general] Scalability issue

2013-06-09 Thread Sergio Ballestrero
) On 9 Jun 2013, at 10:26, Sergio Ballestrero wrote: > Hello Christophe, > (since we've spoken before) as you know we're also using Ganglia. We've > recently added all HLT nodes to the monitoring, so we're up to 1949 nodes > monitored now by a single server - an

Re: [Ganglia-general] Scalability issue

2013-06-09 Thread Sergio Ballestrero
Hello Christophe, (since we've spoken before) as you know we're also using Ganglia. We've recently added all HLT nodes to the monitoring, so we're up to 1949 nodes monitored now by a single server - and on the same server we also run Icinga (although with a "light" config) We have one server [

Re: [Ganglia-general] empty network_report

2012-08-07 Thread Sergio Ballestrero
ng nodes are RHEL5 and some > RHEL6, some are VMs and some are physical. There does not seem to be any > specific pattern. (Also network_report seems to be the only report for which > this happens.) > > -----Original Message- > From: Sergio Ballestrero [mailto:sergio.bal

Re: [Ganglia-general] empty network_report

2012-08-07 Thread Sergio Ballestrero
Eric, (for v3.2) look at libmetrics/linux/metrics.c . It uses /proc/net/dev . Anything in common on the non-reporting nodes? HW? OS version? Cheers, Sergio On 7 Aug 2012, at 16:54, Pronko, Eric wrote: > I asked back in July and did not see any replies, but figured it wouldn’t > hurt to ask aga

Re: [Ganglia-general] Modifying ganglia.

2012-07-25 Thread Sergio Ballestrero
On 25 Jul 2012, at 18:30, Douglas Wagner wrote: >> On Wed, Jul 25, 2012 at 2:22 AM, karthik >> wrote: >> Hi, >> I have built an application. I need to make ganglia monitor that application. >> .Can someone help me how to modify the ganglia. What are the steps involved >> in >> it? >> >> Thank

Re: [Ganglia-general] Ganglia not recognizing other nodes?

2012-07-22 Thread Sergio Ballestrero
Hello Jeff, On 21 Jul 2012, at 16:55, Jeff Layton wrote: > Good morning, > > Apologies for the simple question. I've got a simple cluster > with a master node and one compute node. I installed the > latest Ganglia on the master node (3.4.0) - libganglia, > ganglia-gmond, ganglia-metad, ganglia

Re: [Ganglia-general] Help: metric is missing as systems rebooted

2012-07-16 Thread Sergio Ballestrero
That depends: shorter time, shorter gaps but more traffic. Longer time, longer gaps, less traffic. I use ~2 minutes. Note also this from a previous thread : On 5 May 2012, at 10:56, Sergio Ballestrero wrote: > I had tried this but I was ending up with incomplete *_reports (e.g. missing >

Re: [Ganglia-general] inter node traffic

2012-05-09 Thread Sergio Ballestrero
On 9 May 2012, at 19:26, Morteza wrote: > Hi every body > > Is it possible to see the network traffic that transfers between each two > node? For example I want to know how much data node1 has sent to node2 and > vice versa. > If ganglia doesn’t have this capability, how to add this to it? T

Re: [Ganglia-general] Having to restart gmond on sender nodes if a collector node restarts

2012-05-05 Thread Sergio Ballestrero
Hi Eric, all, I had tried this but I was ending up with incomplete *_reports (e.g. missing # of CPUs, total memory etc.). Seeing this thread, I've tried again, and realised that I also had to adjust time_threshold for constant or long lived metrics. Hope this helps others who may stumble in t

Re: [Ganglia-general] Petabyte network peaks

2012-04-27 Thread Sergio Ballestrero
Hello Arnau, We've done better - exabytes per second! ;-) We've seen that on some specific types of NICs on Scientific Linux 5, running Ganglia 3.2.0 I have updated Roger's patch (attached), and that seems to have mostly solved the issue. I recently saw another occurrence of it but didn't have

[Ganglia-general] missing metadata after restart of unicast gmond receiver

2012-02-19 Thread Sergio Ballestrero
Hello Ganglia users and devels, I have a deployment of Ganglia 3.2 with two monitoring servers and ~300 clients. Each server runs gWeb2, a single gmetad (C, not Python) and multiple gmond, on separate ports, for separating different "clusters" Mostly because of network constraints, I am using