[Ganglia-general] gmond is running but not responding(ganglia 3.1.2)

2009-07-08 Thread Pavel Shevaev
Hi folks, I have the following problem - gmond stops responding on the socket once in 2-3 days. It's definitely running, "ps aux | grep gmond" shows it, however there is no output from "nc localhost 8649" command, actually it's hung. What are the best ways to pinpoint the problem? I guess it make

Re: [Ganglia-general] not a simple integer

2009-07-08 Thread Daniel Kolvik
Hi, Sorry, the RRD version I'm on is 1.2.27. Yes, I start gmetad with debug level 10. The output is from that run. I've also tried to wipe the RRD folder to restart the creation of the RRDs. But I get the same message and outcome all again. The graph that correspontents to the error code has no

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Ken Teague
On 7/8/2009 3:30 PM, Bernard Li wrote: > Hi Ken: > > Okay, try this: > > Figure out the user gmond is running as (common examples are: ganglia, > nobody, etc.). See if you can cat /proc/stat as that user. master3:~ # ps aux |grep gmond nobody 31801 0.0 0.0 23128 2884 ?Ss 15:30

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Bernard Li
Hi Ken: Okay, try this: Figure out the user gmond is running as (common examples are: ganglia, nobody, etc.). See if you can cat /proc/stat as that user. The root user being able to read /proc/stat doesn't necessarily mean Ganglia/gmond can. I suspect you have some different security settings

Re: [Ganglia-general] not a simple integer

2009-07-08 Thread Bernard Li
Hi Daniel: On Wed, Jul 8, 2009 at 8:31 AM, Daniel Kolvik wrote: > RRD 1.2.7 This is pretty old -- have you tried updating to a more recent 1.2.x release? > Debug messages from gmond prints: > > RRD_update > (/var/lib/ganglia/rrds/XXX/__SummaryInfo__/Apahe_Bytes_p_sec.rrd): not a > simple intege

Re: [Ganglia-general] not a simple integer

2009-07-08 Thread Daniel Kolvik
Got a response from RRDs devels. The problem consist in that Ganglia creates the RRDs as COUNTER Datasource. Is it possible to configure or change this behavior. Even though I've declared the metric as float/double, the RRD is created as COUNTER Datasource. COUNTER is only compatible with ints.

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Ken Teague
On 7/8/2009 2:21 PM, Bernard Li wrote: > You should be looking at /proc/stat on your *nodes*, not on your > masters. I am guessing that perhaps your nodes don't have the /proc > filesystem mounted or something like that. btime in /proc/stat is fine on the nodes as well. I should also note that

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Bernard Li
Hi Ken: On Wed, Jul 8, 2009 at 1:45 PM, Ken Teague wrote: >> So, the question is, what is btime on your cluster2/cluster3 nodes' >> /proc/stat? > > FYI: master is the cluster that's reporting correctly.  master2 and master3 > are the two reporting incorrectly. > > > master:~ # grep btime /proc/s

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Ken Teague
On 7/8/2009 1:24 PM, Bernard Li wrote: > I just looked at the code, Ganglia determines boottime based on btime > of /proc/stat. If it fails to get the value of btime, it sets > boottime to 0 (which is what you are observing). I also want to point out that what you're stating here is correct, as

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Ken Teague
On 7/8/2009 1:24 PM, Bernard Li wrote: > I just looked at the code, Ganglia determines boottime based on btime > of /proc/stat. If it fails to get the value of btime, it sets > boottime to 0 (which is what you are observing). > > uptime is derived from boottime. > > So, the question is, what is

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Bernard Li
Hi Ken: I just looked at the code, Ganglia determines boottime based on btime of /proc/stat. If it fails to get the value of btime, it sets boottime to 0 (which is what you are observing). uptime is derived from boottime. So, the question is, what is btime on your cluster2/cluster3 nodes' /proc

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Ken Teague
On 7/8/2009 11:16 AM, Bernard Li wrote: > Hi Ken: Hi Bernard > What OS/arch are the nodes in cluster2/cluster3 running on? Is it > different from cluster1? They're all running SUSE. cluster1 is on SUSE 10.1 and cluster2 and cluster3 are running openSUSE 10.3. master:~ # cat /etc/*release SU

Re: [Ganglia-general] RRDs/Graphs not always showing up

2009-07-08 Thread Daniel Kolvik
No, I compiled from source. Not RPMs. The second time I used the default config files. I only changed to not use mcast, and some names on grid/hosts... /D On Wed, Jul 8, 2009 at 8:05 PM, Bernard Li wrote: > Hi Daniel: > > On Wed, Jul 8, 2009 at 8:39 AM, Daniel Kolvik wrote: > > > I solved it

Re: [Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Bernard Li
Hi Ken: On Wed, Jul 8, 2009 at 9:39 AM, Ken Teague wrote: > I have 3 separate clusters; cluster1, cluster2, and cluster3.  On > cluster2 and cluster3, if I go into the Ganglia web interface and click > on, say, node2 of that cluster, it's reporting an incorrect boottime and > uptime. > > > boott

Re: [Ganglia-general] RRDs/Graphs not always showing up

2009-07-08 Thread Bernard Li
Hi Daniel: On Wed, Jul 8, 2009 at 8:39 AM, Daniel Kolvik wrote: > I solved it by reinstall/compiling the installation. > > I believe it had something to do with the PHP files fetching the XML. Glad that you'd gotten it resolved. You mentioned that you re-compiled Ganglia, where did you get the

Re: [Ganglia-general] Ganglia v3.1.2 install problems with AS5.1 (64)

2009-07-08 Thread Bernard Li
Hi Nigel: As Richard mentioned, you'll need expat-devel. Also, you are better off building RPMs instead of installing from source on RPM-based systems. With the tarball, simply do: rpmbuild -tb --target noarch,x86_64 You need to specify both arch targets because ganglia-web is not architectur

[Ganglia-general] Incorrect boottime and uptime

2009-07-08 Thread Ken Teague
I have 3 separate clusters; cluster1, cluster2, and cluster3. On cluster2 and cluster3, if I go into the Ganglia web interface and click on, say, node2 of that cluster, it's reporting an incorrect boottime and uptime. boottimeWed, 31 Dec 1969 19:00:00 -0500 uptime 14433 days,

Re: [Ganglia-general] not a simple integer

2009-07-08 Thread Ofer Inbar
Daniel Kolvik wrote: > Debug messages from gmond prints: > > RRD_update > (/var/lib/ganglia/rrds/XXX/__SummaryInfo__/Apahe_Bytes_p_sec.rrd): not a > simple integer: '4070.7' > > The gmetric command that submits the value is: > > /usr/bin/gmetric --tfloat --name='Apache_Bytes_p_sec' --value='407

Re: [Ganglia-general] RRDs/Graphs not always showing up

2009-07-08 Thread Daniel Kolvik
I solved it by reinstall/compiling the installation. I believe it had something to do with the PHP files fetching the XML. /D On Wed, Jul 8, 2009 at 5:29 PM, Jesse Becker wrote: > It almost sounds like a caching issue. Are you using squid or another > proxy? Have you tried forcing a full pa

Re: [Ganglia-general] Ganglia v3.1.2 install problems with AS5.1 (64)

2009-07-08 Thread Richard Edward Horner
yum install expat-devel Richard On Wed, Jul 8, 2009 at 12:02 PM, wrote: > > Having some problems installing v3.1.2 on Redhat AS 5.1 - 64 Bit. > >         ## cat /etc/redhat-release >                 Red Hat Enterprise Linux Server release 5.1 (Tikanga) >         ## uname -a >                 Lin

[Ganglia-general] not a simple integer

2009-07-08 Thread Daniel Kolvik
Hi! using Ganglia 3.1.2. RRD 1.2.7 Some metrics don't get updated to the RRDs. Debug messages from gmond prints: RRD_update (/var/lib/ganglia/rrds/XXX/__SummaryInfo__/Apahe_Bytes_p_sec.rrd): not a simple integer: '4070.7' The gmetric command that submits the value is: /usr/bin/gmetric --tflo

Re: [Ganglia-general] RRDs/Graphs not always showing up

2009-07-08 Thread Jesse Becker
It almost sounds like a caching issue. Are you using squid or another proxy? Have you tried forcing a full page reload, and clearing the browser cache? On Wed, Jul 8, 2009 at 07:57, Daniel Kolvik wrote: > After reviewing the value TMAX and the oteher parameters I believe the > problem consist in

[Ganglia-general] Ganglia v3.1.2 install problems with AS5.1 (64)

2009-07-08 Thread nigel . leach
Having some problems installing v3.1.2 on Redhat AS 5.1 - 64 Bit. ## cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.1 (Tikanga) ## uname -a Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/

Re: [Ganglia-general] RRDs/Graphs not always showing up

2009-07-08 Thread Daniel Kolvik
After reviewing the value TMAX and the oteher parameters I believe the problem consist in some other part. The resulting XML contains the value reported to gmetric. The RRD file is updated. But graphs not always showing on web. Accessing directly via graph.php outputs the graphs correct. Any m