Hi folks,
I have the following problem - gmond stops responding on the socket
once in 2-3 days.
It's definitely running, "ps aux | grep gmond" shows it, however there
is no output from "nc localhost 8649" command, actually it's hung.
What are the best ways to pinpoint the problem? I guess it make
Hi,
Sorry, the RRD version I'm on is 1.2.27.
Yes, I start gmetad with debug level 10. The output is from that run.
I've also tried to wipe the RRD folder to restart the creation of the RRDs.
But I get the same message and outcome all again.
The graph that correspontents to the error code has no
On 7/8/2009 3:30 PM, Bernard Li wrote:
> Hi Ken:
>
> Okay, try this:
>
> Figure out the user gmond is running as (common examples are: ganglia,
> nobody, etc.). See if you can cat /proc/stat as that user.
master3:~ # ps aux |grep gmond
nobody 31801 0.0 0.0 23128 2884 ?Ss 15:30
Hi Ken:
Okay, try this:
Figure out the user gmond is running as (common examples are: ganglia,
nobody, etc.). See if you can cat /proc/stat as that user.
The root user being able to read /proc/stat doesn't necessarily mean
Ganglia/gmond can. I suspect you have some different security
settings
Hi Daniel:
On Wed, Jul 8, 2009 at 8:31 AM, Daniel Kolvik wrote:
> RRD 1.2.7
This is pretty old -- have you tried updating to a more recent 1.2.x release?
> Debug messages from gmond prints:
>
> RRD_update
> (/var/lib/ganglia/rrds/XXX/__SummaryInfo__/Apahe_Bytes_p_sec.rrd): not a
> simple intege
Got a response from RRDs devels.
The problem consist in that Ganglia creates the RRDs as COUNTER Datasource.
Is it possible to configure or change this behavior. Even though I've
declared the metric as float/double, the RRD is created as COUNTER
Datasource. COUNTER is only compatible with ints.
On 7/8/2009 2:21 PM, Bernard Li wrote:
> You should be looking at /proc/stat on your *nodes*, not on your
> masters. I am guessing that perhaps your nodes don't have the /proc
> filesystem mounted or something like that.
btime in /proc/stat is fine on the nodes as well. I should also note
that
Hi Ken:
On Wed, Jul 8, 2009 at 1:45 PM, Ken Teague wrote:
>> So, the question is, what is btime on your cluster2/cluster3 nodes'
>> /proc/stat?
>
> FYI: master is the cluster that's reporting correctly. master2 and master3
> are the two reporting incorrectly.
>
>
> master:~ # grep btime /proc/s
On 7/8/2009 1:24 PM, Bernard Li wrote:
> I just looked at the code, Ganglia determines boottime based on btime
> of /proc/stat. If it fails to get the value of btime, it sets
> boottime to 0 (which is what you are observing).
I also want to point out that what you're stating here is correct, as
On 7/8/2009 1:24 PM, Bernard Li wrote:
> I just looked at the code, Ganglia determines boottime based on btime
> of /proc/stat. If it fails to get the value of btime, it sets
> boottime to 0 (which is what you are observing).
>
> uptime is derived from boottime.
>
> So, the question is, what is
Hi Ken:
I just looked at the code, Ganglia determines boottime based on btime
of /proc/stat. If it fails to get the value of btime, it sets
boottime to 0 (which is what you are observing).
uptime is derived from boottime.
So, the question is, what is btime on your cluster2/cluster3 nodes' /proc
On 7/8/2009 11:16 AM, Bernard Li wrote:
> Hi Ken:
Hi Bernard
> What OS/arch are the nodes in cluster2/cluster3 running on? Is it
> different from cluster1?
They're all running SUSE. cluster1 is on SUSE 10.1 and cluster2 and
cluster3 are running openSUSE 10.3.
master:~ # cat /etc/*release
SU
No, I compiled from source. Not RPMs.
The second time I used the default config files. I only changed to not use
mcast, and some names on grid/hosts...
/D
On Wed, Jul 8, 2009 at 8:05 PM, Bernard Li wrote:
> Hi Daniel:
>
> On Wed, Jul 8, 2009 at 8:39 AM, Daniel Kolvik wrote:
>
> > I solved it
Hi Ken:
On Wed, Jul 8, 2009 at 9:39 AM, Ken Teague wrote:
> I have 3 separate clusters; cluster1, cluster2, and cluster3. On
> cluster2 and cluster3, if I go into the Ganglia web interface and click
> on, say, node2 of that cluster, it's reporting an incorrect boottime and
> uptime.
>
>
> boott
Hi Daniel:
On Wed, Jul 8, 2009 at 8:39 AM, Daniel Kolvik wrote:
> I solved it by reinstall/compiling the installation.
>
> I believe it had something to do with the PHP files fetching the XML.
Glad that you'd gotten it resolved.
You mentioned that you re-compiled Ganglia, where did you get the
Hi Nigel:
As Richard mentioned, you'll need expat-devel. Also, you are better
off building RPMs instead of installing from source on RPM-based
systems. With the tarball, simply do:
rpmbuild -tb --target noarch,x86_64
You need to specify both arch targets because ganglia-web is not
architectur
I have 3 separate clusters; cluster1, cluster2, and cluster3. On
cluster2 and cluster3, if I go into the Ganglia web interface and click
on, say, node2 of that cluster, it's reporting an incorrect boottime and
uptime.
boottimeWed, 31 Dec 1969 19:00:00 -0500
uptime 14433 days,
Daniel Kolvik wrote:
> Debug messages from gmond prints:
>
> RRD_update
> (/var/lib/ganglia/rrds/XXX/__SummaryInfo__/Apahe_Bytes_p_sec.rrd): not a
> simple integer: '4070.7'
>
> The gmetric command that submits the value is:
>
> /usr/bin/gmetric --tfloat --name='Apache_Bytes_p_sec' --value='407
I solved it by reinstall/compiling the installation.
I believe it had something to do with the PHP files fetching the XML.
/D
On Wed, Jul 8, 2009 at 5:29 PM, Jesse Becker wrote:
> It almost sounds like a caching issue. Are you using squid or another
> proxy? Have you tried forcing a full pa
yum install expat-devel
Richard
On Wed, Jul 8, 2009 at 12:02 PM, wrote:
>
> Having some problems installing v3.1.2 on Redhat AS 5.1 - 64 Bit.
>
> ## cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.1 (Tikanga)
> ## uname -a
> Lin
Hi!
using Ganglia 3.1.2.
RRD 1.2.7
Some metrics don't get updated to the RRDs.
Debug messages from gmond prints:
RRD_update
(/var/lib/ganglia/rrds/XXX/__SummaryInfo__/Apahe_Bytes_p_sec.rrd): not a
simple integer: '4070.7'
The gmetric command that submits the value is:
/usr/bin/gmetric --tflo
It almost sounds like a caching issue. Are you using squid or another
proxy? Have you tried forcing a full page reload, and clearing the
browser cache?
On Wed, Jul 8, 2009 at 07:57, Daniel Kolvik wrote:
> After reviewing the value TMAX and the oteher parameters I believe the
> problem consist in
Having some problems installing v3.1.2 on Redhat AS 5.1 - 64 Bit.
## cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
## uname -a
Linux 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15
EDT 2008 x86_64 x86_64 x86_64 GNU/
After reviewing the value TMAX and the oteher parameters I believe the
problem consist in some other part.
The resulting XML contains the value reported to gmetric. The RRD file is
updated.
But graphs not always showing on web.
Accessing directly via graph.php outputs the graphs correct.
Any m
24 matches
Mail list logo