FYI.
http://www.aouk83.dsl.pipex.com
has a link to a cygwin based windows agent (not as an installer package
though),
and also a link to a WMI native Ganglia agent coded by APR consulting in
Switzerland.
Enjoy.
Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14
Chris,
I fully agree with your clean and simple comment. Part of
Ganglia's real strength is what it doesn't have, rather than
what it does. Examples:
- metric data is not written locally on the monitored host
- The metric set is fixed in compiled code.
- No ability to customise graphs.
- No serve
>
> I've been under the impression for a while ganglia wasn't getting a
> whole lot of development and was mostly in maintenance mode.
> It hasn't
> changed a whole lot in the few years I've been using it
> (except perhaps
> the config file format, a change that was much appreciated)
You ar
All,
Like many Ganglia users, we have modified the PHP a lot, changed some
C code a bit, and added a whole lot of functionality by creating scripts
of various flavours.
I have also have entirely failed to push these mods back to the
community,
and one reason for this is that I have no idea how ot
10336 cores - golly!
A grid level screen shot for us:
http://www.aouk83.dsl.pipex.com/
21,676 cores. More golly!
But to be fair, ours is not one big cluster - we have hundreds.
Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
> -Original Message-
>
Michael,
Use different multicast addresses for each cluster,
unless you are sure the multicast can't leak
from 1 cluster to another.
Remember that when you list hosts after the data_source
for gmetad.conf that is for resilience only. You do not have to
mention all nodes in the cluster there.
Giv
You will have to remove the old rrds to allow your new definition to be
applied.
The RRA is only used at the initial creation of each rrd file.
If you want to keep your old data, you will have to do magic
(dump/export/import/perl-script)
regards,
Richard Grevis
Production Architecture
Barcla
Saundry,
It sort of looks like you can, but actually you can't.
gmetad writes to rrd databases as local files,
and the web and php read rrd databases as local
(actually it invokes rrdtool itself).
I imagine you could separate the two using NFS filessystems,
but I have not tried this.
kind rega
See comments below, although it may or may not be really right.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
> Behalf Of Matthias Blankenhaus
> Sent: 16 February 2007 04:29
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] GRID /
All,
a swiss consultancy has implemented a native windows gmond, and the
binaries
are in the public domain and free. Follow the trail here:
http://aprconsulting.ch/product.htm
I believe they are also offering ganglia consulting, support, and
customisations for a fee.
This daemon is much better bec
. But that was years ago.
Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
* richard.grevis
For more information about Barclays Capital, please visit our web site
Ian,
it is unclear what you are really trying to do.
Do you want a complete normal running ganglia with some separate
java rrd4j thing able to separately extract/graph rrd data popupulated
by ganglia?
If so, the key data source to parse is connecting to
the gmetad server's port 8651 which dumps
4BB
*DDI : +44 (0) 20 7773 4915
* richard.grevis
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
apr_socket_send
> Behalf Of Bernard Li
> Sent: 15 January 2007 21:04
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general
+44 (0) 20 7773 4915
* richard.grevis
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
> Behalf Of Vladimir
> Sent: 08 January 2007 02:22
> To: Carlo Marcelo Arenas Belon
> Cc: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ga
7773 4915
* richard.grevis
> -Original Message-
> From: Grevis, Richard: IT (LDN)
> Sent: 08 January 2007 11:52
> To: 'Vladimir Vuksan'; ganglia-general@lists.sourceforge.net
> Subject: RE: [Ganglia-general] Windows port issues
> Importance: Low
>
>
Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
* richard.grevis
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
> Behalf Of Martin Knoblauch
> Sent: 04 January 2007 13:43
> To: Vladimir
>
ent the results back.
Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
* richard.grevis
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
> Behalf Of Martin Knoblauch
> Sent: 04 Jan
*DDI : +44 (0) 20 7773 4915
* richard.grevis
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Jason Faulkner
Sent: 04 January 2007 04:16
To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net
Subject: Re: [Gangl
mpile under cygwin.
another note - do your hosts contain other running cygwin processes?
If so, you will need to build ganglia against the version of cygwin
already used. The issue here is that only 1 cygwin dll version can run,
so all your cygwin based processes must use the same dll.
Richard Grevis
Can I ask whether you will keep the existing semantics of the
existing metrics unchanged? I would not be comfortable with
my cpu loads (and cpu count) suddenly doubling or halved.
Also remember about the cygwin agent build, which also processes
from cygwin's /proc.
kind regards,
Richard
Sam,
I imagine this has already been well answered for you, but the host
names you
see are the result of a reverse DNS lookup on the headnode, or whatever
node you get the XML from. You will get IP addresses if the reverse
lookup failed,
although the failure is at the headnode level - not from the
Vitaly,
my version does. The only problem was that I hacked the PHP left
right and centre before I understood everything.
So it will take some work to create a patch. Still, someone else
expressed a desire for that functionality, so I will work on it
this week.
Here are some screen shot samples
kind Richard Grevis
Infrastructure Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
* richard.grevis
> -Original Message-
> From: john allspaw [mailto:[EMAIL PROTECTED]
> Sent: 09 November 2006 03:03
> To: Grevis, Richard: IT (LDN)
Yes,
Bernard is right. If you have configuration problems
I usually recommend first trying a unicast configuration.
And the only way you get a node to appear in 2 clusters
is to configure the node agent itself to send data to
two different headnodes.
The above configuration is "clunky" to say th
Dave,
you may need to be more precise about what you want to happen.
If you are adding hosts to an existing cluster, simply
give them the same configuration as the others and all will be fine.
By fine I mean that the new hosts will appear in the cluster view, even
if their data history is not as
John,
I assume you have configured for multicast and the multicast address
you use does not travel outside the local subnets? That is your current
situation?
option 1 is to make a multicast address on the routers that scopes to
all
your subnets.
option 2 is to unicast to 1 or 2 nominated headnod
Dave,
I tried this, and for me last_update gets changed at the same rate as my
poll rate, so I don't see what you see.
What is your poll rate of the cluster in gmetad.conf? Perhaps you could
mail me the RRA lines in gmetad.conf and
the full output of rrdtool info.
I'm sure you aleady know thi
Ahh yes,
I forgot about Yemi's spoofing code. Hacking that sounds
the easiest way.
regards,
Richard
-Original Message-
From: Dr. Dave Blunt [mailto:[EMAIL PROTECTED]
Sent: 24 August 2006 16:58
To: ganglia-general; harper.mann; Grevis, Richard: IT (LDN)
Subject: RE: [Ganglia-general] Gan
Harper,
I think that the RRD disk I/O from gmetad will be the first limit you
reach.
If you want to load up the gmond process, you could write a program to
send
properly formatted gmond packets but with a spoofed and always changing
source address.
the headnode gmond only determines the host from
Absolutely agreed,
subsecond makes no sense and the ganglia design is not appropriate for
that anyway.
I was originally asked to do 5 seconds, but I have increased that to 10
seconds
as there was no meaningful change in the shape of the graphs anyway.
But 10 second polling is useful to me for a s
If you do want to do fast polling on the Linux or cygwin gmond, I found
some hardwired code in there which effectively limits the polling rate
for
some metrics no matter what you put in the config files. (Sorry martin,
have not raised a bug report yet). Anyway:
> the code below is in the cygwin and
Ian,
it is the gmetad process which write the rrd files, not gmond. Are you
using "rrdtool fetch" to
get the numbers? If you don't specify an end time, rrdtool will choose
"now", so
it is almost certain you will have some Nan's at the end.
What I do is to do a "rrdtool last" first, then use that
All,
I am observing 2 problems occassionally occuring which may or may not be
related.
The first is that very rarely the time reported back to gmetad from the
gmond XML will
leap backwards by maybe a month and a half. Checking the time on the
server running
the gmond reveals that the server tim
Ron,
do the following:
- choose one of your w2k servers as a "headnode".
- configure gmetad.conf to have a SINGLE data_source entry pointing to
this headnode.
- configure gmond.conf on all hosts (including headnode) to have:
udp_send_channel {
host = headnode-hostname
Ron, gmonds only send UDP data to ther gmonds. From your post it is
unclear
what is listening on 140.203.7.43 port 2344. It should be a gmond.
To test a single host, you should configure a udp_send_channel on the
w2k server
to send data to its proper address or hostname (not 127.0.0.1 which is
on
Ahh yes, aggregating data in different ways after the fact.
We had a need to do that, and also a need to provide more than one
cluster heirachy (e.g. clusters grouped by region, but also clusters
grouped by technology owner (say)).
I have written some perl code to do this - sucking the data out of
Mark,
the hostnames that you see in the web interface are the result of
reverse DNS lookup of the IP addresses of the hosts in your clusters.
You will find the differences there and this is what you have to change.
Bear in mind that the host doing the reverse DNS lookup is the headnode
for each c
Have you checked whether your reverse DNS entries are correct?
The ganglia agents use the source address of the UDP packets that are
transmitted to o a reverse DNS lookup to yield the hostname seen in the
XML.
- richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
John,
this may not particularly help you, but on your ganglia server
I would try netcating localhost and checking out TN numbers for a start.
e.g.
nc localhost 8651 | grep 'HOST NAME'
and check out TN values, or maybe just wc the above to see if the data
is always coming in properly. Do th
RRDtool - look to the site?
rrdtool windows binary distributions - see page
http://oss.oetiker.ch/rrdtool/download.en.html scroll down to "binary
distributions",
http://oss.oetiker.ch/rrdtool/pub/?M=D and
http://www.cacti.net/downloads/rrdtool/win32
Find documentation as required on his site.
I a
Joshua,
to the best of my knowledge, gmetad has never been compiled for windows.
So you will not be
able to run the server code under windows.
Gmond and gmetric have been compiled by myself and others in a cygwin
environment. There is no windows build documentation.
The windows gmond does not
Steve,
it may seem strange, but that is the way gmond behaves.
If in all your gmond instances you specify a single unicast
headnode, the only place you will get the XML data payload
is the headnode. The other nodes dump the DTD and nothing else.
If you want to see the data on each of your workers
I was hoping that someone would do it properly!
If I get time today I will get the patch working against 3.0.3
- richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rick
Mohr
Sent: 04 May 2006 15:08
To: ganglia-general@lists.sourceforge.net
Subject: R
Ben,
As you probably already know, the code is in header.php -
if( $context == "cluster" )
{
if (!count($metrics)) {
echo "Cannot find any metrics for selected cluster
\"$clustername\", exiting.\n";
echo "Check ganglia XML tree (telnet $ganglia_ip
$ganglia_port)\n";
exit;
Did you specify different ports for each copy of gmond?
This mean you will need 2 copies of gmetad.conf (port numbers
are specified there), and depending on how you want it to
work you will need to have 2 php web directories and set
$ganglia_port = ???;Inside conf.php
also remember that you ca
Chris,
with unicast, the cluster derives its name from the head-node
configuration only.
By head-node, I mean the nodes that appear in the gmetad configuration
as you have
detailed below.
So in your case, if you have two separate head-nodes for each cluster,
there is
no need to use different port
All,
I have just seen that the date as reported in the XML stream
for one of my clusters went backwards by about 20 days recently.
This of course caused the RRD update to fail because it was attempting
to update with an earlier timestamp.
A netcat of the cluster port 8649 revealed that the clust
I agree with Steve, Chris,
I would suggest that (at least until you are more confident) that
you change the gmond.conf config to be unicast to your headnode, i.e:
udp_send_channel {
#mcast_join = 239.255.160.2
host = your-headnode-hostname
port = 8649
}
And also confirm all seems OK by net
Bernard,
sure - no problem. Give me a day as I need to tease apart other customisations
I have done
that people would not be interested in.
- richard
-Original Message-
From: Bernard Li [mailto:[EMAIL PROTECTED]
Sent: 18 April 2006 21:57
To: Grevis, Ri
Chris,
3.0 gmond?
This version of the agent will have the truncated XML problem,
although I have only seen "no element found" errors on parse as opposed
to what chris is seeing - which sounds like perhaps a partially
constructed XML tree in gmetad memory which then blows up the
subsequent?
Chris
Chris,
possibility 1 - look for "Possible bug in hosts up calculation when
federating clusters"
in the mail archive. But if you are using the 3.0.3 release this should
be fixed.
The reason that the XML stream version affects hosts up is because the
test of liveness
changed between before 2.5, and
Martin,
what you are looking at is a customisation that UC Berkely have done e.g.:
http://monitor.millennium.berkeley.edu/graph.php?g=load_report&z=huge&c=PSI%20Cluster&m=&r=week&s=descending&hc=4&st=1145366202
The graph size is set to huge, which is not standard in ganglia.
If you want to do
Eli and others,
just relized that the pattern for /etc/gmetad data_source spec is not
quite good enough.
($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:]+)(:\d+){0,1}/)
only works when a single headnode is mentioned. This is better, but
still only matches the first headnode:
(($h
Fresh off the presses - others may find it useful too. This iterates
through your clusters
and finds dead hosts or duplicated host entries. Note that you can't
find duplicated
host entries by netcatting gmetad port 8651. You must do it as below:
You will need to compile or otherwise have netcat (nc
Eli,
Martin is most surely right. If you are running an unpatched 3.0.2,
let me share with you the many ways it can all go wrong.
gmond generates the hostnames found in the XML stream by reverse DNS
lookup only. Its internal structures treat every different IP address
it sees as a different host,
There are a few simple and obvious steps.
BTW, it is good that TN is greater than TMAX in some sense, because this
means
that gmetad and all the php stuff is not saying anything that is wrong
wrt to
the XML stream.
So have you done a simple tcpdump of UDP port 8649 on the headnode? Do
the UDP pac
This is the classic behaviour that comes from a "trunucated" XML stream.
There is now a full patch for this in the CVS repository. But
if you suffer from this, you should get a /var/log/messages entry like:
Mar 30 09:56:37 ldndsr0163 /apps/ganglia/sbin/gmetad[15336]: Process XML
(LDN FIP QA Scenar
Steven,
if the problem is routing or actual packet loss, then that should be
reflected by the XML output of the master gmond - the "down" host will
have a TN (much) greater than the TMAX. e.g.:
There is also a very small chance that what you are seeing is related to
the "Possible bug in hosts
All,
I just know that no-one else is doing this, but
I updated the windows gmond with a current cygwin install
and fixed the processor count metric. That is all I did.
Simple recompile, slightly newer cygwin1.dll
However when I used this agent, when gmetad did the tcp poll,
instead of the 10
Exactly.
~Richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jason
A. Smith
Sent: 09 February 2006 18:04
To: [EMAIL PROTECTED]
Cc: Ganglia General
Subject: Re: [Ganglia-general] config file confusion
Just a guessyou probably have all 500 nodes
The php, being what it is, kind of encourages everyone
to do their own thing. The problem is which changes
are appropriate for the whole community, and which are appropriate
to only a few.
The second problem is an engineering one. Hacks are easy - it is usually
what you do first. Generating a prop
Has anyone extended the Linux gmond to include disk I/O
or disk latency stats?
kind regards,
richard grevis
For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.
Internet communi
Multicast.
If you unicast to 2 separate nodes for resiliency, then
you are actually sending double the UDP traffic to a multicast solution.
So unicast does not save on UDP network traffic - the opposite actually
in the resiliency
case. What unicast does do of course is reduce the load and memory
My experience so far:
RRD files on ramdisk is a good idea. RRD is very basic with its I/O, it
writes as soon
as it gets a data point (and reads as well). In my case, a simple blade
engnieering server
with simple local disk was really being hammered with 100 nodes, except
at a 5 second poll,
with 5
Call me old fashioned, but:
who | wc -l | awk '{print $1}'
strikes me as safer
regard,
richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin
Knoblauch
Sent: 25 January 2006 09:25
To: Ben Hartshorne; ganglia-general@lists.sourceforge.net
Subjec
There is another way this failure can occur, although it is unlikely
(it happened to me though).
gmond appears to do a reverse IP lookup of the udp packets'
source address to generate the hostname in the XML. We had an error
in the reverse DNS, and 2 separate hosts in the cluster ended up having
t
And of course, being a reverse DNS lookup, it will depend on
nsswitch.conf, which determines whether the data comes from
/etc/hosts, NIS, or a DNS server.
And in our local environment, we have AD as well as bind DNS,
and even the case (upper/lower) of the fully qualified domain
name can depend on
Martin, Matt, and others
with help from a collegue, we have a 3.0.2 release binary for gmond,
and also a patch that corrects the "number of processors" misreporting.
It is also married to the most recent version of cygwni1.dll. and it
seems that windows gmond multicast works, although maybe I was
Of you could hack the load value itself by dividing by 5 in
cluster_view.php.
regards,
richard
p.s.
this is a bit yuk, but is certainly easy.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alexei
Rodriguez
Sent: 04 January 2006
This may or may not be the answer to you.
gmond itself does a reverse lookup of the IP of packets
sent to it from other gmonds. The resultant name is purely up to
a reverse DNS lookup. If this lookup works you get a hostname.
fail and you get an IP. It is the reverse DNS lookup domain
as executed f
Does anyone have compiled solaris 8 binaries they can send me?
My efforts to compile ganglia on solaris 8 with gcc did not
get very far. Not to sure why but if someone can send me the binaries,
that would make me a happy puppy.
kind regards,
Richard
ps, if interested:
Making all in examples
/bin
Exactly.
I should have been clearer. The default windows/cygwin client is neither
correct enough
(cygwin's fault) nor provides all the metrics we want (in fact, because
some of our farms are not just HPC
farms, we want some other metrics as well). I remain grateful to whoever
developed it, none-th
All,
against all probability, but for reasonable historical reasons, we run
windows based HPC applications.
We also have large networks of similar function windows farms (e..g. web
farms). We want to improve
the visibility of the state of our estate, we like ganglia (rollups and
all that),
and wan
73 matches
Mail list logo